CheckMK Raw 2.0.0p9 (CRE) - Activating Changes Causes Hosts To Show As Down

I’m a bit stuck here. Almost any time we activate changes in CMK Raw 2.0.0.p9 we get a number of hosts that will show as down with a “Null” summary. The number of hosts that will show this ranges from 1 - 100+

I’ve not found any performance or configuration issues on the monitoring sites/hosts.
We have 6 total sites with around 1400 total hosts.

The agents on all hosts have been updated to match the server version.

Any idea where I should start looking? I would appreciate any help.

There are some points you can check.
The problem hosts shown belong these to the same site every time or is it distributed over your complete infrastructure?
Next check - what is used as the host status? Is it a normal ping or do you use the status of the Check_MK service or something else?
As i see no metric icon be some of the hosts i think the host check command is the problem.

Hi @andreas-doehler, what is best practice or recommended?

I would say there is no real best practice. The problem i saw at some 2.0 installations was that a Check_MK service goes critical with output “null” if it is checked directly at the activation time.
At the next check interval it is Ok again. If you use now the status of the Check_MK service as host status then also you host has the chance to be critical at activating changes. You can also say more hosts equal higher chance to have some critical hosts.

I only use service states as host states if i cannot ping the host.

1 Like

Hi,
I have one main server and six other Check MK proxies for different regions.
I have just tested that the (null) issue is extended to the other monitoring proxies.
When I apply a change that affects devices monitored by other proxy nodes I’m also getting the null summary for devices from that remote proxy.
If apply changes affecting to my main server and devices from other proxies, I see this problem for devices from the main server and the affected proxies.
Although, I use ping to monitor the host status and I’m not getting new hosts down alerts, just the Check MK service summary (null) issue.
I use Check MK 2.0.0p9 CRE.
Regards.

That’s the same as on the system i saw it the first time.
It was also with p9 CRE

Good morning, we migrated from 1.6.p19 to 2.0.0p4 and we also have this problem (the hosts are all ping)
Currently we are with a master and 4 slaves all in CRE version 2.0.0p8 and the problem continues, it is quite annoying :confused:

What problem exactly? If you use ping you should not have the host down problem but the Check_MK service shown with output “null” or?

Hello, the problem is exactly the same as @AnthonyWingerter, when applying the changes from the MASTER the hosts (not all, it is random) of the SLAVE remain in null until it redoes the check. All hosts are pinging.
This has never happened in 1.6, it has been happening since we migrated the infrastructure to 2.0, it is not serious, since it does not launch notifications, when giving the next check ok, but it is quite annoying.

1 Like

Hi, I’ve the same problem with 2.0.0p9 (never happened with deploy @ 1.6 version). The unique difference is that it’s not an “host down” alarm but a “service critical alarm”.
I made many tests looking at htop output and I can confirm that null otuputs come when activation overlaps with host and service check.
In our enviroment also “periodic/bulk service discovery” trigger theese errors.

Note that, at the beginnig, we had only one site for (500 host/ 6000 services) and we decided to split to a multisite environment because of the “null issue” but in this deploy it was triggered by “periodic service discovery”, not by changes activation.

now, in the multisite distribuited deploy (4 site with 8/8/8/16 core) the issue is triggered mainly by changes activation, rather than by “periodic service discovery”

Sorry, thanks @geppo !! is in service CRIT me too.

Thanks all. This issue also did not occur for for in v1.6.
This issue also triggers host down notifications for us.

This seems to be a bug in the current version.
Is there a method of opening up a bug report to the CMK team for this issue?

Thanks and regards,
-Anthony-

1 Like

Report a bug…

image

by mail: feedback@checkmk.com

Thanks! I’ve submitted a bug report via email today.

Great|! If you get any feedback please advice all of us :slight_smile: !