(CMKV2.0) Host's check_mk service monitoring returns (null)

Hello,

We upgraded one of our RAW check_mk installations today to version 2.0.
Since then we have been receiving (null) in the summery of the Check_MK service on random hosts at random times.
This happens on all sorts of hosts as well (SMNP,Windows Agent,Linux Agent)

Has anyone else encountered this? is there a solution for this issue? Known bug maybe?

Thanks.

Best Regards,
Adam.

Hallo,
any hints in the logfiles?
Are plugins or self written changes active?

Ralf

Hi,

Thanks for responding. Oddly enough the issue seems to have been resolved by itself.
hasnā€™t happened in the past 13 hours+. When it used to happen every 30 minutes.

The only other 2 issues Iā€™ve noticed since the upgrade is my livestatus usage on the analyze configuration page is at 100% all the time. and Iā€™m using the maximum amount of parallel livestatus connections (20). is there any way to increase this? the documentation points to a page on the global settings that does not exist.

Also several ESXI hosts seem to be reporting (Service Check Timed Out) on the Check_MK Service.
but I have yet to properly diagnose if this is a false positive or not.

Thanks a bunch.

For this i would recommend to do a ā€œcmk -D hostnameā€ on the command line, not that something with the configuration is wrong for this host. Old hosttags or something like this.

The config setting is ā€œMaximum concurrent Livestatus connectionsā€. On my system it is shown if i search for livestatus inside the setup.

Hello,

Thanks for responding.
Regarding the ESXI host. I donā€™t see anything wrong with the configuration. ill try removing and re-adding it maybe and see if that helps.
Alternatively is there any way I can make it so this alert does not appear? as long as its not a hard down and pings arenā€™t being lost. I donā€™t really want to see an event for it.

Regarding the Maximum concurrent Livestatus connections. this setting does not exist for me in the raw version of 2.0.

I found a setting for ā€œMaximum parallel site activationsā€ but it does nothing.

For the RAW edition you can edit the Nagios livestatus broker module configuration.
Inside ā€œ~/etc/mk-livestatus/nagios.cfgā€ you can find the ā€œnum_client_threadsā€ this is the same what you can configure on the enterprise with the web settings.

Thanks! That worked. I tried increasing it to 80 but once I restarted check_mk it went straight up to 80 and 100%
Is there any log I can check to see whatā€™s causing the problem? as I assume I shouldnā€™t increase it more : P

80 concurrent connections is very high. How big is the system?
You can look at the ā€œweb.logā€ and also at the Nagios core log.
But i think it could be a problem with some web extensions inside your installation.

Not that big. 200 hosts 6000 services.
I just increased it from 20 to 80 just to see if it solved anything. it didnā€™t : P
I looked at the web.log and didnā€™t see anything noteworthy nor on the nagios.log file
I also tried removing all of my MKP extensions.

Is there another log file Iā€™m missing or a method to debugging livestatus queries in the raw edition?

HI @adamk ,

since upgrading to Version 2.0 Iā€™m experiencing the same issue.
Since your last post has been a while, have you found a solution?

Kind regards

Edit:
Iā€™m using RAW Edition too, currently with the latest version - 2.0.0p6

What we found most times, if such a problem exists, there was somewhere an open browser with the old interface loaded that was not completely reloaded after the upgrade :slight_smile:

Hey thanks for the advice.
Unfortunately there was no change after ā€˜reloading/restartingā€™ the browsers.

Furthermore I tried to set a rule ā€˜Maximum number of check attempts for serviceā€™ with no effect.
Do you have any more advices?

Kind regards

This is also happening to me in a fresh 2.0.0p7 raw edition install, just monitoring tree hosts: one the own cmk host system with Rocky linux 8.4, and two environment sensors through SNMP.

Hi everyone,
Iā€™m also having this problem with the clients randomly reporting a summary as (null).
I have tried increasing the number of client threads from 20 to 60 (1100+ clients) but it seems didnā€™t make a big difference. With version 1.6.0p24 it was set to 20 and never had this (null) issue.
I also noticed, could be just coincidence, that every time I apply a change, new alerts with summary (null) appear on the dashboard.
I just migrated recently from 1.6.0p24 RAW to Check MK RAW 2.0.0p8.
Any idea what it could be or where to look for messages/errors?
Many thanks!

Check_mk service monitoring returns (null), to many hosts? - Troubleshooting - Checkmk Community

I had the same issue and fixed it, I hope this also works for you guys.

1 Like

Hi,
Thank you for replying, although this doesnā€™t seem to be my issue.
All my SNMP connections are working fine having the right credentials.
This problem happens with servers having the agent installed mostly, and it seems to be triggered when I click ā€œapply changesā€.
It looks like the ongoing connections to the agents are cut when applying new changes and instead of being discarded are listed as ā€œnullā€ because they were interrupted, this is just a guess though, couldnā€™t get to the bottom of it.
Thanks for the info anyway :slight_smile:
Regards.

Hi, i have the same issue,
when apply any change configuration, random hosts return Service Check_mk with (null) state on Summary . I have increased num_client_threads, but the problem continues.

OMD - Open Monitoring Distribution Version 2.0.0p8.cre
Description: Ubuntu 20.04.3 LTS

regards

Today we had the same problem discussed here.

I think the problem is as also mentioned before that the Nagios core donā€™t wait to finish the running checks or kills the running checks actively.

I reported this issue as soon as first stable version of 2.0 was released. I never received any kind of answer to this report. In current check_mk RAW version (2.0.0p15) the issue still exists. It seems that check_mk guys are not interested to release stable free version which can be used to monitor infrastructure with more than 30-40 hosts. In other words, if you have to monitor infrastructure with more than 30-40 hosts (and you donā€™t want to use an alternative), you have two options:

  1. Buy paid version
  2. Use old stable version 1.6

I have also some raw installations running without this problem and they have over 100 hosts. This is not a generic problem with the raw edition.