Smart Ping and "time since last check of the host"

Hi all

We are facing minor issues with very few hosts (~10 out of 7000+) that seeem to randomly be shown as DOWN by smart ping in checkmk 1.6.0p17.

We are using checkmk’s default settings for check intervals (6s/6s), but changed the smart ping settings to “Expect one packed every 300 seconds”. On top of that, we changed the service check intervals to 5 minutes/3 minutes.
Now, for almost every host this seems to work perfectly well. However, about ten hosts on one customer’s site seem to randomly be shown as being DOWN. As often, doing a continuous manual ping from checkmk to the same hosts does neither show any issues, nor minor hickups in response times. Also, when manually forcing a host check, the host will turn UP again immediately. Afterwards, smart ping will work for some time and then start failing again. Sometimes it takes hours for it to fail again, sometimes just minutes.

What I gathered so far is, that on “non-problem” hosts The time since the last check of the host is always lower than six seconds, whether the hosts are actually down or not. For the ten “problem hosts”, the same is not true. There, the value keeps increasing, sometimes stating the last check had been minutes ago. Also, the Host check duration always stays the same.
When encountering the issue, it looks like this:
image

I am unsure whether this is an issue with check scheduling on this specific site, or with smart ping itself. Resource-wise and “checkmk helper”-wise the monitoring host seems to be okay. Also, the other ~80 hosts do not show the same behaviour.

Have any of you faced such an issue before?

Thank you and kind regards
Thierry

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.