Check hosts with PING parameters

We have monitoring set up to email us based on the “Check hosts with Ping (ICMP Echo Request)” service inside of CheckMK. Currently it seems like if hosts drop a single ping then we are getting an email saying that it is critical and then immediately following an email that says it is back in OK status. I have tried changing the parameters of this service but still no luck. Can anyone help me out on this? I’ve tried changing the “Number of packets” parameter and the “Total timeout of check” to larger values with no luck.

Thanks in advance!

1 Like

Hi @jperry, welcome to CMK Forum.

Why don’t you use “Maximum number of check attempts for service” rule in WATO for the service you’re talking ?

Cheers,

I can try changing that and see if it works. We also get a ton of notifications for the Check_MK service going from OK to CRIT and then CRIT to OK right away… The plugin ouput says “Service Check Timed Out”. I have the “Check_MK Agent Timeout” set to 120 seconds though so I’m thinking I need to change another parameter.

I believe that you’re facing some network issues.
If you have packet losses and huge timeouts for agent access.
Is that agent just a normal agent ? some plugins that might increase that timeout ?

Increasing the Maximum number of check attempts for service rule maybe not the most correct solution (in my opinion), but still an option.

Try to inspect some network issues you might have.

Cheers,

1 Like

If I run a consistent ping to some of these servers from outside of Check_MK I rarely see a dropped ping and all response times are right around 1ms sometimes 2ms. We are just using the normal agent with no extra plugins and when the Check_MK notification status goes from CRIT to OK it usually says “[snmp] Success” with an execution time of under 5 seconds.

How often are you requesting the agent (snmp) ? Try to increase Normal check interval for host checks rule and the Retry check interval for host checks (as well for services).

Which CMK version are you using ? RAW / CME ?

Cheers,

I think all the host checks are at 1 minute intervals… We are using Raw 1.6.0p5

Hi @jperry and welcome to the checkmk community.

What kind of host are you checking? Linux, Windows, something special?

Some devices asked frequently via SNMP does not answer anytime because they have some built-in features to block IPs frequent requests. The option @ricardoftribeiro mentions to lower the check rate on such devices could help preventing such behavior.