I am monitoring some hosts which are far from Check_MK Instance so sometimes the times for executing SNMP checks is higher than 60 seconds. Check_MK is triggering events for Check_MK service going OK ->Critical and Critical → Ok. In the notifications settings I have excluded the service Check_MK but it’s still triggering e-mails… This is the output of the check plugin -
CRIT - [snmp] keepalive timed outCRIT , Got no information from host, execution time 63.4 sec.
What can I do in this case as I don’t want to be notified for these type of events?
Just to add that I have changed the max timeout in check_mk_configuration file to be 200 sec.
Also for the folder where these hosts sit I have specified timeout value to be 180 sec but it’s still comparing to 60 seconds…
The decision why a notification is sent can be tracked in the file ~/var/log/notify.log.
I don’t have trace for these notifications in the file.
What are you trying to do is just take a pain reliever instead of fixing the issue. If your SNMP check runs in to timeout your monitoring is not proper working.
The root cause could be several issues. If its a remote host it could be due to slow network. This also highly depends on the amount of data to be sent.
It could also due to a slow SNMP agent and we also had issues with SNMP agents dont answer if requests comes too quickly.
- Setup a rule “Timing settings for SNMP access” and increase “Response timeout” and “Number of retries” ( we use currently 8sec and 5 retries)
- Setup a rule “Hosts not using Inline-SNMP”. With this rule Net SNMP is used which consumes slightly more resources but is much more robust.
- Setup a rule “Normal check interval for service checks” and “Retry check interval for service checks” and set both to at least to 5 min.
- Setup a rule “Service check timeout” and set this to something below the value used in 3.
- Setup a rule “Configuration of RRD databases of services” and set the step precision to the same value as used in 3. Possibly you need to migrate the RRD files of your hosts with cmk to the new layout.
Assign this rules to one host for testing and if all works well assign it to the others.
You could verify a part of this rules if you click in monitoring on the check_mk service on the burger menu “Parameters for this service”