Cached mrpe checks get checked every time the agent runs and the Current check attempts goes up

Hi,
we have a check that uses the /etc/check_mk/mrpe.cfg and is set to an interval of 300.
Now when the check failes check_mk checks this every 60s and the counter for failed attempts goes up.

Is this by design, or am I doing something wrong?

It also looks like checks are marked as failed when the current check attempts are not done yet

Hi,
did you looked at https://checkmk.com/cms_agent_linux.html chapter 4.2. Looks like that you hava a long running process.

Cheers,
Christian

Yes, this is a long running process. The whole test suite takes about 1h so we moved things to single checks that just need to run every XXX seconds.

This is by design.

The agent calls the Nagios Plugin via MRPE every 5 minutes, but it has to deliver the output with every call of the agent (i.e. every minute) because otherwise this would create an UNKNOWN for the service check.

That means that the service check itself is run every minute.

You also see the CRIT as soon as the state changes. The maximum number of check attempts (15 in your example) only delay the notification created by the monitoring core. Until the 15 check attempts are reached the service check is in a SOFT state, after that it’s in a HARD state. This can be used as a filter in the views.

So how do I tell check_mk ‘14 fails are ok, but 15 is bad’?

Hi,

as r.sander wrote, you need to configure the rule “maximum number of check attempts” to 15.
If you use a plugin or a local script you will be able to start the code in x minutes. The only thing you need ist to create a directory e.g. check_mk_agent/local/900, out your code in there and the sript will run every 15 Minutes. One big effort is, that the output is also cached.

Regards,
Christian

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.