Changing check interval

Hi,

as far as I know the following is only possible with an active check (nagios plugin):

Set the normal check interval to 1 hour.
If the check fails, set the check interval to 5 Min.

The problem I have is, that I need this feature for a check which is currently implemented as a special agent. This special agent generates several services. But a nagios plugin can only create one service.


Ambre

1 Like

Check interval works also for special agents as they are also normal active checks.

Its correct. this you can do for active checks only.
The Check_MK service is an active check which executes your special agent.

If you change the scheduling of Check MK check all discovered passive checks ‘inherit’ the settings. Please also consider to change RRD settings according to the changed timing.

I hope that helps

regards

Mike

1 Like

Hi,

the check interval can be changed. But I also changed the retry interval along with max. check attempts for services. But the retry interval did not change after a service failed.

I did the following settings:
Created a virtual host called Host NP_Test
Created a rule for “Individual program call” (special agent) for this host.
(This special agent is creating the services.)

For the host NP_Test I also created the following rules:
Added rule “Normal check interval for service checks“ for Check_MK$ with 5 Min. interval.
Added rule “Retry check interval for service check“ for service Check_MK$ with 1 Min. interval.
Added Rule “Maximum number of check attempts for service“ for service Check_MK$ and specific services generated by the special agent.

The normal check interval is changed to 5 Min. But the retry interval is not changed to 1 Min.


Ambre

1 Like

Hi,

my conclusion ist:
The “Retry check interval for services” can be changed for nagios checks only.

All my settings to define an extra retry check interval failed, except for nagios checks.


Ambro

I would rephrase this to ‘for active checks only’.

What check is it about? Can yop please provide more details
If you dont see below in the service its an passive check:

image

The following is taken over from the executing check:

image

In most cases the passive check takes over from the check_mk service

regards

Mike

Hi,

yes, the values are shown correctly but, the retry is done every 5m and not every 1m in case you would have 5m / 1m setting. In oder words, the retry setting is ignored except for nagios checks.

I have a nagios check with 5m normal check interval and 1m retry check interval. And it works exactly as expected. The check is executed every 5 min. On the first CRIT (soft CRIT) the check intervall is changed to 1m. After three failed checks the service get red (hard CRIT). The maximum number of attemps is set to 3.

When I configure this check as a special agent, the GUI shows 5m / 1m. But the retry interval does not change when the check failes. Only the normal check interval changes.


Ambre

The passive checks can`t trigger the Check_MK service in case of failures, unless manually performed.

Hi Paulo,

I’m talking about aktive check. See above.

Nagion checks → Working as expected.

Special agent → The retry interval is not change. Though it is shown in the WATO.


Br
Ambre

1 Like

As far as I`m aware when using Special Agents, only the one named Check_MK is the Active Check while the discovered services are considered Passive Checks as they wait for Check_MK service to bring the results, also results are updated when you update Check_MK service.

1 Like

Hi all,

can I consider this as a bug or how can this be explained.

For special agents the changed retry intervall is shown in the WATO for example as 5m / 1m. But in reality it is not considered. Becuase the retry interval remains the same as the normal check interval (5m). The same is the case when I change the retry interval for the Check MK service. The retry interval of the services do not change.

So, either I’m copletely missing something. Or the WATO is missleding in this case.

I can’t assume that I’m the only person how is facing this issue.


Ambre

Hi @ambre,

Practical Workarounds

Here are the best ways to work around this right now:

  1. Increase “Maximum number of check attempts” in your rule This is the closest you can get to real retry behavior for Special Agents.
  2. Run the Special Agent more frequently via cron (most reliable workaround) Example (as root or site user):

Bash

# Run every 5 minutes instead of the configured interval
*/5 * * * * su - <sitename> -c "cmk --special-agents --force <your_agent_name>"
  1. Use the REST API or command line for more control You can also trigger the agent manually when needed.

Greetz Bernd

1 Like

If this is a bug or not is hard to decide :wink: as it is in this state all the time before.

I could simulate your problem in my test system. In my own systems i normally have not this problem as i try to stay with the 1 minute check interval and here also the 1 minute retry.
If i need more check interval then also the retry has the same value.

The following screenshot shows your problem with the one single check that is UNKN and it is a passive check from the CheckMK service on top.

It can be seen that the second check attempt is done 5 minutes after the first one, also with a retry interval of 1 minute.

The only viable solution for this problem would be an alert handler for such a machine that is triggering the Check_MK service in case of an problem with one of the passive services.

Inside the alert handler you can decide then how long it waits before retrying the Check_MK service.

2 Likes

Hi,

thanks to all of you for helping me to understand this issue.


ambre

2 Likes