Omd performance showing CRIT alert

manojr2k · November 3, 2023, 1:26pm

CMK version: 2.2.0
OS version: rockylinux9

Error message: i am getting this CRITICAL message on check-mk machine.
what needs to be done here?

Heavy · November 3, 2023, 1:55pm

The message means that the Checkmk fetcher processes are occupied to 95%. Once the occupation reaches 100%, Checkmk agent checks will have to wait for a free fetcher process and your check latency increases.

You should consider increasing the number of fetcher processes. But be aware that fetcher processes consume memory, so keep an eye on the memory utilization. I typically run 50 fetcher processes on a Checkmk server VM with 16GB RAM.

manojr2k · November 3, 2023, 2:36pm

hi Heavy

thanks for reply.
will monitor performance . can you please tell me where i can get the exact information for Core Monitoring settings? so that i can do the needful changes according to requirement?

Heavy · November 3, 2023, 2:54pm

The documentation on CMC fetchers and checkers should be a good starting point.

manojr2k · November 3, 2023, 3:02pm

thanks Heavy .
can this information also be useful for identifying stale services issue which shows in dashboard ?

Heavy · November 3, 2023, 3:45pm

Stale services can have different reasons.

Among them is a (too) heavily loaded Checkmk server, so watch out for the metrics of the OMD performance check.

Other possible reasons include slow network connections and slow responses to Agent queries, maybe due to misbehaving 3rd party plugins or local checks.

manojr2k · November 3, 2023, 6:37pm

for us , network connection can not be issue.
we are not using much of 3rd party plugins
we are using local check plugin , how i can verify/measure performance of our local check plugin (shell script )?

manojr2k · November 3, 2023, 6:42pm

Maximum concurrent Checkmk fetchers current setting is 13 → changed to 30
Maximum concurrent active checks → from 5 changed to → 10

i have this stats at the moment.and i did the above changes from global settings.
i am not sure what needs to be done for apache WARN.
can you suggest anything ?
should i need to restart site ? so that performance count will change?

Heavy · November 3, 2023, 10:34pm

When you klick on the blue information sign at the left, a text appears that tells you what to do.
The Apache check typically warns you that the default number of processes might be too high.

manojr2k · November 4, 2023, 5:55am

hi Heavy
thanks for the reply

we are using local check plugin , how i can verify/measure performance of our local check plugin (shell script ) that is /usr/lib/check_mk_agent/local/localchq.sh.

i mean, how i can cross check that , if any plugin is not causing the stale service which comes in some interval ?

Heavy · November 4, 2023, 10:27am

You can measure the execution time/load of a script with the time command, e.g.

# time /usr/lib/check_mk_agent/local/localchq.sh
[local check output]

real   0m0,014s
user   0m0,010s
sys    0m0,003s

If you suspect that the local check consumes to much time and is the source of stale services, consider putting it in a subdirectory so that it is executed cached. See

manojr2k · November 4, 2023, 10:51am

ok Thanks Heavy for the reply.

manojr2k · November 4, 2023, 11:37am

after some interval , i can see this messages in Monitor->History->Service check duration .

does it has any relation with stale service ?

manojr2k · November 4, 2023, 5:58pm

HI Heavy , i was just checking the performance of local script .

time /usr/lib/check_mk_agent/local/localchq.sh
when i checked the time command to execute the script , its taking below time.

am i suppose to improve this ?
real 0m9.815s
user 0m1.946s
sys 0m5.231s

martin.hirschvogel · November 5, 2023, 6:29am

Either improve your scripts if possible or more fetchers.
Instead of 1s for each host, your fetchers are busy for up to 10-11s just with waiting. Thus, one fetcher can only do around 5 hosts instead of 30-40…

manojr2k · November 5, 2023, 6:44am

Hi Martin
Thanks for the reply.
also should i need to increase Maximum concurrent active checks ? currently i kept 20

is there any alternate to script ? because as per i know shell scripts are normally slow .

andreas-doehler · November 5, 2023, 10:20am

No - it depends on the programming of your scripts.

From some posts before - your agent execution time is very high for normal server systems. @martin.hirschvogel already gave the technical explanation and i would say a normal Linux / Windows server should not need more than 1 or 2 seconds for the agent query.
There are special cases where it could take longer but for the majority of systems a value between 0.5 and 2 seconds should be the target.

manojr2k · November 5, 2023, 10:31am

ok thanks Andreas for the reply.