Saturation CPU OOM and Stale Host

Easlay · February 23, 2026, 1:59pm

Good morning everyone,
I installed checkcmk 2.4.0p20 raw on an Ubuntu 24.04 LTS server with 4 vCPUs and 8 GB of RAM.
I have about 320 active hosts for ~7,000 services. I’m having the problem that the services/hosts keep crashing.
Checking the server-side processes, I noticed that many Python 3 scripts are being executed, and the plugin installed on the host isn’t being used. Could you help me?

andreas-doehler · February 23, 2026, 2:20pm

As you use the RAW edition there is one important thing you should do on your system.

Configuration of parallel checks inside the Nagios core
- ~/etc/nagios/nagios.d/timing.cfg –> max_concurrent_checks=0 should be changed to a value not higher than the double amount of cpu cores available

I would say for RAW edition 4 cores is way too low.

Easlay · February 23, 2026, 2:24pm

By doing this, am I not at risk of losing checks or getting stale?
ideally how many would be ideal?

andreas-doehler · February 23, 2026, 3:21pm

If you get stales you know that you need more CPU cores.

As i said not more than double amount of CPU cores.

r.sander · February 23, 2026, 3:29pm

Or not higher than the number of hyper threading threads?

andreas-doehler · February 23, 2026, 3:29pm

You can also say it this way. But it only applies to RAW edition^^

r.sander · February 23, 2026, 3:43pm

In Ansible facts there is processor_cores and processor_nproc. For VMs both are the same (because processor_threads_per_core is 1).

It is easier to just use processor_nproc to set max_concurrent_checks.

Easlay · February 23, 2026, 4:22pm

thnx a lot for tips.

have u got a more tips for obtain best performance?

mbunkus · February 23, 2026, 4:51pm

A while ago I’ve done some testing wrt. how efficient hyper threading is by compiling a rather demanding C++ application in parallel on an AMD Ryzen 5950 which has 16 real CPU cores, 32 with HT. Due to how C++ works each compilation unit (.cpp file) took quite a number of seconds to compile, meaning the overhead from starting processes etc. was dwarfed by the raw computational need. I won’t go into details and methodology too much, but the result was pretty clear:

Going from 8 parallel processes to 16 roughly halved compilation time — as expected, as I’m using real cores here. Going from 16 to 32, though, only resulted in a 20% gain, showing how little HT can effectively achieve in this kind of scenario. I usually only consider real cores when sizing new virtualization hosts, too, for the same reason.

paulosantanabr · February 23, 2026, 6:09pm

Which interval are you using for service checks? If using the default you can try increasing from 1 to 5 minutes which might be enough. Be aware that SNMP devices might require more time and processing power than a standard checkmk agent.

Easlay · February 24, 2026, 8:18am

can u say the name of services for timing?

paulosantanabr · February 24, 2026, 9:26am

Look for “Normal check interval for service checks”