Cpu usage after upgrade to 2.0.0p4

Lukasz · May 17, 2021, 7:08am

After upgrading cee from 1.6.0p24 to 2.0.0p4 my cpu usage is much higher.

And Fetcher helper usage is always 90-100%

Server is centos 7.9.2009, 4cpu, 8GB ram, 400 hosts/ 23000 services.
The version of the agents on the servers is 1.6.0p9-18

from top command i have
/omd/sites/chechmk/bin/cmk --checker
/omd/sites/chechmk/bin/fetcher (multiple times)

it’s normal ?
What does the fetcher helper do, and how can i help ?

andreas-doehler · May 17, 2021, 8:29am

With this version you have two different helper processes - fetchers and checkers.
Fetcher will only transfer the data from the agents/devices to the checkers to be checked then.
Can you have a look how many of these two helpers are defined in your system?
Rules are → Maximum concurrent Checkmk fetchers/checkers
You can also switch back to the old model with “Use separate fetchers and checkers” turned off.
This can be used to test if there is some difference in the load.

In your case you need more fetchers if they are nearly 100% in usage.

Lukasz · May 17, 2021, 9:08am

My settings are and i never changed it:
obraz

When i switch back to the old model “Use separate fetchers and checkers”, cpu dropped by 10% but stale services go to 20%

My statictics are:
obraz

andreas-doehler · May 17, 2021, 9:36am

Latency looks ok. I would decrease the check helpers a little bit and increase the fetcher helpers.
You can have also a view at the graphs from the “OMD nagios performance” to see how high your helper usage was with 1.6

Lukasz · May 17, 2021, 9:44am

Thank you for your help with helpers, I will try it.

And have you also noticed that when you change from 1.6 to 2.0, the use of cpu grow so much?

osksto · June 28, 2021, 11:33am

We are facing the same problem with 2.0.0p3 enterprise.

CPU wise we are OK, is the solution increasing the number of concurrent fetcher helpers?

What is the solution here, is there a consensus? Lukasz, have you solved your case?

Thank you,
Oskar

andreas-doehler · June 28, 2021, 11:47am

It is a possible problem to go from combined helpers as it was with 1.6 to the separate helpers for checks and to fetch data. You need to test and find the sweet spot between booth.
If you have only agents with a short runtime you need fewer fetchers compared with the same amount of hosts but all checked with snmp.
I think there is no general solution or good advice.

Lukasz · June 28, 2021, 12:01pm

Yes i incresed from 13 to 20 “Maximum concurrent Checkmk fetchers” from global settings and it works.

Denis1 · November 10, 2021, 1:48pm

Same issue for me with checkmk cee 2.0.0p12 update with 1.3K snmp hosts & 24K services. I had to move gradually ‘fetcher helper usage’ from 13 to 200 !
Now the fetcher helper usage drops from 99% to 54%
Meanwhile the monitoring core services check rate rises from 33 services check/sec to 340/sec
Not a big surprise, now my checkmk VM memory is critical
But I know well the sysadmins guys

system · November 10, 2022, 1:49pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.