We have got a very strange problem after the upgrade to version 2.1 (Enterprise Edition) (first update was from 2.0.0p27 to 2.1.0p7 and today we installed the 2.1.0p9 version). The OS of the servers is CentOS 7.9 with the last patches.
The cpu usage of the “cmc” process increases randomly (after 4, 6, 9 hours) to a very high value: it starts to use the whole processor (the server has 2 virtual processors).
There are no errors in the log files. “strace” has not shown any differences with the other systems.
There are two solutions to lower the cmc process cpu usage:
- omd restart site
- cmk -R
Has someone obtained a similar problem? Could be a problem with the OS, e.g. systemd?
Nobody with a similar problem?
The system has been moved to a Rocky Linux 8.6 and the problem with cmc high cpu usage is still present. In the cmc log no error messages.
OMD - Open Monitoring Distribution Version 2.1.0p9.cee. The below figure shows the sudden increase of the cmc process cpu usage.
The “cmk -R” command helps:
Does anybody has any advice what should I check, may there be a problem with the configuration?
That is very low assigned resources for such a monitoring system. How many hosts do you monitor?
I have a small system with only 20 hosts and 2 cores and this system has no such problems. Only the base is different than yours with actual Ubuntu/Debian.
It is also strange that after a restart your cmc process needs no resources.
There are ca. 400 hosts and 9500 services. The most strange thing is that the system had no problems before the upgrade with the 2.0.0p16 version. The “cmc” process usually does not consume much cpu resources, a similar system with ca. 800 hosts (ca. 30.000 services), consumes under 5% (but the server has 4 cores).
OK we will try to add 2 cores.
We added 2 cpu cores, however after some time (ca. 1.5h) the cmc process uses again ca. 100% of the one core. Really strange. Interesting is also that the restart of the cmc process “cmk restart cmc” is not enough. Only “cmk -R” or “omd restart siite” commands are able to lower the cpu usage of the cmc process.
Has anybody a similar issue?
Now you need to inspect the core log - it is possible that you need to change the log level of the cmc core to get some useful information.
Well, I configured the “debug” level for all the cmc core sections. With the script I was able to detect the the exact time when the cpu usage of the cmc process rised to over 90%. It happens really quickly in one second from almost 0% to over 90%. Unfortunately in the cmc.log file there were no unusual messages. Also the amount of the messages did not rise. Really strange, there is no zombie process, the system and Web interface are still very responsive. However I still believe that cmc process should not use the whole core.
There are several ways your configuration can impact the impact of the micro core.
It can be normal, that the core is running with 100% utilization on one CPU core, but that seems unlikely here.
You might want to look for recurring downtimes, if you have a lot of those, they can impact the core.
Thank you Robin for the comment. However, there is no recurring downtime configured. I will check the config in more detail, but it is strange because before the upgrade to 2.1 there was no cmc high cpu usage.