Cmc crashes several times a week

CMK version:
2.1.0p24
OS version:
Debian 11
Error message:
None

cmc stops sometimes (1-3 times a week) without an error message. In the cmc log you only see:

2023-03-21 22:45:28 [5] [main] [rrdcached at "/omd/sites/xxx/tmp/run/rrdcached.sock"] stopping...
2023-03-21 22:45:28 [5] [rrdcached] [rrdcached at "/omd/sites/xxx/tmp/run/rrdcached.sock"] closing connection
2023-03-21 22:45:28 [5] [rrdcached] [rrdcached at "/omd/sites/xxx/tmp/run/rrdcached.sock"] stopped
2023-03-21 22:45:28 [5] [main] [RRD helper 2423926] exited normally
2023-03-21 22:45:28 [5] [main] [carbon connection pool] stopping...
2023-03-21 22:45:28 [5] [carbon] [carbon connection pool] stopped
2023-03-21 22:45:28 [5] [main] [influxdb connection pool] stopping...
2023-03-21 22:45:28 [5] [influxdb] [influxdb connection pool] stopped
2023-03-21 22:45:28 [5] [core 2423865] [main] stopping config cleaner...

Log level of cmc was info for that. Nothing according cmc in syslog. rrdcached.log is empty

in alert.log I found:

2023-03-21 22:45:37,361 [20] [cmk.base.alert_handling] ----------------------------------------------------------------------
2023-03-21 22:45:37,361 [20] [cmk.base.alert_handling] Starting alert handler helper.
2023-03-21 22:45:37,361 [20] [cmk.base.alert_handling] Global handler timeout: 60 sec (TERM), 120 sec (KILL)
2023-03-21 22:45:37,363 [20] [cmk.base.events] We are back after a restart.
2023-03-21 22:45:37,364 [20] [cmk.base.events] CMC has closed the connection. Shutting down.

My system has one site, 310 Hosts, 19000 Services (Mostly cmk-agent, several snmp, some old nagios style plugins

Anyone has something similar? How to find the reason? I set cmc log level to debug now.

It stopped again without any useful information in log, even with loglevel debug for cmc

Hi @HaZet1968 - logging is not very useful in this instance as it seems. At least not from the application perspective :wink:

Please review the memory and CPU consumption of the machine and the processes.
If possible, deploy a separate Checkmk monitoring site and do a proper process monitoring on the evil site. I’m always deploying an allseeingeye site to monitor my own checkmk sites. That way, i can verify memory consumption and doublecheck my production workings.

In regards to the checkmk software itself - yes - some kind of error happened and the deployed logs should be more helpful…

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.