I could solve the problem for me by simply deleting the faulty rrds of the affected hosts under ~/var/pnp4nagios/perfdata (deleted complete dir of the host).
rrds get recreated et voilà nagios doesn’t crash any more (and wato shows me graphs again).
So, the problem is solved or the core still crashes by performing the same action like Analyze recent notifications → replay this notification or anything else?
Thanks for providing this information. I do not see anything strange. Can you also check the below log files especially when the time when the core crashes ?
Here’s what I could find as the only error in $OMD_ROOT/var/pnp4nagios/log/perfdata.log :
2025-07-07 06:21:33 [809913] [0] RRDs::update ERROR rrdcached@unix:/omd/sites/*****/tmp/run/rrdcached.sock: illegal attempt to update using time 1751862069.000000 when last update time is 1751862081.000000 (minimum one second step)
Everytime the service crashes, it seems to be because of such an error (trying to update a graph with datetime before the actual datetime).
I’ve managed to stop crashes for now, by deleting all rrd graphs in var/pnp4nagius/perfdata, but it really is not the best experience so far.
I have the same issue. Can’t find any reason for constant crashes from nagios.
I’ve have done a clean install from scratch with version 2.4.0p5 and everything was fine. I was adding some hosts and suddenly it start to crash. I’ve upgrade to the latest version 2.4.0p7 apply the new version config but the issue persists.
After activate the debug to -1 can’t see any useful logs anywhere.
tail -f /opt/omd/sites/monitor/var/pnp4nagios/log/perfdata.log
2025-07-08 18:16:36 [6021] [0] RRDs::update /omd/sites/monitor/var/pnp4nagios/perfdata/nas03.licorbeirao.com/_HOST__rta.rrd 1751994972:0.301
2025-07-08 18:16:36 [6021] [0] RRDs::update ERROR rrdcached@unix:/omd/sites/monitor/tmp/run/rrdcached.sock: illegal attempt to update using time 1751994972.000000 when last update time is 1751994991.000000 (minimum one second step)
i have the same log entry as you but this is not the reason for crashing.
It’s unbelievable that this issue is still here since 2.2 for what i understand from foruns online.
In my case, the nagios service started to crash when I added a “Host check command” which was set to “TCP Connect”. Mind you, the check worked, but somehow it crashed nagios afterwards. So I set it to “Always assume host to be up”, which is not ideal, but better than the crashes.
I’m having this problem as well, since the weekend. I am running Checkmk Raw Edition 2.5.0-2025.07.16 via Docker. I think this setup has been upgraded from ver 2.4.something, but that was a couple of weeks ago. I don’t know what would have trigger the behaviour change; possibly the docker daemon was restarted. I only have 1 agent installed, the other VMs are all using ESX only. The VM with the agent was restarted in this period.
The following don’t report anything obviously suspicious:
cmk -U -vvv cmk --debug -vvR
The only suspicious line in the Docker logs is this:
monitoring syslogd: /dev/xconsole: No such file or directory
When I restart either nagios, or the whole checkmk, I see warning emails arrive from each of my dozen monitored VMs - due to missing services - and then the OK message arrives from each of them in term again. This takes about 2 minutes, and then at that point the nagios service goes down.