Dear Community,
I have two CheckMK instances and today the following issue with Python occurred on both. The interface showed error :
[CheckMK Instance name] Livestatus Error Unhandled exception: 400: Site connection not initiated (Heartbeat timeout after 2.0 sec).
Inside the mkeventd.log file:
"
… StatusServer] Error handling client : [Errno 104] Connection reset by peer
Traceback (most recent call last):
File “/omd/sites/[Instance name]/lib/python/cmk/ec/main.py”, line 3031, in serve
“”)
File “/omd/sites/[Instance name]/lib/python/cmk/ec/main.py”, line 3051, in handle_client
for query in Queries(self, client_socket, self._logger):
File “/omd/sites/[Instance name]/lib/python/cmk/ec/main.py”, line 2511, in next
data = self._socket.recv(4096)
error: [Errno 104] Connection reset by peer
"
Both instances are on different servers. I checked the firewall, but the issue was not caused by it. In WATO > Distributed Monitoring everything looked fine, no errors appeared there. I also checked the logs from site backups and activating different changes, but no actions were preformed there before the issue.
In liveproxyd.log events are saved after I restarted the instance, but without any specific information:
2021-12-20 08:35:22,842 [20] [cmk.liveproxyd.(3445011).Manager] Got signal 15. Initiating shutdown…
2021-12-20 08:35:22,863 [20] [cmk.liveproxyd.(3445011).Manager] Good bye.
2021-12-20 08:35:22,865 [20] [cmk.liveproxyd] Successfully shut down.
2021-12-20 08:35:47,909 [20] [cmk.liveproxyd] ----------------------------------------------------------
2021-12-20 08:35:47,910 [20] [cmk.liveproxyd] Livestatus Proxy-Daemon (1.6.0p19) starting…
2021-12-20 08:35:47,911 [20] [cmk.liveproxyd] Configured 0 sites
Could you please give me a hint from where the problem was caused? After restart it fixes, but it is not a long-term solution.
Thank you in advance!
Best Regards,
Elena