Server hung state monitoring

martin.schwarz · October 28, 2020, 5:19pm

How exactly does that “hung” state look like?

For example, I’ve had Linux VMs lose their disk access due to some failure of the underlying storage system. The kernel was still responding to pings, so Checkmk still saw the host as up. But the checkmk agent couldn’t run anymore and produced a timeout for the “Check_MK” service. So for this kind of failure, make sure you send notifications for the “Check_MK” service as well.