Server hung state monitoring

Hi,

Is there any way we can monitor server hung state in Check_mk,

How exactly does that “hung” state look like?

For example, I’ve had Linux VMs lose their disk access due to some failure of the underlying storage system. The kernel was still responding to pings, so Checkmk still saw the host as up. But the checkmk agent couldn’t run anymore and produced a timeout for the “Check_MK” service. So for this kind of failure, make sure you send notifications for the “Check_MK” service as well.

im looking for windows devices

And how does that “hung state” look like for a Windows system? Probably causes agent timeouts just like for a Linux system. So again, you would notice this from timeouts on the “Check_MK” service.

“Server Hung” is too vague a description. Some windows device hung states are characterized by no network access. (No ping, no remote login, etc). Others by no or very slow login or application issues. No ping is easy to monitor, but if the server is thrashing due to resource issues (memory, disk, network io), the answer may not be as simple. If you have access to a server in what you are calling a hung state, note the symptoms. If not, you can examine the server data after the server is back up for hints. Checkmk data, logs, etc.
Once you identify a repeatable symptom, you may be able to monitor that component directly, or write a script to check it for you.

The server is pingable but they are not able to login to the servers.
Mostly these are part of Auto shutdown for Azure

There is a check for Nagios if RDP sessions can be established successfully according to x224 which you can add as classical Nagios plugin :


We have it running here exact for this purpose (mainly 2012R2), but it seems that it does not work properly with Windows Server 2016/2019

BR Thomas

we have it working from past few days it is created as host tag not sure from past few days i don’t see any RDP check available even when we select the host tag as windows

Try to map the check explicitly to one host for testing first.

BR

i did it worked previously now it is not working not sure about the reason

i did it worked previously now it is not working not sure about the reason

do you have any documentation on how to install this in check_mk environment

Hi
pls refer to the official guide: https://checkmk.com/cms.html
If you mean the plugin i mentioned at the beginning, you must install it in the Nagios plugin folder of your Checkmk site and then create a rule in “Active Checks” -> " Classical active and passive Monitoring checks"

BR Thomas