Understanding check_mk-cpu.threads

Dear forum.

We are currently experiencing a Linux host having 11k threads compared to other hosts
Can we count on this check as being the actual server load?

Here is a table of hosts and some information about them
All the VMs are NOT under any meaning full load (max of 30%)
Uptime is between 40 & 200 days

Why would this one server be so overloaded?

Hostname OS vCPU Ram Thread Count Running VMs
Host 02 XCP-ng 8.0 80 1536 GB 11300 27
Host 03 XCP-ng 8.0 80 2048 GB 1450 25
Host 04 XCP-ng 8.0 80 2048 GB 1348 20
Host 05 Citrix Xenserver 7.1 CU2 80 1536 GB 1916 22
Host 06 Citrix Xenserver 7.1 CU2 31 512 GB 1865 28

The description of the check reads:

Monitor the number of processes and threads. If too many processes
and threads are found then the check results in a warning or critical
state. The default levels are set to {2000} and {4000}.
Author: Mathias Kettner mk@mathias-kettner.de

11k threads with 80 vCPU is not too much i think.
I have here a small 4 core home server with some containers running. Result are 1,2k threads and only a load of 1 and a utilization of 20%.
If i remember it correctly then a bigger Oracle server i have inside one of my monitoring installations has over 20k or 30k threads.

Hi guys… How can I remove this Alarm…
CRIT - Count: 37735 threads (warn/crit at 3000 threads/4000 threads)
Just rebooting the Server or there are commands that needs to be executed in order to remove the threads…?

Please let me know if you will need more information.
Thanks in advance. Best Regards.

Hi @brauliom and welcome to the forum,

First of all, just the mere fact that an OS has a high number of threads, doesn’t necessarily constitute a problem, as Andreas correctly pointed out in his previous reply. This depends on a multitude of factors, e.g., what’s exactly running on that host and is it capable of handling this number of threads well?

So, your very first course of action should be to determine if the service(s) it provides is(are) negatively affected while this high number of threads is being reported by CMK. If it isn’t, whatever is causing this, may be “normal” and you can safely increase the WARN/CRIT threshold for this check. If the machine is indeed “down to its knees”, you need to find out which process(es) is(are) causing this.

So, in short: Simply rebooting the host, is most likely not a permanent solution, if you cannot find out what is causing this behaviour.

HTH,
Thomas

1 Like

Thank you very much Thomas. Yes. You’re right … If there is a problem caused by these amount of threads I will request a mw to reboot the machine.

Hi Braulio,

Umm… that is not what I said, but… you’re welcome. :slight_smile:

Thomas

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact @fayepal if you think this should be re-opened.