We are currently experiencing a Linux host having 11k threads compared to other hosts
Can we count on this check as being the actual server load?
Here is a table of hosts and some information about them
All the VMs are NOT under any meaning full load (max of 30%)
Uptime is between 40 & 200 days
Why would this one server be so overloaded?
Hostname
OS
vCPU
Ram
Thread Count
Running VMs
Host 02
XCP-ng 8.0
80
1536 GB
11300
27
Host 03
XCP-ng 8.0
80
2048 GB
1450
25
Host 04
XCP-ng 8.0
80
2048 GB
1348
20
Host 05
Citrix Xenserver 7.1 CU2
80
1536 GB
1916
22
Host 06
Citrix Xenserver 7.1 CU2
31
512 GB
1865
28
The description of the check reads:
Monitor the number of processes and threads. If too many processes
and threads are found then the check results in a warning or critical
state. The default levels are set to {2000} and {4000}.
Author: Mathias Kettner mk@mathias-kettner.de
11k threads with 80 vCPU is not too much i think.
I have here a small 4 core home server with some containers running. Result are 1,2k threads and only a load of 1 and a utilization of 20%.
If i remember it correctly then a bigger Oracle server i have inside one of my monitoring installations has over 20k or 30k threads.
Hi guys… How can I remove this Alarm…
CRIT - Count: 37735 threads (warn/crit at 3000 threads/4000 threads)
Just rebooting the Server or there are commands that needs to be executed in order to remove the threads…?
Please let me know if you will need more information.
Thanks in advance. Best Regards.
First of all, just the mere fact that an OS has a high number of threads, doesn’t necessarily constitute a problem, as Andreas correctly pointed out in his previous reply. This depends on a multitude of factors, e.g., what’s exactly running on that host and is it capable of handling this number of threads well?
So, your very first course of action should be to determine if the service(s) it provides is(are) negatively affected while this high number of threads is being reported by CMK. If it isn’t, whatever is causing this, may be “normal” and you can safely increase the WARN/CRIT threshold for this check. If the machine is indeed “down to its knees”, you need to find out which process(es) is(are) causing this.
So, in short: Simply rebooting the host, is most likely not a permanent solution, if you cannot find out what is causing this behaviour.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact @fayepal if you think this should be re-opened.