Timotheus
(Timotheus)
March 7, 2022, 9:03am
1
CMK version:
2.0.0p20
OS version:
SLES 12.5
Recently CPU Load (average last 15) on my CMK host is up significantly. Watching Top on the console I see a large number of Python 3 tasks poping up ever minute. It looks like this:
top - 19:26:08 up 23 days, 9:16, 1 user, load average: 11.12, 11.52, 10.32
Tasks: 297 total, 45 running, 252 sleeping, 0 stopped, 0 zombie
%Cpu(s): 92.4 us, 7.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
KiB Mem: 4031712 total, 3462604 used, 569108 free, 198388 buffers
KiB Swap: 2103292 total, 104704 used, 1998588 free. 1859820 cached Mem
4010 monitor 20 0 46184 22452 8404 R 2.658 0.557 0:00.13 python3
4086 monitor 20 0 40440 18372 7740 R 2.658 0.456 0:00.08 python3
4003 monitor 20 0 46160 22344 8404 R 2.326 0.554 0:00.13 python3
4006 monitor 20 0 46184 22452 8400 R 2.326 0.557 0:00.12 python3
4008 monitor 20 0 46160 22516 8576 R 2.326 0.558 0:00.12 python3
4012 monitor 20 0 46160 22580 8640 R 2.326 0.560 0:00.12 python3
4013 monitor 20 0 46160 22536 8600 R 2.326 0.559 0:00.12 python3
4014 monitor 20 0 46156 22640 8700 R 2.326 0.562 0:00.12 python3
4018 monitor 20 0 46160 22548 8612 R 2.326 0.559 0:00.12 python3
4031 monitor 20 0 46216 22648 8712 R 2.326 0.562 0:00.12 python3
4032 monitor 20 0 45880 22472 8568 R 2.326 0.557 0:00.12 python3
4033 monitor 20 0 46160 22384 8452 R 2.326 0.555 0:00.13 python3
4034 monitor 20 0 45884 22696 8792 R 2.326 0.563 0:00.12 python3
4035 monitor 20 0 46160 22388 8452 R 2.326 0.555 0:00.12 python3
4036 monitor 20 0 45864 22456 8600 R 2.326 0.557 0:00.12 python3
4037 monitor 20 0 45884 22520 8616 R 2.326 0.559 0:00.12 python3
4038 monitor 20 0 45884 22512 8608 R 2.326 0.558 0:00.12 python3
Looking at the Graph for CPU load I see that this started 2 days ago and jumped significantly yesterday evening. I am not aware of any changes to the system that might have caused this.
I was running CMK 1.6 on this server until a few weeks ago. The CPU Load was typically under 0.5. It has been up slightly after upgrading to CMK 2.0, but still well under 1.0.
I would be grateful for pointers on finding the root of this behavior.
Hi,
please press “c” while running top. It will reveal to command line of those python3 processes.
1 Like
Timotheus
(Timotheus)
March 7, 2022, 4:45pm
3
Hi,
Thanks for your response. The python processes all look like this:
/omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/zSW-JTr
I have 88 files in that directory. Two for each host. Are these defining SNMP checks?
Hi these are defining not only SNMP hosts, but all hosts. But it doesn’t really matter, as two files per host are totally expected.
Can you post a screenshot of your top, after you pressed “c” (command line) and “1” (1 line per core with CPU details).
Timotheus
(Timotheus)
March 8, 2022, 6:53am
5
Good morning!
I hope this reveals more:
top - 07:50:14 up 1 day, 12:22, 1 user, load average: 5.13, 4.46, 4.35
Tasks: 256 total, 23 running, 232 sleeping, 0 stopped, 1 zombie
%Cpu0 : 68.4 us, 30.2 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 1.3 si, 0.0 st
KiB Mem: 4031712 total, 3701692 used, 330020 free, 157772 buffers
KiB Swap: 2103292 total, 45824 used, 2057468 free. 2602144 cached Mem
20058 monitor 20 0 51404 27724 9716 R 4.934 0.688 0:00.20 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/zSW-MH-Pforte
20043 monitor 20 0 55952 30328 9728 R 4.605 0.752 0:00.24 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/Benjamin
20033 monitor 20 0 139580 37988 10952 S 3.947 0.942 0:00.38 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/Arbel
20041 monitor 20 0 57180 31560 9724 R 3.947 0.783 0:00.30 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/T-SBC
20173 monitor 20 0 45584 22064 8552 R 3.947 0.547 0:00.12 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/USV-2
20038 monitor 20 0 139584 37832 10792 S 3.289 0.938 0:00.37 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/Golan
20013 monitor 20 0 60132 34836 9808 R 2.632 0.864 0:00.35 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/Yaffo
20015 monitor 20 0 56804 31432 9568 R 2.632 0.780 0:00.34 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/zSW-JTr
20016 monitor 20 0 56668 31456 9568 R 2.632 0.780 0:00.34 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/zSW-JRK
20034 monitor 20 0 60428 35188 9804 R 2.632 0.873 0:00.35 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/Haifa
1 root 20 0 42160 5664 4268 S 2.303 0.140 2:38.97 /usr/lib/systemd/systemd --switched-root --system --deserialize 23
20011 monitor 20 0 57064 31708 9740 R 2.303 0.786 0:00.33 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/zSW-HGG
20012 monitor 20 0 57624 32180 9740 R 2.303 0.798 0:00.33 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/zSW-JFre
20031 monitor 20 0 57512 32068 9876 S 1.974 0.795 0:00.32 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/Asaph
20030 monitor 20 0 57112 31820 9876 S 1.645 0.789 0:00.30 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/zSW-JFri-2
20352 monitor 20 0 33824 12996 5664 R 1.645 0.322 0:00.05 python3 /omd/sites/monitor/share/check_mk/agents/special/agent_vsphere -u vmonitor -s=6I%=B5D0&7jf3v!+CXgZ -i hostsystem,datastore,counters --direct --hostname Golan -P --spaces cut --no-cert-check xxx.xxx.xxx.xxx
20029 monitor 20 0 57200 31776 9824 S 1.316 0.788 0:00.30 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/zSW-JWh
20032 monitor 20 0 57064 31688 9824 S 1.316 0.786 0:00.29 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/zSW-MH
20357 monitor 20 0 33424 12844 5756 R 1.316 0.319 0:00.04 python3 /omd/sites/monitor/share/check_mk/agents/special/agent_vsphere -u vmonitor -s=6I%=B5D0&7jf3v!+CXgZ -i hostsystem,datastore,counters --direct --hostname Arbel -P --spaces cut --no-cert-check xxx.xxx.xxx.xxx
7 root 20 0 0 0 0 R 0.329 0.000 1:05.35 [ksoftirqd/0]
8 root 20 0 0 0 0 R 0.329 0.000 1:38.25 [rcu_sched]
1289 root 20 0 8776 1688 1628 S 0.329 0.042 0:00.28 /usr/sbin/xinetd -stayalive -dontfork
1601 monitor 20 0 64264 9984 880 S 0.329 0.248 5:26.79 /omd/sites/monitor/bin/redis-server *:0
20039 monitor 20 0 58432 33024 9948 S 0.329 0.819 0:00.28 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/Ephraim
20409 monitor 20 0 22084 7960 4624 R 0.329 0.197 0:00.01 /omd/sites/monitor/bin/python3 /omd/sites/monitor/var/check_mk/core/helper_config/latest/host_checks/Eran
32668 monitor 20 0 697304 12268 5268 S 0.329 0.304 1:29.15 /omd/sites/monitor/bin/nagios -ud /omd/sites/monitor/tmp/nagios/nagios.cfg
2 root 20 0 0 0 0 S 0.000 0.000 0:00.00 [kthreadd]
4 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 [kworker/0:0H]
6 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 [mm_percpu_wq]
9 root 20 0 0 0 0 S 0.000 0.000 0:00.00 [rcu_bh]
10 root rt 0 0 0 0 S 0.000 0.000 0:00.00 [migration/0]
You just have one CPU. That’s probably a bad idea for any monitoring system, as a monitoring is somewhat massive-parallel. So I’m not surprised about the load.
Timotheus
(Timotheus)
March 11, 2022, 3:52pm
7
Hi,
That was a change I made some weeks ago when doing VSphere tuning. The advice I received was to reduce all VMs to 1 CPU to minimize CPU over booking. But it was understood that you need to watch and see what works. For a week or two things looked fine and then started to go sour. That is why I did not associate it with the CPU count right away.
Just this afternoon I needed to reboot because of a kernel update and decided to see if adding a CPU would help. It has been 2+ hours now and things are looking good. I will continue to observe the CPU Load over time and if things stay normal I will mark your post as the solution.
Thanks for your assistance!
Have a nice weekend!
I have a history as VMware consultant and I have to say: Sorry, this is a very bad advice…
Modern OS all expect to be multicore. If you just provide 1 CPU, well, you serialized your workload.
2 Likes
Timotheus
(Timotheus)
March 11, 2022, 5:34pm
9
Your advice is much appreciated.
system
(system)
Closed
March 11, 2023, 5:35pm
10
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.