Hi,
I’ve created a RHEL7 VM with 16Gb RAM and 8 CPU, running version 2.0.0p8. I’ve got everything I need working great apart from the system load. I’ve configured 480 clients monitoring ~19500 services. The server fluctuates between 30 and 50 and did go up to 300 when I did a bulk discovery. This load seems excessive to me. If this is “normal” for the number of services, then I’ll build a distributed cluster. All the agents are version 1.6, so I’m looking at updating these anyway. Any thoughts?
Raw edition with Nagios core and 480 clients with default check interval of 1 minute
480 calls to the precompiled checks and 480 check_icmp per minute
→ ca 1k process creations per minute
now you need to look at the runtime of one “check_mk” service this is the time the system need to query a single client. Normally i assume here between 0,5 and 2 seconds for a normal Linux/Windows system.
That means you need between 500 and 2000 cpu seconds per minute. Now you have 8 CPUs.
The result is between 62 and 250 seconds are needed on your system to get all the work done in 1 minute.
You see in the best case you need more time than you have available.
For cee remove the ping process creation and the rest stays nearly the same.
It is possible that the overall runtime of the check_mk service is lower but also here this depends mostly on the queried systems.
Extreme example with short answer time from agent (0,1second) then you can have a system with 8 cores to monitor around 4 or 5k servers. But if you only monitor switches (3-5seconds answer time) then you will only be happy with a maximum of 200 hosts.
If you can make such an easy calculation also with 2.0 i cannot say until now. Last week i migrated the first bigger systems to 2.0. The next weeks will show if it works also there.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.