I need help figuring out how to change the time interval at which my Checkmk instance retrieves information about the monitored services.
My current goal is to keep the monitoring interval particularly low (30s / 1min) for core agent operations and host reachability (I think the Check_MK service is sufficient, but I’d like to ask for your advice), while increasing the threshold for less critical services such as performance monitoring (CPU, memory, network Rx/Tx, etc.).
I am currently using the latest version of Checkmk Community (2.5.0), hosted in a Docker container, the host VM is a machine based on the Ubuntu Server 24.04 cloud image.
I’m monitoring 17 hosts with about 300 services, even though the VM is very small (2 cores and 4 GB of RAM on a Proxmox host, which is a mini-PC with an Intel N100 processor), the load on the machine is very high, and I can’t get the service to run smoothly enough.
Most of the services come from agents installed on Linux VMs, in some cases they use API integrations for my PVE hosts, and some hosts are monitored via SNMP (for network devices and HPE iLO).
Although I have several years of experience with other network and infrastructure monitoring platforms, I’m still quite new to Checkmk and really keen to learn, so I apologize in advance if I’ve used any terms incorrectly!
Thank you to everyone who wants to participate and lend a hand!
As you wrote most of the data is retrieved from.an agent, be it the Checkmk agent on Linux or Windows, an SNMP agent on network devices or a special agent for other systems.
All of the agents send all the data they can gather at once. This is makes it very efficient and allows automatic service discovery.
You cannot configure a different check interval for the services where an agent provides the data. They all run in the agent interval.
Only active checks (like for HTTP, SMTP or LDAP services) run independently and can be configured with their own check interval.
Here i would pay attention to the configuration of you monitoring core (Nagios). Nagios core inside Docker container is not so good. If you are already on a Proxmox host i would run the CMK as an LXC directly on Proxmox and not inside a container thats inside a VM.
If you habe only two core i would recommend to set the concurrent checks to a value of 2 or maximum 4 inside the Nagios core config.
Good day Robert, thank you so much for the details and the explanation.
My monitoring is entirely based on agents and SNMP, so from what I understand, what I was trying to do is simply not feasible given Checkmk’s architecture…
As mentioned in my preavious message, my request stems from the need to make Checkmk monitoring more efficient, with the goal of running the service adequately on a very lightweight virtual machine.
At the moment, even though the monitoring doesn’t exceed 300 services across about 20 hosts, the load, especially in terms of CPU usage, on the machine is extremely high: is there any tweak I can apply to make the monitoring more efficient?