Check_MK performance

LaSoe · February 7, 2023, 8:55pm

The overall resource consumption and the number of checks you can run on the Core has improved a lot and the activation of certain changes is now also much faster. The performance of activating our daily work changes (rules, thresholds, etc.) has not really improved from our subjective point of view.

If you only need the OS metrics the Agents do a great job. When you have additional local checks you are quickly over the 1 min default interval because the unixlike Agents still run everything sequentially (and no, async is not a solution, it’s a workaround). In terms of performance, the unixlike Agents have not improved much in the last 5 years. The Windows Agent on the other hand has been rebuilt and in my opinion offers now the better extensibility and control options than the unixlike Agents.

What bothers us the most in our daily work within Checkmk is working with Wato, which has become slower with each release, especially since the introduction of the new GUI with 2.0. After many bug tickets, some things have improved significantly, but they still have a long way to go before we get a truly responsive GUI.

And finally, of course, it should be mentioned that many new functions like the Rest API and check improvments have been added which support an upgrade to 2.x. But before upgrading to a 2.x version, you should test it carefully ;-). We were very grateful that we had a very competent tribe29 employee on call during our major releas upgrades.

This is my personal, subjective observation based on our environment (2.1.cme, 2000 hosts, 200,000 services, 30 sites all running on the same bare metal server)