CheckMK Raw Performance

CMK version: 2.0.0p18 (CRE)
OS version: Debian 10

Hello guys,

since we upgraded from 2.0.0p5 (CRE) to 2.0.0p17 (CRE) our CheckMK isnt running smooth anymore.
Sometimes Nagios crashes (seems because of CPU spikes), high CPU usage which leads to flapping and in general the response time of the web UI is super slow. Takes one to ten seconds until the page loads if you navigate through.
Seems it has to do with distributed Monitoring and the slave sites, cause when we disable the slave sites it is running way more smoother (like normal). Attached a sample config of our slave connections.

The VM has many times high CPU peaks which leads to flapping of all hosts and timeouts to the slave sites.
Attached a screenshot of the CPU usage of the last 6 weeks. We did the upgrade to p17 on the 12th of December.

As you can see since then it does not run smooth. At the points where the CPU goes nearly to zero there the nagios crashed and we needed to restart the server.

Also attached the overview of our hosts and services.

We use distributed monitoring with one master site and 3 slaves.

Hardware of the Linux VM:
Debian 10
12 vCPU
12 GB RAM

Is there anything we can do to achieve more stability and performance?

We already found the official documentation of distributed monitoring:

There it says: “By reducing the status host’s proof interval from the default of sixty seconds to, e.g. five seconds, you can minimize the duration of a hangup”

Where can we adjust this?

Best regards
Tobias

Unbenannt2

Please have a look at this thread and the proposed solution.
It is possible that it is also the problem in your case as it only affects the RAW edition.

Hi Andreas,

sorry didnt see this thread. Thank you very much!

After disabling Encryption for the connection to the slave sites it is running super fast and reactive now.

Big difference!

I will monitor the CPU usage and flapping the next days as well but it seems that this solved the problem.

Have a nice weekend, best regards

Tobias

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.