Periodic service discovery problems

Hi

Check_MK RAW 2.0.0p12 CentOS 7

Hope someone can help or point me in the right direction?

Have set the following:

Global settings Enable regular service discovery checks (deprecated) = unset

Service Rule:
Periodic service discovery = every 2 hours

Problems are:
CPU load goes to 100% and server becomes unresponsive (Number of vCPU is 2)?
Adding vCPU’s make no difference it still goes to 100% even with 8 vCPU’s

After discovery is finished, Check_MK Inventory Fails, reschedule check fixes it.
How do I set the discovery to run the checks on each host after discovery is finished?

Have you tried to split the “Periodic service discovery” rule into multiple rules for different groups like one for all windows and one for all linux Server?

Please keep in mind that only one “Periodic service discovery” rule per server can be active at the same time (first matching rule defines the parameter).

Hi

Tried setting the rule to run on each folder in WATO, but still getting high CPU
image

How many hosts are monitored in your system and what type of hosts there are mostly? (server vs. snmp devices)

What is strange is your wrote that you use CMK2 but the screenshot of the load is from a system CMK 1.6 or older.

Hi

I have 314 host
25 AWS Instances
91 SNMP
57 linux servers
94 http/https only checks
40 windows server
7 ping only

The v2 setup is a test system, I am monitoring the test system on our live system which is v1.2.6p12

The runtime of a single discovery is important in your case.
Your have around 200 real hosts and some AWS instances.
If you start every two hours the discovery and all start at the same time (what should only happen after the first activation of discovery) the system triggers these 200 and something active checks to run. This is a significant load.
What is the normal check interval used on your system, the default 1 minute interval?

What i don’t know, as i have no AWS systems, is the normal runtime of the AWS special agent.

If a complete discovery takes between 5 and 10 seconds per host you will need for one discovery run around 1000 and 2000 CPU seconds. The time per service discovery you can measure on the command line with a “time cmk --check-discovery hostname”
2 cores → nearly 10 minutes with 100% usage and also a high load
Only if all checks trigger at the same time.

Hi

The average run time for the AWS systems is 3sec
I have the Periodic service discovery in rules, but did not set times on the rule,
I set the normal check interval to 5 mins, to see if that was the high CPU problem,
but would like to get it back to 1 min.
I will try setting a different time for each rule.

Is there a way to set a time to run on a rule,
I can only find “Never do discovery or activate changes in the following time ranges”
to run the rule every 2 hours at a set time I would have to add a lot of time ranges?

Can anyone help with this I get this error?
Trying to set the discovery to run at 1am

image

image

The red highlighted field should be 23:59 to fix the error message.

Hi

Many thanks, I think I had a bit of a brain block :man_facepalming:

Hi

So far so good, have got the CPU load down, but still have something that is running once a day
that is taking the server down?
Do not have any rules set for once a day, cannot see anything in the global settings.
It is something that starts when the site is started, is there a daily something that runs?

image