Periodic service discovery problems

Blue_Sky · November 2, 2021, 12:18pm

Hi

Check_MK RAW 2.0.0p12 CentOS 7

Hope someone can help or point me in the right direction?

Have set the following:

Global settings Enable regular service discovery checks (deprecated) = unset

Service Rule:
Periodic service discovery = every 2 hours

Problems are:
CPU load goes to 100% and server becomes unresponsive (Number of vCPU is 2)?
Adding vCPU’s make no difference it still goes to 100% even with 8 vCPU’s

After discovery is finished, Check_MK Inventory Fails, reschedule check fixes it.
How do I set the discovery to run the checks on each host after discovery is finished?

LaSoe · November 2, 2021, 1:19pm

Have you tried to split the “Periodic service discovery” rule into multiple rules for different groups like one for all windows and one for all linux Server?

Please keep in mind that only one “Periodic service discovery” rule per server can be active at the same time (first matching rule defines the parameter).

Blue_Sky · November 3, 2021, 3:57pm

Hi

Tried setting the rule to run on each folder in WATO, but still getting high CPU

andreas-doehler · November 3, 2021, 7:49pm

How many hosts are monitored in your system and what type of hosts there are mostly? (server vs. snmp devices)

What is strange is your wrote that you use CMK2 but the screenshot of the load is from a system CMK 1.6 or older.

Blue_Sky · November 3, 2021, 9:31pm

Hi

I have 314 host
25 AWS Instances
91 SNMP
57 linux servers
94 http/https only checks
40 windows server
7 ping only

The v2 setup is a test system, I am monitoring the test system on our live system which is v1.2.6p12

andreas-doehler · November 4, 2021, 9:30am

The runtime of a single discovery is important in your case.
Your have around 200 real hosts and some AWS instances.
If you start every two hours the discovery and all start at the same time (what should only happen after the first activation of discovery) the system triggers these 200 and something active checks to run. This is a significant load.
What is the normal check interval used on your system, the default 1 minute interval?

What i don’t know, as i have no AWS systems, is the normal runtime of the AWS special agent.

If a complete discovery takes between 5 and 10 seconds per host you will need for one discovery run around 1000 and 2000 CPU seconds. The time per service discovery you can measure on the command line with a “time cmk --check-discovery hostname”
2 cores → nearly 10 minutes with 100% usage and also a high load
Only if all checks trigger at the same time.

Blue_Sky · November 4, 2021, 1:15pm

Hi

The average run time for the AWS systems is 3sec
I have the Periodic service discovery in rules, but did not set times on the rule,
I set the normal check interval to 5 mins, to see if that was the high CPU problem,
but would like to get it back to 1 min.
I will try setting a different time for each rule.

Blue_Sky · November 4, 2021, 1:34pm

Is there a way to set a time to run on a rule,
I can only find “Never do discovery or activate changes in the following time ranges”
to run the rule every 2 hours at a set time I would have to add a lot of time ranges?

Blue_Sky · November 8, 2021, 9:09am

Can anyone help with this I get this error?
Trying to set the discovery to run at 1am

tosch · November 8, 2021, 11:42am

The red highlighted field should be 23:59 to fix the error message.

Blue_Sky · November 8, 2021, 12:34pm

Hi

Many thanks, I think I had a bit of a brain block

Blue_Sky · November 24, 2021, 9:50am

Hi

So far so good, have got the CPU load down, but still have something that is running once a day
that is taking the server down?
Do not have any rules set for once a day, cannot see anything in the global settings.
It is something that starts when the site is started, is there a daily something that runs?

system · November 24, 2022, 9:51am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.