check-mk-agent-async.service with 600 Seconds subfolder - systemd
# Created by Check_MK Agent Bakery.
# This file is managed via WATO, do not edit manually or you
# lose your changes next time when you update the agent.
# Connection and authentication
DBUSER='/:::::'
DBUSER_cdb1='c##checkmk:*::localhost:1521:cdb1'
DBUSER_cdb2='c##checkmk:*::localhost:1522:cdb2'
ASMUSER='/::SYSASM:::'
# Sections to run in foreground and wait for the result
SYNC_SECTIONS='instance performance systemparameter processes sessions longactivesessions logswitches undostat recovery_area recovery_status dataguard_stats locks'
# Sections to run in the background, at a slower interval cached
ASYNC_SECTIONS='tablespaces rman jobs resumable'
# Sections to run in foreground for ASM
SYNC_ASM_SECTIONS='instance processes'
# Sections to run in the background for ASM
ASYNC_ASM_SECTIONS='asm_diskgroup'
# Sections disabled for some selected SIDs
EXCLUDE_cdb1='jobs'
EXCLUDE_cdb2='jobs'
# Cache time (i.e. check interval) for async sections
CACHE_MAXAGE=600
As part of this werk, there was an additional step for systemd-based systems that you need to perform regarding mk_oracle.
Ideally, you only need to move mk_oracle from /usr/lib/check_mk_agent/mk_oracle to /usr/lib/check_mk_agent//mk_oracle. The recommended number as per the werk is 60.
In your case, you have chosen 300. I have some more questions:
Are there any other plugins/local checks executed by the agent ?
What is the normal check interval/retry check interval of the Check_MK service check of this host?
Any “Service check timeout” rule defined for the Check_Mk service check ?
Can you share the output of: ls -l /var/lib/check_mk_agent/cache ?
Is it possible to reproduce the problem with 2 CDB’s running on a single server ?
There is no reason for 300 seconds (to not produce to much traffic mayby).
we defined no specific retry intervall for this service.
1 machine with one Database and 3 Container
1 machine with two Databases and 2 Container each
1 machine with three Databases and 2 Container each
[root@oralx3 300]# ls /var/lib/check_mk_agent/cache/ -la
total 28
drwxr-xr-x. 2 root root 212 May 12 13:52 .
drwxr-xr-x. 9 root root 4096 May 12 13:16 ..
-rw-r--r-- 1 root root 472 May 12 13:52 chrony.cache
-rw-r--r-- 1 root root 7016 May 12 13:48 oracle_cdb1.cache
-rw-r--r-- 1 root root 3495 May 8 15:10 oracle_cdb2.cache
-rw-r--r-- 1 root root 71 May 12 13:48 oracle_cdb2.cache.fail
-rw-r--r-- 1 root root 0 May 12 13:48 oracle_cdb2.cache.new.1415344
-rw-r--r-- 1 root root 392 May 12 13:16 plugins_cmk-update-agent.cache
-rw-r--r-- 1 root root 0 May 12 13:48 plugins_mk_oracle.cache
I changed the value from 300 to 60, but the error is still showing up.
As a test. can you move the oracle_cdb* and plugins_mk_oracle* files to another folder and restart the cmk-agent-ctl-daemon.service and see the content of this folder /var/lib/check_mk_agent/cache after 2-3 check intervals ?
rw-r–r-- 1 root root 472 May 16 07:00 chrony.cache
rw-r–r-- 1 root root 392 May 16 07:00 plugins_cmk-update-agent.cache
I updated to the latest version with no success.
Its clear, the last DB in the list fails.
On the server with one DB, this fails
On the server with two DBs, the 2nd fails
on the server with three DBs, the 3rd fails.
So there is probably missing something at the end.
I am seeing the same problem with 2.4.0p1 against 23ai database with 1 CDB containing 1 PDB.
It seems to be related to the async checks only. I moved the “tablespaces rman resumable locks” checks to the sync section and it seems to be working fine.
Running the individual check SQLs performs very quick. It seems to be related to the plugin not being able to read the async generated cache files.
Good to hear, that you have a working monitoring again.
I will follow-up inhouse, whether the original issue is a topic to be fixed in source code and/or documentation.
I also have the problem with Oracle 23ai and the async checks from the mk_oracle plugin.
Master 2.1.0p43
Slave 2.2.0p41
Host is attached on slave
Host configuration:
Agent 2.2.0p41 with systemd
mk_oracle plugin version 2.2.0p41
mk_oracle plugin = /usr/lib/check_mk_agent/plugins/60
I can’t put all the services into SYNC because then mk_oracle doesn’t run.
With ASYNC, I see in /var/lib/check_mk_agent/cache that there’s a database cache file there. After 10 minutes, a cache file with .new is created, and about 12 minutes later, the cache file .new is transferred to the other cache file. The ASYNC services are then in the Stale state for 10 minutes.
When can we expect a fix for this bug? It’s very important for us because many customers are switching to 23ai.
This is a bug. We can reproduce it locally. Moving to SYNC is not a solution, maybe a temporary workaround. The ASYNC section should work as well. We are looking into it. More updates will follow soon.
We have deployed a second PDB in the CDB for a customer, and we can no longer run ASNC checks. The workaround of setting all checks to SYNC no longer works either. We do not use custom checks.
When can we expect a solution to this problem? This database is in production and highly critical.