Mk_oracle "Timed out plugin" for all CDBs starting with 2.4.0

holgermanz · May 12, 2025, 8:02am

2.4.0
latest Appliance

Agent plug-ins: 5,local checks: 1, Timed out Plugin(s):…

There was Topic in the past

with the same problem. The solution to set
MK_ORA_DEBUG=true
MK_ORA_LOGGING=true
is working also with 2.4.

We never had this problem in the past und die workaround is not realy good (after baking new Agents, i have manually to change the file again).

VG Holger

chauhan_sudhir · May 12, 2025, 9:51am

I have some questions:

On which OS version you have deployed the agent ?
Is the Checkmk agent deployed as systemd or xinetd service ?
Can you share the /etc/check_mk/mk_oracle.cfg (ofcourse after masking the credentials) ?

holgermanz · May 12, 2025, 11:07am

Hello,

Oracle Linux 8.5 and Oracle Linux 8.8
check-mk-agent-async.service with 600 Seconds subfolder - systemd

# Created by Check_MK Agent Bakery.
# This file is managed via WATO, do not edit manually or you
# lose your changes next time when you update the agent.

# Connection and authentication
DBUSER='/:::::'
DBUSER_cdb1='c##checkmk:*::localhost:1521:cdb1'
DBUSER_cdb2='c##checkmk:*::localhost:1522:cdb2'
ASMUSER='/::SYSASM:::'

# Sections to run in foreground and wait for the result
SYNC_SECTIONS='instance performance systemparameter processes sessions longactivesessions logswitches undostat recovery_area recovery_status dataguard_stats locks'

# Sections to run in the background, at a slower interval cached
ASYNC_SECTIONS='tablespaces rman jobs resumable'

# Sections to run in foreground for ASM
SYNC_ASM_SECTIONS='instance processes'

# Sections to run in the background for ASM
ASYNC_ASM_SECTIONS='asm_diskgroup'

# Sections disabled for some selected SIDs
EXCLUDE_cdb1='jobs'
EXCLUDE_cdb2='jobs'

# Cache time (i.e. check interval) for async sections
CACHE_MAXAGE=600

chauhan_sudhir · May 12, 2025, 11:10am

check-mk-agent-async.service with 600 Seconds subfolder - systemd

So, the mk_oracle is placed like this: /usr/lib/check_mk_agent/plugins/600/mk_oracle ?

holgermanz · May 12, 2025, 11:17am

/usr/lib/check_mk_agent/plugins/300/mk_oracle

checkmk Agent rules>Oracle Databases
“Host uses systemd, Interval 5 Minutes.”

chauhan_sudhir · May 12, 2025, 11:41am

As part of this werk, there was an additional step for systemd-based systems that you need to perform regarding mk_oracle.

Ideally, you only need to move mk_oracle from /usr/lib/check_mk_agent/mk_oracle to /usr/lib/check_mk_agent//mk_oracle. The recommended number as per the werk is 60.

In your case, you have chosen 300. I have some more questions:

Are there any other plugins/local checks executed by the agent ?
What is the normal check interval/retry check interval of the Check_MK service check of this host?
Any “Service check timeout” rule defined for the Check_Mk service check ?
Can you share the output of: ls -l /var/lib/check_mk_agent/cache ?
Is it possible to reproduce the problem with 2 CDB’s running on a single server ?

holgermanz · May 12, 2025, 12:26pm

There is no reason for 300 seconds (to not produce to much traffic mayby).
we defined no specific retry intervall for this service.
1 machine with one Database and 3 Container
1 machine with two Databases and 2 Container each
1 machine with three Databases and 2 Container each

[root@oralx3 300]# ls /var/lib/check_mk_agent/cache/ -la
total 28
drwxr-xr-x. 2 root root  212 May 12 13:52 .
drwxr-xr-x. 9 root root 4096 May 12 13:16 ..
-rw-r--r--  1 root root  472 May 12 13:52 chrony.cache
-rw-r--r--  1 root root 7016 May 12 13:48 oracle_cdb1.cache
-rw-r--r--  1 root root 3495 May  8 15:10 oracle_cdb2.cache
-rw-r--r--  1 root root   71 May 12 13:48 oracle_cdb2.cache.fail
-rw-r--r--  1 root root    0 May 12 13:48 oracle_cdb2.cache.new.1415344
-rw-r--r--  1 root root  392 May 12 13:16 plugins_cmk-update-agent.cache
-rw-r--r--  1 root root    0 May 12 13:48 plugins_mk_oracle.cache

I changed the value from 300 to 60, but the error is still showing up.

chauhan_sudhir · May 12, 2025, 12:38pm

As a test. can you move the oracle_cdb* and plugins_mk_oracle* files to another folder and restart the cmk-agent-ctl-daemon.service and see the content of this folder /var/lib/check_mk_agent/cache after 2-3 check intervals ?

Also, the agent + mk_oracle is 2.4 ?

holgermanz · May 15, 2025, 8:20am

The Agent is also 2.4.
When removing the plugin, the cache directory only contains:
chrony.cache
plugins_cmk-update-agent.cache

nothing else.

chauhan_sudhir · May 15, 2025, 1:07pm

Can you share the output : ls -l /var/lib/check_mk_agent/cache ?

holgermanz · May 16, 2025, 5:04am

rw-r–r-- 1 root root 472 May 16 07:00 chrony.cache
rw-r–r-- 1 root root 392 May 16 07:00 plugins_cmk-update-agent.cache

I updated to the latest version with no success.

Its clear, the last DB in the list fails.
On the server with one DB, this fails
On the server with two DBs, the 2nd fails
on the server with three DBs, the 3rd fails.

So there is probably missing something at the end.

burgeau · May 18, 2025, 12:49am

I am seeing the same problem with 2.4.0p1 against 23ai database with 1 CDB containing 1 PDB.

It seems to be related to the async checks only. I moved the “tablespaces rman resumable locks” checks to the sync section and it seems to be working fine.

Running the individual check SQLs performs very quick. It seems to be related to the plugin not being able to read the async generated cache files.

holgermanz · May 22, 2025, 8:55am

Thank you Mark, after changing async checks to sync checks everything is working again.

HartmutLeister · May 22, 2025, 11:29am

Hello @holgermanz,

Good to hear, that you have a working monitoring again.
I will follow-up inhouse, whether the original issue is a topic to be fixed in source code and/or documentation.

Sunny Greetings
Hartmut

michael_kauschke · June 2, 2025, 7:47am

Hello @HartmutLeister,

I also have the problem with Oracle 23ai and the async checks from the mk_oracle plugin.
Master 2.1.0p43
Slave 2.2.0p41
Host is attached on slave
Host configuration:
Agent 2.2.0p41 with systemd
mk_oracle plugin version 2.2.0p41
mk_oracle plugin = /usr/lib/check_mk_agent/plugins/60
I can’t put all the services into SYNC because then mk_oracle doesn’t run.
With ASYNC, I see in /var/lib/check_mk_agent/cache that there’s a database cache file there. After 10 minutes, a cache file with .new is created, and about 12 minutes later, the cache file .new is transferred to the other cache file. The ASYNC services are then in the Stale state for 10 minutes.
When can we expect a fix for this bug? It’s very important for us because many customers are switching to 23ai.

Regards,
Michael

chauhan_sudhir · June 3, 2025, 3:27pm

This is a bug. We can reproduce it locally. Moving to SYNC is not a solution, maybe a temporary workaround. The ASYNC section should work as well. We are looking into it. More updates will follow soon.

marcel.arentz · June 4, 2025, 12:30pm

For all affected people in this thread: Is this (only) happening on instances where you also executing using custom sql queries?

tkriener · June 11, 2025, 5:39pm

I have no custom sql query and with 2.4.0p3 and Oracle 19c the same issue.

marcel.arentz · June 12, 2025, 7:50am

Thank you for confirmation. We’re still investigating the problem.

michael_kauschke · June 16, 2025, 12:03pm

We have deployed a second PDB in the CDB for a customer, and we can no longer run ASNC checks. The workaround of setting all checks to SYNC no longer works either. We do not use custom checks.

When can we expect a solution to this problem? This database is in production and highly critical.