Mk_oracle plugin problem

CMK version: 2.0.0p19
OS version Agent: CentOS Linux release 7.9.2009 (Core)
OS version Server: AlmaLinux release 8.5 (Arctic Sphynx)

when running the plugin on command line the output is as expected, takes only a second, and the exit status 0

MK_CONFDIR=/etc/check_mk MK_VARDIR=/var/lib/check_mk_agent /usr/lib/check_mk_agent/plugins/mk_oracle

it seems the plugin does not provide any output when executed via check-mk-agent. there is only an empty file /var/lib/check_mk/cache/oracle_SID.cache.new created

If CACHE_MAXAGE=0 in /etc/check_mk/mk_oracle.cfg is set then it works, but i think then no async processing ist done.

Anyone seen this problem?

Hi @uleodolter,

how does the output of MK_CONFDIR=/etc/check_mk /usr/lib/check_mk_agent/plugins/mk_oracle -t (sensitive data included) looks like?
I never have set the variable for the vardir on this call before, just setting the confdir.

Any output on the agent call related to the mk_oracle plugin or just no oracle sections at all?

When running just the connection test (option -t) then i get empty sections, i think this is intended. When mk_oracle is started without any option or option -l then i get full output of oracle. There is only a problem when this is executed via agent.

# MK_CONFDIR=/etc/check_mk /usr/lib/check_mk_agent/plugins/mk_oracle -t
<<<oracle_instance>>>
<<<oracle_sessions>>>
<<<oracle_logswitches>>>
<<<oracle_undostat>>>
<<<oracle_recovery_area>>>
<<<oracle_processes>>>
<<<oracle_recovery_status>>>
<<<oracle_longactivesessions>>>
<<<oracle_dataguard_stats>>>
<<<oracle_performance>>>
<<<oracle_locks>>>
<<<oracle_systemparameter>>>
<<<oracle_tablespaces>>>
<<<oracle_rman>>>
<<<oracle_jobs>>>
<<<oracle_resumable>>>
<<<oracle_iostats>>>
<<<oracle_instance>>>
<<<oracle_processes>>>
<<<oracle_asm_diskgroup>>>

---login----------------------------------------------------------------
    Operating System:       Linux
    ORACLE_HOME (oratab):   /opt/app/oracle/product/12r1
    Logincheck to Instance: sid1
    Version:                12.1
    Login ok User:	    CHECKMK on hostname Instance sid1
    SYNC_SECTIONS:          instance
sessions
logswitches
undostat
recovery_area
processes
recovery_status
longactivesessions
dataguard_stats
performance
locks
systemparameter
    ASYNC_SECTIONS:         tablespaces
rman
resumable
iostats
------------------------------------------------------------------------

maybe you have to export the MK_* env so that they can also be used by the async processes

export MK_CONFDIR=/etc/check_mk; export MK_VARDIR=/var/lib/check_mk_agent; ...

The output on command line looks like this (only the beginning, there is a lot more information performance and tablespaces:

# /usr/lib/check_mk_agent/plugins/mk_oracle
<<<oracle_instance>>>
<<<oracle_sessions>>>
<<<oracle_logswitches>>>
<<<oracle_undostat>>>
<<<oracle_recovery_area>>>
<<<oracle_processes>>>
<<<oracle_recovery_status>>>
<<<oracle_longactivesessions>>>
<<<oracle_dataguard_stats>>>
<<<oracle_performance>>>
<<<oracle_locks>>>
<<<oracle_systemparameter>>>
<<<oracle_tablespaces>>>
<<<oracle_rman>>>
<<<oracle_jobs>>>
<<<oracle_resumable>>>
<<<oracle_iostats>>>
<<<oracle_instance>>>
<<<oracle_processes>>>
<<<oracle_asm_diskgroup>>>
<<<oracle_instance:sep(124)>>>
PRM4|12.1.0.2.0|OPEN|ALLOWED|STARTED|3617138|2791881398|ARCHIVELOG|PRIMARY|NO|PRM4|080820170947|FALSE|0||0|||||-1|0
<<<oracle_sessions:sep(124)>>>
PRM4|48|776|-1
<<<oracle_logswitches:sep(124)>>>
PRM4|1
<<<oracle_undostat:sep(124)>>>
PRM4|272|3|2030|1250|0
<<<oracle_recovery_area:sep(124)>>>
<<<oracle_processes:sep(124)>>>
PRM4|81|500
...

Hi @uleodolter,

so in general the plugin doesn’t have a problem.
Can you show the content of your plugin directory and cache directory. Just want to see if there is something wrong with permissions or executable flag.

I have now been able to fix the problem locally by setting MK_ORA_DEBUG=true in /usr/lib/check_mk_agents/plugins/mk_oracle. After that change the cache file is created correctly and updated every CACHE_MAXAGE. After looking into the plugin i found that MK_ORA_DEBUG is used only once where the cache file is generated, so i suspect the problem must be there in the if then else fi statement.

    # Cache file outdated and new job not yet running? Start it
    if [ -z "$use_cache_file" ] && [ ! -e "${cache_file}.new" ]; then
        if $MK_ORA_DEBUG; then
            echo "set -o noclobber; exec > \"${cache_file}.new\" || exit 1; ${cmd_name} && mv \"${cache_file}.new\" \"${cache_file}\" || rm -f \"${cache_file}\" \"${cache_file}.new\"" | /bin/bash
        else
            # When the command fails, the output is throws away ignored
            echo "set -o noclobber; exec > \"${cache_file}.new\" || exit 1; ${cmd_name} && mv \"${cache_file}.new\" \"${cache_file}\" || rm -f \"${cache_file}\" \"${cache_file}.new\"" | nohup /bin/bash >/dev/null 2>&1 &
        fi
    fi

I have the same problem, and it was connected with environment variable TNS_ADMIN. mk_oracle.sh looks here for sqlnet.ora file, if this file is absent here mk_oracle.sh fallback to its config directory (/etc/check_mk). Either set this variable to proper location (/opt/oracle/product/12cR2/db/network/admin - in my case) or copy sqlnet.ora file to MK_CONFDIR (/etc/check_mk)

Hi @uleodolter !
This problem may be related to async issues with systemd: check_mk_agent: Fix issues with systemd
Can you try to run the entire mk_oracle plugin asynchronously? Just move it to a subfolder named after the desired seconds to cache, e.g., 60: /usr/lib/check_mk_agent/plugins/60/mk_oracle
When using the agent bakery, you can also configure the ruleset “Set cache age for plugins (UNIX)”.

Please let us know if it fixes the problem, so we can mainline the fix.

@AndiU I was trying what you said but it didn’t work. Also restarted agent-async and agent-ctl afterwards and nothing changed. If I run the agent manually the services which are stale seems to miss anyway in the output… So now I can again debug scripts what I so often did last times…
I also wonder how stale services are recognized by monitoring in general… because if a services is not working it should be crit and throw an error.

@AndiU seems like now it is working again… took some time to show up in the GUI

On another Oracle host I was wondering why many services are missing… so I ran the mk_oracle Plugin manually and saw it got the section of RMAN (oracle_rman) for example. But the cmk-agent-ctl dump doesn’t have this section…
And the problem in general is also that I expect each service which is done by the script is working but there is not even an error that this info cannot be get. So maybe I didn’t really understand yet how to work with that service discovery…

Okay, now I’m a bit confused :slight_smile:

@AndiU seems like now it is working again… took some time to show up in the GUI

When you wrote this, did you run the plugin asynchronously, i.e. following my suggestion to put it under /usr/lib/check_mk_agent/plugins/60/mk_oracle? Or did you already roll it back?

On another Oracle host I was wondering why many services are missing… so I ran the mk_oracle Plugin manually and saw it got the section of RMAN (oracle_rman) for example. But the cmk-agent-ctl dump doesn’t have this section…

So what’s the setup on this host? Is the mk_oracle plugin configured to run asynchronously?

For the first host, yes I ran it like you said. But what I am confused about is that the interval for each service is still different. So RMAN checks still got an cache of 10 minutes. So this 60 seconds setting is then probably just for the whole plugin when it should run.

For the second issue I didn’t put it in a folder to let it run asynchronously.

So this 60 seconds setting is then probably just for the whole plugin when it should run.

Yes! The mk_oracle plugin does its own caching. The reason to run the entire plugin in asynchonously/cached mode is just because otherwise it’s not possible to fork new (long-running) processes from it.
That’s because the agent is (usually) invoked by a systemd socket on linux, and systemd (in our setup) watches all forked processes and kills them as soon as the main process (i.e. the synchronous agent call) exits. Back in the time when the mk_oracle was written initially, there was no problem, because xinetd was the common super server, then, and it doesn’t care for forked processes.

It even seems ok to me that the cmk-agent-ctl dump doesn’t have the section, as it calls the agent via a local systemd socket.

Ah ok so as long it is not fixed we need that workaround for the Oracle plugin?! Ok then I will change it there.

The second part about the cmk-agent-ctl I didn’t understand yet. Why it is ok that if I want to debug it via that dump command that the section isn’t there?! Or do you just mean that this should be solved by running it ansynchronously?

Yes, just wanted to say that the cmk-agent-ctl dump command is also affected by this misbehavior. Of course the dump is intended to properly show all sections