Mk_oracle plugin problem

uleodolter · February 15, 2022, 1:24pm

CMK version: 2.0.0p19
OS version Agent: CentOS Linux release 7.9.2009 (Core)
OS version Server: AlmaLinux release 8.5 (Arctic Sphynx)

when running the plugin on command line the output is as expected, takes only a second, and the exit status 0

MK_CONFDIR=/etc/check_mk MK_VARDIR=/var/lib/check_mk_agent /usr/lib/check_mk_agent/plugins/mk_oracle

it seems the plugin does not provide any output when executed via check-mk-agent. there is only an empty file /var/lib/check_mk/cache/oracle_SID.cache.new created

If CACHE_MAXAGE=0 in /etc/check_mk/mk_oracle.cfg is set then it works, but i think then no async processing ist done.

Anyone seen this problem?

tosch · February 15, 2022, 1:34pm

Hi @uleodolter,

how does the output of MK_CONFDIR=/etc/check_mk /usr/lib/check_mk_agent/plugins/mk_oracle -t (sensitive data included) looks like?
I never have set the variable for the vardir on this call before, just setting the confdir.

Any output on the agent call related to the mk_oracle plugin or just no oracle sections at all?

uleodolter · February 15, 2022, 4:38pm

When running just the connection test (option -t) then i get empty sections, i think this is intended. When mk_oracle is started without any option or option -l then i get full output of oracle. There is only a problem when this is executed via agent.

# MK_CONFDIR=/etc/check_mk /usr/lib/check_mk_agent/plugins/mk_oracle -t
<<<oracle_instance>>>
<<<oracle_sessions>>>
<<<oracle_logswitches>>>
<<<oracle_undostat>>>
<<<oracle_recovery_area>>>
<<<oracle_processes>>>
<<<oracle_recovery_status>>>
<<<oracle_longactivesessions>>>
<<<oracle_dataguard_stats>>>
<<<oracle_performance>>>
<<<oracle_locks>>>
<<<oracle_systemparameter>>>
<<<oracle_tablespaces>>>
<<<oracle_rman>>>
<<<oracle_jobs>>>
<<<oracle_resumable>>>
<<<oracle_iostats>>>
<<<oracle_instance>>>
<<<oracle_processes>>>
<<<oracle_asm_diskgroup>>>

---login----------------------------------------------------------------
    Operating System:       Linux
    ORACLE_HOME (oratab):   /opt/app/oracle/product/12r1
    Logincheck to Instance: sid1
    Version:                12.1
    Login ok User:	    CHECKMK on hostname Instance sid1
    SYNC_SECTIONS:          instance
sessions
logswitches
undostat
recovery_area
processes
recovery_status
longactivesessions
dataguard_stats
performance
locks
systemparameter
    ASYNC_SECTIONS:         tablespaces
rman
resumable
iostats
------------------------------------------------------------------------

LaSoe · February 16, 2022, 6:24pm

maybe you have to export the MK_* env so that they can also be used by the async processes

export MK_CONFDIR=/etc/check_mk; export MK_VARDIR=/var/lib/check_mk_agent; ...

uleodolter · February 17, 2022, 6:04am

The output on command line looks like this (only the beginning, there is a lot more information performance and tablespaces:

# /usr/lib/check_mk_agent/plugins/mk_oracle
<<<oracle_instance>>>
<<<oracle_sessions>>>
<<<oracle_logswitches>>>
<<<oracle_undostat>>>
<<<oracle_recovery_area>>>
<<<oracle_processes>>>
<<<oracle_recovery_status>>>
<<<oracle_longactivesessions>>>
<<<oracle_dataguard_stats>>>
<<<oracle_performance>>>
<<<oracle_locks>>>
<<<oracle_systemparameter>>>
<<<oracle_tablespaces>>>
<<<oracle_rman>>>
<<<oracle_jobs>>>
<<<oracle_resumable>>>
<<<oracle_iostats>>>
<<<oracle_instance>>>
<<<oracle_processes>>>
<<<oracle_asm_diskgroup>>>
<<<oracle_instance:sep(124)>>>
PRM4|12.1.0.2.0|OPEN|ALLOWED|STARTED|3617138|2791881398|ARCHIVELOG|PRIMARY|NO|PRM4|080820170947|FALSE|0||0|||||-1|0
<<<oracle_sessions:sep(124)>>>
PRM4|48|776|-1
<<<oracle_logswitches:sep(124)>>>
PRM4|1
<<<oracle_undostat:sep(124)>>>
PRM4|272|3|2030|1250|0
<<<oracle_recovery_area:sep(124)>>>
<<<oracle_processes:sep(124)>>>
PRM4|81|500
...

tosch · February 17, 2022, 8:08am

Hi @uleodolter,

so in general the plugin doesn’t have a problem.
Can you show the content of your plugin directory and cache directory. Just want to see if there is something wrong with permissions or executable flag.

uleodolter · February 28, 2022, 2:21pm

I have now been able to fix the problem locally by setting MK_ORA_DEBUG=true in /usr/lib/check_mk_agents/plugins/mk_oracle. After that change the cache file is created correctly and updated every CACHE_MAXAGE. After looking into the plugin i found that MK_ORA_DEBUG is used only once where the cache file is generated, so i suspect the problem must be there in the if then else fi statement.

    # Cache file outdated and new job not yet running? Start it
    if [ -z "$use_cache_file" ] && [ ! -e "${cache_file}.new" ]; then
        if $MK_ORA_DEBUG; then
            echo "set -o noclobber; exec > \"${cache_file}.new\" || exit 1; ${cmd_name} && mv \"${cache_file}.new\" \"${cache_file}\" || rm -f \"${cache_file}\" \"${cache_file}.new\"" | /bin/bash
        else
            # When the command fails, the output is throws away ignored
            echo "set -o noclobber; exec > \"${cache_file}.new\" || exit 1; ${cmd_name} && mv \"${cache_file}.new\" \"${cache_file}\" || rm -f \"${cache_file}\" \"${cache_file}.new\"" | nohup /bin/bash >/dev/null 2>&1 &
        fi
    fi

tszczesn · April 11, 2022, 10:51am

I have the same problem, and it was connected with environment variable TNS_ADMIN. mk_oracle.sh looks here for sqlnet.ora file, if this file is absent here mk_oracle.sh fallback to its config directory (/etc/check_mk). Either set this variable to proper location (/opt/oracle/product/12cR2/db/network/admin - in my case) or copy sqlnet.ora file to MK_CONFDIR (/etc/check_mk)

AndiU · June 7, 2022, 7:01am

Hi @uleodolter !
This problem may be related to async issues with systemd: check_mk_agent: Fix issues with systemd
Can you try to run the entire mk_oracle plugin asynchronously? Just move it to a subfolder named after the desired seconds to cache, e.g., 60: /usr/lib/check_mk_agent/plugins/60/mk_oracle
When using the agent bakery, you can also configure the ruleset “Set cache age for plugins (UNIX)”.

Please let us know if it fixes the problem, so we can mainline the fix.

matze218 · June 10, 2022, 9:24am

@AndiU I was trying what you said but it didn’t work. Also restarted agent-async and agent-ctl afterwards and nothing changed. If I run the agent manually the services which are stale seems to miss anyway in the output… So now I can again debug scripts what I so often did last times…
I also wonder how stale services are recognized by monitoring in general… because if a services is not working it should be crit and throw an error.

matze218 · June 10, 2022, 9:37am

@AndiU seems like now it is working again… took some time to show up in the GUI

matze218 · June 10, 2022, 11:33am

On another Oracle host I was wondering why many services are missing… so I ran the mk_oracle Plugin manually and saw it got the section of RMAN (oracle_rman) for example. But the cmk-agent-ctl dump doesn’t have this section…
And the problem in general is also that I expect each service which is done by the script is working but there is not even an error that this info cannot be get. So maybe I didn’t really understand yet how to work with that service discovery…

AndiU · June 14, 2022, 9:57am

Okay, now I’m a bit confused

@AndiU seems like now it is working again… took some time to show up in the GUI

When you wrote this, did you run the plugin asynchronously, i.e. following my suggestion to put it under /usr/lib/check_mk_agent/plugins/60/mk_oracle? Or did you already roll it back?

On another Oracle host I was wondering why many services are missing… so I ran the mk_oracle Plugin manually and saw it got the section of RMAN (oracle_rman) for example. But the cmk-agent-ctl dump doesn’t have this section…

So what’s the setup on this host? Is the mk_oracle plugin configured to run asynchronously?

matze218 · June 14, 2022, 10:15am

For the first host, yes I ran it like you said. But what I am confused about is that the interval for each service is still different. So RMAN checks still got an cache of 10 minutes. So this 60 seconds setting is then probably just for the whole plugin when it should run.

For the second issue I didn’t put it in a folder to let it run asynchronously.

AndiU · June 14, 2022, 11:35am

So this 60 seconds setting is then probably just for the whole plugin when it should run.

Yes! The mk_oracle plugin does its own caching. The reason to run the entire plugin in asynchonously/cached mode is just because otherwise it’s not possible to fork new (long-running) processes from it.
That’s because the agent is (usually) invoked by a systemd socket on linux, and systemd (in our setup) watches all forked processes and kills them as soon as the main process (i.e. the synchronous agent call) exits. Back in the time when the mk_oracle was written initially, there was no problem, because xinetd was the common super server, then, and it doesn’t care for forked processes.

It even seems ok to me that the cmk-agent-ctl dump doesn’t have the section, as it calls the agent via a local systemd socket.

matze218 · June 14, 2022, 11:52am

Ah ok so as long it is not fixed we need that workaround for the Oracle plugin?! Ok then I will change it there.

The second part about the cmk-agent-ctl I didn’t understand yet. Why it is ok that if I want to debug it via that dump command that the section isn’t there?! Or do you just mean that this should be solved by running it ansynchronously?

AndiU · June 14, 2022, 12:09pm

Yes, just wanted to say that the cmk-agent-ctl dump command is also affected by this misbehavior. Of course the dump is intended to properly show all sections

GarthH · February 8, 2023, 12:23am

Hello, is this still an issue with 2.0 ? We are troubleshooting cache file issues and we found this thread, is there any updates/new info or is the solution still to force the plugin to run async /60/ and then tune further async sections with a bigger maxage in the config?

i42mapur · March 19, 2023, 7:05pm

Hello,

I’m using mk_oracle plugin version 2.1.0p21 and facing issues. Random empty cache files, due to which the agent complaints that some sections are missing. It happens randomly on databases. So, at some point it works for some of the databases. Then, I remove all the cache files and on the next execution it works for another random set of databases.

Is there any advice here, please? Not even the standard plugin works fine for us. I’ve removed all the extra sections with no luck…

Best regards,
Rafael

chauhan_sudhir · March 20, 2023, 7:40am

If your server is using systemd, then you should look at this: mk_oracle on AIX, Solaris and UNIX: Solve sync. vs. async sections on hosts with systemd