I am facing an issue with Oracle monitoring that I get a particular Oracle Instance and its checks as vanished with a message “Login into Database failed” and after sometime all these UNKOWN checks come back in the inventory in like the 3rd or 4th re-discovery.
I am using mk_oracle. The version 12.2.0.1.0. Specifically, the ORA Intance is flapping. So it will be “Unknown = Item not found in monitoring data” and after some time it will be OK.
Yes, I already tried this. The problem is the monitoring works fine except the Oracle Instance service check which is flapping with sporadic UNKNOWNS ></ OK
Can you please show us, which sections for the plugin you have defined synchronoulsy and which asynchronously?
If you check, for example, the oracle jobs on each call of the agent and don’t cache the result you may run into the situation your queries against the database are to slow and you don’t get an answer within a monitoring cycle. This could cause a complete instance to be shown as stale or unknown.
In defualt the job-section runs synchronously. Can you check the mk_oracle.cfg and post the value of SYNC_SECTIONS and ASYNC_SECTIONS?
If your instance changes to unknown/stale, can you save your cached data within ~site/tmp/check_mk/cache/<host>? This could help analyse the problem.
Forgot to mention the important thing. I am not directly running the mk_oracle instead runing this on a host and then piggybacking the results to checkmk
You said, your instance is running 12.2.0.1? It could be a problem if you set a fixed version inside your config file. In my opinion you shouldn’t set this version and let the plugin check the version directly.
And I did that. So far, I don’t see any UNKNWON. I will monitor it. Do I still need the SYNC,ASYNC sections as I am calling the mk_oracle from this 3rd server which in turn polls everything from the RAC ?
The sync and async sections have a default value. We just found out that the statements for some check are ressource heavy and could produces some unnecessary stress on the databse, like i said befor with oracle jobs. Also this sections aren’t that important, at least for us, so we decided to cache thjem for 10 minutes and reduce the stress on the database.
Our sync and async sections are defined like this:
# Sections to run in foreground and wait for the result
SYNC_SECTIONS='instance dataguard_stats logswitches longactivesessions performance processes recovery_area recovery_status sessions undostat'
# Sections to run in the background, at a slower interval cached
ASYNC_SECTIONS='locks resumable rman tablespaces ts_quotas'
# Sections to run in foreground for ASM
SYNC_ASM_SECTIONS='asm_diskgroup instance processes'
# Sections to run in the background for ASM
ASYNC_ASM_SECTIONS=''
# Cache time (i.e. check interval) for async sections
CACHE_MAXAGE=600
Thanks @tosch
Since, I use the mk_oracle plugin on a seperate server which fetches all the information from the Oracle RAC remotely ,Should I still use these sections ?
I think so because of the remote function you are getting an additional latency for your check results. Please check the documentation before, it could be the case that the sections differ in name for oracle remote instances or you need some additional configuration parameters.