Oracle RAC Monitoring: Spradic message "Login into Database failed"

davidwayne · February 12, 2021, 11:06am

I am facing an issue with Oracle monitoring that I get a particular Oracle Instance and its checks as vanished with a message “Login into Database failed” and after sometime all these UNKOWN checks come back in the inventory in like the 3rd or 4th re-discovery.

What can I check here ?

brm · February 12, 2021, 11:45am

Are you using mk_oracle or mk_oracle.ps1?

What is the Oracle version of your monitored database? Can you reproduce your observation on various hosts with an Oracle database?

davidwayne · February 12, 2021, 11:48am

I am using mk_oracle. The version 12.2.0.1.0. Specifically, the ORA Intance is flapping. So it will be “Unknown = Item not found in monitoring data” and after some time it will be OK.

brm · February 12, 2021, 11:58am

Has there been any maintenance for the database recently?

Maybe you should have a try with the standard debugging / diagnostic routines provided for mk_oracle as described in docs.checkmk.com - any informative hints within the log file when performing option -l?

davidwayne · February 12, 2021, 12:09pm

Yes, I already tried this. The problem is the monitoring works fine except the Oracle Instance service check which is flapping with sporadic UNKNOWNS ></ OK

brm · February 12, 2021, 12:18pm

Is sporadic UNKNOWN completely random or is there a certain intervall?

tosch · February 12, 2021, 1:15pm

Can you please show us, which sections for the plugin you have defined synchronoulsy and which asynchronously?
If you check, for example, the oracle jobs on each call of the agent and don’t cache the result you may run into the situation your queries against the database are to slow and you don’t get an answer within a monitoring cycle. This could cause a complete instance to be shown as stale or unknown.

davidwayne · February 12, 2021, 1:35pm

Its random and doesn’t happen like at a particular time of the day

davidwayne · February 12, 2021, 1:36pm

Hi @tosch
I installed the mk_oracle plugin and the mk_oracle.cfg via puppet. I have not defined anything.

tosch · February 12, 2021, 1:43pm

In defualt the job-section runs synchronously. Can you check the mk_oracle.cfg and post the value of SYNC_SECTIONS and ASYNC_SECTIONS?
If your instance changes to unknown/stale, can you save your cached data within ~site/tmp/check_mk/cache/<host>? This could help analyse the problem.

davidwayne · February 12, 2021, 1:53pm

In mk_oracle.cfg, I have only:

DBUSER='check_mk:xxxxxxxx'

davidwayne · February 12, 2021, 1:54pm

What sections I should define ?

davidwayne · February 12, 2021, 6:28pm

Forgot to mention the important thing. I am not directly running the mk_oracle instead runing this on a host and then piggybacking the results to checkmk

This is how my mk_oracle.cfg looks like:

REMOTE_ORACLE_HOME="/usr/lib/oracle/18.5/client64"
TNS_ADMIN="/usr/lib/oracle/18.5/client64/lib/network/admin"
REMOTE_INSTANCE_1='check_mk:mypassword::myRemoteHost:1521:myOracleHost:MYINST3:11.2:MYINST3'

Could this also be that I have the client 18.5 but I want to monitor 11.2 ?

tosch · February 13, 2021, 11:36am

You said, your instance is running 12.2.0.1? It could be a problem if you set a fixed version inside your config file. In my opinion you shouldn’t set this version and let the plugin check the version directly.

davidwayne · February 15, 2021, 6:42am

Thanks @tosch for your help so far.
So, I should do this:

REMOTE_INSTANCE_1='check_mk:mypassword::myRemoteHost:1521:myOracleHost:MYINST3::MYINST3'

And I did that. So far, I don’t see any UNKNWON. I will monitor it. Do I still need the SYNC,ASYNC sections as I am calling the mk_oracle from this 3rd server which in turn polls everything from the RAC ?

tosch · February 15, 2021, 7:24am

The sync and async sections have a default value. We just found out that the statements for some check are ressource heavy and could produces some unnecessary stress on the databse, like i said befor with oracle jobs. Also this sections aren’t that important, at least for us, so we decided to cache thjem for 10 minutes and reduce the stress on the database.
Our sync and async sections are defined like this:

# Sections to run in foreground and wait for the result
SYNC_SECTIONS='instance dataguard_stats logswitches longactivesessions performance processes recovery_area recovery_status sessions undostat'

# Sections to run in the background, at a slower interval cached
ASYNC_SECTIONS='locks resumable rman tablespaces ts_quotas'

# Sections to run in foreground for ASM
SYNC_ASM_SECTIONS='asm_diskgroup instance processes'

# Sections to run in the background for ASM
ASYNC_ASM_SECTIONS=''

# Cache time (i.e. check interval) for async sections
CACHE_MAXAGE=600

davidwayne · February 16, 2021, 8:58am

Thanks @tosch
Since, I use the mk_oracle plugin on a seperate server which fetches all the information from the Oracle RAC remotely ,Should I still use these sections ?

tosch · February 16, 2021, 1:42pm

I think so because of the remote function you are getting an additional latency for your check results. Please check the documentation before, it could be the case that the sections differ in name for oracle remote instances or you need some additional configuration parameters.

davidwayne · February 16, 2021, 2:03pm

I checked it already and the names are same. I also tried putting all the sections under ```
ASYNC_SECTIONS

but no luck so far.

davidwayne · February 16, 2021, 2:04pm

Could this be also because I use Oracle 18.5 client on my 3rd server but my Oracle RAC version is 11.2 ?