Empty .new. cache files

i42mapur · March 19, 2023, 7:22pm

CMK version:
2.0.0p29

OS version:
Oracle Linux 7.9

Error message:
No error message

I’m running default “mk_oracle” plugin with no additional sections. ASYNC sections are sometimes executed properly, but sometimes I get empty .new. cache files hanging there for hours. While the .new. cache file is there, no additional process is spawned to gather information for the specific databases. If I remove all the cache files, new ones are generated, but again, some of the files (it may be the same as before, or not) stay as .new. empty and the process associated to them (the pid after the .new.) is not existing.

Due to this, at the end, I have plenty of vanished services.

Any advice, please?

Thanks and best regards,
Rafael

T.Schmitz · March 20, 2023, 7:34am

Hi Rafael,

it seems that an SQL which is executed against the Oracle Database is hanging.
On our databases it is usually the Oracle Controlfile RMAN Backup SQL which is causing the issue.
The database team is usually fixing the issue in the database.

You can test this e.g. by disabling the corresponding ASYNC Section in mk_oracle.cfg via the EXCLUDE_<ORACLE_SID> parameter.

This is also the Oracle Controlfile SQL which is causing the issue on our dbs, maybe you can test this on our databases to check how long it is running.

select name
      || '|' || 'COMPLETED'
      || '|'
      || '|' || to_char(CHECKPOINT_TIME, 'yyyy-mm-dd_hh24:mi:ss')
      || '|' || 'CONTROLFILE'
      || '|'
      || '|' || round((sysdate - CHECKPOINT_TIME) * 24 * 60)
      || '|' || '0'
from (select upper(decode(0, 0, d.NAME, i.instance_name)) name
            ,max(bcd.CHECKPOINT_TIME) CHECKPOINT_TIME
      from v$database d
      join v$BACKUP_CONTROLFILE_DETAILS bcd on d.RESETLOGS_CHANGE# = bcd.RESETLOGS_CHANGE#
      join v$instance i on 1=1
      group by upper(decode(0, 0, d.NAME, i.instance_name)));

Best Regards
Thomas

LaSoe · March 20, 2023, 9:06am

Hi Rafael,

in older mk_oracle versions the *.new files were not always deleted when the executed check was terminated and as a consequence the check was not executed anymore. In the current 2.0.0p35 version this should be fixed.

Nevertheless you still have to find out which check hangs sometimes

Regards, Lars

Rendanic · March 21, 2023, 6:38am

RMAN SQLs are executed against the controlfile of Oracle. There are no indexes - Oracle has to read all RMAN data from the controlfile and filter the result in memory.
Sometimes fixed table stats could help in creating better plans, but sometime it won’t work as well.

A known issue ist, when huge number of archivelogs were created between a backup. Oracle increases the size of the controlfile to store the information and has to read all empty records to find the current state for existing archivelogdata - remember - there are no indexes availible for the controlfile… This could cause heavy performance issues in RMAN Ccheck of mk_oracle.

i42mapur · March 21, 2023, 7:10am

Hello,

thanks everybody for your help. Finally, the main point here was that we’re using systemd. Therefore, following this document helped: mk_oracle on AIX, Solaris and UNIX: Solve sync. vs. async sections on hosts with systemd

After that, I’ve had issues with long running rman queries, but gathering fixed table stats have fixed it.

Best regards,
Rafael

system · March 20, 2024, 7:11am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.