New to CheckMk. Need help reducing size of /opt/omd/sites/checkmk/var/pnp4nagios/perfdata

miketduffy · January 9, 2021, 10:41am

Hi

Could you please point me in the direction of the config file that would reduce the depth of data we retain in /opt/omd/sites/checkmk/var/pnp4nagios/perfdata.
Need it set to 3 months but I suspect it’s set to 12 months

thanks in advance.

andreas-doehler · January 9, 2021, 4:55pm

First question why? How big is your perfdata directory?
What you can keep in mind is that the perfdata folder will not grow with time you use your system only it grows if you add some more monitored systems.
At the start it assigns all the space it needs. For one performance metric value you need around 300-400 kByte space. That means one network interface needs 4.2 MByte of space.
If you monitor 10k interfaces then it is only 42 GByte that is not much space.

If you really want to modify this you need to modify the PNP4Nagios settings in your system. If you have the data stored inside “~/var/pnp4nagios” i think you use the CRE version.
The config file is “~/etc/pnp4nagios/rra.cfg”. Inside the sample file you find the default configuration.
Keep in mind that only new created rrd files will get the new settings. Existing rrd files cannot modified easily.

openmindz · February 9, 2021, 7:24pm

Hello everyone,

Well another simple but somewhat more brutal solution would be a cronjob that deletes perfdata you no longer need. I’ve done that on a few instances now, and I haven’t had any issues with this approach… even if it’s perhaps not the most elegant way.

Background:

A colleague of mine has written a script to monitor kubernetes pods, which also generates perfdata. The problem is, that pods change their names when they die and are reborn so we ended up with an unbelievable amount of invalid (and thus: useless…) perfdata. This was using almost 100GB of space after a year or so… so I simply deleted everything older than a day: worked perfectly.

HTH,
Openmindz

andreas-doehler · February 9, 2021, 10:12pm

Yes this should be done in every bigger system. But pay attention that you only do this in the RAW edition. In enterprise edition this is not so easy.

brm · February 10, 2021, 6:49am

Instead of the “brutal solution” by using a cronjob - in case of pnp4nagios perfdata cleanup I guess there’s a checkmk on-board configuration for that routine:

Global settings >> Site Management >> Automatic disc space cleanup >> Current settings >>
Delete files older than: xy days

Also see inline help for that parameter:

The historic events (state changes, downtimes etc.) of your hosts and services is stored in the monitoring history as plain text log files. One history log file contains the monitoring history of a given time period of all hosts and services. The files which are older than the configured time will be removed on the next execution of the disk space cleanup.
The historic metrics are stored in files for each host and service individually. When a host or service is removed from the monitoring, it’s metric files remain untouched on your disk until the files last update (modification time) is longer ago than the configure age.

In comparison to:
... >> Automatic disc space cleanup >> Current settings >>
Cleanup of abandoned host files older than (e.g. w.r.t. .rrd files)

And its inline help:

During monitoring there are several dedicated files created for each host. There are, for example, the discovered services, performance data and different temporary files created. During deletion of a host, these files are normally deleted. But there are cases, where the files are left on the disk until manual deletion, for example if you move a host from one site to another or deleting a host manually from the configuration.
The performance data (RRDs) and HW/SW inventory archive are never deleted during host deletion. They are only deleted automatically when you enable this option and after the configured period.

openmindz · February 10, 2021, 7:33am

Thanks to both of you for the additional insight, much appreciated.

andreas-doehler · February 10, 2021, 7:38am

Attention to the automatic cleanup, it can only remove data from hosts completely removed from monitoring. What i have in bigger systems is something like big core switch changed the naming of a huge amount of interfaces and this results in many obsolete RRDs. These files cannot be removed with the “Automatic disc space cleanup”. For this task i use some self made cron jobs.

The “Automatic disc space cleanup” is good for deleted hosts and older log files and so on.

brm · February 10, 2021, 7:44am

Thanks to andreas-doehler for that information, that’s really important to know - however, handling old files in reality is not getting easier with that…

system · February 10, 2022, 7:45am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact @fayepal if you think this should be re-opened.