We’re running 2.3.0p18 but we’ve had this issue in all versions we’ve ever run.
If I use filestats.cfg to monitor files, that plugin is always particularly slow and can cause fetcher timeouts on the hosts where I look for files.
I both use this to check for the age of the newest log file in a folder to see if a process is still writing to a log file. Sometimes I use this plugin to check for dataflow, that files end in directories where I expect them as part of various integrations.
Sometimes I am looking at very large directories and this check is particularly slow.
I’ve increased the overall timeout on these hosts to 2 minutes, and check less frequently, but I am still getting some timeouts, always because of the filestats.cfg plugin.
caution is advised with the Show files in service details option: If many files have been responsible for a status change, they will all be listed, which can lead to long lists and associated performance and view issues.
I don’t use that show files option. I set up a File Group in Agent rules to look for a specific file, or a group of files. Often I’m using reg ex and looking for one specific log file like this:
Section name:
Splunk
Globbing pattern for input files:
/opt/splunkforwarder/var/log/splunk/splunkd.log
Then I create a Service Monitoring rule and say alert if that particular File Group is older than 15 minutes.
However, for dataflow checks, I’m sometimes just watching a huge directory and reporting if the newest file in that directory is older than 15 minutes or something like that. If the directory is huge, the check is really slow. I get that checking a large directory might take a while, but if I run a ls -lt at a command prompt, I still get results almost immediately, but CheckMK is timing out if I give it 2 minutes to perform the check.
I’ve tried using an age filter to then only look for files over the past few hours, and within that, check the age of the most recent file.
Globing is relatively slow and mk_filestats loops then over all directories and descends into all directories found th get all the files. then the stats command is called for each folder and file in the list.
It may be easier to use a “simpler” ‘input_patterns’ and filter out the non-relevant directories and files by using the ‘filter_regex’ and/or the ‘filter_regex_inverse’ to include only directories and files that are relevant by name.
If the filters are used cleverly, this can drastically reduce the list of directories and files to be checked, as mk_filestats then no longer has to descend into these directories and can simply ignore the files they contain. Since this filtering is done before calling stats, it can drastically reduce the number of files to be checked and thus improve speed.
To better understand in detail what is happening and where the time is lost run mk_filestats in verbose mode → “mk_filestats -vv”. With “-c ” you can specify other configfiles to test different variants.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.