we are suffering from long execution times on windows hosts if we use “Count, size and age of files”. At the moment we are monitoring about 200 Files (PG Backups). As soon as we deploy the file-check the execution time explodes from 2 sec. to about 50 sec.
But that problem occurs not on every windows server. The amount of files is nearly the same:
HostA: 190 files – execution time 1.7 sec
HostB: 207 files - execution time 48.8 sec
They both use the same CMK rule to define what files should be watched. If we disable the file check we are back to 2 sec. execution time.
Issue: The “Count, size and age of files” rule (mk_filestats / fileinfo) causes massive slowdowns on some Windows hosts — agent runtime jumps from ~2 seconds to ~48–50 seconds, while the same rule runs fine on other hosts with similar file counts.
This is a well-known performance problem with the Windows agent when monitoring files.
Main causes:
The plugin uses globbing to scan directories. Even with only ~200 matching files, Windows can be very slow if the folder contains thousands of files, is on HDD, heavily fragmented, or scanned by antivirus in real-time.
Too broad patterns (e.g. C:\Backups** or recursive search) make it dramatically worse.
Quick recommendations:
Make your include patterns as narrow and specific as possible (e.g. C:\PG_Backups*.backup instead of broad wildcards).
What I don’t understand here is, that I have other windows hosts with nearly the same amount of files and they are working great. Same folders, same patterns.
The output type you use can have a significant impact on runtime: file_stats lists all matching files, whereas extremes_only and count_only only outputs metrics.
It would also be interesting to determine whether the slowdown occurs during the file list generation (retrieving all filenames matching the pattern) or when reading the file statistics.