Timeout of check_mk_agent because of fileinfo

CMK version: 2.3.0p21
OS version: Ubuntu 24.04

We’re trying to monitor a backup server which collects the data from about 100 servers. But we got the problem, that the Checkmk Agent always runs into a timeout, because it need about 20 minutes to complete.
This is due to the large number of files that we need to check.
I have the following setting in the fileinfo.cfg:

/data0/rsnapshot/*/*/*/sicherung/mysql/*.sql
/data0/rsnapshot/*/*/*/sicherung/etc/*.tar.gz
/data0/rsnapshot/*/*/*/sicherung/www/*.tar.gz

I also made a group pattern, because this is the result for the few times the check runs through:

Is there any way to outsource the fileinfo part so that I can run it asynchronously via a cronjob, create a textfile with the results and just parse this textfile?
This would be much much faster and the check would not run into a timeout nearly evertime it runs.

Thanks in advance
Man-in-Black

Fist of all don’t use fileinfo, use mk_filestats instead.

In mk_filestats you can play around with the filename, filesize and fileage filters and the diffrent aggregation options to get the best result for your needs. On servers with many files I often use count_only or extremes_only as aggregation.

In addition, mk_filestats can also be executed async.

Thanks for the reply.

I’ve tested it with the mk_filestats.
But I got the same problem.
The agents just runs like forever.

I think I will try to replicate the logic of the script to write the results into a textfile and then just parse this file.
So I can schedule this script via cron but the agent will just run in time.

I’ve ended up by writing a own local script which will do the same, but runs within a couple of seconds.

If you’re interested in it:

#!/usr/bin/env python3
import os
import glob
import time

def check_backups(path, file_type):
    files = glob.glob(path)
    count = len(files)
    total_size = sum(os.path.getsize(f) for f in files)
    newest_file = max(files, key=os.path.getmtime) if files else None
    newest_time = os.path.getmtime(newest_file) if newest_file else 0
    age_seconds = time.time() - newest_time

    status = 0
    if age_seconds > 36 * 3600:
        status = 2
    elif age_seconds > 24 * 3600:
        status = 1

    return count, total_size, age_seconds, status

def format_time(seconds):
    days, remainder = divmod(int(seconds), 86400)
    hours, remainder = divmod(remainder, 3600)
    minutes, _ = divmod(remainder, 60)
    return f"{days}d {hours}h {minutes}m"

def format_size(size_bytes):
    size_mb = size_bytes / (1024 * 1024)
    if size_mb < 1000:
        return f"{size_mb:.2f} MB"
    size_gb = size_mb / 1024
    if size_gb < 1000:
        return f"{size_gb:.2f} GB"
    size_tb = size_gb / 1024
    return f"{size_tb:.2f} TB"

mysql_files = check_backups("/data0/rsnapshot/*/*/*/sicherung/mysql/*.sql", "MySQL")
etc_files = check_backups("/data0/rsnapshot/*/*/*/sicherung/etc/*.tar.gz", "etc")
www_files = check_backups("/data0/rsnapshot/*/*/*/sicherung/www/*.tar.gz", "www")

for file_type, (count, size, age, status) in [
    ("MySQL", mysql_files),
    ("etc", etc_files),
    ("www", www_files)
]:
    status_text = ["OK", "WARN", "CRIT"][status]
    formatted_age = format_time(age)
    formatted_size = format_size(size)
    print(f"{status} RBackup_{file_type} count={count}|size={size}|age={age:.0f} {status_text} - {count} files, {formatted_size}, {formatted_age} old")

not as customizable as the builtin version, but way faster and enough for my specific usecase.

Thanks for the help.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.