Folder size/age of S3 buckets

Dear community,

what would be the best way to monitor folder sizes? We’re backing up customer data to our own minIO server (S3), but since the AWS monitoring only works with the ‚real‘ AWS, I’d need to be able to at least monitor the folder size of the (few) buckets. Bonus points for providing a way to monitor the modification date, too.

I found this thread, but it doesn’t feature a working solution: Monitoring Folder Size

Thanks in advance!

Ben

I am the OP of the thread you mentioned.
I settled for a self-made CheckMK local check based on Python3, which creates a service for every directory I wish to monitor across all my servers.
Though I have to admit, I don’t know the first thing about AWS, so I’m not sure how / if this works on there.
Do you have a normal linux VM there, which you can access and execute normal CheckMK Linux Plugins?
If yes, local checks should work.

Everything you need to know about CheckMK local checks should be in the CheckMK local check docs

Yes!

Would you mind sharing your script?

I can do that, but not right now.
Will upload the script here in Code Tags when I have time (sometime today or this weekend).

Best regards,
pixelpoint

1 Like

As promised, here is my code.
It is kinda overly complicated because of the cmk class.
We have 10+ different local checks, tailored to our needs, so I created a CheckMK class to offload all the things to create a unified service output.

If you want to use that, just deploy it in the same folder as you deploy the folder size check.
If not, you will need to edit the parts where I use cmk service and do the service creation on your own.

check_dir_size.py

#!/usr/bin/env python3

""" This module is used for monitoring directory sizes and reporting these to CheckMK.
    The status is automatically calculated based on WARN and CRIT levels of directory sizes.
    If a max_size (see: dictionary host_paths) has been configured, WARN is 80% and CRIT is 90% of max_size.
"""

###############
### imports ###
###############

import glob
import socket
from pathlib import Path
from cmk import CMKService

#################
### functions ###
#################

def get_foldersize(fullpath_str):
    """ Function expects a full path and will add the size of every file (except symlinks) of every subdirectory and return the total size in Bytes """
    fullpath_pathlib = Path(fullpath_str)
    total_size = 0
    for file in fullpath_pathlib.iterdir():
        if file.is_dir():
            total_size += get_foldersize(file)
        elif file.is_symlink():
            continue
        else:
            total_size += file.stat().st_size
    return total_size

def sizeof_format(num, suffix='B'):
    """ Function takes a size in Bytes and converts it to the matching unit like MiB / GiB / ... """
    for unit in [' ',' Ki',' Mi',' Gi',' Ti',' Pi',' Ei',' Zi']:
        if abs(num) < 1024.0:
            return "%3.1f%s%s" % (num, unit, suffix)
        num /= 1024.0
    return "%.1f%s%s" % (num, 'Yi', suffix)

#################
### variables ###
#################

# constants
STATE_OK = 0
STATE_WARN = 1
STATE_CRIT = 2
STATE_UNKNOWN = 3
NLDELIMIT = "\\\\n"

# variables
host_paths = [  # a dictionary with hostnames and folder paths, so you can monitor different folders on different hosts
    { 'hostname': 'SERVER_HOSTNAME', 'path': 'PATH_TO_FOLDER', 'max_size': 20*1024*1024*1024 } # size in Bytes
]

#################
### main code ###
#################

cmk_list = []

hostname = socket.gethostname()
if '.' in hostname:
    hostname = hostname.split('.')[0]

# search for the right hostname
for host in host_paths:
    if host['hostname'] == hostname:
        folder = host['path']
        size = get_foldersize(host['path'])
        max_size = host['max_size']

        cmk = CMKService(f"\"DirSize {folder}\"", uses_metrics=True, state_calculated=True)

        # create short output string
        if max_size != 0:
            cmk.add_short(f"{sizeof_format(size)} / {sizeof_format(max_size)}")
            cmk.add_metric('fs_size', size, int(max_size*0.8), max_size)
        else:
            cmk.add_short(f"{sizeof_format(size)}")

        cmk_list.append(cmk)

for cmkservice in cmk_list:
    print(cmkservice.get_serviceString())

cmk.py

#/usr/bin/env python3
"""
    This Module is used to aggregate all important things
    needed to create a CheckMK Service in a unified way.
"""

STATE_OK = 0
STATE_WARN = 1
STATE_CRIT = 2
STATE_UNKN = 3
NLDELIMIT = "\\n"

class CMKService():
    """
        This Class holds all information needed for CheckMK.
        - Service Name
        - Service Metrics
        - Service State
        - Service Output (short and long)
    """

    def __init__(self, name, uses_metrics=False, state_calculated=False):
        self._name = name
        self._uses_metrics = uses_metrics
        self._state_calculated = state_calculated

        self._state = STATE_UNKN
        self._metrics = []  # List of dictionaries
        self._short = []    # short service text messages, which you see in the summary
        self._long = []     # long service text messages, accessible via click on service

        if state_calculated:
            self._state_calculated = True
            self._state = 'P'

    def add_short(self, text):
        """ Adds a short (single line) service text """
        self._short.append(text)
    def add_long(self, text):
        """ Adds a long (multi-line) service text """
        self._long.append(text)

    def add_metric(self, name, value, warn=None, crit=None):
        """
            This method adds a Metric to the CheckMK Service.
            WARN and CRIT values are optional.
        """
        self._metrics.append({
            'name': name,
            'value': value,
            'warn': warn,
            'crit': crit,
        })

    def short_isEmpty(self):
        if len(self._short) == 0:
            return True
        return False

    def get_serviceString(self):
        """
            This method returns a complete Service String.
            The output string is already formatted the way CheckMK needs it.
        """
        service_str = ''
        service_str += str(self._state)+' '+self._name

        if self._uses_metrics:
            tmp = ''
            for metric in self._metrics:
                # if the string has a metric already (not 'falsy' in a boolean context),
                # add the checkmk metric seperator '|'
                if tmp: tmp += '|'
                tmp += metric['name']+'='+str(metric['value'])
                if metric['warn']: tmp += ";"+str(metric['warn'])
                if metric['crit']: tmp += ";"+str(metric['crit'])
            service_str += " "+tmp
        else: service_str += ' -'

        tmp = ''
        for short in self._short:
            if tmp: tmp += ' // '
            tmp += short
        service_str += " "+tmp

        tmp = ''
        for long in self._long:
            tmp += NLDELIMIT+''+long
        service_str += tmp

        return service_str

    def uses_metrics(self):
        """ Returns BOOL TRUE/FALSE if Service uses Metrics """
        return self._uses_metrics
    def state_calculated(self):
        """ Returns BOOL TRUE/FALSE if state is Self-calculated ('P') """
        return self._state_calculated

    # Getters / Setters
    @property
    def state(self):
        return self._state
    @state.setter
    def state(self, new_state):
        self._state = new_state

1 Like

Thank you very much for sharing the script(s). They have been working without any issues :slight_smile:

But: The script needs roughly 6 hours to complete. I really hope there’ll be a better way in the future, e.g. connecting to a local minIO instance with the S3 integration.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.