Zfs plugin used space reporting full dataset when reservation present

Hello,

we have multiple zfs servers that we monitor with checkMK and until recently we’ve only used quotas to limit space used for users, and reporting was working correctly.

To avoid having more user quotas than there is actual space, we recently started implementing reservations and this has lead to finding an issue with how the default zfs plugin reports used space. All our datasets now report 100% usage. For example:

Filesystem /bigpool/bigdataset CRIT - 100% used (80 of 80 TB), (warn/crit at 85.0%/90.0%), trend: 0 B / 6 hours

df, zfs itself, and NFS all report the used and free space correctly:

root@bigpool:~# df -PTlk -t zfs
Filesystem          Type  1024-blocks        Used    Available Capacity Mounted on
.....
bigpool/bigdataset zfs   85899345920 68213454848  17685891072      80% /bigpool/bigdataset
root@bigpool:~# zfs list
NAME                       USED  AVAIL     REFER  MOUNTPOINT
.....
bigpool/bigdataset         80T  16.5T     63.5T  /bigpool/bigdataset

root@login:~# df -h | grep bigdataset
zfsserv:/bigpool/bigdataset                  80T   64T   17T  80% /nfs/bigpool/bigdataset

The problem seems to be caused by the “used” flag used for the zfs get command, which is this line:

zfs get -t filesystem,volume -Hp name,quota,used,avail,mountpoint,type 2>/dev/null

The reporting also uses this “used” flag:

            # 1. Filesystems with a quota
            if "quota" in entry:
                used_mb = entry["used"]
                total_mb = entry["quota"]
                avail_mb = total_mb - used_mb

I would like to suggest using the “referenced” flag from the zfs get command instead, that is the actual used space of the dataset, here are the relevant fields for that dataset:

root@bigpool:~# zfs get all bigpool/bigdataset
NAME                 PROPERTY              VALUE                  SOURCE
bigpool/bigdataset  type                  filesystem             -
bigpool/bigdataset  used                  80T                    -
bigpool/bigdataset  available             16.5T                  -
bigpool/bigdataset  referenced            63.5T                  -
bigpool/bigdataset  quota                 80T                    local
bigpool/bigdataset  mountpoint            /bigpool/bigdataset   default
bigpool/bigdataset  refreservation        80T                    local
bigpool/bigdataset  usedbysnapshots       0B                     -
bigpool/bigdataset  usedbydataset         63.5T                  -
bigpool/bigdataset  usedbychildren        0B                     -
bigpool/bigdataset  usedbyrefreservation  16.5T                  -
bigpool/bigdataset  written               63.5T                  -
bigpool/bigdataset  logicalused           75.2T                  -
bigpool/bigdataset  logicalreferenced     75.2T                  -

I would like to suggest to change the zfs get line and also the reporting line to use “referenced” instead of “used” the flag. It would then look like this:

zfs get -t filesystem,volume -Hp name,quota,referenced,avail,mountpoint,type 2>/dev/null

            # 1. Filesystems with a quota
            if "quota" in entry:
                used_mb = entry["referenced"]
                total_mb = entry["quota"]
                avail_mb = total_mb - used_mb

edit: anonimized dataset and pool name