Hello,
we have multiple zfs servers that we monitor with checkMK and until recently we’ve only used quotas to limit space used for users, and reporting was working correctly.
To avoid having more user quotas than there is actual space, we recently started implementing reservations and this has lead to finding an issue with how the default zfs plugin reports used space. All our datasets now report 100% usage. For example:
Filesystem /bigpool/bigdataset CRIT - 100% used (80 of 80 TB), (warn/crit at 85.0%/90.0%), trend: 0 B / 6 hours
df, zfs itself, and NFS all report the used and free space correctly:
root@bigpool:~# df -PTlk -t zfs
Filesystem Type 1024-blocks Used Available Capacity Mounted on
.....
bigpool/bigdataset zfs 85899345920 68213454848 17685891072 80% /bigpool/bigdataset
root@bigpool:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
.....
bigpool/bigdataset 80T 16.5T 63.5T /bigpool/bigdataset
root@login:~# df -h | grep bigdataset
zfsserv:/bigpool/bigdataset 80T 64T 17T 80% /nfs/bigpool/bigdataset
The problem seems to be caused by the “used” flag used for the zfs get command, which is this line:
zfs get -t filesystem,volume -Hp name,quota,used,avail,mountpoint,type 2>/dev/null
The reporting also uses this “used” flag:
# 1. Filesystems with a quota
if "quota" in entry:
used_mb = entry["used"]
total_mb = entry["quota"]
avail_mb = total_mb - used_mb
I would like to suggest using the “referenced” flag from the zfs get command instead, that is the actual used space of the dataset, here are the relevant fields for that dataset:
root@bigpool:~# zfs get all bigpool/bigdataset
NAME PROPERTY VALUE SOURCE
bigpool/bigdataset type filesystem -
bigpool/bigdataset used 80T -
bigpool/bigdataset available 16.5T -
bigpool/bigdataset referenced 63.5T -
bigpool/bigdataset quota 80T local
bigpool/bigdataset mountpoint /bigpool/bigdataset default
bigpool/bigdataset refreservation 80T local
bigpool/bigdataset usedbysnapshots 0B -
bigpool/bigdataset usedbydataset 63.5T -
bigpool/bigdataset usedbychildren 0B -
bigpool/bigdataset usedbyrefreservation 16.5T -
bigpool/bigdataset written 63.5T -
bigpool/bigdataset logicalused 75.2T -
bigpool/bigdataset logicalreferenced 75.2T -
I would like to suggest to change the zfs get line and also the reporting line to use “referenced” instead of “used” the flag. It would then look like this:
zfs get -t filesystem,volume -Hp name,quota,referenced,avail,mountpoint,type 2>/dev/null
# 1. Filesystems with a quota
if "quota" in entry:
used_mb = entry["referenced"]
total_mb = entry["quota"]
avail_mb = total_mb - used_mb
edit: anonimized dataset and pool name