Filesystem Check on NETAPP Used > 150%

CMK version: Checkmk RAW 2.3.0.p10
OS version: Debian 12 bookworm, Kernel 6.1.0-23-amd64

Error message:

**Output of “cmk --debug -vvn hostname”:

Trying to acquire lock on /omd/sites/live/tmp/check_mk/counters/NETAPPCL-RZ2
Got lock on /omd/sites/live/tmp/check_mk/counters/NETAPPCL-RZ2
value store: loading from disk
Releasing lock on /omd/sites/live/tmp/check_mk/counters/NETAPPCL-RZ2
Released lock on /omd/sites/live/tmp/check_mk/counters/NETAPPCL-RZ2
Checkmk version 2.3.0p10
+ FETCHING DATA
  Source: SourceInfo(hostname='NETAPPCL-RZ2', ipaddress='10.100.4.45', ident='snmp', fetcher_type=<FetcherType.SNMP: 7>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7fad3beb9ca0]
Read from cache: SNMPFileCache(NETAPPCL-RZ2, path_template=/omd/sites/live/tmp/check_mk/data_source_cache/snmp/{mode}/{hostname}, max_age=MaxAge(checking=0, discovery=90.0, inventory=90.0), simulation=False, use_only_cache=False, file_cache_mode=6)
Not using cache (Too old. Age is 41 sec, allowed is 0 sec)
  SNMP scan:
       Getting OID .1.3.6.1.2.1.1.1.0: Running 'snmpget -v2c -c public -m "" -M "" -On -OQ -Oe -Ot 10.100.4.45 .1.3.6.1.2.1.1.1.0'
SNMP answer: ==> ["NetApp Release 9.13.1P9: Fri Apr 19 17:13:02 UTC 2024"]
b'NetApp Release 9.13.1P9: Fri Apr 19 17:13:02 UTC 2024'
       Getting OID .1.3.6.1.2.1.1.2.0: Running 'snmpget -v2c -c public -m "" -M "" -On -OQ -Oe -Ot 10.100.4.45 .1.3.6.1.2.1.1.2.0'
SNMP answer: ==> [.1.3.6.1.4.1.789.2.5]
b'.1.3.6.1.4.1.789.2.5'
       Using cached OID .1.3.6.1.2.1.1.1.0: 'NetApp Release 9.13.1P9: Fri Apr 19 17:13:02 UTC 2024'
       Using cached OID .1.3.6.1.2.1.1.2.0: '.1.3.6.1.4.1.789.2.5'
       Getting OID .1.3.6.1.4.1.789.1.5.4.1.29.*: Running 'snmpgetnext -Cf -v2c -c public -m "" -M "" -On -OQ -Oe -Ot 10.100.4.45 .1.3.6.1.4.1.789.1.5.4.1.29'
SNMP answer: ==> [801490212]
b'801490212'
       Using cached OID .1.3.6.1.2.1.1.1.0: 'NetApp Release 9.13.1P9: Fri Apr 19 17:13:02 UTC 2024'
       Using cached OID .1.3.6.1.2.1.1.1.0: 'NetApp Release 9.13.1P9: Fri Apr 19 17:13:02 UTC 2024'
       Using cached OID .1.3.6.1.2.1.1.2.0: '.1.3.6.1.4.1.789.2.5'
       Using cached OID .1.3.6.1.4.1.789.1.5.4.1.29.*: '801490212'
   SNMP scan found                    df_netapp snmp_uptime
Trying to acquire lock on /omd/sites/live/tmp/check_mk/snmp_scan_cache/NETAPPCL-RZ2.10.100.4.45
Got lock on /omd/sites/live/tmp/check_mk/snmp_scan_cache/NETAPPCL-RZ2.10.100.4.45
Releasing lock on /omd/sites/live/tmp/check_mk/snmp_scan_cache/NETAPPCL-RZ2.10.100.4.45
Released lock on /omd/sites/live/tmp/check_mk/snmp_scan_cache/NETAPPCL-RZ2.10.100.4.45
netapp_cpu: Fetching data (SNMP walk cache is enabled: Use any locally cached information)
Running 'snmpbulkwalk -Cr10 -v2c -c public -m "" -M "" -Cc -OQ -OU -On -Ot 10.100.4.45 .1.3.6.1.4.1.789.1.2.1.3'
df_netapp: Fetching data (SNMP walk cache is enabled: Use any locally cached information)
Running 'snmpbulkwalk -Cr10 -v2c -c public -m "" -M "" -Cc -OQ -OU -On -Ot 10.100.4.45 .1.3.6.1.4.1.789.1.5.4.1.2'
Running 'snmpbulkwalk -Cr10 -v2c -c public -m "" -M "" -Cc -OQ -OU -On -Ot 10.100.4.45 .1.3.6.1.4.1.789.1.5.4.1.29'
Running 'snmpbulkwalk -Cr10 -v2c -c public -m "" -M "" -Cc -OQ -OU -On -Ot 10.100.4.45 .1.3.6.1.4.1.789.1.5.4.1.30'
snmp_info: Fetching data (SNMP walk cache is enabled: Use any locally cached information)
Running 'snmpbulkwalk -Cr10 -v2c -c public -m "" -M "" -Cc -OQ -OU -On -Ot 10.100.4.45 .1.3.6.1.2.1.1.1'
Running 'snmpbulkwalk -Cr10 -v2c -c public -m "" -M "" -Cc -OQ -OU -On -Ot 10.100.4.45 .1.3.6.1.2.1.1.2'
Running 'snmpbulkwalk -Cr10 -v2c -c public -m "" -M "" -Cc -OQ -OU -On -Ot 10.100.4.45 .1.3.6.1.2.1.1.4'
Running 'snmpbulkwalk -Cr10 -v2c -c public -m "" -M "" -Cc -OQ -OU -On -Ot 10.100.4.45 .1.3.6.1.2.1.1.5'
Running 'snmpbulkwalk -Cr10 -v2c -c public -m "" -M "" -Cc -OQ -OU -On -Ot 10.100.4.45 .1.3.6.1.2.1.1.6'
snmp_uptime: Fetching data (SNMP walk cache is enabled: Use any locally cached information)
Running 'snmpbulkwalk -Cr10 -v2c -c public -m "" -M "" -Cc -OQ -OU -On -Ot 10.100.4.45 .1.3.6.1.2.1.1.3'
Running 'snmpbulkwalk -Cr10 -v2c -c public -m "" -M "" -Cc -OQ -OU -On -Ot 10.100.4.45 .1.3.6.1.2.1.25.1.1'
Write data to cache file /omd/sites/live/tmp/check_mk/data_source_cache/snmp/checking/NETAPPCL-RZ2
Trying to acquire lock on /omd/sites/live/tmp/check_mk/data_source_cache/snmp/checking/NETAPPCL-RZ2
Got lock on /omd/sites/live/tmp/check_mk/data_source_cache/snmp/checking/NETAPPCL-RZ2
Releasing lock on /omd/sites/live/tmp/check_mk/data_source_cache/snmp/checking/NETAPPCL-RZ2
Released lock on /omd/sites/live/tmp/check_mk/data_source_cache/snmp/checking/NETAPPCL-RZ2
[cpu_tracking] Stop [7fad3beb9ca0 - Snapshot(process=posix.times_result(user=0.010000000000000009, system=0.0, children_user=0.03, children_system=0.01, elapsed=0.5399999991059303))]
  Source: SourceInfo(hostname='NETAPPCL-RZ2', ipaddress='10.100.4.45', ident='piggyback', fetcher_type=<FetcherType.PIGGYBACK: 4>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7fad3ae302f0]
Read from cache: NoCache(NETAPPCL-RZ2, path_template=/dev/null, max_age=MaxAge(checking=0.0, discovery=0.0, inventory=0.0), simulation=False, use_only_cache=False, file_cache_mode=1)
No piggyback files for 'NETAPPCL-RZ2'. Skip processing.
No piggyback files for '10.100.4.45'. Skip processing.
Get piggybacked data
[cpu_tracking] Stop [7fad3ae302f0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
[cpu_tracking] Start [7fad3b1d2ba0]
+ PARSE FETCHER RESULTS
  HostKey(hostname='NETAPPCL-RZ2', source_type=<SourceType.HOST: 1>)  -> Add sections: ['df_netapp', 'netapp_cpu', 'snmp_info', 'snmp_uptime']
  HostKey(hostname='NETAPPCL-RZ2', source_type=<SourceType.HOST: 1>)  -> Add sections: []
Received no piggyback data
CPU utilization      Total CPU: 17.00%
Filesystem /vol/MDV_CRS_2851fd8ba91111ec8260d039ea930be3_A Used: 0.96% - 98.2 MB of 10.2 GB, trend per 1 day 0 hours: +19.8 MB, trend per 1 day 0 hours: +0.19%, Time left until disk full: 1 year 144 days
Filesystem /vol/MDV_CRS_2851fd8ba91111ec8260d039ea930be3_A/.snapshot Used: 0% - 0 B of 537 MB, trend per 1 day 0 hours: +0 B, trend per 1 day 0 hours: +0%
Filesystem /vol/MDV_CRS_2851fd8ba91111ec8260d039ea930be3_B Used: 0.02% - 1.83 MB of 10.2 GB, trend per 1 day 0 hours: +16.5 kB, trend per 1 day 0 hours: +<0.01%, Time left until disk full: 1691 years 47 days
Filesystem /vol/MDV_CRS_2851fd8ba91111ec8260d039ea930be3_B/.snapshot Used: 0% - 0 B of 537 MB, trend per 1 day 0 hours: +0 B, trend per 1 day 0 hours: +0%
Filesystem /vol/RZ2_NFS_Krones_2 Used: 40.64% - 1.34 TB of 3.30 TB, trend per 1 day 0 hours: -4.94 GB, trend per 1 day 0 hours: -0.15%
Filesystem /vol/RZ2_NFS_Preform_1 Used: 7.52% - 118 GB of 1.57 TB, trend per 1 day 0 hours: -17.4 MB, trend per 1 day 0 hours: -0.00%
Filesystem /vol/RZ2_NFS_Preform_1/.snapshot Used: 5.84% - 4.82 GB of 82.5 GB, trend per 1 day 0 hours: +16.3 MB, trend per 1 day 0 hours: +0.02%, Time left until disk full: 13 years 12 days
Filesystem /vol/RZ2_NFS_Preform_2 Used: 8.60% - 180 GB of 2.09 TB, trend per 1 day 0 hours: -831 MB, trend per 1 day 0 hours: -0.04%
Filesystem /vol/RZ2_NFS_Preform_2/.snapshot Used: 36.08% - 39.7 GB of 110 GB, trend per 1 day 0 hours: -276 MB, trend per 1 day 0 hours: -0.25%
Filesystem /vol/RZ2_NFS_Swap_Logs Used: 0.08% - 1.39 GB of 1.65 TB, trend per 1 day 0 hours: -1.51 GB, trend per 1 day 0 hours: -0.09%
Filesystem /vol/RZ2_NFS_VIT_01 Used: 28.42% - 594 GB of 2.09 TB, trend per 1 day 0 hours: +15.1 GB, trend per 1 day 0 hours: +0.72%, Time left until disk full: 99 days 8 hours
Filesystem /vol/RZ2_NFS_VIT_01/.snapshot Used: 153.99% - 169 GB of 110 GB (warn/crit at 94.00%/96.00% used)(!!), trend per 1 day 0 hours: +14.1 GB, trend per 1 day 0 hours: +12.83%, Time left until disk full: 0 seconds
Filesystem /vol/RZ2_NFS_VIT_SSI_2 Used: 48.80% - 805 GB of 1.65 TB, trend per 1 day 0 hours: +8.11 GB, trend per 1 day 0 hours: +0.49%, Time left until disk full: 104 days 3 hours
Filesystem /vol/TEST Used: 0.05% - 289 MB of 612 GB, trend per 1 day 0 hours: +17.3 kB, trend per 1 day 0 hours: +<0.01%, Time left until disk full: 97009 years 221 days
Filesystem /vol/TEST/.snapshot Used: 0% - 0 B of 32.2 GB, trend per 1 day 0 hours: +0 B, trend per 1 day 0 hours: +0%
Filesystem /vol/rz2_krones_2 Used: <0.01% - 17.8 MB of 2.20 TB, trend per 1 day 0 hours: +22.4 kB, trend per 1 day 0 hours: +<0.01%, Time left until disk full: 268446 years 35 days
Filesystem /vol/rz2_vit_ssi_2 Used: <0.01% - 17.6 MB of 4.05 TB, trend per 1 day 0 hours: +23.1 kB, trend per 1 day 0 hours: +<0.01%, Time left until disk full: 481430 years 199 days
Filesystem /vol/svm_nfs_rz2_root Used: 0.79% - 8.04 MB of 1.02 GB, trend per 1 day 0 hours: +1.57 MB, trend per 1 day 0 hours: +0.15%, Time left until disk full: 1 year 278 days
Filesystem /vol/svm_nfs_rz2_root/.snapshot Used: 4.31% - 2.31 MB of 53.7 MB, trend per 1 day 0 hours: -403 kB, trend per 1 day 0 hours: -0.75%
Filesystem /vol/vol0 Used: 10.19% - 75.1 GB of 738 GB, trend per 1 day 0 hours: -2.58 GB, trend per 1 day 0 hours: -0.35%
Filesystem /vol/vol0/.snapshot Used: 21.61% - 8.39 GB of 38.8 GB, trend per 1 day 0 hours: +1.13 GB, trend per 1 day 0 hours: +2.90%, Time left until disk full: 27 days 1 hour
Filesystem aggr0_NETAPPNODE1_RZ2/.snapshot Used: <0.01% - 20.5 kB of 43.2 GB, trend per 1 day 0 hours: -19.0 MB, trend per 1 day 0 hours: -0.04%
Filesystem aggr1_NETAPPNODE1_RZ2 Used: 13.58% - 2.01 TB of 14.8 TB, trend per 1 day 0 hours: -6.40 GB, trend per 1 day 0 hours: -0.04%
Filesystem aggr1_NETAPPNODE1_RZ2/.snapshot Used: 1.91% - 14.9 GB of 778 GB, trend per 1 day 0 hours: +6.36 GB, trend per 1 day 0 hours: +0.82%, Time left until disk full: 119 days 21 hours
SNMP Info            NetApp Release 9.13.1P9: Fri Apr 19 17:13:02 UTC 2024, NETAPPCL-RZ2, , 
Uptime               Up since 2024-06-06 15:34:22, Uptime: 75 days 20 hours
No piggyback files for 'NETAPPCL-RZ2'. Skip processing.
No piggyback files for '10.100.4.45'. Skip processing.
[cpu_tracking] Stop [7fad3b1d2ba0 - Snapshot(process=posix.times_result(user=0.010000000000000009, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.010000001639127731))]
[snmp] Success, [piggyback] Success (but no data found for this host), execution time 0.6 sec | execution_time=0.550 user_time=0.020 system_time=0.000 children_user_time=0.030 children_system_time=0.010 cmk_time_snmp=0.490 cmk_time_agent=0.000

** (If it is a problem with checks or plugins)

Hello checkmk-community.
Does anyone else have the problem, that some filesystem-checks on Netapp dont seem to get the correct maximum file size?

Any advice or tips appreciated.

Best regards
Jonas

NetApp snapshots larger than the configured snapshot reserve “overflow” into the normal volume.

2 Likes

Thank you Martin.

Seems I cant edit my original post (yet maybe, because I am a forum noob), so here the change-location:

In the NetApp ONTAP System Manager in the Storage>Volumes-Settings for my volume, I found the “Snapshot Reserve %”-Value. Standard seems to be 5%. Gladly in my case the snapshots takes most of the storage of this volume and I could increase it massively to 20%.

Hurray for checkmk.

2 Likes

Just wondering that you use SNMP to monitor your NetApp.
There is a special agent available using the REST API from NetApp.
I like to recommend you using that one.

regards

Michael

That’s i would also say. But the only missing feature there is the snapshot monitoring :wink:
For this i need to modify the netapp special agent every major version that i have this information inside my monitoring systems.

here’s the corresponding request in the ideas portal: https://ideas.checkmk.com/suggestions/436474/monitor-netapp-snapshots

That’s a different request to what is really needed on Netapp volumes. Normally you want to know how big is the snapshot reserve and how much of this is used. The single snapshots are not so important from my point of view. But that is my opinion :wink:

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.