NetApp (Service Check Timed Out)

CheckMK version:
2.0.0p17 raw

OS version of CheckMK server or monitored system:
Debian 10 on CheckMK server / NetApp Release 9.7P12

Description of the problem :
WebAPI information is not received although data was received the first time.

Output of → cmk --debug -vvl hostname

cmk --debug -vvI NAAFF01
Discovering services and host labels on: NAAFF01
NAAFF01:
+ FETCHING DATA
  Source: SourceType.HOST/FetcherType.SNMP
[cpu_tracking] Start [7fddb26e9880]
Loading autochecks from /omd/sites/aks/var/check_mk/autochecks/NAAFF01.mk
[SNMPFetcher] Fetch with cache settings: SNMPFileCache(base_path=PosixPath('/omd/sites/aks/tmp/check_mk/data_source_cache/snmp/NAAFF01'), max_age=MaxAge(checking=0, discovery=120, inventory=120), disabled=False, use_outdated=False, simulation=False)
Not using cache (Too old. Age is 331 sec, allowed is 120 sec)
[SNMPFetcher] Execute data source
  SNMP scan:
   SNMP STUFF........

[cpu_tracking] Start [7fddb2406910]
Calling: /omd/sites/aks/share/check_mk/agents/special/agent_netapp '172.17.2.80' 'monitoring' 'PASSWORD' '--no_counters' 'volumes'
[ProgramFetcher] Fetch with cache settings: DefaultAgentFileCache(base_path=PosixPath('/omd/sites/aks/tmp/check_mk/data_source_cache/special_netapp/NAAFF01'), max_age=MaxAge(checking=0, discovery=120, inventory=120), disabled=False, use_outdated=False, simulation=False)
Not using cache (Too old. Age is 970 sec, allowed is 120 sec)
[ProgramFetcher] Execute data source

AND THEN THE CHECK FREEZES

Hi @Nettcode and welcome to the checkmk community.

I would suggest your netapp is not responding in time and so your check freezes. Can you run the command line call to netapp directly and measure the time?

1 Like

Hi @tosch,

Am I doing it right? If so, I get an error message.

root@cmk:/opt/omd/sites/aks/share/check_mk/agents/special# ./agent_netapp 172.17.2.80 monitoring PASSWORD
Traceback (most recent call last):
File “./agent_netapp”, line 9, in
from cmk.special_agents.agent_netapp import main
ModuleNotFoundError: No module named ‘cmk’

Please always use checkmk related stuff as site-user. omd su <sitename> and run the command again. Can you please also add the option --debug, this may show additional errors on the call.

I trust you, but my company doesn’t :slight_smile: The domain is blocked for me.

How long does this request took? If it’s longer than 120 seconds it could be the problem. You can try to disable the counters. They take quite a good chunk of time to gather on larger filers.

I have now been unlocked. I have attached the file for you. When I execute the “Live” command, I don’t think it will complete. (I can press the Enter key and I am still in the query)…

Got following Message at the end Querying class environment-sensors-get-iter: Timeout: Operation “environment_sensors_iterator::next_imp()” took longer than 50 seconds to complete [from mgwd on node “naaff01-01” (VSID: -1) to mgwd at 169.254.103.102]

Please delete the file, you got some sensitive data inside!

1 Like

@tosch I have deleted the post.

It took around 23 seconds to fetch the API call from the filer. Not that bad. (i see with disabled counters)

I remember from our use of NetApp that there is an DOS attack protection included and some weird other behaviors. In total the there are 41 queries startet against the API which could result in a block for your IP. I would suggest to higher the monitoring interval to at least 5 minutes.

I have the counters already disabled but to test your suggestion, where can I adjust the monitoring interval for the host?

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.