Help creating SNMP Check

Dear Forum.

I would like to monitor the CPU usage of my Netgear switches.
I have found the oid for it from the SNMP walk.
.1.3.6.1.4.1.4526.11.1.1.4.9.0 5 Secs ( 70%) 60 Secs ( 80.12231%) 300 Secs ( 80%)

How would I add the 5 Sec value to a check?

I’m completely new to making custom checks and need some guidance.

I’m currently using cmk version 1.5.0p16 Enterprise.

Have a look at https://checkmk.com/cms_legacy_devel_snmpbased.html and the other articles on https://checkmk.com/cms_legacy_documentation.html#devel

This is what i have currently. Witch dose not seem to work

def inventory_netg_cpu(info):
   # Debug: lets see how the data we get looks like
   print info
   return []

def check_netg_cpu(item, params, info):
   return (3, "UNKNOWN - not yet implemented")

check_info["netg_cpu"] = {
    "check_function"        : check_netg_cpu,
    "inventory_function"    : inventory_netg_cpu,
    "service_description"   : "NIC %s",
    "snmp_info"             : ( ".1.3.6.1.4.1.4526.11.1.1.4.9.0", [ "2", "3", "8" ] )
}

And i can see the results from the

cmk -vv --oid .1.3.6.1.4.1.4526.11.1.1.4.9.0 --snmpwalk XS716T-switch

Walk on ".1.3.6.1.4.1.4526.11.1.1.4.9.0"...Executing BULKWALK of ".1.3.6.1.4.1.4526.11.1.1.4.9" on Z1P99-SWL01
.1.3.6.1.4.1.4526.11.1.1.4.9.0 => [    5 Secs ( 96.2594%)   60 Secs ( 97.2394%)  300 Secs ( 97.629%)] 'OCTETSTR'
1 variables.

I need cmk to retrive the 5 Sec value

This is the return of my check, cmk -v --checks netg_cpu -I switchname

 [snmp] Execute data source
 [piggyback] Execute data source
+ EXECUTING DISCOVERY PLUGINS (1)
SUCCESS - Found nothing new

You’re not returning anything in the inventory function.
return info perhaps ?

From the NETGEAR-SWITCHING-MIB I see this oid relates to agentSwitchCpuProcessTotalUtilization. I am not sure what values you were looking for with 2, 3 and 8.

From memory, try this.

  def inventory_netg_cpu(info):
     # Debug: lets see how the data we get looks like
     import pprint;pprint.pprint(info)
     return [ (None, None) ]
  
  def check_netg_cpu(item, params, info):
     return (3, "UNKNOWN - not yet implemented")
  
  check_info["netg_cpu"] = {
      "inventory_function"    : inventory_netg_cpu,
      "check_function"        : check_netg_cpu,
      "service_description"   : "CPU Utilization (5 Sec.)",
      "snmp_scan_function"    : lambda oid: oid(".1.3.6.1.2.1.1.2.0").startswith(".1.3.6.1.4.1.4526.") and \
                                            oid(".1.3.6.1.4.1.4526.11.1.1") != None
      "snmp_info"             : ( ".1.3.6.1.4.1.4526.11.1.1.4.9", ["0"] )
  }

If you run an inventory with that check against your switch it will print the contents of info. You can then undestand how the data is returned and can check it before returning the results.

cmk -v --checks netg_cpu -II XS716T-switch

In the check_netg_cpu function you can then extract the 5 second percentage and return that as the details to the check.

I have managed to make this

def inventory_my_netgear_cpu(info):
    pprint.pprint(info)
    return []


def check_my_netgear_cpu(item, params, info):
    return 3, 'not yet implemented'


    check_info['my_netgear_cpu'] = {
        'inventory_function': inventory_my_netgear_cpu,
        'check_function': check_my_netgear_cpu,
        'service_description': 'DESCR',
        'snmp_info': ('.1.3.6.1.4.1.4526.11.1.1.4.9', ["0"]),
        'snmp_scan_function': lambda oid: True,
    }

And that returns:

OMD[mysite]:~$ cmk --debug -vv --checks my_netgear_cpu switch1
[cpu_tracking] Start with phase 'busy'
Check_MK version 1.6.0p8
Try aquire lock on /omd/sites/mysite/tmp/check_mk/counters/switch1
Got lock on /omd/sites/mysite/tmp/check_mk/counters/switch1
Releasing lock on /omd/sites/mysite/tmp/check_mk/counters/switch1
Released lock on /omd/sites/mysite/tmp/check_mk/counters/switch1
Loading autochecks from /omd/sites/mysite/var/check_mk/autochecks/switch1.mk
+ FETCHING DATA
[cpu_tracking] Push phase 'snmp' (Stack: ['busy'])
 [snmp] No persisted sections loaded
 [snmp] Not using cache (Don't try it)
 [snmp] Execute data source
 [snmp] my_netgear_cpu: Fetching data
Executing WALK of ".1.3.6.1.4.1.4526.11.1.1.4.9" on switch1
.1.3.6.1.4.1.4526.11.1.1.4.9.0 => [    5 Secs ( 24.2257%)   60 Secs ( 26.82%)  300 Secs ( 28.5991%)] 'OCTETSTR'
 [snmp] Write data to cache file /omd/sites/mysite/tmp/check_mk/data_source_cache/snmp/switch1
Try aquire lock on /omd/sites/mysite/tmp/check_mk/data_source_cache/snmp/switch1
Got lock on /omd/sites/mysite/tmp/check_mk/data_source_cache/snmp/switch1
Releasing lock on /omd/sites/mysite/tmp/check_mk/data_source_cache/snmp/switch1
Released lock on /omd/sites/mysite/tmp/check_mk/data_source_cache/snmp/switch1
[cpu_tracking] Pop phase 'snmp' (Stack: ['busy', 'snmp'])
[cpu_tracking] Push phase 'agent' (Stack: ['busy'])
 [piggyback] No persisted sections loaded
 [piggyback] Execute data source
No piggyback files for 'switch1'. Skip processing.
No piggyback files for 'IP OF SWITCH'. Skip processing.
[cpu_tracking] Pop phase 'agent' (Stack: ['busy', 'agent'])
[cpu_tracking] End
OK - [snmp] Success, execution time 0.0 sec | execution_time=0.026 user_time=0.020 system_time=0.010 children_user_time=0.000 children_system_time=0.000 cmk_time_snmp=0.011 cmk_time_agent=-0.010

How would i go about getting the 5 Sec data as an metric?

Hi,
when you will see the result of ino with your pprint function you need the right command for discovery: cmk --debug -vvII … instead of -vv the you will see the result. Otherwise put your pprint in the check function.
Gegards, Christian

If you send me the snmpwalk info I can use it to run tests against on my dev box

cmk --snmpwalk switch1

The data will be stored under the snmpwalks directory at

~/var/check_mk/snmpwalks/switch1


Greg

This is my check

def parse_my_netgear_cpu(info):
    try:
        return float(info[0][0])
    except (IndexError, ValueError):
        return

def inventory_my_netgear_cpu(parsed):
    if parsed:
        return [(None, {})]

def check_my_netgear_cpu(item, params, parsed):
    if not parsed:
        return
    return check_cpu_util(parsed, params)

check_info['my_netgear_cpu'] = {
    'parse_function': parse_my_netgear_cpu,
    'inventory_function': inventory_my_netgear_cpu,
    'check_function': check_my_netgear_cpu,
    'service_description': 'CPU utilization',
    'snmp_info': ('.1.3.6.1.4.1.4526.11.1.1.4', ['9']),
    'snmp_scan_function': lambda oid: oid('.1.3.6.1.2.1.1.2.0', '').startswith('.1.3.6.1.4.1.1139'),
    'includes': ['cpu_util.include'],
    'group': 'cpu_utilization',
}

And when doing the: cmk --debug -vII --checks my_netgear_cpu switch1
I get this result:

Discovering services on: switch1
switch1:
+ FETCHING DATA
 [snmp] Execute data source
 [piggyback] Execute data source
No piggyback files for 'switch1'. Skip processing.
No piggyback files for 'IP OF SWITCH'. Skip processing.
+ EXECUTING DISCOVERY PLUGINS (1)
[[u'    5 Secs ( 23.2316%)   60 Secs ( 32.3220%)  300 Secs ( 30.4773%)']]
SUCCESS - Found no services, no host labels

How do i parse the result to get the 5 Sec value? (23.2316%)

You need to learn python :wink:

perc5 = info[0][0].strip().split(’(’)[1].split(’)’)[0].replace(’%’,’’)

What is the next step`?

This is the check, how do i get this to be a service?

def inventory_switch_cpu1(info):
   perc5 = info[0][0].strip().split("(")[1].split(")")[0].replace("%","")
   print perc5
   return []

def check_switch_cpu1(item, _no_params, info):
   if perc5 < 70:
      return 0, "OK - CPU% is " + perc5
   else:
      return (3, "UNKNOWN - not yet implemented")

check_info["switch_cpu1"] = {
    "check_function"        : check_switch_cpu1,
    "inventory_function"    : inventory_switch_cpu1,
    'service_description': 'CPU utilization',
    "snmp_info"             : ( ".1.3.6.1.4.1.4526.11.1.1.4", ["9"] ),
    'includes': ['cpu_util.include'],
    'group': 'cpu_utilization',
}

Look at the links from Robert. You need to write the discovery function (old wording inventory function).

1 Like

All working now.

I just need a way to apply this to a selection of hosts so that it does not check all my snmp devices

Is there a way to check multiple oid’s?
I have a couple of switches. Do i need to make a check for each or?

Yes, this is possible to check multiple OIDs. Please look in the given checks as example.

What might be the problem here?

def inventory_switchcpu(info):
    data = str(info).split("(")[1].split(")")[0].strip()
    tp = tuple(data)
    tp1 = tuple("1")
    return (1, tp)

def check_switchcpu(item, parms, info):
    data = str(info).split("(")[1].split(")")[0].strip()
    if data < 75.0:
        return (0, "OK - CPU @ " + data)
    elif data > 95.0:
        return (2, "CRIT - CPU @ " + data)
    else:
        return (1, "WARN - CPU @ " + data)

check_info['switchcpu'] = {
    'inventory_function'    : inventory_switchcpu,
    'check_function'        : check_switchcpu,
    'service_description'   : 'CPU utilization',
    'snmp_info': ('.1.3.6.1.4.1.4526.11.1.1.4', ["9"]),
    'snmp_scan_function'    : lambda oid: True,
    'includes': ['cpu_util.include'],
    'group': 'cpu_utilization',
}

The script has been remade.

switchcpu_default_levels = (80.0, 90.0)

def check_switchcpu(item, params, info):
     data = (info).split("(")[1].split(")")[0].strip().replace("%","")
     util = float(data)
     infotext = "CPU utilization" + str(util)
     warn, crit = params
     perfdata = [("util", util, warn, crit, 0, 100)]
     if util >= crit:
         return (2, infotext + " (critical at %d%%)" % crit, perfdata)
     elif util >= warn:
         return (1, infotext + " (warning at %d%%)" % warn, perfdata)
     else:
         return (0, infotext, perfdata)

check_info['switchcpu'] = {
    'check_function'        : check_switchcpu,
    'inventory_function'    : lambda info: [(None, "switchcpu_default_levels")],
    'service_description'   : 'CPU utilization',
    'snmp_info': ('.1.3.6.1.4.1.4526.11.1.1.4', ["9"]),
    'snmp_scan_function'    : lambda oid: True,
    "group"                 : "cpu_utilization",
    "has_perfdata"          : True,
}

With this i get an error in the web panel.

All of my problems has been resolved, posting the final script here.

switchcpu_default_levels = (80.0000, 90.0000)

def check_switchcpu(item, params, info):
     data = ((str(info)).split("(")[1])
     data1 = (data.split(")")[0]).strip().replace("%","")
     util = float(data1)
     infotext = "CPU utilization " + str(util)
     warn, crit = params
     perfdata = [("util", util, warn, crit, 0, 100)]
     if util >= crit:
         return (2, infotext + " (critical at %d%%)" % crit, perfdata)
     elif util >= warn:
         return (1, infotext + " (warning at %d%%)" % warn, perfdata)
     else:
         return (0, infotext, perfdata)

check_info['switchcpu'] = {
    'check_function'        : check_switchcpu,
    'inventory_function'    : lambda info: [(None, "switchcpu_default_levels")],
    'service_description'   : 'CPU utilization',
    'snmp_info': ('.1.3.6.1.4.1.4526.11.1.1.4', ["9"]),
    'snmp_scan_function'    : lambda oid: True,
    "group"                 : "cpu_utilization",
    "has_perfdata"          : True,
}