Help with an SNMP check: fine on cli, no output on web

I’m struggling with this particular check I’ve been writing. It runs fine on the command line, but in the GUI it either runs and doesn’t register any values, or gives an error of the form,

**Service discovery failed for this host** : Got invalid data:

Invalid check parameter string 'LKD' (6, 37, u'LKD')

Here’s the check itself:

#!/usr/bin/python
# -*- encoding: utf-8; py-indent-offset: 4 -*-


def inventory_endrun_clock(info):
    #for timeFigureOfMerit, ptpAccuracy, sigState in info:
    #    yield None
    #print(info)
    realinfo = []
    for i in range(len(info[0])):
        if(info[0][i] != u''):
            realinfo.append(info[0][i])
        elif(info[1][i] != u''):
            realinfo.append(info[1][i])
        else:
            print("neither array has data for [%i]"%i)
            realinfo.append(u'')
    realinfo[0]=int(realinfo[0])
    realinfo[1]=int(realinfo[1])
    info=tuple(realinfo)
    print(info)
    yield(info)


def check_endrun_clock(item, params, info):
    print("entering check_endrun_clock")
    for timeFigureOfMerit, ptpAccuracy, sigState in info:
        if timeFigureOfMerit == item:
            status = 0
            snmp_status = ''
            if timeFigureOfMerit > 8:
                snmp_status += 'Time error is > 10ms:'
                status = 2
            if ptpAccuracy > 1000:
                snmp_status += 'PTP Clock Accuracy greater than 1 milisecond (%i):' % ptpAccuracy
                status = 2
            if sigState <> 'LKD':
                snmp_status += 'CDMA signal status not locked! Currently=%s:' % sigState
                status = 2
            if (status == 0) and (ptpAccuracy > 500):
                snmp_status += 'PTP Clock Accuracy greater than 500 microseconds (%i):' % ptpAccuracy
                status = 1
            if status == 0:
                snmp_status="CDMA signal locked: Time error of %i: PTP Accuracy of %i" % (timeFigureOfMerit, ptpAccuracy)
            return (status, "Status: %s" % snmp_status, [('timeFigureOfMerit', timeFigureOfMerit), ('ptpAccuracy', ptpAccuracy)])


check_info['endrun_clock'] = {
    'check_function':       check_endrun_clock,
    'inventory_function':   inventory_endrun_clock,
    'service_description':  "endrun clock, signal is %s",
    'snmp_scan_function':   lambda oid: 'Sonoma' in oid(".1.3.6.1.2.1.1.1.0") and oid(".1.3.6.1.2.1.1.1.0") != None,
    'has_perfdata':         True,
}

# endrun base 1.3.6.1.4.1.13827
# .11.1.11.0 Integer time figure of merrit, timeFigureOfMerit
# .11.4.16.0 Integer PTP Clock Accuracy, ptpAccuracy
# .11.2.6.0 string CMD Signal State, sigState
snmp_info['endrun_clock'] = ('.1.3.6.1.4.1.13827', ['11.1.11', '11.4.16.0', '11.2.6.0'])

First thing - it would be good to see how your info looks like. (the value of the info variable from your inventory function)
Second point is “yield” at the end without a loop means only one value returned.
It would be better to use a return there instead of yield.
Also there should be no print statement inside of inventory function.
Other thing is I don’t understand what you want to achieve with your many appends?

The data at the end of the inventory function looks like:
(u’6’, u’37’, u’LKD’)

The majority of the function (all the appends, etc) is just to flatten the array that’s returned into one line. Before building up that array and casting it to a tuple, info looks like this:

[[u’’, u’37’, u’LKD’], [u’6’, u’’, u’’]]

As your first output shows the data is invalid :slight_smile:

The result of the inventory function should only be one value per item plus thresholds if needed for inventory.

But your check is only pulling three different OIDs.
First step is you need to extend your first OID also with a 0. At the moment you get two line in your result because of the missing 0 after ‘11.1.11’ → ‘11.1.11.0’

Now you should only get one line.
If you want to use the Signal State as your service description, i think this is not good if the signal state changes, you need to output the Signal State description as result of your inventory function.

example end of your inventory function

return [(info[0][2], {})]

I would do a change in your service description.

'service_description':  "endrun clock",

And then only return an empty value in your inventory. That means if data is present at inventory time then also this check should work.

return [(None, {})]

That’s all

Wow. Thank you! I’m going to let this bake ovenight, and will post the clean/working version in the morning.

So, it’s mostly working, however some hosts that match the scan function, and respond to all the appropriate OIDs aren’t registering in the GUI. From the CLI it’s running and seeing it:

OMD[network]:~/ks_junk$ cmk -nv --no-cache --debug --checks endrun_clock -II clock3
Discovering services on: clock3
clock3:
+ FETCHING DATA
 [snmp] Execute data source
 [piggyback] Execute data source
+ EXECUTING DISCOVERY PLUGINS (1)
  1 endrun_clock
SUCCESS - Found 1 services

The check itself

#!/usr/bin/python
# -*- encoding: utf-8; py-indent-offset: 4 -*-


def inventory_endrun_clock(info):
    return [(None, {})]


def check_endrun_clock(item, params, info):
    for timeFigureOfMerit, ptpAccuracy, sigState in info:
        status = 0
        snmp_status = ''
        if int(timeFigureOfMerit) > 8:
            snmp_status += 'Time error is > 10ms:'
            status = 2
        if int(ptpAccuracy) > 1000:
            snmp_status += 'PTP Clock Accuracy greater than 1 milisecond (%s):' % ptpAccuracy
            status = 2
        if sigState <> 'LKD':
            snmp_status += 'CDMA signal status not locked! Currently=%s:' % sigState
            status = 2
        if (status == 0) and (int(ptpAccuracy) > 500):
            snmp_status += 'PTP Clock Accuracy greater than 500 microseconds (%s):' % ptpAccuracy
            status = 1
        if status == 0:
            snmp_status="CDMA signal locked: Time error of %s: PTP Accuracy of %s" % (timeFigureOfMerit, ptpAccuracy)
        return (status, "Status: %s" % snmp_status, [('timeFigureOfMerit', timeFigureOfMerit), ('ptpAccuracy', ptpAccuracy)])


check_info['endrun_clock'] = {
    'check_function':       check_endrun_clock,
    'inventory_function':   inventory_endrun_clock,
    'service_description':  "endrun clock",
    'snmp_scan_function':   lambda oid: 'Sonoma' in oid(".1.3.6.1.2.1.1.1.0") and oid(".1.3.6.1.2.1.1.1.0") != None,
    'has_perfdata':         True,
}

# endrun base 1.3.6.1.4.1.13827
# .11.1.11.0 Integer time figure of merrit, timeFigureOfMerit
# .11.4.16.0 Integer PTP Clock Accuracy, ptpAccuracy
# .11.2.6.0 string CMD Signal State, sigState
snmp_info['endrun_clock'] = ('.1.3.6.1.4.1.13827', ['11.1.11.0', '11.4.16.0', '11.2.6.0'])

If you use this inside the command line, then you force the inventory of this service also if the scan function fails.

It would be nice to see how the first SNMP lines look like on the different hosts.

You mean how the OID that I use in the scan function looks for each one?

OMD[network]:~/ks_junk$ for i in $(<clocks);do snmpget -v1 -c foo $i .1.3.6.1.2.1.1.1.0; done
SNMPv2-MIB::sysDescr.0 = STRING: Linux clock1 3.2.2-Sonoma 6010-0064-000_v2.01 #1 PREEMPT Sun Nov 13 21:31:23 UTC 2016 armv5tel
SNMPv2-MIB::sysDescr.0 = STRING: Linux clock3 3.2.2-Sonoma 6010-0064-000_v1.01 #1 PREEMPT Mon Apr 15 22:44:31 UTC 2013 armv5tel

Scratch that. I went back and tested on my dev instance, and the check (as posted) works perfectly. So now it’s a matter of figuring out why it doesn’t work on my real prod instance.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.