Writing plugin - Create warn after X time of specific state

I’m working on an plugin to check VSS Writer states (Windows) and need some guidance since I’m only beginner in Python and CMK plugin developing.

For the state “Waiting for Completion”, I want it to be OK for an X time (for example 1 hour), if it’s in the state of “Waiting for Completion” more than X time, I want it to generate state WARN so that someone goes an check on it.

However since I only get Writer name and Status, I don’t know how to check if it’s longer in the state for X time.
Would like to get some tips/guidance or points in the right direction how to work with this to only let it generate an WARN state if the state is the same for more than X time.

I’m aware of the options for delayed notifications, but I don’t want to see an warning in my dashboard unless the state is the same for more than X time.

Sorry, I do not get it. You want to monitor Virtual Shadow Snapshotting on Windows and ensure that these jobs get ready at all and (even better) get ready in time. How do your current scripts look like?

#!/usr/bin/env python3
# -*- encoding: utf-8; py-indent-offset: 4 -*-


from .agent_based_api.v1 import register, render, Result, Metric, State, check_levels, ServiceLabel, Service

#<<<_vss_writers>>>
#Task_Scheduler_Writer {d61d61c8-d73a-4eee-8cdd-f6f9786b7124} {1bddd48e-5052-49db-9b07-b96f96727e6b} Stable No_error
#VSS_Metadata_Store_Writer {75dfb225-e2e4-4d39-9ac9-ffaff65ddf06} {088e7a7d-09a8-4cc6-a609-ad90e75ddc93} Stable No_error
#$line.Writer_name.replace("'","") $line.Writer_Id, $line.Writer_Instance_Id, $line.State.split("_")[1], $line.Last_error

##Posible States
# Stable = OK
# Failed = CRIT
# Unstable = WARN
# In-Progress = OK
# Waiting_for_Completion = OK
# (Might add new states later if needed) # ESC-15-04-2022

def __vss_writers_name(line):
    return line[0]

def parse__vss_writers(string_table):
    section = {}
    for line in string_table:
        name = str(line[0])
        section[name] = {
            'id': str(line[1])
        }
        section[name]['writername'] = str(line[0])
        section[name]['instanceid'] = str(line[2])
        section[name]['state'] = str(line[3])
        section[name]['lasterror'] = str(line[4])

    return section

register.agent_section(
        name="_vss_writers",
        parse_function=parse__vss_writers,
)

def discover__vss_writers(section):
    for name, data in section.items():
        yield Service(item=name)

def check__vss_writers(item, params, section):
    if item in section:
        data = section[item]

        if data['state'] == "Stable":
            yield Result(state=State.OK, summary="%s has a %s state, Last error is: %s. Writer ID: %s. Writer Instance ID: %s." % (data['writername'],data['state'],data['lasterror'],data['id'],data['instanceid']))
        
        elif data['state'] == "In-Progress":
            yield Result(state=State.OK, summary="%s has a %s state, Last error is: %s. Writer ID: %s. Writer Instance ID: %s." % (data['writername'],data['state'],data['lasterror'],data['id'],data['instanceid']))
        
        elif data['state'] == "Waiting_for_Completion":
            yield Result(state=State.OK, summary="%s has a %s state, Last error is: %s. Writer ID: %s. Writer Instance ID: %s." % (data['writername'],data['state'],data['lasterror'],data['id'],data['instanceid']))
        
        elif data['state'] == "Unstable":
            yield Result(state=State.WARN, summary="%s has a %s state, Last error is: %s. Writer ID: %s. Writer Instance ID: %s." % (data['writername'],data['state'],data['lasterror'],data['id'],data['instanceid']))
        
        elif data['state'] == "Failed":
            yield Result(state=State.CRIT, summary="%s has a %s state, Last error is: %s. Writer ID: %s. Writer Instance ID: %s." % (data['writername'],data['state'],data['lasterror'],data['id'],data['instanceid']))
        
        else:
            yield Result(state=State.UNKNOWN, summary="%s has a %s state, Last error is: %s. Writer ID: %s. Writer Instance ID: %s. Please Add Current state to check file" % (data['writername'],data['state'],data['lasterror'],data['id'],data['instanceid']))



register.check_plugin(
        name="_vss_writers",
        service_name="VSS Writer: %s",
        discovery_function=discover__vss_writers,
        check_function=check__vss_writers,
        check_default_parameters={
            },
        check_ruleset_name="_vss_writers",
)


<Added my check file above, I also added my output below <<<_vss_writers>>>
I want to monitor the state of the VSS Writers, because sometimes they end up in an error state.
However, I want to add to the check, if an state for example “Waiting_for_Completion” is the current state for more than 1 hour, it’s no longer OK, but an warning.
Because if the state is longer than 1 hour “Waiting for completion”, an manual action might be needed to fix it.

(Where 1 hour will be an flexible value, that can be changed as a parameter)

But since the output doesn’t bring me value of time, I’m looking for the possibility to let CMK do that in some way.
It not being possible is also an answer :sweat_smile: , but was hoping someone knows a way.

I’d add a metric for “Age of job start”. Use “age” as name of the metric, then Checkmk expects seconds as unit. For a running job just include the age of the start, when the job is not running, return age=0. You’ll get nice spikes in the graphs then to compare the running time.

Your code looks like you’re quite familiar with writing check plugins but maybe you don’t know the two functions

def set_item_state(user_key: object, state: Any) -> None:
def get_item_state(user_key: object, default: Any = None) -> Any:

With these you can save and load arbitrary data (of reasonable size!) between check intervals. You could use them to store some current state and a timestamp and later, in subsequent check intervals, retrieve that data and compare it to the current data, or whatever. I had a similar requirement and did the same. The functions are used like so, for example:

previous_data = get_item_state("some_identifier", default = {})
...
current_data = {
    "key1": 0,
    "key2": "Hello",
    "timestamp": int(time.time())
}
set_item_state("some_identifier", current_data)

You could use that to store since when a certain state is reached.

2 Likes

I look at other plugins and use those as examples for creating my own check plugins. (that’s probably why it looks like that I 'm familiar with it :sweat_smile: )
This might not be the best way to do it, but it worked for me so far in the limited time available.

I will have a look into that tonight to see if I can figure this out and add it to the check :slight_smile: thankyou.
Will let you know how it goes.

1 Like

@Dirk
I’ve just succeeded implementing your suggestion.
Got a bit busy last couple of weeks all of a sudden.
But made time available for it tonight and it’s working now.

Thankyou very much for your help!

1 Like

For reference, this is part the code now:

def check_vss_writers(item, params, section):
    if item in section:
        now = time()
        data = section[item]
        
        previous_data = get_item_state("%s" % (data['writername']))
        if not previous_data:
            set_item_state("%s" % (data['writername']), {"last_state": data['state'], "timestamp": int(now)})
            previous_data = get_item_state("%s" % (data['writername']))

        #################################
        #Check
        #################################
        if data['state'] == "Stable":
            yield Result(state=State.OK, summary="%s has a %s state, Last error is: %s. Writer ID: %s. Writer Instance ID: %s." % (data['writername'],data['state'],data['lasterror'],data['id'],data['instanceid']))
            set_item_state("%s" % (data['writername']), {"last_state": data['state'], "timestamp": int(now)})
        
        elif data['state'] == "In-Progress":
            yield Result(state=State.OK, summary="%s has a %s state, Last error is: %s. Writer ID: %s. Writer Instance ID: %s." % (data['writername'],data['state'],data['lasterror'],data['id'],data['instanceid']))
            set_item_state("%s" % (data['writername']), {"last_state": data['state'], "timestamp": int(now)})
        
        elif data['state'] == "Waiting_for_Completion":
            if data['state'] != previous_data['last_state']:
                set_item_state("%s" % (data['writername']), {"last_state": data['state'], "timestamp": int(now)})
                yield Result(state=State.OK, summary="%s has a %s state, Last error is: %s. Writer ID: %s. Writer Instance ID: %s." % (data['writername'],data['state'],data['lasterror'],data['id'],data['instanceid']))
            elif previous_data['last_state'] == data['state'] and now - previous_data['timestamp'] < 120:
                yield Result(state=State.OK, summary="%s has a %s state, Last error is: %s. Writer ID: %s. Writer Instance ID: %s." % (data['writername'],data['state'],data['lasterror'],data['id'],data['instanceid']))
            elif now - previous_data['timestamp'] >= 120 and previous_data['last_state'] == data['state']:
                yield Result(state=State.WARN, summary="%s has a %s state, Last error is: %s. Writer ID: %s. Writer Instance ID: %s. This Writer is in this state since %s" % (data['writername'],data['state'],data['lasterror'],data['id'],data['instanceid'],render.datetime(int(previous_data['timestamp']))))


1 Like