Custom Plugin Check Function failing

Template for troubleshooting posts

Here is the post template that you can use to make your issue clearer to everyone. Just edit out the sample placeholder texts.


Checkmk version:
2.2.0p18

OS version of Checkmk server or monitored system:
Ubuntu 22.04.1 Azure Edition (CheckMK Cloud Appliance)

Description of the problem : Insert description here
I created a custom extension/plugin to check a host’s service status as reporting in our Job Management System. So far, I have everything working except for the Service State.

The monitored service status shows the following:
check failed - please submit a crash report! (Crash-ID: 6c17fe30-ddcf-11ee-af27-95f85f287d62)

Error message and/or output from below commands:
Please use code blocks for pasting outputs and codes!

The error shown in the Crash view is:

Exception: KeyError (state)
Traceback:

 File "/omd/sites/<removed>/lib/python3/cmk/base/agent_based/checking/_checking.py", line 413, in get_aggregated_result
    consume_check_results(
  File "/omd/sites/<removed>/lib/python3/cmk/base/api/agent_based/checking_classes.py", line 
484,  in consume_check_results
    for subr in subresults:
  File "/omd/sites/<removed>/lib/python3/cmk/base/api/agent_based/register/check_plugins.py", line 93, in filtered_generator
    for element in generator(*args, **kwargs):
  File >"/omd/sites/<removed>/local/lib/python3/cmk/base/plugins/agent_based/indesign_instance_status 
.py", line 38, in check_indesign_instance_status
    for service in section:

Local Variables

{'item': 'IDS2023C_queue',
'section': {'IDS2023A_queue': {'state': <State.OK: 0>, 'status': 'idle'},
           'IDS2023B_queue': {'state': <State.OK: 0>, 'status': 'idle'},
            'IDS2023C_queue': {'state': <State.OK: 0>, 'status': 'idle'}},
'service_info': {'state': <State.OK: 0>, 'status': 'idle'},
'services': {}}

Every other portion of the extension works perfectly though…

I’ve looked through everything in the script and it all looks right.

Does anyone have any ideas of what could be causing this?

Thanks!

You will have to show the code of the check plugin and the agent data that causes the issue.

That’s a good point!

Here’s the plugin script:

#!/usr/bin/env python3

from cmk.base.plugins.agent_based.agent_based_api.v1 import register, Result, Service, State


def parse_indesign_instance_status(string_table):
    parsed_services = []
    
    for line in string_table:
        # Initialize the service dictionary with the service name
        service = {'name': line[0]}

        # The status is the last element in the line, and we know the state needs to be determined from it
        status = line[-1]  # Directly use the last element assuming it's the status
        
        # Determine the service state based on the status value
        if status.lower() in ["idle", "running", "active", "locked"]:
            service['state'] = 'OK'
        elif status.lower() == "down":
            service['state'] = 'CRIT'
        else:
            service['state'] = 'UNKNOWN'
        
        # Directly store the status
        service['status'] = status
        
        # Append the service dictionary to parsed_services
        parsed_services.append(service)
    print(f"Parsed Service Dictionary {parsed_services}")
    return parsed_services



def discover_indesign_instance_status(section):
    print("Starting discovery phase...")
    print(f"Raw section data received: {section}")
    # Assuming section data is now structured in a way that services can be directly identified
    for service in section:
        # You might need to filter out any unwanted keys, similar to the previous 'if' condition
        yield Service(item=service['name'])



def check_indesign_instance_status(item, section):
    print(f"Received Service Dictionary: {section}")
    for service in section:
        if service['name'] == item:
            # Use dict.get() to avoid KeyError if 'state' or 'status' key does not exist
            state = service.get('state')
            status = service.get('status')
            
            # Check if either state or status is None and handle accordingly
            if state is None or status is None:
                result_state = 'state.UNKNOWN'
                summary = f"{item} state or status is missing"
                yield Result(state=State.UNKNOWN, summary=summary)
            elif state == 'CRIT':
                result_state = 'state.CRIT'
                summary = f"{item} state is critical: {status}"
                yield Result(state=State.CRIT, summary=summary)
            elif state == 'OK':
                result_state = 'state.OK'
                summary = f"{item} state is operational: {status}"
                yield Result(state=State.OK, summary=summary)
            else:
                result_state = 'state.UNKNOWN'
                summary = f"{item} state is unknown: {status}"
                yield Result(state=State.UNKNOWN, summary=summary)
            
            print(result_state)
            # Yield a result summary based on the service state.


register.agent_section(
    name="indesign_instance_status",
    parse_function=parse_indesign_instance_status,
)

register.check_plugin(
    name="indesign_instance_status",
    service_name="InDesign %s",
    discovery_function=discover_indesign_instance_status,
    check_function=check_indesign_instance_status,
)

Agent Data:

<<<indesign_instance_status>>>
IDS2023A_queue - Status: idle
IDS2023C_queue - Status: idle
IDS2023B_queue - Status: idle

This gives you only the name of the service. Try to change it like this:

    for name, service in section.items():
        if name == item:
1 Like

It turns out that it was an issue with permissions. I backed up the site and then replaced the site with the backup and now all is working.

Thank you for the advice @thl-cmk, I’ll experiment with that, but for now I’m just happy to have this service monitored fully.