Adapting NUT plugin for CheckMK 2.4

Hi everyone,

I’m looking for some guidance on adapting the NUT (Network UPS Tools) check plugin (by Marcel Pennewiss) for CheckMK version 2.4.


[WARN] Check_man page uses an API (/legacy) which is marked as deprecated in Checkmk 2.3.0 and will be removed in Checkmk 2.4.0.

[WARN] File /local/share/check_mk/web/plugins/agent_based/deprecated.py uses an API (/legacy) which is marked as deprecated in Checkmk 2.3.0 and will be removed in Checkmk 2.4.0.

[WARN] Legacy GUI extension in /metrics uses an API (/legacy) which is marked as deprecated in Checkmk 2.3.0 and will be removed in Checkmk 2.4.0.

[WARN] File /local/share/check_mk/web/plugins/performer uses an API (/legacy) which is marked as deprecated in Checkmk 2.3.0 and will be removed in Checkmk 2.4.0.

[WARN] File /local/share/check_mk/web/plugins/wato/wato.py uses an API which is marked as deprecated and may not work anymore due to unknown imports or objects.


The client output is:

<<>>
==> UPS <==
battery.charge: 100
battery.charge.low: 10
battery.charge.warning: 50
battery.date: 2001/09/25
battery.mfr.date: 2022/09/16
battery.runtime: 4311
battery.runtime.low: 120
battery.type: PbAc
battery.voltage: 13.5
battery.voltage.nominal: 12.0
device.mfr: American Power Conversion
device.model: Back-UPS ES 650G2
device.serial: 5B2237T68435
device.type: ups
driver.name: usbhid-ups
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: auto
driver.parameter.synchronous: no
driver.version: 2.7.4
driver.version.data: APC HID 0.96
driver.version.internal: 0.41
input.sensitivity: medium
input.transfer.high: 266
input.transfer.low: 180
input.transfer.reason: input voltage out of range
input.voltage: 234.0
input.voltage.nominal: 230
ups.beeper.status: enabled
ups.delay.shutdown: 20
ups.firmware: 937.a2 .I
ups.firmware.aux: a2
ups.load: 6
ups.mfr: American Power Conversion
ups.mfr.date: 2022/09/16
ups.model: Back-UPS ES 650G2
ups.productid: 0002
ups.realpower.nominal: 400
ups.serial: 5B2237T68435
ups.status: OL
ups.test.result: No test initiated
ups.timer.reboot: 0
ups.timer.shutdown: -1
ups.vendorid: 051d

I created a version of nut.py compatible with Checkmk 2.4.
It’s very basic, but it seems to work.
I don’t rule out errors, especially with non-standard values.

#!/usr/bin/env python3

# Plugin for checking UPS with NUT via checkmk 2.4
# Example output of check_mk_agent
    # <<<nut>>>
    # ==> UPS <==
    # battery.charge: 100
    # battery.charge.low: 10
    # battery.charge.warning: 50
    # battery.date: 2001/09/25
    # battery.mfr.date: 2020/10/13
    # battery.runtime: 212
    # battery.runtime.low: 120
    # battery.type: PbAc
    # battery.voltage: 27.1
    # battery.voltage.nominal: 24.0
    # device.mfr: American Power Conversion
    # device.model: Back-UPS XS 1400U 
    # device.serial: 4B2042P09573  
    # device.type: ups
    # driver.name: usbhid-ups
    # driver.parameter.bus: 003
    # driver.parameter.pollfreq: 30
    # driver.parameter.pollinterval: 2
    # driver.parameter.port: auto
    # driver.parameter.product: Back-UPS XS 1400U  FW:926.T2 .I USB FW:T2
    # driver.parameter.productid: 0002
    # driver.parameter.serial: 4B2042P09573
    # driver.parameter.synchronous: auto
    # driver.parameter.vendor: American Power Conversion
    # driver.parameter.vendorid: 051D
    # driver.version: 2.8.0
    # driver.version.data: APC HID 0.98
    # driver.version.internal: 0.47
    # driver.version.usb: libusb-1.0.26 (API: 0x1000109)
    # input.sensitivity: medium
    # input.transfer.high: 280
    # input.transfer.low: 155
    # input.voltage: 236.0
    # input.voltage.nominal: 230
    # ups.beeper.status: enabled
    # ups.delay.shutdown: 20
    # ups.firmware: 926.T2 .I
    # ups.firmware.aux: T2 
    # ups.load: 29
    # ups.mfr: American Power Conversion
    # ups.mfr.date: 2020/10/13
    # ups.model: Back-UPS XS 1400U 
    # ups.productid: 0002
    # ups.realpower.nominal: 700
    # ups.serial: 4B2042P09573  
    # ups.status: OL
    # ups.test.result: No test initiated
    # ups.timer.reboot: 0
    # ups.timer.shutdown: -1
    # ups.vendorid: 051d

from cmk.agent_based.v2 import (
    AgentSection,
    CheckPlugin,
    Result,
    Service,
    State,
    Metric,
    #check_levels
)
import math

def try_number(value_str: str | None) -> int | float | None:
    if value_str is None: return None
    try:
        if "." not in value_str and "," not in value_str and "e" not in value_str.lower():
            return int(value_str)
        else:
            return float(value_str.replace(",", "."))
    except (ValueError, TypeError): return None

def parse_nut(string_table: list[list[str]]):
    parsed = {}
    current_ups = None
    for line_fields in string_table:
        if not line_fields: continue
        if line_fields[0] == "==>" and line_fields[-1] == "<==" and len(line_fields) >= 3:
            current_ups = " ".join(line_fields[1:-1])
            if current_ups: parsed[current_ups] = {}
            else: current_ups = None
        elif current_ups and parsed.get(current_ups) is not None:
            raw_key_field = line_fields[0]; key = None; value_fields = []
            if ":" in raw_key_field:
                parts = raw_key_field.split(":", 1); key = parts[0].strip()
                if len(parts) > 1 and parts[1].strip() != "": value_fields = [parts[1].strip()] + line_fields[1:]
                else: value_fields = line_fields[1:]
            elif len(line_fields) > 1 and line_fields[1] == ':':
                key = raw_key_field.strip(); value_fields = line_fields[2:]
            else: continue
            value = " ".join(value_fields).strip()
            if key: parsed[current_ups][key] = value
    return parsed


def sanitize_charge_levels(
    params_levels: tuple | None, agent_crit_str: str | None, agent_warn_str: str | None,
    hc_crit: int, hc_warn: int
) -> tuple[int, int] | None:
    crit_val, warn_val = None, None
    def _to_int_or_none(val_in) -> int | None:
        if isinstance(val_in, (int, float)): return int(val_in)
        if isinstance(val_in, str):
            num_val = try_number(val_in)
            return int(num_val) if num_val is not None else None
        return None
    if params_levels and isinstance(params_levels, tuple) and len(params_levels) == 2:
        p_c, p_w = params_levels
        if p_c is None and p_w is None: return None
        temp_crit = _to_int_or_none(p_c); temp_warn = _to_int_or_none(p_w)
        if temp_crit is not None and temp_warn is not None: crit_val, warn_val = temp_crit, temp_warn
    if crit_val is None:
        ac = _to_int_or_none(agent_crit_str); crit_val = ac if ac is not None else hc_crit
    if warn_val is None:
        aw = _to_int_or_none(agent_warn_str); warn_val = aw if aw is not None else hc_warn
    if not (isinstance(crit_val, int) and isinstance(warn_val, int)): return None
    if crit_val > warn_val: warn_val = crit_val
    return crit_val, warn_val

def sanitize_float_levels(
    levels_from_wato: tuple | None, agent_levels: tuple[float | None, float | None],
    hardcoded_defaults: tuple[float, float], direction: str
) -> tuple[float, float] | None:
    hc_crit, hc_warn = hardcoded_defaults; ag_crit, ag_warn = agent_levels
    final_crit, final_warn = None, None
    def _to_float_or_none(val_in) -> float | None:
        if isinstance(val_in, (int, float)): return float(val_in)
        if isinstance(val_in, str):
            num_val = try_number(val_in); return float(num_val) if num_val is not None else None
        return None
    if levels_from_wato and isinstance(levels_from_wato, tuple) and len(levels_from_wato) == 2:
        w_c, w_w = levels_from_wato
        if w_c is None and w_w is None: return None
        temp_crit = _to_float_or_none(w_c); temp_warn = _to_float_or_none(w_w)
        if temp_crit is not None and temp_warn is not None:
            final_crit, final_warn = float(temp_crit), float(temp_warn)  # force float here
    if final_crit is None: final_crit = float(ag_crit) if ag_crit is not None else float(hc_crit)
    if final_warn is None: final_warn = float(ag_warn) if ag_warn is not None else float(hc_warn)
    if not (isinstance(final_crit, (int,float)) and isinstance(final_warn, (int,float))): return None
    final_crit, final_warn = float(final_crit), float(final_warn)
    if direction == "lower":
        if final_crit > final_warn: final_warn = final_crit # Ensure crit is lower or equal to warn for 'lower' direction logic
    elif direction == "upper":
        if final_warn > final_crit: final_crit = final_warn # Ensure crit is higher or equal to warn for 'upper' direction logic
    return final_crit, final_warn

def discover_nut(section):
    for ups_name in section: yield Service(item=ups_name)

def check_nut(item, params, section):
    if item not in section: yield Result(state=State.UNKNOWN, summary=f"UPS '{item}' not found"); return
    data = section[item]
    if not data: yield Result(state=State.UNKNOWN, summary=f"UPS '{item}' no data"); return

    status = data.get("ups.status")
    if status:
        state = State.OK
        summary_parts = [f"Status: {status}"]
        if status == "OL":
            summary_parts.append("(Online)")
        elif "OB" in status or "DISCHRG" in status: 
            state=State.WARN
            summary_parts.append("(On Batt)")
        if "LB" in status: # Potrebbe essere OL LB, quindi CRIT
            state=State.CRIT
            summary_parts.append("(Low Batt!)")
        if "OVER" in status: # Potrebbe essere OL OVER
            state=State.CRIT
            summary_parts.append("(Overload!)")
        if "OFF" in status:
            state=State.CRIT
            summary_parts.append("(UPS Off!)")
        if "ALARM" in status and state==State.OK: # Solo se non già WARN/CRIT per altre ragioni
            state=State.WARN
            summary_parts.append("(Alarm)")
        yield Result(state=state, summary=", ".join(summary_parts))
    else: 
        yield Result(state=State.UNKNOWN, summary="ups.status missing")
        return # Esce se manca lo status, gli altri check non hanno senso

    # Battery Charge
    battery_charge_str = data.get("battery.charge")
    battery_charge_val = try_number(battery_charge_str)

    if battery_charge_val is not None:
        if not isinstance(battery_charge_val, int):
            battery_charge_val = int(round(float(battery_charge_val)))

        charge_levels_tuple = sanitize_charge_levels(
            params.get("battery_charge_levels"),
            data.get("battery.charge.low"),
            data.get("battery.charge.warning"),
            hc_crit=10, hc_warn=50
        )
        
        yield Metric("battery_charge", battery_charge_val, boundaries=(0, 100))

        if charge_levels_tuple:
            crit_level, warn_level = charge_levels_tuple # These are (crit, warn)
            current_state = State.OK
            summary_msg = f"Charge: {battery_charge_val}%"

            if battery_charge_val < crit_level:
                current_state = State.CRIT
                summary_msg += f" (CRITICAL < {crit_level}%)"
            elif battery_charge_val < warn_level:
                current_state = State.WARN
                summary_msg += f" (WARNING < {warn_level}%)"
            
            yield Result(state=current_state, summary=summary_msg)
        else:
            yield Result(state=State.OK, summary=f"Charge: {battery_charge_val}% (thresholds disabled/invalid)")
    elif "battery.charge" in data:
        yield Result(state=State.UNKNOWN, summary="battery.charge invalid")

    # Battery Runtime
    battery_runtime_str = data.get("battery.runtime")
    battery_runtime_val = try_number(battery_runtime_str)

    if battery_runtime_val is not None:
        if isinstance(battery_runtime_val, float) and (math.isinf(battery_runtime_val) or math.isnan(battery_runtime_val)):
            yield Result(state=State.UNKNOWN, summary=f"Battery runtime is invalid (inf/nan): {battery_runtime_str}")
            return

        battery_runtime_val = float(battery_runtime_val)  # Ensure it's a float
        ag_crit, ag_warn = try_number(data.get("battery.runtime.low")), None
        hc_defaults = (120.0, 300.0) # (crit_default, warn_default)
        levels = sanitize_float_levels(params.get("battery_runtime_levels"), (ag_crit, ag_warn), hc_defaults, "lower")
        
        yield Metric("battery_runtime", battery_runtime_val)

        runtime_str_formatted = f"{int(battery_runtime_val//60)}m{int(battery_runtime_val%60)}s"
        summary_msg = f"Runtime: {runtime_str_formatted}"
        current_state = State.OK # Default state

        alert_message_appended = False
        if levels:
            crit_level, warn_level = levels # (crit, warn)
            
            if battery_runtime_val < crit_level:
                current_state = State.CRIT
                crit_level_summary = f"{int(crit_level//60)}m{int(crit_level%60)}s"
                summary_msg += f" (CRITICAL < {crit_level_summary})"
                alert_message_appended = True
            elif battery_runtime_val < warn_level:
                current_state = State.WARN
                warn_level_summary = f"{int(warn_level//60)}m{int(warn_level%60)}s"
                summary_msg += f" (WARNING < {warn_level_summary})"
                alert_message_appended = True

        else:
            summary_msg += " (thresholds disabled/invalid)"

        yield Result(state=current_state, summary=summary_msg)
    elif "battery.runtime" in data: yield Result(state=State.UNKNOWN, summary="battery.runtime invalid")

    # UPS Load
    ups_load_str = data.get("ups.load")
    ups_load_val = try_number(ups_load_str)

    if ups_load_val is not None:
        if isinstance(ups_load_val, float) and (math.isinf(ups_load_val) or math.isnan(ups_load_val)):
            yield Result(state=State.UNKNOWN, summary=f"UPS load is invalid (inf/nan): {ups_load_str}")
            return # Exit if inf/nan to prevent further issues
        
        ups_load_val = float(ups_load_val) # Ensure it's a float
        
        # Test yielding Metric
        yield Metric("ups_load", ups_load_val, boundaries=(0.0,100.0))


        hc_defaults = (90.0, 80.0) # (crit_default, warn_default) for upper, so crit > warn
        levels = sanitize_float_levels(params.get("ups_load_levels"), (None,None), hc_defaults, "upper")

        summary_msg = f"Load: {ups_load_val:.0f}%"
        current_state = State.OK # Default state

        if levels:
            crit_level, warn_level = levels # (crit, warn)
            # For 'upper', crit_level is the highest (most critical) threshold
            if ups_load_val > crit_level:
                current_state = State.CRIT
                summary_msg += f" (CRITICAL > {crit_level:.0f}%)"
            elif ups_load_val > warn_level:
                current_state = State.WARN
                summary_msg += f" (WARNING > {warn_level:.0f}%)"
            
            yield Result(state=current_state, summary=summary_msg)
        else:
            yield Result(state=State.OK, summary=f"{summary_msg} (thresholds disabled/invalid)")
    elif "ups.load" in data: 
        yield Result(state=State.UNKNOWN, summary="ups.load invalid or not found")

    # Input Voltage
    input_voltage_str = data.get("input.voltage")
    input_voltage_val = try_number(input_voltage_str)

    if input_voltage_val is not None:
        if isinstance(input_voltage_val, float) and (math.isinf(input_voltage_val) or math.isnan(input_voltage_val)):
            yield Result(state=State.UNKNOWN, summary=f"Input voltage is invalid (inf/nan): {input_voltage_str}")
            # return # Decidiamo se fare return o continuare con UNKNOWN e metrica 0/speciale
        else:
            input_voltage_val = float(input_voltage_val) # Ensure it's a float

        # Agent-provided levels (може бути None)
        cl_agent = try_number(data.get("input.transfer.low"))
        ch_agent = try_number(data.get("input.transfer.high"))
        
        # Hardcoded defaults (crit_default, warn_default)
        hc_defaults_lower = (190.0, 200.0) # Example: Crit < 190, Warn < 200
        hc_defaults_upper = (250.0, 240.0) # Example: Crit > 250, Warn > 240


        levels_lower = sanitize_float_levels(
            params.get("input_voltage_levels_lower"),
            (cl_agent, None), # agent_crit_low, agent_warn_low (agent usually only provides crit)
            hc_defaults_lower,
            "lower"
        )
        
        levels_upper = sanitize_float_levels(
            params.get("input_voltage_levels_upper"),
            (ch_agent, None), # agent_crit_high, agent_warn_high (agent usually only provides crit)
            hc_defaults_upper,
            "upper"
        )

        yield Metric("input_voltage", input_voltage_val, boundaries=(0.0, 300.0)) # Assuming 0-300V is a sane range

        final_state = State.OK
        summary_parts = [f"Input: {input_voltage_val:.1f}V"]

        # Check lower thresholds
        if levels_lower:
            crit_low, warn_low = levels_lower
            if input_voltage_val < crit_low:
                final_state = State.CRIT
                summary_parts.append(f"(CRITICAL < {crit_low:.1f}V)")
            elif input_voltage_val < warn_low:
                final_state = State.WARN
                summary_parts.append(f"(WARNING < {warn_low:.1f}V)")

        # Check upper thresholds (only if not already CRIT from lower thresholds)
        if levels_upper and final_state != State.CRIT:
            crit_high, warn_high = levels_upper # For "upper", sanitize_float_levels ensures crit_high >= warn_high
            if input_voltage_val > crit_high:
                final_state = State.CRIT # Can override a WARN from lower levels
                # Remove previous warning if it existed and add critical
                if any("(WARNING <" in part for part in summary_parts):
                    summary_parts = [s for s in summary_parts if not s.startswith("(WARNING <")]
                summary_parts.append(f"(CRITICAL > {crit_high:.1f}V)")
            elif input_voltage_val > warn_high and final_state == State.OK: # Only set to WARN if currently OK
                final_state = State.WARN
                summary_parts.append(f"(WARNING > {warn_high:.1f}V)")
        
        yield Result(state=final_state, summary=" ".join(summary_parts))

    elif "input.voltage" in data: 
        yield Result(state=State.UNKNOWN, summary="input.voltage invalid or not found")

agent_section_nut=AgentSection(name="nut",parse_function=parse_nut)
check_plugin_nut=CheckPlugin(name="nut",service_name="UPS %s",discovery_function=discover_nut,check_function=check_nut,check_default_parameters={},check_ruleset_name="nut_status_thresholds")

Hi,

i am using this extension as well and tried to update it. Found your thread so here is my take on it GitHub - virus2500/checkmk_nut: NUT extension for checkmk

Here is the mkp for it.
nut-3.0.1.mkp (7.0 KB)

Maybe you can test it on your end. Works good on my side.

The github workflows aren’t working atm… haven’t really looked into them.

It’s just a first try so … beware of bugs :wink:

br

3 Likes

Thank you for the work you did, it works perfectly for me as well.
:top: