Systemd Services Summary finds 3 failed services but doesn't get CRIT - stays OK

CMK version:
2.2.0p6 (enterprise)
OS version:
AlmaLinux 8.8

Seems like this still worked in 2.1 but is not working properly in 2.2 anymore.
Or did i miss something?

Situation:
Systemd Services Summary shows 3 failed services (no parsing problem, services correctly discovered as “failed”, but the Systemd Services Summary check still stays OK.

Output from “cmk-agent-ctl dump on the monitored server”:

× memcached@CENSORED.service - Memcached CENSORED
 Loaded: loaded (/etc/systemd/system/memcached@.service; enabled; preset: disabled)
 Active: failed (Result: exit-code) since Mon 2023-08-07 09:01:05 CEST; 8h ago
 Duration: 26ms
 Main PID: 784 (code=exited, status=71)
 CPU: 8ms

× memcached@CENSORED_1.service - Memcached CENSORED_1
 Loaded: loaded (/etc/systemd/system/memcached@.service; enabled; preset: disabled)
 Active: failed (Result: exit-code) since Mon 2023-08-07 09:01:05 CEST; 8h ago
 Duration: 23ms
 Main PID: 788 (code=exited, status=71)
 CPU: 8ms

× memcached@CENSORED_2.service - Memcached CENSORED_2
 Loaded: loaded (/etc/systemd/system/memcached@.service; enabled; preset: disabled)
 Active: failed (Result: exit-code) since Mon 2023-08-07 09:01:05 CEST; 8h ago
 Duration: 19ms
 Main PID: 798 (code=exited, status=71)
 CPU: 6ms

and

memcached@CENSORED.service loaded failed failed Memcached CENSORED
memcached@CENSORED_1.service loaded failed failed Memcached CENSORED_1
memcached@CENSORED_2.service loaded failed failed Memcached CENSORED_2

Here are some Details on my configured WATO rules for systemd service:
Bildschirmfoto 2023-08-07 um 17.40.21

I did a short and “hacky” hotfix because i need this to work correctly again for us:

--- ./modified/lib/python3/cmk/base/plugins/agent_based/systemd_units.py	2023-08-07 18:22:46.634267603 +0200
+++ ./lib/python3/cmk/base/plugins/agent_based/systemd_units.py	2023-08-07 19:47:34.779297890 +0200
@@ -7,6 +7,7 @@
 from datetime import timedelta
 from enum import Enum
 from typing import Any, Iterable, Iterator, Mapping, NamedTuple, Optional, Sequence
+import re

 from .agent_based_api.v1 import check_levels, regex, register, render, Result, Service, State
 from .agent_based_api.v1.type_defs import CheckResult, DiscoveryResult, StringTable
@@ -549,6 +550,21 @@
     services_organised = _services_split(units, blacklist)
     yield Result(state=State.OK, summary=f"Disabled: {len(services_organised['disabled']):d}")
     # some of the failed ones might be ignored, so this is OK:
+
+    failed_services = 0
+    for i in units:
+        if i.active_status == 'failed':
+            failed_services += 1
+            for a in blacklist:
+                if re.search(a, i.name):
+                    print("failed but blacklisted: " + i.name)
+                    failed_services -= 1
+
+    if failed_services > 0:
+        yield Result(
+            state=State.CRIT, summary=f"Failed: {failed_services:d}"
+        )
+    else:
     yield Result(
         state=State.OK, summary=f"Failed: {sum(s.active_status == 'failed' for s in units):d}"
     )

I know that the fix is far from ideal but my python skills are limited and i needed a quick solution.

Same situation on my site, currently running 2.2.0p30. I’m about to upgrade to 2.3 to see if it’s still there. Did you ever get it fixed apart from your hack? TIA!

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.