pdunkler
(Paul Dunkler)
August 7, 2023, 3:56pm
1
CMK version:
2.2.0p6 (enterprise)
OS version:
AlmaLinux 8.8
Seems like this still worked in 2.1 but is not working properly in 2.2 anymore.
Or did i miss something?
Situation:
Systemd Services Summary shows 3 failed services (no parsing problem, services correctly discovered as “failed”, but the Systemd Services Summary check still stays OK.
Output from “cmk-agent-ctl dump on the monitored server”:
× memcached@CENSORED.service - Memcached CENSORED
Loaded: loaded (/etc/systemd/system/memcached@.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Mon 2023-08-07 09:01:05 CEST; 8h ago
Duration: 26ms
Main PID: 784 (code=exited, status=71)
CPU: 8ms
× memcached@CENSORED_1.service - Memcached CENSORED_1
Loaded: loaded (/etc/systemd/system/memcached@.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Mon 2023-08-07 09:01:05 CEST; 8h ago
Duration: 23ms
Main PID: 788 (code=exited, status=71)
CPU: 8ms
× memcached@CENSORED_2.service - Memcached CENSORED_2
Loaded: loaded (/etc/systemd/system/memcached@.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Mon 2023-08-07 09:01:05 CEST; 8h ago
Duration: 19ms
Main PID: 798 (code=exited, status=71)
CPU: 6ms
and
memcached@CENSORED.service loaded failed failed Memcached CENSORED
memcached@CENSORED_1.service loaded failed failed Memcached CENSORED_1
memcached@CENSORED_2.service loaded failed failed Memcached CENSORED_2
Here are some Details on my configured WATO rules for systemd service:
pdunkler
(Paul Dunkler)
August 7, 2023, 4:25pm
2
I did a short and “hacky” hotfix because i need this to work correctly again for us:
--- ./modified/lib/python3/cmk/base/plugins/agent_based/systemd_units.py 2023-08-07 18:22:46.634267603 +0200
+++ ./lib/python3/cmk/base/plugins/agent_based/systemd_units.py 2023-08-07 19:47:34.779297890 +0200
@@ -7,6 +7,7 @@
from datetime import timedelta
from enum import Enum
from typing import Any, Iterable, Iterator, Mapping, NamedTuple, Optional, Sequence
+import re
from .agent_based_api.v1 import check_levels, regex, register, render, Result, Service, State
from .agent_based_api.v1.type_defs import CheckResult, DiscoveryResult, StringTable
@@ -549,6 +550,21 @@
services_organised = _services_split(units, blacklist)
yield Result(state=State.OK, summary=f"Disabled: {len(services_organised['disabled']):d}")
# some of the failed ones might be ignored, so this is OK:
+
+ failed_services = 0
+ for i in units:
+ if i.active_status == 'failed':
+ failed_services += 1
+ for a in blacklist:
+ if re.search(a, i.name):
+ print("failed but blacklisted: " + i.name)
+ failed_services -= 1
+
+ if failed_services > 0:
+ yield Result(
+ state=State.CRIT, summary=f"Failed: {failed_services:d}"
+ )
+ else:
yield Result(
state=State.OK, summary=f"Failed: {sum(s.active_status == 'failed' for s in units):d}"
)
I know that the fix is far from ideal but my python skills are limited and i needed a quick solution.
nilasae
(Nilasae)
July 16, 2024, 2:01pm
3
Same situation on my site, currently running 2.2.0p30. I’m about to upgrade to 2.3 to see if it’s still there. Did you ever get it fixed apart from your hack? TIA!
system
(system)
Closed
July 16, 2025, 2:01pm
4
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.