Systemd Service Summary not recognizing failed services

Hi community

I need help with some behaviour I do not understand. I have a FreeIPA server that I am monitoring with Check_MK. Now I tried to verify, if the monitoring alerts when the service ipa.service fails. Long story short: It does not.

My suspicion is, that the “Systemd Service Summary” Check does not respect this unit. The raw client output shows 278 units, ipa.service is one of them and the state is failed. But the check_mk only shows a total of 100 units.

So my questions are:

  • How does Check_MK decide, which unit it is going to monitor within the “Systemd Service Summary” Check?
  • How can I see what is currently in the list of the units actually moinitored?
  • What are my possibilities to influence this list of units?

What I already tried is to create a “Systemd single services” rule. The rule appears on the list of host rules, so it should apply, but nothing happens further…

Thanks in advance, Michael

Hi @Mambo,

the rule you are showing is not what you expect. This rule is to map the systemd unit state to a specific state at checkmk. The filled fiel Name of the service is in this particular case misleading, because its referred to the service name in checkmk (like Memory, CPU utilization, Interface %s, and so on).

The systemd service summary should tell you, if there are any units in a not OK state, as far as i checked the source code. So i would expect your system service summary to be CRIT if the unit ipa.service is failed.

Can you share a screenshot of a service which has a failed ipa.service and also provide your exact checkmk version?

Thank you @tosch

I see your point and understand why the mentioned rule is not working. Here are the requested informations:

checkmk version: 2.0.0p9 (CRE)

Here is a screenshot of the service. What I see now, is that the time of the next scheduled service check is in the past…?

I thought the rules might be interesting too:

Can you check if your unit ipa.service looks like the following if you run systemctl --all on your server?

  UNIT                                   LOAD   ACTIVE SUB    DESCRIPTION
● ipa.service                            loaded failed failed ipa.service description

In this case your systemd service summary should turn red and report at least one failed unit.

Yes, this is my problem. It does not turn into red. I placed intentionally a config error to have this service in a failed state. The CheckMK Service state I posted before is exactly with this systemd service state on the server:

I have the same problem: two services are in state failed, but are not recognized as such.

  dm-event.service              loaded inactive dead    Device-mapper event daemon
● elasticsearch.service         loaded failed   failed  Elasticsearch
  emergency.service             loaded inactive dead    Emergency Shell
...
  kmod-static-nodes.service     loaded active   exited  Create list of required static device nodes for the current kernel
● logrotate.service             loaded failed   failed  Rotate log files
  lvm2-lvmpolld.service         loaded inactive dead    LVM2 poll daemon

Systemd Service Summary reports: Total: 81, Disabled: 4, Failed: 0

Then I started the elasticsearch service manually

  elasticsearch.service         loaded active   running Elasticsearch

Now, Systemd Service Summary reports: Total: 82, Disabled: 4, Failed: 0

It looks like the lines with failed services are ignored/skipped completely. Maybe a parsing issue with the ● at the beginning of the line?

Edit: We use checkmk 2.0.0p12 (CRE)

Indeed it seems like it has something to do with parsing.

Werk: <tt>systemd_units</tt>: Handle "●" as marker for broken units correctly
GitHub: systemctl prints ● instead of * in C.UTF-8 locale by elcamlost · Pull Request #371 · tribe29/checkmk · GitHub

Following the Werk, its going to be fixed with Version 2.1.0i1.

2 Likes

Ah! That bug bit us too - could the fix please be included in the next patch release?

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.