How to configure Checkmk to change status, if a certain amount of Docker container is down

Hello,

we’re about to use Checkmk for checking internal systems, but I am struggling on getting a service check for Docker working.

What I want to achieve is, that the service check for “Docker containers” will switch to WARN,
if the amount of running containers is 4 or below.
And it should go to CRIT if the running containers are dropping to 3 or below.

Should be easy, I thought. Well, doesn’t seems so.

Here’s what I did to achieve my goal:

  1. Install the Checkmk agent on my host
  2. Place the mk_docker.py plugin in the correct folder (/usr/lib/check_mk_agent/plugins)
  3. Change the permissions, added the rights to execute the file
  4. Placed the docker.cfg in /etc/check_mk/
  5. Added the host in Checkmk and waited for all services to appear

All services appeared, the Docker container status could be read:
Containers: 5, Running: 4, Paused: 0, Stopped: 1

This is what I did next:

  1. Created a new Enforced Services rule for Docker node container levels
  2. Set the Running containers lower levels to Warning at 4, Critical at 3
  3. Adjusted the conditions to my needs
  4. Saved and activated the changes

The rule is being applied for the host, I checked that. But the status does not change.
There is currently one container down. So it should switch to WARN.

Or did I miss anything?

I’m on Checkmk Raw 2.0.0p22. Installed on a Ubuntu 20.04.

Best regards,

Niklas

Hallo,
please post some screenshots of your rules.
Ralf

Hi Ralf,

you can find some more infos in the screenshots below.

Screenshot

(Sorry, have to upload them separately.)

Best,

Niklas

Here’s another screenshot:

Screenshot

And another:

Screenshot

with these settings it should go warning if there are less then 4 running conainers and critical if there are less than 3 running containers.
Just play with the numbers und look at the result. Or stop another container.
The way you set it up is correct.

HI Man-in-Black,

I was thinking about this, too.
I was already playing around with the numbers, lowering them in steps down to 2 (WARN) and 1 (CRIT).
No changes.

I also switched the conditions around and used “Stopped containers upper level” with WARN at 1 and CRIT at 2. This at least triggered a CRIT.
But even at 1 stopped container, the status changed to CRIT. Omitting WARN completely.
I’ve played with the numbers here as well, without any good result:

Screenshot

Any ideas?

Best,

Niklas

Okay, nevermind. I got it to work!

The service details of my last screenshot brought me on the right path. It says “warn/crit below 1/2”.
I’ve changed everything to “Stopped containers upper levels: 1, 2” and its working correctly now.

Guess I needed some kind of rubberducking here. :slight_smile:

Thanks for your input!

Best,

Niklas

It seams that the wording here is wrong, maybe a bug?

If I store the values in the upper levels, then they behave as I would have expected from the lower levels. Say if less than x containers are running then a warning is generated.

In my tests, the two points lower levels and upper levels were swapped.

Hallo,
good to read that you solves the problem
Is is a typical mistake so it is really important to check every service if it is working as expected.

Ralf

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.