Not able to monitor a Docker Cluster

tjxob · May 11, 2021, 5:30pm

Hi there,

I hope this email finds you all well. I am writing this topic to ask for assistance regarding monitoring docker swarm cluster using CheckMk. I believe this should be trivial issue but was not able to make it work. When I ever I test the connection in CheckMk, it gives me the error “506 cannot talk to daemon” and I can see in the service a failed service: NTP Time 506 Cannot talk to daemon.

Here are the steps that I have done so far. I am running the Raw version BTW

Docker Manager Node Server:

I installed the linux agent. I can see from CheckMk server that is pulling info: RAM, CPU, etc.
I copied the mk_docker.py plugin to /usr/lib/check_mk_agent/plugins and I was able to call the python correclty no issues. It was missing a couple of libraries but in the end, I downloaded them and it is working.
In docker, I enabled the daemon to expose be connected remotly on the tcp and port 0.0.0.0:2375

CheckMk manager Server:

I was able to telnet the ip and port and it is connecting.
I add the docker.cfg file and edited the base_url: tcp://:2375
I restarted the server omd restart

Despite all this I still have issue and not able to pull any docker related services. What would be the issue? I also tried to install the mk_docker.py plugin into the CheckMk server, resolved the missing libraries, and not able to make it work.

I do appreciate you help as I am in the middle of preparing a POC and I am kind of stuck
Thanks

andreas-doehler · May 11, 2021, 5:47pm

The “docker.cfg” file you need on your monitored docker host not on the CMK server.
This file must then be placed inside “/etc/check_mk/” to be found by the mk_docker.py plugin.

The “mk_docker.py” has no swarm specific checks and with this in mind i think you need to run the “mk_docker.py” on every swarm node to gather all the running containers. Later inside CMK you can then build virtual cluster objects and group services from the swarm nodes there.
But first step should be a running “mk_docker.py” on one node with valid output.

tjxob · May 11, 2021, 6:28pm

Thank you Andreas for jumping in. So this means that all steps related to enabling the TCP protocol in the docker host were not needed necessary, correct?

tjxob · May 11, 2021, 6:49pm

I placed it in /etc/check_mk folder in the docker host. In this file I have an exact copy of this one I tried by leaving the url as is and also by changing it to tcp://0.0.0.0:2375

In the docker server we have now installed:

The linux agent
The mk_docker.py placed in /usr/lib/check_mk_agent/plugins/
And the docker.cfg in /etc/check_mk
When I run the tests from the checkmk server, it gives me the same error: 506 Cannot talk to daemon

I also run the python library, the output indicates that its working, it pulls the running container and docker related info. Is there anything in specific that I should look into to make sure that it is working as It should.

Thank you again for help

andreas-doehler · May 11, 2021, 8:05pm

I would use the docker socket inside the config file as it is in the example. Normally you don’t need to expose the tcp port. As this is also a security risk.

Normally i test the whole setup on the docker server with an call of the check_mk_agent.
The output should contain the docker sections near the end of the output.

Another test option is to run the “mk_docker.py” standalone.
For this you have to export the variable “MK_CONFDIR” with the folder where the docker.cfg can be found. Then you can run the mk_docker.py and inspect what happens.

tjxob · May 11, 2021, 8:35pm

That is strange. This is what I got when I run check_mk_agent

<<chrony:cached(1620765213,30)>>
506 Cannot talk to daemon
<<>>
<<local:sep(0)>>

andreas-doehler · May 11, 2021, 8:46pm

Is the “mk_docker.py” executable?
A minimum is an error message if something bad happens
The cannot talk to daemon comes from the chrony i think.

tjxob · May 11, 2021, 8:51pm

The python runs with no issues. And I can see mu container information there. That is very strange. I mean if something is not configured properly, mk_docker.py would fail and not connect to the docker instance and pull running container info. Do you know what else can I do?

andreas-doehler · May 12, 2021, 5:58am

The question is only why is the mk_docker.py not executed by the agent.
How do you tested the mk_docker.py?

If your docker socket is reachable under the default path it should also work without a docker.cfg file.

The output on my system looks like this.

mk_docker.py standalone

/usr/lib/check_mk_agent/plugins/mk_docker.py
<<<docker_node_info:sep(124)>>>
@docker_version_info|{"PluginVersion": "0.1", "DockerPyVersion": "5.0.0", "ApiVersion": "1.41"}
<<<docker_node_info:sep(0)>>>
{"ID": "Y6WH:E3XY:SK7J:6RFF:Y4MK:SRAR:UVEN:IL3K:FL63:ICSS:6KOI:W67A", "Containers": 6, "ContainersRunning": 2, "ContainersPaused": 0, "ContainersStopped": 4, "Images": 9, "Driver": "overlay2", "DriverStatus"

running check_mk_agent on the same machine

/usr/bin/check_mk_agent

...
softirq 1497598 0 417300 25 19845 5429 0 201906 411382 36 441675
<<<md>>>
<<<vbox_guest>>>
<<<local:sep(0)>>>
<<<docker_node_info:sep(124)>>>
@docker_version_info|{"PluginVersion": "0.1", "DockerPyVersion": "5.0.0", "ApiVersion": "1.41"}
<<<docker_node_info:sep(0)>>>
{"ID": "Y6WH:E3XY:SK7J:6RFF:Y4MK:SRAR:UVEN:IL3K:FL63:ICSS:6KOI:W67A",

You see the same output only embedded in the other agent output.

tjxob · May 12, 2021, 6:24am

Hi Andreas, Thank you again for your help. I am testing the plugin by executing it directly.
python /usr/lib/check_mk_agent/plugins/mk_docker.py
I have no idea to be honest why it is not being called by the agent. That is strange

tjxob · May 12, 2021, 6:27am

Just for your information, I am running the 2.0.0 version. Are you familiar with any issues with this version?

andreas-doehler · May 12, 2021, 6:45am

That’s the wrong way to test. You need to call this without python on the start.
The script must be set as executable and it must be possible to start it directly.

tjxob · May 12, 2021, 6:58am

Thank you for pointing this out. Yes, now it is giving this error.
/usr/bin/env: ‘python3’: Not a directory

tjxob · May 12, 2021, 7:44am

All works perfectly. I can see now docker related service and it looks good. I appreciate your help Andreas. It shows the overall containers number, images size, etc. But what about every container stats? I am interesting in knowing the list of all containers, their corresponding CPU, Disk, I/O, etc… How can I achieve that?

andreas-doehler · May 12, 2021, 8:00am

The container data is transferred as piggyback data. You can decide if you want to GUID of your container as piggyback host name or the container name.
This configuration can be done inside the docker.cfg.
Then you need to create host objects corresponding to the names from the piggyback data.

If you have not configured anything then the piggyback header with the container name looks like this.

<<<<74057434cea0>>>>

With the setting container_id: name the same container has a more readable name

<<<<portainer>>>>

system · May 13, 2022, 11:52am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.