Hi friends I have 10 computers that are running on each one docker, inside the HOST I have 5 containers , I have the plugin working but what I need now is to monitor when it stops or shuts down. I am creating the rule but every time it happens if I stop in a HOST I get the alert in all.
I don’t know if I’m doing it correctly or I’m missing something.
If you intend to get a warning when any of your Docker containers goes into the “stopped” state, this is what the rule in the screenshot does. You will get a WARN state if one and a CRIT state if two or more containers stop.
Note that this rule won’t help if the container gets deleted (docker rm). You need to monitor the container itself (by using the piggyback data from the host) to get informed when a specific container disappears.
Thanks for answering, extacto is what I want when a container of a warn is stopped but if they are 2 a critical but I see two alternatives upper and lowers level that I do not know what is the difference the thing that when I put in one of the options for example of my 10 servers I stop a container in one and jumps the alert on all 10 servers.
That’s why I was asking if there was something wrong with what I was configuring.
With Docker containers, “stopped” and “paused” are two different states. A paused container has all processes including their memory and processor state frozen. They can be resumed with “docker unpause” and continue to work at exactly the point where they were paused. In contrast, with a stopped container, all processes are terminated. Only the container’s filesystem continues to exist, but no processes run any more. When restarting with “docker start”, the processes are started again.
In you example, all checks enter the red state because you configured the CRITICAL thresholds to zero. The CRITICAL threshold should always be higher than the WARNING threshold when configuring UPPER levels. For example, you can set the WARNING threshold to 1 and the CRITICAL threshold to 5. This means that the check is WARN if at least one container is stopped or paused, and CRIT if either 5 containers are stopped or 5 containers are paused.
I am still failing, I have a question what is the difference between upper and lower level. I understand about stop and pause. But I don’t know where I’m wrong. I screenshot you. if you see I put the values in the threshold 1,3 I stop a container in an AP. But as you can see, several APs are still red, not only the one that is stopped.
Upper level means that the check is going to be WARN/CRIT if the measured number is equal or higher than the threshold. If you set the upper levels to 1/5, than the check gets WARN if 1 to 4 containers are stopped, and CRIT if 5 or more containers are stopped.
Lower level is the opposite: the check gets WARN/CRIT it the meaured number is equal or lower than the configured threshold. So, in your example, all checks are CRIT because on every host there are less than 3 stopped containers.
If I understood you use case correctly, you should only set upper levels.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.