CMK version: 2.0.0p16 (CRE)
OS version: Debian GNU/Linux 10 (buster)
Error message: N/A
Output of “cmk --debug -vvn hostname”: N/A
Hi,
I have a use case where I need to monitor K8s nodes and pods and trigger notifications when some values are not within a threshold.
I already have a Prometheus server that provide the metrics that I need and a Prometheus Alert Manager that trigger some alerts, but I would like to use Checkmk to produce the same alerts and then delete the Prometheus Alert Manager, so that I can have only one centralized alerting system.
Some examples of notifications that need to be created by checkmk are the following:
-
Nodes Disk Usage above 80%.
-
Pod memory above 90% for the last 5 minutes.
-
Pod not ready for the last 15 minutes.
-
Deployments replicas available don’t match Deployment specification replicas for more than 15 minutes.
When reading the Checkmk documentation I found two possible options:
- Use Prometheus Special Agent and PromQL
This seemed at first a good option, but then I noticed that it only supports one value per query.
This doesn’t seem to be maintainable since checkmk needs to identify and monitor when new nodes, deployments or pods are created dynamically.
- Use Kubernetes Special Agent (KSA) and Dynamic Host Configuration (DHC)
This would monitor what I need, but it seems that only the paid Entreprise Edition allows to use DHC.
So my questions are:
1 - Can all the alerts above be generated in checkmk?
2 - Is it possible with the free version?
3 - Should be done with KSA and DHC or in another way?
Any help would be very appreciated.