Questions on Prometheus integration and kube-state-metrics

KartoffelSalat · May 19, 2021, 11:57am

I am currently testing the prometheus integration in order to automate kubernetes cluster monitoring. I’ve setup a Prometheus-Datasource in order to receive piggback-data and setup the dynamic host-creation stuff.

So far, so good. A lot of objects are created and many of them have active warnings / errors on them.

Services

I selected to also import services, which worked fine, they all have a PING check in state: check_icmp: No hosts to check.

Which is what I would expect, since normally kubernetes services are not reachable from outside the cluster, so the question is, how is this intended to work?

Questions

If the checkmk-instance doing the checks would run inside the k8s cluster the checks would succeed. Are there any EE docker-images for this purpose?
If the first is not the case/intention, so is there any way to check these services? Otherwise importing them would not make any sense.

Jobs

We make intensive use of Jobs. These are also identified as normal pods and labeled with cmk/kubernetes_object:pod. Unfortunately the pod lifecycle created by a Job is different from eg. a deployment. They are intended to terminate and therefore any Jobs (successfull or not) would lead to at least three critical states:

Condition ContainersReady: False
Condition Ready: False
Container Ready: 0/1, Running: 0, Waiting: 0, Terminated: 1

Questions:

is there any method of monitoring k8s-jobs with this pipeline?
if not, how can I get rid of these false negatives created, or filter the pods created by jobs from being created?

elias.voelker · May 19, 2021, 9:30pm

@wontekh this seems to be your area of expertise

KartoffelSalat · May 20, 2021, 8:18am

I was just wondering about how it was thought to be used, since normally one has defined rules for the prometheus-alertmanager to monitor the k8s stuff.

So to solve this one could import all the promql queries defined for the altertmanager into checkmk; this would also prevent blowing up the number of hosts and services needlessly

wontekh · May 21, 2021, 7:19pm

hey @KartoffelSalat,
The PING check is generic and comes per default with each checkmk host (including piggyback hosts). There is a high chance that this default behavior will be changed in the near future.

Regarding your jobs question, you can also try out the Kubernetes Datasource if your version allows it. Otherwise, I would suggest using the Prometheus custom query (as detailed in your other post) for more refined information. With that being said, we are currently looking into our k8 monitoring process to adjust the solution for such “short timed” instances which is not natively supported. There should be kubernetes related posts in the near future.

KartoffelSalat · May 28, 2021, 8:29am

Thnx for your reply.

We use an up2date version of k8s which is not supported by the phyton-client … and with regard to their development speed I’d say never will

We make extensive use of prometheus for monitoring our applications. Makes no sense for me to integrate that queries in checkmk when there are no real benefits from it and when the prometheus-alertmanager does the job already without impacting my licence

But we are always curious if there are ways to improve our monitoring, we I’m looking forward for your development in the k8s direction.

system · May 28, 2022, 8:30am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.