Kubernetes monitoring on CRE 2.2.0p7: Kernel Performance service reports "UNKN" because of "not unique" values

openmindz · August 23, 2023, 11:41am

Dear Forum,

I have an RKE1 cluster on SuSE Linux Enterprise VMs running on VMWare. The Checkmk agent (RPM) was installed on all machines that form the RKE cluster.

I configured “Kubernetes monitoring” as per the documentation, and set it up to only get “Nodes” from kubernetes for the time being.

I also configured a piggyback rule, to get metrics from the special agent “on” my nodes.
Since then, I get this alert on every node:

I assume, that metrics for this check “come” from the agent on the host and from the special agent, too, which is probably the reason why it says “found 2 times”.

How do I “solve” this problem, without simply “disabling” the “Kernel Performance” service and/or check? Can I somehow set “precedence” which metrics to prefer (e.g. “prefer agent metrics”)?

Thanks,
Thomas

openmindz · August 24, 2023, 11:46am

Dear Forum,

A workaround I applied is to skip the kernel section of the agent I installed on all hosts
via /etc/check_mk/exclude_sections.cfg where I added MK_SKIP_KERNEL="yes" on all
affected hosts.

This of course means, that I now have to rely an the special agent for this check.
I’d prefer to be able to keep the metrics from the “normal” and the special agent.

It would be great, if Checkmk would use any available information in such a scenario, or
if it could be configured somehow. For the time being, I’m OK again:

Hope this information helps anyone.

Regards,
Thomas

martin.hirschvogel · August 24, 2023, 4:54pm

Out of curiousity, why use the checkmk agent in that setup?
Otherwise, just adapt the helm chart to not roll out the machine sections daemonset

openmindz · August 25, 2023, 8:43am

Hi Martin,

Force of habit: I had a bunch of hosts, I could access them, I installed agents on them…
I thought it’s nice to have a “fallback” if metrics can no longer be collected through kubernetes, for any reason.

Thanks for pointing out that I could disable the “machine sections” part in such a scenario. Perhaps this should be mentioned somehow in the official documentation.

Regards,
Thomas

martin.hirschvogel · August 25, 2023, 9:08am

Hey Thomas,

the recommended solution would be to not install agents on these hosts.
The Kubernetes Collector automatically collects machine metrics (e.g. Filesystem, Network, CPU, Memory, Kernel) for each node in a cluster. If you e.g. add another node to a cluster, it is automatically monitored. If your cluster has auto-scaling functionality, then this becomes very important.
Thus, we do not recommend disabling the machine sections. Therefore, we also do not document it. Cheers!

openmindz · August 25, 2023, 9:21am

OK Martin, thanks for the explanation!

Thomas

system · August 24, 2024, 9:22am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.