The details on the service page gives me a bit more info:
Not healthyCRIT
Verbose response:
{“kind”:“Status”,“apiVersion”:“v1”,“metadata”:{},“status”:“Failure”,“message”:“nodes "my-node-name" is forbidden: User "system:serviceaccount:checkmk-monitoring:checkmk-agent" cannot get resource "nodes/proxy" in API group "" at the cluster scope: GKE Warden authz [denied by managed-namespaces-limitation]: cluster scoped resource "nodes/proxy" is managed and access is denied”,“reason”:“Forbidden”,“details”:{“name”:“my-node-name”,“kind”:“nodes”},“code”:403}
Version v1.31.8-gke.1045000
According to the documentation we need to ensure the following:
Cluster Collector and GKE Autopilot Version:
Confirm that you’re using Cluster Collector version 1.5.1 or higher and GKE Autopilot version 1.27 or higher 1.
Configuration in values.yaml:
Ensure that you have set var_run to readOnly in the values.yaml file to satisfy the read-only permission required in GKE Autopilot:
volumeMountPermissions:
var_run:
readOnly: true
We already confirmed that all of this is configured. Cluster Collector is version 1.7 (newest CheckMK Agent helm deployment) and Autopilot is Version 1.31.8
Nevertheless we experience the above issue. Can you give some advice how we can proceed further to resolve the issue? Thank you very much in advance!
Hey Sven,
you can ignore this error for now and remove the Kubelet service from monitoring.
GKE Autopilot has changed the rights which applications have for requesting information from nodes.
Given that GKE Autopilot is hardcore managed and a Kubelet issue will likely be handled very quickly by Google Cloud themselves, we will eventually remove that service for GKE in general. But we will be looking into other options regarding permissions.
Martin
Hello, I tried a workaround by adding the following into the values.yaml of the checkmk agent helm deployment to keep the cadvisor from trying to collect smaps.
After trying upgrade the helm deployment with the new values.yaml file I got the following error:
Error: UPGRADE FAILED: cannot patch "mydeployment-checkmk-node-collector-container-metrics" with kind DaemonSet: admission webhook "warden-validating.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints.
Violations details: {"[denied by autogke-no-write-mode-hostpath]":["hostPath volume var-run in container cadvisor is accessed in write mode; disallowed in Autopilot.","hostPath volume sys used in container cadvisor uses path /sys which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].","hostPath volume docker used in container cadvisor uses path /var/lib/docker which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/]."]}
Requested by user: 'myuser@email.com', groups: 'system:authenticated'.
Is there any other way to try and get rid of the error message?
Cannot read smaps files for any PID from CONTAINER
That shouldn’t happen. Can you share the error messages you get? Ideally via a support ticket, then our devs can work directly in there.
GKE Autopilot is a supported version, but they made some changes to the whitelisting policy recently and our tests showed no issues, but might have been that our cluster still had partner priviledges - not always clear.