Problems with the Kubernetes special agent

CMK version:
2.0.0p21

OS version:
Debian 11

Error message:
[agent] Version: 2.0.0p21, OS: linux, [special_kubernetes] Agent exited with code 1: (404)

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

[ProgramFetcher] Fetch with cache settings: DefaultAgentFileCache(base_path=PosixPath(’/omd/sites/mhs/tmp/check_mk/data_source_cache/special_kubernetes/MH-K8S-M1’), max_age=MaxAge(checking=0, discovery=120, inventory=120), disabled=False, use_outdated=False, simulation=False)
Not using cache (Does not exist)
[ProgramFetcher] Execute data source
Agent exited with code 1: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({‘Audit-Id’: ‘412628ce-b951-4a8d-9d5f-afb7ae4b21a4’, ‘Cache-Control’: ‘no-cache, private’, ‘Content-Type’: ‘text/plain; charset=utf-8’, ‘X-Content-Type-Options’: ‘nosniff’, ‘X-Kubernetes-Pf-Flowschema-Uid’: ‘a675563a-23a7-4108-8990-ccfa2a5c51cb’, ‘X-Kubernetes-Pf-Prioritylevel-Uid’: ‘7458550b-fa57-48c4-b9a5-365d3b25f1de’, ‘Date’: ‘Mon, 02 May 2022 06:33:22 GMT’, ‘Content-Length’: ‘19’})
HTTP response body: 404 page not found

How I tried to install the agent:

  1. create service acc. with share/doc/check_mk/treasures/kubernetes/check_mk_rbac.yaml
  2. Importing certificate
  3. Storing the password (token) in Checkmk
  4. Adding a Kubernetes cluster to the monitoring

(so all as shown in the docs)

What can’t be wrong:

  • Port
  • IP ADDRESS
  • Token
  • Cert

HTTP response body: 404 page not found

Looks like the Cluster IP from “kubectl config view” is not reachable. Have you tried a manual telnet to this IP over the port that you specify ?

Apart from this, can you do a manual curl ?

1 Like

Output Telnet

telnet 192.168.x.x 6443
Trying 192.168.x.x...
Connected to 192.168.x.x.
Escape character is '^]'.
Connection closed by foreign host.

curl with https:

curl https://192.168.x.x:6443/api/
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

curl with http:
Client sent an HTTP request to an HTTPS server.

But in terms of the network, I must be able to reach the host because the host has a network card installed in it which is located in that network. So it should be on the same network. And just to be sure, I also checked the firewall and no packets are being sent through it.

I can also contact the API with the admin.conf file from another host in another network and that works…

I also tried to run it once with one of the masternodes because I have a HA installation which has a virtual IP from a LoadBalancer in front of it, but there I have exactly the same problem. So the load balancer in front of it cannot be the problem.

The master nodes are also all up and ready.

k8s-m1   Ready    control-plane,master   4d17h   v1.23.6
k8s-m2   Ready    control-plane,master   4d17h   v1.23.6
k8s-m3   Ready    control-plane,master   4d17h   v1.23.6
k8s-w1   Ready    <none>                 4d17h   v1.23.6
k8s-w2   Ready    <none>                 4d17h   v1.23.6
k8s-w3   Ready    <none>                 4d17h   v1.23.6

Thanks so far for your help!

Furthermore, it is funny that when I enter a static URL in CheckMK (with HTTP) instead of the host, I get the error message:

[agent] Version: 2.0.0p21, OS: linux, [special_kubernetes] Agent exited with code 1: (400)

With HTTPS i get:
[agent] Version: 2.0.0p21, OS: linux, [special_kubernetes] Agent exited with code 1: (404)

Edit:

When I change the URL to “https://192.168.x.x:6443/api/” in the CheckMK config, I get a 403 error:

Agent exited with code 1: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'f78941a3-b039-4036-b181-db0599c824e4', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'a675563a-23a7-4108-8990-ccfa2a5c51cb', 'X-Kubernetes-Pf-Prioritylevel-Uid': '7458550b-fa57-48c4-b9a5-365d3b25f1de', 'Date': 'Mon, 02 May 2022 07:54:45 GMT', 'Content-Length': '321'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"apis \"storage.k8s.io\" is forbidden: User \"system:serviceaccount:check-mk:check-mk\" cannot get resource \"apis/v1\" in API group \"\" at the cluster scope","reason":"Forbidden","details":{"name":"storage.k8s.io","kind":"apis"},"code":403}

And when I try the curl command with the parameter --insecure I get the output:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/api/\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}

Furthermore, I see I haven’t posted my config here yet. Here is a ss of it:

Edit: Custom URL is now https://192.168.x.x:6443/api/

How are you generating the token ?

1 Like
kubectl get secrets check-mk-token-xxxxx -n check-mk -o jsonpath='{.data.token}' | base64 --decode
1 Like

Now, after changing the ClusterRole to:

...
- apiGroups: ["*"]
  resources: ["*"]
  verbs: ["get", "watch", "list"]

I get the following error by running cmk --debug -vvn K8S-M1 (K8S-M1 is the masternode with which I am currently testing all this):

Agent exited with code 1: (404)
Reason: Not Found

HTTP response headers: HTTPHeaderDict({'Audit-Id': 'e9826b36-a092-4b98-b0a5-206a4c56288c', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'a675563a-23a7-4108-8990-ccfa2a5c51cb', 'X-Kubernetes-Pf-Prioritylevel-Uid': '7458550b-fa57-48c4-b9a5-365d3b25f1de', 'Date': 'Mon, 02 May 2022 09:03:11 GMT', 'Content-Length': '174'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"the server could not find the requested resource","reason":"NotFound","details":{},"code":404}

The ckeckmk config is:

And when I try:

kubectl auth can-i get deployment --as=system:serviceaccount:check-mk:check-mk

I get:

yes

I can use at “all” any verbs of K8S.

So with the service account everything should be fine… (Regarding rbac and so on).

Can you apply the file as it is and then fetch the token and place inside Checkmk ?
Apart from that, please try disabling SSL cert check as a test.

1 Like

I have applied it as it is in the docs. By applying it like this, I get the error:

[agent] Version: 2.0.0p21, OS: linux, [special_kubernetes] Agent exited with code 1: (403)

Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '41c24c2e-c0fb-4b70-93bb-a612af3e4d02', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'a675563a-23a7-4108-8990-ccfa2a5c51cb', 'X-Kubernetes-Pf-Prioritylevel-Uid': '7458550b-fa57-48c4-b9a5-365d3b25f1de', 'Date': 'Mon, 02 May 2022 09:48:29 GMT', 'Content-Length': '321'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"apis \\"storage.k8s.io\\" is forbidden: User \\"system:serviceaccount:check-mk:check-mk\\" cannot get resource \\"apis/v1\\" in API group \\"\\" at the cluster scope","reason":"Forbidden","details":{"name":"storage.k8s.io","kind":"apis"},"code":403}**CRIT** , execution time 1.4 sec

And when I have applied this config and run kubectl auth can-i get all --as=system:serviceaccount:check-mk:check-mk I also always receive yes. (I can use at “all” any verbs of K8S)

By changing the ClusterRole (apiGroups= * , resources= * ,) i get the error:

[agent] Version: 2.0.0p21, OS: linux, [special_kubernetes] Agent exited with code 1: (404)

Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'b6d85c96-1c11-48ba-bb47-46c8cc203abe', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'a675563a-23a7-4108-8990-ccfa2a5c51cb', 'X-Kubernetes-Pf-Prioritylevel-Uid': '7458550b-fa57-48c4-b9a5-365d3b25f1de', 'Date': 'Mon, 02 May 2022 09:50:22 GMT', 'Content-Length': '174'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"the server could not find the requested resource","reason":"NotFound","details":{},"code":404}CRIT, execution time 0.8 sec

Disabling SSL doesn’t change anything.

HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"apis \\"storage.k8s.io\\" is forbidden: User \\"system:serviceaccount:check-mk:check-mk\\" cannot get resource \\"apis/v1\\" in API group \\"\\" at the cluster scope","reason":"Forbidden","details":{"name":"storage.k8s.io","kind":"apis"},"code":403}**CRIT** , execution time 1.4 sec

Which Kubernetes version are you using ? Is it on-premise cluster or it comes from a managed provider ?
I have a Kubernetes 1.20 being monitored via this special agent and it works without any problems. I followed exactly the steps mentioned in the documentation. In my case, we have on-prem cluster.

Hello,

We also have a Kubernetes on prem cluster Version 1.23.6. I also followed exactly the steps in the docs and in my case it doesn’t work. Maybe it is a Version dependent Problem? Because the V1.23.6 is the newest release available…

But regarding the Kubernetes Python Client @ github the K8S version 1.23.x should be supported…

image

It could be. I still use Docker as a Container runtime and but planning to switch to CRIO. Do you also use Docker ? Recently tried the new K8 monitoring (which is only available on 2.1) and with my current setup and the steps using helm works as well for me.

Yes, we use Docker as our CRE. All right, maybe I’ll then try to upgrade to Version 2.1 and check if that works. If it doesn’t, I’ll give you an update. Thank you so far for all your help and patience. :slight_smile:

Try the 2.1 one to see if you still face issues.
We tried it with 1.23 and it worked there without issues.
If you run into issues, we can do a debug session together. Cheers