CMK version: 2.2.0p7 OS version: CMK server : RHEL 7.9
This is the documentation page that was used to set up the Kubernetes monitoring:
Error message:
Both the Check_MK and Check_MK Discovery services show the following error:
[special_kube] Agent exited with code 1: Failed to establish a connection to xx.xx.xx.xx:6443 at URL /version
Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)
+ PARSE FETCHER RESULTS
No persisted sections
HostKey(hostname='<hostname>', source_type=<SourceType.HOST: 1>) -> Not adding sections: Agent exited with code 1: Failed to establish a connection to xx.xx.xx.xx:6443 at URL /version
HostKey(hostname='<hostname>', source_type=<SourceType.HOST: 1>) -> Add sections: []
Received no piggyback data
[cpu_tracking] Start [7fd4acc201d0]
value store: synchronizing
Trying to acquire lock on /omd/sites/<site name>/tmp/check_mk/counters/<hostname>
Got lock on /omd/sites/<site name>/tmp/check_mk/counters/<hostname>
value store: loading from disk
Releasing lock on /omd/sites/<site name>/tmp/check_mk/counters/<hostname>
Released lock on /omd/sites/<site name>/tmp/check_mk/counters/<hostname>
No piggyback files for '<hostname>'. Skip processing.
[cpu_tracking] Stop [7fd4acc201d0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.009999999776482582))]
[special_kube] Agent exited with code 1: Failed to establish a connection to xx.xx.xx.xx:6443 at URL /version(!!), [piggyback] Success (but no data found for this host), execution time 0.8 sec | execution_time=0.780 user_time=0.000 system_time=0.000 children_user_time=0.670 children_system_time=0.070 cmk_time_ds=0.030 cmk_time_agent=0.000
I only added the sections where the error is shown to save space.
Does anyone have an idea why this error is occuring ?
Does anyone have an idea how to solve this error ?
Yes, that part is clear, but the reason why is unclear at this point as the k8s cluster is otherwise accessable to the checkmk server. And since it’s not clear why it’s not able to reach this part of the k8s cluster, we also don’t know what needs fixing so the check doesn’t give the current error.
Check from the monitoring server if the Kubernetes API server is reachable.
E.g. run a curl … vs it.
If it returns a 403, then the rest should be straight forward. Anything besides this is a network issue.
I’ll run a curl tomorrow and see what it shows. Just not sure whether to hope for a 403 or something else as I’m not sure at this point which will lead to a quicker fix overall!
Hi Martin, I ran the curl with -v from the checkMK server and this is the output it generated:
curl -v https://xx.xx.xx.xx:6443/
* About to connect() to xx.xx.xx.xx port 6443 (#0)
* Trying xx.xx.xx.xx...
* Connected to xx.xx.xx.xx (xx.xx.xx.xx) port 6443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* Server certificate:
* subject: CN=kube-apiserver
* start date: Nov 09 13:00:54 2023 GMT
* expire date: Nov 08 13:00:54 2024 GMT
* common name: kube-apiserver
* issuer: CN=kubernetes
* NSS error -8179 (SEC_ERROR_UNKNOWN_ISSUER)
* Peer's Certificate issuer is not recognized.
* Closing connection 0
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html
Would this be the same reason the checkMK services are being tripped up, or is this purely a curl issue with the certificate issuer?
We use self-signed certificates issues by the company’s certificate desk. Does this have to be indicated or something to checkMK in order for it to be OK with the certificates?
Could this also be an issue for checkMK when registering the agents for our non-kubernetes Linux / Windows servers?
Is this also something that would hinder setting up TLS for the checkmk-cluster-collector?
Thanx for the additional suggestions & information. I’ve not yet had a chance to check it out or test any further lately, but will hopefully get some time to debug further by Friday.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.