CMK version:
2.1.0.p9
OS version:
Ubuntu 22.04.1 LTS
Error message:
hluerssen@bpsl086199:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
checkmk-cluster-collector-59766d5445-b654z 1/1 Running 0 52m
checkmk-node-collector-container-metrics-7qs7k 2/2 Running 0 51m
checkmk-node-collector-container-metrics-99w6d 2/2 Running 0 50m
checkmk-node-collector-container-metrics-9s4gv 2/2 Running 0 48m
checkmk-node-collector-container-metrics-c68qv 2/2 Running 0 49m
checkmk-node-collector-container-metrics-h7q4j 2/2 Running 0 50m
checkmk-node-collector-container-metrics-k9wsk 2/2 Running 0 51m
checkmk-node-collector-container-metrics-m99qm 2/2 Running 0 49m
checkmk-node-collector-container-metrics-ndf5p 2/2 Running 0 52m
checkmk-node-collector-container-metrics-zw7fw 2/2 Running 0 51m
checkmk-node-collector-machine-sections-6vgkm 1/1 Running 0 4d1h
checkmk-node-collector-machine-sections-8xh6g 1/1 Running 0 4d1h
checkmk-node-collector-machine-sections-c8lbg 0/1 CrashLoopBackOff 14 (4m4s ago) 52m
checkmk-node-collector-machine-sections-gsq46 1/1 Running 0 4d1h
checkmk-node-collector-machine-sections-ngst7 1/1 Running 0 4d1h
checkmk-node-collector-machine-sections-pc6gt 1/1 Running 0 52m
checkmk-node-collector-machine-sections-sl67h 1/1 Running 2 (3d ago) 4d1h
checkmk-node-collector-machine-sections-vk7wx 1/1 Running 1 (3d ago) 4d1h
checkmk-node-collector-machine-sections-xshxv 1/1 Running 0 3d
hluerssen@bpsl086199:~$ kubectl logs checkmk-node-collector-machine-sections-c8lbg
DEBUG: 2022-12-02 08:18:42,906 - Parsed arguments: Namespace(host='checkmk-cluster-collector.checkmk-monitoring', port=8080, secure_protocol=True, max_retries=10, connect_timeout=10, read_timeout=12, polling_interval=60, verify_ssl=True, ca_cert='/etc/ca-certificates/checkmk-ca-cert.pem', log_level='debug')
DEBUG: 2022-12-02 08:18:42,906 - Cluster collector base url: https://checkmk-cluster-collector.checkmk-monitoring:8080
INFO: 2022-12-02 08:18:42,906 - Querying Checkmk Agent for node data
Traceback (most recent call last):
File "/usr/local/bin/checkmk-machine-sections-collector", line 8, in <module>
sys.exit(main_machine_sections())
File "/usr/local/lib/python3.10/site-packages/checkmk_kube_agent/send_metrics.py", line 471, in _main
worker(session, cluster_collector_base_url, headers, verify)
File "/usr/local/lib/python3.10/site-packages/checkmk_kube_agent/send_metrics.py", line 376, in machine_sections_worker
returncode = process.wait(5)
File "/usr/local/lib/python3.10/subprocess.py", line 1207, in wait
return self._wait(timeout=timeout)
File "/usr/local/lib/python3.10/subprocess.py", line 1933, in _wait
raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['/usr/local/bin/check_mk_agent']' timed out after 5 seconds
Hello everyone,
similar to the issue described here one of my machine-sections collector remains in CrashLoopBackOff.
We did the installation with helm already, so the video mentioned in the other post did not provide any additional insight.
I suspect that the timeout mentioned in the error message is due to the size of the node (48 vCPUs, 512 GB) which is also the only difference I can make out between this node and all the other ones where it is running perfectly fine.
Is there a way to adjust the timeout, at least for debugging purposes?
Best Regards
Hendrik