Kubernetes checkmk-node-collector-container-metrics

CMK version:
CheckMK : 2.2.0p8
Node/Cluster Collector : 1.4.1

OS version:
Kubernetes: Ubuntu 22.04
CheckMK: CentOS 8

** Installation on K8S Agent **

helm upgrade --install --create-namespace -n checkmk-monitoring myrelease checkmk-chart/checkmk -f values.yaml

We are using the example NodePort Configuration : clusterCollector: {service: {type: NodePort, nodePort: 30035}}

Error message:

2023-09-13T09:13:09.021455664Z stderr F Traceback (most recent call last):
2023-09-13T09:13:09.021521298Z stderr F   File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
2023-09-13T09:13:09.02185322Z stderr F     httplib_response = self._make_request(
2023-09-13T09:13:09.021864063Z stderr F   File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
2023-09-13T09:13:09.022136775Z stderr F     six.raise_from(e, None)
2023-09-13T09:13:09.022171604Z stderr F   File "<string>", line 3, in raise_from
2023-09-13T09:13:09.022302999Z stderr F   File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
2023-09-13T09:13:09.022563941Z stderr F     httplib_response = conn.getresponse()
2023-09-13T09:13:09.022575242Z stderr F   File "/usr/local/lib/python3.10/http/client.py", line 1374, in getresponse
2023-09-13T09:13:09.023248129Z stderr F     response.begin()
2023-09-13T09:13:09.023257672Z stderr F   File "/usr/local/lib/python3.10/http/client.py", line 318, in begin
2023-09-13T09:13:09.023514938Z stderr F     version, status, reason = self._read_status()
2023-09-13T09:13:09.023524526Z stderr F   File "/usr/local/lib/python3.10/http/client.py", line 287, in _read_status
2023-09-13T09:13:09.023754355Z stderr F     raise RemoteDisconnected("Remote end closed connection without"
2023-09-13T09:13:09.023766679Z stderr F http.client.RemoteDisconnected: Remote end closed connection without response
2023-09-13T09:13:09.02377176Z stderr F 
2023-09-13T09:13:09.023777189Z stderr F During handling of the above exception, another exception occurred:
2023-09-13T09:13:09.023784393Z stderr F 
2023-09-13T09:13:09.023789568Z stderr F Traceback (most recent call last):
2023-09-13T09:13:09.023800358Z stderr F   File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
2023-09-13T09:13:09.024045195Z stderr F     resp = conn.urlopen(
2023-09-13T09:13:09.024054937Z stderr F   File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 785, in urlopen
2023-09-13T09:13:09.024388657Z stderr F     retries = retries.increment(
2023-09-13T09:13:09.024398853Z stderr F   File "/usr/local/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
2023-09-13T09:13:09.024673821Z stderr F     raise six.reraise(type(error), error, _stacktrace)
2023-09-13T09:13:09.024684319Z stderr F   File "/usr/local/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
2023-09-13T09:13:09.025025039Z stderr F     raise value.with_traceback(tb)
2023-09-13T09:13:09.025036252Z stderr F   File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
2023-09-13T09:13:09.025340043Z stderr F     httplib_response = self._make_request(
2023-09-13T09:13:09.025381349Z stderr F   File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
2023-09-13T09:13:09.025527387Z stderr F     six.raise_from(e, None)
2023-09-13T09:13:09.02554158Z stderr F   File "<string>", line 3, in raise_from
2023-09-13T09:13:09.025586726Z stderr F   File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
2023-09-13T09:13:09.025799665Z stderr F     httplib_response = conn.getresponse()
2023-09-13T09:13:09.025837909Z stderr F   File "/usr/local/lib/python3.10/http/client.py", line 1374, in getresponse
2023-09-13T09:13:09.026457413Z stderr F     response.begin()
2023-09-13T09:13:09.026474673Z stderr F   File "/usr/local/lib/python3.10/http/client.py", line 318, in begin
2023-09-13T09:13:09.026720667Z stderr F     version, status, reason = self._read_status()
2023-09-13T09:13:09.026740112Z stderr F   File "/usr/local/lib/python3.10/http/client.py", line 287, in _read_status
2023-09-13T09:13:09.026957131Z stderr F     raise RemoteDisconnected("Remote end closed connection without"
2023-09-13T09:13:09.026980682Z stderr F urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
2023-09-13T09:13:09.027017584Z stderr F 
2023-09-13T09:13:09.027026246Z stderr F During handling of the above exception, another exception occurred:
2023-09-13T09:13:09.027033063Z stderr F 
2023-09-13T09:13:09.027040636Z stderr F Traceback (most recent call last):
2023-09-13T09:13:09.027055094Z stderr F   File "/usr/local/bin/checkmk-container-metrics-collector", line 8, in <module>
2023-09-13T09:13:09.027127251Z stderr F     sys.exit(main_container_metrics())
2023-09-13T09:13:09.02714198Z stderr F   File "/usr/local/lib/python3.10/site-packages/checkmk_kube_agent/send_metrics.py", line 466, in _main
2023-09-13T09:13:09.027397078Z stderr F     worker(session, cluster_collector_base_url, headers, verify)
2023-09-13T09:13:09.027421391Z stderr F   File "/usr/local/lib/python3.10/site-packages/checkmk_kube_agent/send_metrics.py", line 336, in container_metrics_worker
2023-09-13T09:13:09.027562438Z stderr F     cluster_collector_response = session.post(
2023-09-13T09:13:09.027593465Z stderr F   File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 637, in post
2023-09-13T09:13:09.027840302Z stderr F     return self.request("POST", url, data=data, json=json, **kwargs)
2023-09-13T09:13:09.027850875Z stderr F   File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
2023-09-13T09:13:09.028107167Z stderr F     resp = self.send(prep, **send_kwargs)
2023-09-13T09:13:09.028123445Z stderr F   File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
2023-09-13T09:13:09.02844691Z stderr F     r = adapter.send(request, **kwargs)
2023-09-13T09:13:09.028466079Z stderr F   File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 501, in send
2023-09-13T09:13:09.028697261Z stderr F     raise ConnectionError(err, request=request)
2023-09-13T09:13:09.028708164Z stderr F requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

** kubectl get pods **

❯ kgp
NAME                                                       READY   STATUS    RESTARTS        AGE
myrelease-checkmk-cluster-collector-6fc8fbb858-7xlz6       1/1     Running   72 (49m ago)    7d
myrelease-checkmk-node-collector-container-metrics-2vgqs   2/2     Running   74 (69m ago)    7d
myrelease-checkmk-node-collector-container-metrics-8s8t2   2/2     Running   69 (69m ago)    7d
myrelease-checkmk-node-collector-container-metrics-c9vsc   2/2     Running   68 (13h ago)    7d
myrelease-checkmk-node-collector-container-metrics-ct9sf   2/2     Running   72 (9h ago)     7d
myrelease-checkmk-node-collector-container-metrics-knr6x   2/2     Running   67 (20m ago)    7d
myrelease-checkmk-node-collector-container-metrics-m8vv6   2/2     Running   158 (60m ago)   7d
myrelease-checkmk-node-collector-container-metrics-pgrd9   2/2     Running   424 (65m ago)   7d
myrelease-checkmk-node-collector-container-metrics-tmh2b   2/2     Running   192 (60m ago)   6d20h
myrelease-checkmk-node-collector-machine-sections-5pvmh    1/1     Running   4 (3d15h ago)   7d
myrelease-checkmk-node-collector-machine-sections-5zdqt    1/1     Running   9 (6d ago)      7d
myrelease-checkmk-node-collector-machine-sections-6lql6    1/1     Running   5 (32h ago)     7d
myrelease-checkmk-node-collector-machine-sections-8dqr7    1/1     Running   9 (32h ago)     7d
myrelease-checkmk-node-collector-machine-sections-dm7b2    1/1     Running   6 (32h ago)     7d
myrelease-checkmk-node-collector-machine-sections-rmpwd    1/1     Running   6 (10h ago)     7d
myrelease-checkmk-node-collector-machine-sections-v45t6    1/1     Running   8 (26h ago)     7d
myrelease-checkmk-node-collector-machine-sections-x8mvq    1/1     Running   8 (26h ago)     7d

** Problem **

The node collector container metrics do fail every couple of hours. It is not really predictable. But it might even create a flapping state for a node host in checkmk which is annoying. After a restart it just works for a minutes or couple of hours or even days.

bye
David

1 Like