CMK version:
CheckMK : 2.2.0p8
Node/Cluster Collector : 1.4.1
OS version:
Kubernetes: Ubuntu 22.04
CheckMK: CentOS 8
** Installation on K8S Agent **
helm upgrade --install --create-namespace -n checkmk-monitoring myrelease checkmk-chart/checkmk -f values.yaml
We are using the example NodePort Configuration : clusterCollector: {service: {type: NodePort, nodePort: 30035}}
Error message:
2023-09-13T09:13:09.021455664Z stderr F Traceback (most recent call last):
2023-09-13T09:13:09.021521298Z stderr F File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
2023-09-13T09:13:09.02185322Z stderr F httplib_response = self._make_request(
2023-09-13T09:13:09.021864063Z stderr F File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
2023-09-13T09:13:09.022136775Z stderr F six.raise_from(e, None)
2023-09-13T09:13:09.022171604Z stderr F File "<string>", line 3, in raise_from
2023-09-13T09:13:09.022302999Z stderr F File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
2023-09-13T09:13:09.022563941Z stderr F httplib_response = conn.getresponse()
2023-09-13T09:13:09.022575242Z stderr F File "/usr/local/lib/python3.10/http/client.py", line 1374, in getresponse
2023-09-13T09:13:09.023248129Z stderr F response.begin()
2023-09-13T09:13:09.023257672Z stderr F File "/usr/local/lib/python3.10/http/client.py", line 318, in begin
2023-09-13T09:13:09.023514938Z stderr F version, status, reason = self._read_status()
2023-09-13T09:13:09.023524526Z stderr F File "/usr/local/lib/python3.10/http/client.py", line 287, in _read_status
2023-09-13T09:13:09.023754355Z stderr F raise RemoteDisconnected("Remote end closed connection without"
2023-09-13T09:13:09.023766679Z stderr F http.client.RemoteDisconnected: Remote end closed connection without response
2023-09-13T09:13:09.02377176Z stderr F
2023-09-13T09:13:09.023777189Z stderr F During handling of the above exception, another exception occurred:
2023-09-13T09:13:09.023784393Z stderr F
2023-09-13T09:13:09.023789568Z stderr F Traceback (most recent call last):
2023-09-13T09:13:09.023800358Z stderr F File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
2023-09-13T09:13:09.024045195Z stderr F resp = conn.urlopen(
2023-09-13T09:13:09.024054937Z stderr F File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 785, in urlopen
2023-09-13T09:13:09.024388657Z stderr F retries = retries.increment(
2023-09-13T09:13:09.024398853Z stderr F File "/usr/local/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
2023-09-13T09:13:09.024673821Z stderr F raise six.reraise(type(error), error, _stacktrace)
2023-09-13T09:13:09.024684319Z stderr F File "/usr/local/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
2023-09-13T09:13:09.025025039Z stderr F raise value.with_traceback(tb)
2023-09-13T09:13:09.025036252Z stderr F File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
2023-09-13T09:13:09.025340043Z stderr F httplib_response = self._make_request(
2023-09-13T09:13:09.025381349Z stderr F File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
2023-09-13T09:13:09.025527387Z stderr F six.raise_from(e, None)
2023-09-13T09:13:09.02554158Z stderr F File "<string>", line 3, in raise_from
2023-09-13T09:13:09.025586726Z stderr F File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
2023-09-13T09:13:09.025799665Z stderr F httplib_response = conn.getresponse()
2023-09-13T09:13:09.025837909Z stderr F File "/usr/local/lib/python3.10/http/client.py", line 1374, in getresponse
2023-09-13T09:13:09.026457413Z stderr F response.begin()
2023-09-13T09:13:09.026474673Z stderr F File "/usr/local/lib/python3.10/http/client.py", line 318, in begin
2023-09-13T09:13:09.026720667Z stderr F version, status, reason = self._read_status()
2023-09-13T09:13:09.026740112Z stderr F File "/usr/local/lib/python3.10/http/client.py", line 287, in _read_status
2023-09-13T09:13:09.026957131Z stderr F raise RemoteDisconnected("Remote end closed connection without"
2023-09-13T09:13:09.026980682Z stderr F urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
2023-09-13T09:13:09.027017584Z stderr F
2023-09-13T09:13:09.027026246Z stderr F During handling of the above exception, another exception occurred:
2023-09-13T09:13:09.027033063Z stderr F
2023-09-13T09:13:09.027040636Z stderr F Traceback (most recent call last):
2023-09-13T09:13:09.027055094Z stderr F File "/usr/local/bin/checkmk-container-metrics-collector", line 8, in <module>
2023-09-13T09:13:09.027127251Z stderr F sys.exit(main_container_metrics())
2023-09-13T09:13:09.02714198Z stderr F File "/usr/local/lib/python3.10/site-packages/checkmk_kube_agent/send_metrics.py", line 466, in _main
2023-09-13T09:13:09.027397078Z stderr F worker(session, cluster_collector_base_url, headers, verify)
2023-09-13T09:13:09.027421391Z stderr F File "/usr/local/lib/python3.10/site-packages/checkmk_kube_agent/send_metrics.py", line 336, in container_metrics_worker
2023-09-13T09:13:09.027562438Z stderr F cluster_collector_response = session.post(
2023-09-13T09:13:09.027593465Z stderr F File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 637, in post
2023-09-13T09:13:09.027840302Z stderr F return self.request("POST", url, data=data, json=json, **kwargs)
2023-09-13T09:13:09.027850875Z stderr F File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
2023-09-13T09:13:09.028107167Z stderr F resp = self.send(prep, **send_kwargs)
2023-09-13T09:13:09.028123445Z stderr F File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
2023-09-13T09:13:09.02844691Z stderr F r = adapter.send(request, **kwargs)
2023-09-13T09:13:09.028466079Z stderr F File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 501, in send
2023-09-13T09:13:09.028697261Z stderr F raise ConnectionError(err, request=request)
2023-09-13T09:13:09.028708164Z stderr F requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
** kubectl get pods **
❯ kgp
NAME READY STATUS RESTARTS AGE
myrelease-checkmk-cluster-collector-6fc8fbb858-7xlz6 1/1 Running 72 (49m ago) 7d
myrelease-checkmk-node-collector-container-metrics-2vgqs 2/2 Running 74 (69m ago) 7d
myrelease-checkmk-node-collector-container-metrics-8s8t2 2/2 Running 69 (69m ago) 7d
myrelease-checkmk-node-collector-container-metrics-c9vsc 2/2 Running 68 (13h ago) 7d
myrelease-checkmk-node-collector-container-metrics-ct9sf 2/2 Running 72 (9h ago) 7d
myrelease-checkmk-node-collector-container-metrics-knr6x 2/2 Running 67 (20m ago) 7d
myrelease-checkmk-node-collector-container-metrics-m8vv6 2/2 Running 158 (60m ago) 7d
myrelease-checkmk-node-collector-container-metrics-pgrd9 2/2 Running 424 (65m ago) 7d
myrelease-checkmk-node-collector-container-metrics-tmh2b 2/2 Running 192 (60m ago) 6d20h
myrelease-checkmk-node-collector-machine-sections-5pvmh 1/1 Running 4 (3d15h ago) 7d
myrelease-checkmk-node-collector-machine-sections-5zdqt 1/1 Running 9 (6d ago) 7d
myrelease-checkmk-node-collector-machine-sections-6lql6 1/1 Running 5 (32h ago) 7d
myrelease-checkmk-node-collector-machine-sections-8dqr7 1/1 Running 9 (32h ago) 7d
myrelease-checkmk-node-collector-machine-sections-dm7b2 1/1 Running 6 (32h ago) 7d
myrelease-checkmk-node-collector-machine-sections-rmpwd 1/1 Running 6 (10h ago) 7d
myrelease-checkmk-node-collector-machine-sections-v45t6 1/1 Running 8 (26h ago) 7d
myrelease-checkmk-node-collector-machine-sections-x8mvq 1/1 Running 8 (26h ago) 7d
** Problem **
The node collector container metrics do fail every couple of hours. It is not really predictable. But it might even create a flapping state for a node host in checkmk which is annoying. After a restart it just works for a minutes or couple of hours or even days.
bye
David