Kubernetes plugins upgrade for kubernetes > 1.18?

Follow up of Kubernetes plugins fails with kubernetes > 1.18

Yesterday python-kubernetes released https://github.com/kubernetes-client/python/releases/tag/v12.0.0 which (if I understand correctly) can talk to more recent version of kubernetes.

Any plans to upgrade the plugin so we can re-enable supervising kubernetes with checkmk ?

Ping @KartoffelSalat @andreas-doehler who participated in the original thread

ping @StodaraHodan @martin.hirschvogel who also participated in the original thread.

Cool, I will let our developer know.

While they updated this, they haven’t yet achieved support for 1.18 etc.
See here:


They plan a stable support for 1.1.8 for 13 December as it seems.

@arthurlogilab Thnx for the reminder

Since v12.0.1 has beed released, I tried to update the client but failed. When I try pip install kubernetes==12.0.1 I get No matching distribution found for kubernetes==12.0.1

The update for 12.0.0 works, but has the same error as before:

Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Date': 'Thu, 12 Nov 2020 11:57:47 GMT', 'Content-Length': '19', 'Content-Type': 'text/plain; charset=utf-8', 'Cache-Control': 'no-cache, private'})
HTTP response body: 404: Page Not Found

The reason seems to be what @martin.hirschvogel mentioned before.

We have a similar situation in a Java Project where we use the java-client library, which also hangs behind the API release schedule.

What I’m learning from this is not to upgrade kubernetes too often and to have an eye on the toolchain.

A new Python Client has been released. As one can see in the Changelog they are not only changing the naming scheme but also annouce to drop support for Python 2 from the 1.18 Release on.

since CheckMk comes with Python 2.7.17 I ask my self if the kuberntes plugin has any future?

CMK 2 has also Python3, I think this is no problem here.

I’m using OMD - Open Monitoring Distribution Version 1.6.0p17.cre and find tells me:

$ find -L . -type f -name "python"
./version/bin/python
./bin/python

which is version 2.7.17. Did I miss something during setup?

No. Andreas was referring to the upcoming release 2.0 which uses python3. You’re on 1.6 (the latest stable) which still has python2.

1 Like

And we are thinking about a different solution how to do Kubernetes monitoring in the meantime :slight_smile:
There is of course our Prometheus integration as an alternative.

same here with a fresh installed K8

Reason: Not Found
HTTP response headers: HTTPHeaderDict({‘Date’: ‘Sun, 29 Nov 2020 23:03:19 GMT’, ‘Content-Length’: ‘19’, ‘Content-Type’: ‘text/plain; charset=utf-8’, ‘Cache-Control’: ‘no-cache, private’})
HTTP response body: 404: Page Not Found

I ran into the same issue, after some digging through the code I found this by adding some debug output into /omd/sites/twa/lib/python/kubernetes/client/rest.py:231

OMD[twa]:~$ /omd/sites/twa/share/check_mk/agents/special/agent_kubernetes '--token' '<token>' '--infos' 'services,deployments,pods,daemon_sets,stateful_sets' '--port' '443' '--no-cert-check' '--url-prefix' 'https://<URL>' '--path-prefix' '/k8s/clusters/c-c7fcd' --debug '<HOST>'
('URL: ', 'https://<URL>:443/k8s/clusters/c-c7fcd/apis/storage.k8s.io/v1/storageclasses')
('URL: ', 'https://<URL>:443/k8s/clusters/c-c7fcd/api/v1/namespaces')
('URL: ', 'https://<URL>:443/k8s/clusters/c-c7fcd/apis/rbac.authorization.k8s.io/v1/roles')
('URL: ', 'https://<URL>:443/k8s/clusters/c-c7fcd/apis/rbac.authorization.k8s.io/v1/clusterroles')
('URL: ', 'https://<URL>:443/k8s/clusters/c-c7fcd/api/v1/componentstatuses')
('URL: ', 'https://<URL>:443/k8s/clusters/c-c7fcd/api/v1/nodes')
('URL: ', 'https://<URL>:443/k8s/clusters/c-c7fcd/api/v1/nodes/shared-cluster-node-1/proxy/stats')
kubernetes.client.rest.ApiException: (404)
...

so it’s actually the request to get stats for the node which leads to a 404.
I can’t find anything in the k8s API docs which describes this endpoint and I also have no clue which might be missing in my cluster to serve this endpoint.

Update:
found this issue here in the forum, which provides a workaround:
Deprecated cadvisor stats with Kubernetes 1.18 and check-mk 1.6.0p18

A fix is on the way and will be available in p20

2 Likes

@martin.hirschvogel good news! Can we test with version 2.0.0b2 (beta) of check-mk ? Is it possible to test with version 1.6.0 (stable) ?

Both 2.0.0b2 and 1.6.0p20 have it. But 1.6.0p20 is not yet released. So, why not try out Cmk 2.0.0b2.

Well, thnx for the tip, but in my case it did not work out.

The agent fetches a lot of stuff but then failes with:

Traceback (most recent call last):
  File "./agent_kubernetes", line 9, in <module>
    sys.exit(main())
  File "/omd/sites/ote/lib/python/cmk/special_agents/agent_kubernetes.py", line 1215, in main
    api_data = ApiData(api_client)
  File "/omd/sites/ote/lib/python/cmk/special_agents/agent_kubernetes.py", line 1007, in __init__
    for node in nodes.items
  File "/omd/sites/ote/local/lib/python/kubernetes/client/api/core_v1_api.py", line 1919, in connect_get_node_proxy_with_path
    return self.connect_get_node_proxy_with_path_with_http_info(name, path, **kwargs)  # noqa: E501
  File "/omd/sites/ote/local/lib/python/kubernetes/client/api/core_v1_api.py", line 2020, in connect_get_node_proxy_with_path_with_http_info
    collection_formats=collection_formats)
  File "/omd/sites/ote/local/lib/python/kubernetes/client/api_client.py", line 353, in call_api
    _preload_content, _request_timeout, _host)
  File "/omd/sites/ote/local/lib/python/kubernetes/client/api_client.py", line 184, in __call_api
    _request_timeout=_request_timeout)
  File "/omd/sites/ote/local/lib/python/kubernetes/client/api_client.py", line 377, in request
    headers=headers)
  File "/omd/sites/ote/local/lib/python/kubernetes/client/rest.py", line 243, in GET
    query_params=query_params)
  File "/omd/sites/ote/local/lib/python/kubernetes/client/rest.py", line 233, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (404)

I’m running kubernetes 1.19.

Update:

I forgot to add the --enable-cadvisor-json-endpoints flag in all nodes; just did it on the control-plane node. After I fixed it, it worked :pray: thnx a lot :+1:

Update 2:

Now that the check retrives information, check_mk is not very happy with it, since it reports CRIT - [special_kubernetes] Version: unknown, OS: unknown, Got no information from host, execution time 28.5 sec 
 which also renders this agent not usable.

Hi,

similar error here with

  • Kubernetes v1.19.10
  • Checkmk-Raw v2.0.0p9.cre
[special_kubernetes] Version: unknown, OS: unknown, Missing monitoring data for check plugins: logwatch_ec  WARN, execution time 1.8 sec

and service “Log Forwarding” is pending. All other services are fine.

Any solution for that?

Is there any update on this ? I’m using Enterprise 2.0.0p15, kubernetes 1.22 and getting similar errors 401, 403 etc.

We have rewritten the entire Kubernetes monitoring for various reasons. Among them that the official Kubernetes Python client is not reliable enough.
The beta of Checkmk 2.1 is coming out soon, where you can then already try out the new Kubernetes monitoring.

1 Like

I also got the same status code 404 when using the K8s Agent with the following versions:

docker image = checkmk/check-mk-raw:2.0.0p17

kubernetes = 1.22.5

The error on the UI is the following:

Agent exited with code 1: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({‘Audit-Id’: ‘745831d9-0307-448c-9a94-cf4b31dca388’, ‘Cache-Control’: ‘no-cache, private’, ‘Content-Type’: ‘text/plain; charset=utf-8’, ‘X-Content-Type-Options’: ‘nosniff’, ‘X-Kubernetes-Pf-Flowschema-Uid’: ‘a5702eaf-86a4-44e6-9156-623ccd9297b1’, ‘X-Kubernetes-Pf-Prioritylevel-Uid’: ‘3c611a5c-dfc2-4112-94d1-295497ebe9de’, ‘Date’: ‘Wed, 16 Mar 2022 14:18:38 GMT’, ‘Content-Length’: ‘19’})
HTTP response body: 404 page not found

When redeploying the sameK8s cluster but with version 1.21.5 it worked fine.

I also tested with different versions of checkmk which I got the following errors:

  • 2.0.0p21 - Gets the same error

  • 2.1.0b2 - Get error “Agent exited with code 1: ‘NoneType’ object has no attribute ‘partition’”

Kubernetes only maintains release branches for the most recent three minor releases which currently are 1.23, 1.22 and 1.21.

I’m wondering if checkmk can keep the pace and be always compatible with at least one of the last 3 K8s minor versions ( Releases | Kubernetes), because this is a most for a production environment. Is this something that can be guaranteed by checkmk?