[special_kube] Agent exited with code 1: could not convert string to float

CMK version:

Checkmk version 2.1.0p15

OS version:

  • OS: Debian GNU/Linux 11 (bullseye)
  • Kubernetes: v1.24.7+k3s1

Error message:

[special_kube] Agent exited with code 1: could not convert string to float: '50M'(!!)

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

OMD[mwscan_site]:~$ cmk --debug -vvn k3sdev
Checkmk version 2.1.0p15
Try license usage history update.
Trying to acquire lock on /omd/sites/mwscan_site/var/check_mk/license_usage/next_run
Got lock on /omd/sites/mwscan_site/var/check_mk/license_usage/next_run
Trying to acquire lock on /omd/sites/mwscan_site/var/check_mk/license_usage/history.json
Got lock on /omd/sites/mwscan_site/var/check_mk/license_usage/history.json
Next run time has not been reached yet. Abort.
Releasing lock on /omd/sites/mwscan_site/var/check_mk/license_usage/history.json
Released lock on /omd/sites/mwscan_site/var/check_mk/license_usage/history.json
Releasing lock on /omd/sites/mwscan_site/var/check_mk/license_usage/next_run
Released lock on /omd/sites/mwscan_site/var/check_mk/license_usage/next_run
+ FETCHING DATA
  Source: SourceType.HOST/FetcherType.PROGRAM
[cpu_tracking] Start [7fc2eb09c310]
[ProgramFetcher] Fetch with cache settings: DefaultAgentFileCache(k3sdev, base_path=/omd/sites/mwscan_site/tmp/check_mk/data_source_cache/special_kube, max_age=MaxAge(checking=0, discovery=120, inventory=120), disabled=False, use_outdated=False, simulation=False)
Not using cache (Does not exist)
[ProgramFetcher] Execute data source
Calling: /omd/sites/mwscan_site/share/check_mk/agents/special/agent_kube --pwstore=4@0@kubernetes_dev '--cluster' 'devcluster' '--token' '**********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************' '--monitored-objects' 'deployments' 'daemonsets' 'statefulsets' 'namespaces' 'nodes' 'pods' '--cluster-aggregation-exclude-node-roles' 'control-plane' 'infra' '--api-server-endpoint' 'https://kube-dev.iznet:6443' '--verify-cert-api' '--api-server-proxy' 'FROM_ENVIRONMENT' '--k8s-api-connect-timeout' '20' '--cluster-collector-endpoint' 'https://kube-dev.iznet/checkmk-collector/' '--verify-cert-collector' '--cluster-collector-proxy' 'FROM_ENVIRONMENT'
[cpu_tracking] Stop [7fc2eb09c310 - Snapshot(process=posix.times_result(user=0.0, system=0.010000000000000009, children_user=1.41, children_system=0.13, elapsed=1.8599999994039536))]
  Source: SourceType.HOST/FetcherType.PIGGYBACK
[cpu_tracking] Start [7fc2eafbcc70]
[PiggybackFetcher] Fetch with cache settings: NoCache(k3sdev, base_path=/omd/sites/mwscan_site/tmp/check_mk/data_source_cache/piggyback, max_age=MaxAge(checking=0, discovery=120, inventory=120), disabled=True, use_outdated=False, simulation=False)
Not using cache (Cache usage disabled)
[PiggybackFetcher] Execute data source
No piggyback files for 'k3sdev'. Skip processing.
Not using cache (Cache usage disabled)
[cpu_tracking] Stop [7fc2eafbcc70 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
+ PARSE FETCHER RESULTS
  Source: SourceType.HOST/FetcherType.PROGRAM
  -> Not adding sections: Agent exited with code 1: could not convert string to float: '50M'
  Source: SourceType.HOST/FetcherType.PIGGYBACK
No persisted sections
  -> Add sections: []
Received no piggyback data
Received no piggyback data
[cpu_tracking] Start [7fc2eafbc4c0]
value store: synchronizing
Trying to acquire lock on /omd/sites/mwscan_site/tmp/check_mk/counters/k3sdev
Got lock on /omd/sites/mwscan_site/tmp/check_mk/counters/k3sdev
value store: loading from disk
Releasing lock on /omd/sites/mwscan_site/tmp/check_mk/counters/k3sdev
Released lock on /omd/sites/mwscan_site/tmp/check_mk/counters/k3sdev
No piggyback files for 'k3sdev'. Skip processing.
[cpu_tracking] Stop [7fc2eafbc4c0 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.0, children_system=0.0, elapsed=0.0))]
[special_kube] Agent exited with code 1: could not convert string to float: '50M'(!!), [piggyback] Missing data(!), execution time 1.9 sec | execution_time=1.860 user_time=0.000 system_time=0.010 children_user_time=1.410 children_system_time=0.130 cmk_time_ds=0.310 cmk_time_agent=0.000

This error even occurs when I disable Enrich with usage data from Checkmk Cluster Collector

2022-11-07 10:54:55,704 WARNING Unsupported Kubernetes version 'v1.24.7+k3s1'. Supported versions are v1.21, v1.22, v1.23.

hmpf

1.24 is supported. We tested it and made some adjustments for 1.24.
Seems like we missed something. I have come across this problem as well before and this is something very likely produced by the Kubernetes API.

Could you do us a favor an run the special agent command with debug and vcrtrace enabled? See here: Debugging the kubernetes - k8s special agent - Checkmk Knowledge Base - Checkmk Knowledge Base
Then we can find out where exactly it goes wrong and what the API returns.

1 Like

Output is too long for this board

Already expired. Can you please reshare.

You must have a hell of a cluster, if it can deal with a CPU requests of 50M.
Because that’s what one of your pods asks for. You should be able to see that pod as being in pending, if your cluster actually can’t handle it.
0.000001M equals to 1000m, which is one core. So I think 50M means 50 million cores. :slight_smile:
I assume the developer actually wanted 50m, and mistyped.

I have forwarded that to the team, that they consider handling such values as well. Because even though I believe them to be very unrealistic, now we have seen the reality :slight_smile:

1 Like

As it is also a bug, that we can’t handle these crazy values, we have fixed this as well:

2 Likes

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.