Help needed with Kubernetes monitoring

Hi all
i’d need some help to geht Kubernetes monitoring working.
I got the certificate and the token and the endpoint IP which i guess is the IP to run the special agent against (hostname is xx.xx.xx.xx.bc.googleusercontent.com).
But i cannot get it to work.
I am totally new to this and i’d need some more screenshots here: https://checkmk.com/cms_monitoring_kubernetes.html i guess :slight_smile:
Is there anyone willing to give me a hint :slight_smile:

BR Thomas

What version of Kubernetes do you use?
If it is 1.16 or newer then the Python bindings will not work as they only support Kubernetes up to 1.15.
After a look at the github repo i would say only the go client and the javascript client are up to date for 1.18.

Hi Andreas
it says: Master-Version 1.14.10-gke.36

BR

Ok 1.14 should work.
What happens if executed manually on the command line?

In the document you posted you can find instructions for trying on the command line, as Andreas is suggesting.
Running on the command line should give a clear error message.
…but you may already have an error in your dashboard if it is failing…?

What have you done so far?
Some hints at what is failing or where you are struggling would be great :slight_smile:

Hi thank you for the quick replies.
Sorry yes i missed some things. Basically i am stuck at poit 2.4 where i add the host.
Everything until 2.4 worked as described i.e.Cert & token … . At first i thought maybe our companies’ proxy won’t allow this but all other checks which must go over the proxy work and a curl -k https://xx.xx.xx.xx/ brings:
{
“kind”: “Status”,
“apiVersion”: “v1”,
“metadata”: {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403

Which indicates that the cluster can be accessed at least.
Which commands you cou mean? i went through the documenation but i can’t find it?
And i think i found a mistake: i think it should be *Custom path suffix* instead of *Custom path prefix* because it is to be appended to the URL, shouldn’t it?

BR
Thomas

Run the special agent manually.
The complete command you will get with a “cmk -D hostname” to your kubernetes host.
You can also add the options “-v” and “–debug” to get a little bit more output.

Hi Andreas, this is what i get:
OMD[]:~$ cmk -D -v -debug

<hostname>
Addresses:              xx.xx.xx.xx
Tags:                   [APP_Type:no_app], [Cluster:no_cls], [Country:de], [DB_Type:db_none], [Dev_Mount:no_mnt_plc], [Device_Group:no_dev_grp], [Device_Type:no_dev_tp], [Domain:no_dom], [Hardware_Manufacturer:no_hw], [Hypervisor:no_hv], [Interface_Count:no_int], [Location:hq_ing], [Lot:no_lot], [NGN_Store:no_ngn], [Operating_System:no_os], [Server_Type:no_srv_type], [Team:no_team], [Units:no_unit], [address_family:ip-v4-only], [agent:special-agents], [criticality:prod], [ip-v4:ip-v4], [mnt_infra:no_mnt], [networking:lan], [no_local_checks:no_local_checks], [piggyback:auto-piggyback], [site:INFMON01], [snmp_ds:no-snmp], [tcp:tcp]
Labels:
Host groups:            check_mk
Contact groups:         Administrators, cg_germany, all
Agent mode:             No Checkmk agent, all configured special agents
Type of agent:
  Program: /omd/sites/<sitename>/local/share/check_mk/agents/special/agent_kubernetes --pwstore=2@0@kubernetes_test_mf '--token' '***********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************' '--infos' 'nodes,services,deployments,pods,daemon_sets,stateful_sets' '--port' '443' '--no-cert-check' 'xx.xx.xx.xx'
  Process piggyback data from /omd/sites/INFMON01/tmp/check_mk/piggyback/xx.xx.xx.xx.cloud.google.com
Services:
  checktype item params description groups
  --------- ---- ------ ----------- ------

This needs to be tested manually with the mentioned verbose and debug switches.

Hi Andreas, this is what i get:

OMD[]:~/local/share/check_mk/agents/special$ ./agent_kubernetes ‘–token’ ‘’ ‘–infos’ ‘nodes,services,deployments,pods,daemon_sets,stateful_sets’ ‘–port’ ‘443’ ‘–no-cert-check’ ‘’ -v --debug
2020-07-15 14:38:20,276 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by ‘NewConnectionError(’<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2688a78750>: Failed to establish a new connection: [Errno 110] Connection timed out’,)‘: /apis/storage.k8s.io/v1/storageclasses
2020-07-15 14:38:20,276 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by ‘NewConnectionError(’<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2688a78750>: Failed to establish a new connection: [Errno 110] Connection timed out’,)': /apis/storage.k8s.io/v1/storageclasses

this leaves me puzzled :slight_smile:

Thomas

First the “-v” and “–debug” must be before the IP/hostname of the queried system.
Your agent don’t run with extra output at the moment.
If the option is set correctly you should see something like “parsed arguments: …”

Hi Andreas
the result is the same whether i put

./agent_kubernetes '-v' '--debug' '--token' 'xxx' '--infos' 'nodes,services,deployments,pods,daemon_sets,stateful_sets' '--port' '443' '--no-cert-check' 'host'

or

> ./agent_kubernetes '--token' 'xxx' '--infos' 'nodes,services,deployments,pods,daemon_sets,stateful_sets' '--port' '443' '--no-cert-check' '-v' '--debug' 'host'

In WATO -> Diagnostic it says for the Agent: API Error:Your request timed out after 110 seconds. This issue may be related to a local configuration problem or a request which works with a too large number of objects. But if you think this issue is a bug, please send a crash report.

I am sorry… i feel a little dumb now… am i missing something?

Thomas

Ok news on this:
we examined the special agent (~/local/share/check_mk/agents/special/agent_kubernetes) and it seems that there is no option for parsing argument for a proxy.
Is this correct?
We modified the special agent to fetch the systems’ proxy settings. The the proxy options in the global settings do not work.
Old(line 1416ff):

def get_api_client(arguments):
    # type: (argparse.Namespace) -> client.ApiClient
    logging.info('Constructing API client')

      config = client.Configuration()

      if arguments.url_prefix:
         config.host = '%s:%s%s' % (arguments.url_prefix.rstrip("/"), arguments.port,
                                    arguments.path_prefix)

New (line 1416ff):

> def get_api_client(arguments):
>     # type: (argparse.Namespace) -> client.ApiClient
>     logging.info('Constructing API client')
> 
>     config = client.Configuration()
> 
>     proxy_url = os.getenv('http_proxy', None)
>     logging.info("Setting proxy: {}".format(proxy_url))
>     config.proxy = proxy_url
> 
>     if arguments.url_prefix:
>         config.host = '%s:%s%s' % (arguments.url_prefix.rstrip("/"), arguments.port,
>                                    arguments.path_prefix)

Now the host gets inventarized fine and the check from command line also works (i.e. cmk -D -v ):


But in the webinterface Check_MK reports a timeout and all discovered services remain on PEND
Dynamic configuration is set up according to Dynamic host configuration - Managing dynamic infrastructures

I think the problem is that the environment where the “Check_MK” service runs don’t see the environment proxy you had with your user on the command line.
The change can work in your environment but will fail in all other environments where a proxy is set but the kubernetes is only reachable directly.

Better would be to extend the agent with the proxy argument and that’s it.

Hi Andreas

Better would be to extend the agent with the proxy argument and that’s it.

Yeah this is exactly what we did by modifying the agent itself.
The output of the command line you see above is executed as siteuser.

BR Thomas

Is not exactly the same as executed by the monitoring core. Is this Nagios or CMC core?

Hi Andreas
its CMC

:slight_smile:

I think the CMC don’t see your environment variable for the proxy. That’s why i said it would be good if this can be done with a normal argument for the special agent.

Your change only pulls the “os.getenv(‘http_proxy’, None)” and this is empty for the special agent i think.

Ah you’re right.
replacing it with the proxy URL works.
Erm… is there a way to modify the agent to accept the proxy as argument?

BR Thomas

Hi Andreas et al…
today we upgraded to 1.6p14CEE and modified the new k8s_extensions package to have an extra argument for proxy:
grafik

Since the package itself is modified, i think it makes more sense to integrate this in the next release rather than publish it via exchange, doesn’t it?

BR Thomas