Check issues after dist upgrade to Ubuntu 20.04 (python3 SSL certificate validation)

We’ve been running CMK 2.0.0p3 on Ubuntu 16.04 for a few weeks.
Yesterday I’ve updated in-place (via do-release-upgrade) to 18.04 LTS and then 20.04 LTS.
As recommended in the CMK 2.0 upgrade guide, we’ve kept the /opt/omd directory renamed to prevent any issues due to the uninstall and reinstall of packages.

Everything went fine, but for a few issues:

  1. check_http now fails for old hosts that don’t support TLSv1.2, but only TLSv1.0. Changing openssl.cnf to lower the openssl requirements towards remote endpoints didn’t help, but that’s only 2 hosts that are affected and not the main issue

  2. special agent graylog doesn’t work anymore:

Error: HTTPSConnectionPool(host='monitoring.host', port=443): Max retries exceeded with url: /api/system/cluster/stats (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1125)')))

This is a significant issue as all hosts that have piggyback services due to the graylog agent, now have their Checkmk service changed to yellow and while we’ve acknowledged them all we’ll be blind to any further sections going missing.

  1. in a very similar vein, we’ve got very similar issues with the third-party plugin
    https://www.claudiokuenzler.com/monitoring-plugins/check_esxi_hardware.php
    Here, the callstack looks like this:
Traceback (most recent call last):
  File "/omd/sites/INSTANCE/local/lib/python3/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/omd/sites/INSTANCE/local/lib/python3/urllib3/connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "/omd/sites/INSTANCE/local/lib/python3/urllib3/connectionpool.py", line 1010, in _validate_conn
    conn.connect()
  File "/omd/sites/INSTANCE/local/lib/python3/urllib3/connection.py", line 411, in connect
    self.sock = ssl_wrap_socket(
  File "/omd/sites/INSTANCE/local/lib/python3/urllib3/util/ssl_.py", line 453, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls)
  File "/omd/sites/INSTANCE/local/lib/python3/urllib3/util/ssl_.py", line 495, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock)
  File "/omd/sites/INSTANCE/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/omd/sites/INSTANCE/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/omd/sites/INSTANCE/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1125)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/omd/sites/INSTANCE/local/lib/python3/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/omd/sites/INSTANCE/local/lib/python3/urllib3/connectionpool.py", line 783, in urlopen
    return self.urlopen(
  File "/omd/sites/INSTANCE/local/lib/python3/urllib3/connectionpool.py", line 783, in urlopen
    return self.urlopen(
  File "/omd/sites/INSTANCE/local/lib/python3/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/omd/sites/INSTANCE/local/lib/python3/urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='172.30.11.11', port=5989): Max retries exceeded with url: /cimom (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1125)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/omd/sites/INSTANCE/local/lib/python3/pywbem/_cim_http.py", line 315, in wbem_request
    resp = conn.session.post(
  File "/omd/sites/INSTANCE/local/lib/python3/requests/sessions.py", line 590, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/omd/sites/INSTANCE/local/lib/python3/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/omd/sites/INSTANCE/local/lib/python3/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/omd/sites/INSTANCE/local/lib/python3/requests/adapters.py", line 514, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='172.30.11.11', port=5989): Max retries exceeded with url: /cimom (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1125)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./check_esxi_hardware.py", line 720, in <module>
    instance_list = wbemclient.EnumerateInstances(classe)
  File "/omd/sites/INSTANCE/local/lib/python3/pywbem/_cim_operations.py", line 2775, in EnumerateInstances
    result = self._imethodcall(
  File "/omd/sites/INSTANCE/local/lib/python3/pywbem/_cim_operations.py", line 1823, in _imethodcall
    reply_data, self._last_server_response_time = wbem_request(
  File "/omd/sites/INSTANCE/local/lib/python3/pywbem/_cim_http.py", line 320, in wbem_request
    raise ConnectionError(msg, conn_id=conn.conn_id)
pywbem._exceptions.ConnectionError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1125); OpenSSL version used: OpenSSL 1.1.1k  25 Mar 2021

I thought maybe I need to use the Global settings rule “Trusted certificate authorities for SSL” - only to find out that at least the graylog SSL root certs (both internal CAs) had been placed there already in the past.
They are also verified included in ~/var/ssl/ca-certificates.crt

What am I doing wrong? Is there more to do after such an upgrade? Why does the urllib3 even check for the validity of the cert if the agent_graylog.py actively tries to disable the check?

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

I think the cert check is not properly disabled inside the special agent.
This urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) one line is not enough anymore to disable the certificate warnings.
Normally you need also in the request options an “verify=False”.
This could also be the same problem for the ESX hardware check.

Try to put these root CA certs also inside the system cert store.
If it is configured correctly also a test with “openssl s_client -connect host:443” should give no error message. This test must also work for your downgrade hosts.
Own scripts and special agents don’t use the ~/var/ssl/ca-certificates.crt. Mostly this is used for the internal function of CMK.

  1. Re agent_graylog.py: you are correct indeed, I had apparently made a change to the built-in agent_graylog.py and after reinstalling the same CEE DEB for focal this change was therefore overwritten:
--- /omd/versions/2.0.0p3.cee/lib/check_mk/special_agents/agent_graylog.py      2021-04-17 13:59:34.000000000 +0200
+++ /omd/versions/PW2.0.0p3.cee/lib/check_mk/special_agents/agent_graylog.py    2021-05-02 21:31:16.781934033 +0200
@@ -145,7 +145,7 @@
 
 def handle_response(url, args):
     try:
-        response = requests.get(url, auth=(args.user, args.password))
+        response = requests.get(url, auth=(args.user, args.password),verify=False)
     except requests.exceptions.RequestException as e:
         sys.stderr.write("Error: %s\n" % e)
         if args.debug:

→ re-applied the diff, now check is working again. Apparently the agent is missing a WATO option to ignore cert issues, similar to other checks and special agents.

  1. This made me remember, however: I was unable to get those changes to stick by copying the file into the ~/local/ site hierarchary, as one usually does. Checkmk would not use the “local” version, but always the built-in one. Same issue for agent_vsphere, actually. (In that case though I did remember that I made a change and therefore redid it after update to Ubuntu 20.04)
    Do you know of a way to override the built-in special agents with local ones?

  2. When I hover over the Trusted roots setting in Global settings, the help text does in fact include “or when special agents communicate via HTTPS”. But that’s obviously incorrect (or the agent_graylog is not using the correct function / invocation?) as I’ve had the certs included in the setting and nevertheless needed to add the “verify=False”

The problem is that inside CMK 2.0 all special agents consists of two parts. Inside the folder “~/share/check_mk/agents/special/”. This file is only a “dummy” that loads the real agent from “~/lib/check_mk/special_agents/”. If you want to make a durable change then you need to copy first the dummy file to the local structure and let this file load your modified real agent from somewhere inside the local folder structure.

1 Like

I don’t really understand the point of first folder to simply invoke, without any functional changes, another file in another hierarchy altogether.
Also, given that the python root is ~/lib, I’m wondering how I could create a python file in the local structure that imports a file sitting at ~/local/lib.

Given that these wrapper scripts are functional no-ops, anyway, the following incredibly-backwards solution seems to work for me:

cp ~/lib/check_mk/special_agents/agent_graylog.py ~/local/share/check_mk/agents/special/agent_graylog
(and a chmod +x as the former folder contains non-executable python scripts, while the latter expects executables only)

Yes your solution can be done the described way. To your question why there are two different folders are involved → for compatibility with older special agents :slight_smile:
If this would not be needed anymore then the special agent would only be inside the “~/lib/…”

If your Python file is inside the local structure you can do a relative import from a file also inside the local folders. For instance also in your own check plugins using the same helper functions and so on.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.