Massive Livestatus errors/timeouts after updating Ubuntu 18.04 to the latest version

Hi list
after updating my Ubuntu VMs today to the newwest update i encouter massive livestatus timeout errors.
At first i thought this might have been the Checkmk update from p16 to p17 but it seems not.
I run a 4 VM setup, all Ubuntu 18.04LTS with a distributed setup. After updating to 4.15.0-115-generic #116-Ubuntu SMP Wed Aug 26 14:04:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux (thats what uname -a says) today the biggest two of the slaves encounter much livestatus timeouts. The livestatus.log on the master reads lots of [cmk.liveproxyd.(14909).Site(<lavename>).Client(17)] Cannot send error message to client: [Errno 9] Bad file descriptor and also often [Livestatus error: (‘_ssl.c:711: The handshake operation timed out’,).] similar to [Livestatus error: ('_ssl.c:711: The handshake operation timed out',). The encryption settings are probably wrong."].
Also when i log onto the slaves directly i get empty dashboards with
Cannot connect to 'unix:/omd/sites/INFMON01_2/tmp/run/live': [Errno 11] Resource temporarily unavailable
After fiddling around with downgrade to p16/update o p17 and vice versa and stuff i came up to disable the slaves’ TLS encryption for the connection and changing the livestatus proxy settings a little and enabling ht setting “Use persistent connection” in the slaves’ connection setup.
Still i get a lot of these on the master:

2020-09-02 10:24:05,932 [40] [cmk.liveproxyd.(544).Site(<sitename>).Client(22)] Cannot send error message to client: [Errno 32] Broken pipe
2020-09-02 10:24:37,675 [40] [cmk.liveproxyd.(544).Site(<sitename>).Client(27)] Cannot send error message to client: [Errno 32] Broken pipe
2020-09-02 10:24:37,786 [40] [cmk.liveproxyd.(544).Site(<sitename>).Thread(Thread-12).Channel(12)] Channel failed
Traceback (most recent call last):
  File "/omd/sites/INFMON01/lib/python/cmk/cee/liveproxy/Channel.py", line 174, in _execute
    answer = self._get_livestatus_response()
  File "/omd/sites/INFMON01/lib/python/cmk/cee/liveproxy/Channel.py", line 337, in _get_livestatus_response
    header = self._receive_data(16, self._site.query_timeout())
  File "/omd/sites/INFMON01/lib/python/cmk/cee/liveproxy/Channel.py", line 385, in _receive_data
    raise Exception("Remote Site Query timeout")
Exception: Remote Site Query timeout

BR Thomas

Ok something is wrong with one of the slave sites.
When i log on the site directly after md starts everything is fine, but after some minutes all services and hosts disappear:

BR

Downgraded the Kernel to 4.15.0-111, this seems to cool down the problems a little.

BR

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.