Distributed monitoring 2.1 to 2.2 upgrade

CMK version: 2.2.0p23 enterprise
OS version: RHEL 8.9

Currently using distributed monitoring with checkmk. version 2.1.0p41.
Trying to upgrade sites one by one starting from remote servers to 2.2.0p23.
However, after a successful upgrade (no errors), I cannot activate connection from master to remote. No errors generated, it just wait in “Activating” forever (I could wait at max 1 hour before going back to old version).

Tried to check all the logs but couldnt find a meaningful error. Any suggestions for troubleshooting?

Thanks in advance.

Which edition are you using?

2.2.0p23.cee
enterprise edition

Thanks, I will check tomorrow, I have an idea which change could have introduced a regression.

2 Likes

while “activating” is waiting like that, I only see continuous logging below on remote checkmk apache/access-log :

*> 10.10.118.252 - - [09/Apr/2024:19:46:10 +0200] "POST /slavesite1/check_mk/automation.py?command=checkmk-remote-automation-get-status HTTP/1.1" 200 513 "-" "python-requests/2.31.0"*
*> 10.10.118.252 - - [09/Apr/2024:19:46:11 +0200] "POST /slavesite1/check_mk/automation.py?command=checkmk-remote-automation-get-status HTTP/1.1" 200 512 "-" "python-requests/2.31.0"*
*> 10.10.118.252 - - [09/Apr/2024:19:46:11 +0200] "POST /slavesite1/check_mk/automation.py?command=checkmk-remote-automation-get-status HTTP/1.1" 200 513 "-" "python-requests/2.31.0"*
*> 10.10.118.252 - - [09/Apr/2024:19:46:12 +0200] "POST /slavesite1/check_mk/automation.py?command=checkmk-remote-automation-get-status HTTP/1.1" 200 513 "-" "python-requests/2.31.0"*
*> 10.10.118.252 - - [09/Apr/2024:19:46:12 +0200] "POST /slavesite1/check_mk/automation.py?command=checkmk-remote-automation-get-status HTTP/1.1" 200 513 "-" "python-requests/2.31.0"*
*> 10.10.118.252 - - [09/Apr/2024:19:46:13 +0200] "POST /slavesite1/check_mk/automation.py?command=checkmk-remote-automation-get-status HTTP/1.1" 200 513 "-" "python-requests/2.31.0"*
*> 10.10.118.252 - - [09/Apr/2024:19:46:13 +0200] "POST /slavesite1/check_mk/automation.py?command=checkmk-remote-automation-get-status HTTP/1.1" 200 513 "-" "python-requests/2.31.0"*
*> 10.10.118.252 - - [09/Apr/2024:19:46:14 +0200] "POST /slavesite1/check_mk/automation.py?command=checkmk-remote-automation-get-status HTTP/1.1" 200 513 "-" "python-requests/2.31.0"*
1 Like

Thanks. If you have any suggested 2.2 code level to try in your mind, please feel free to share :slight_smile:

Tried 2.2 p14 / p17 / p20 / P23 / p24, ended up same result.

First finding: Does not happen when using the Raw. So it must have to do with changes in the licensing system that were introduced since around 2.2.0p13.

I’ve realized something similar cause license was not visible after the upgrade. Even if i reinstall license after upgrade, it does not work. Maybe we can try going something lower than 2.2.0p13 and wait for some future fix?

I cannot reproduce the problem with a fresh 2.1.0p41 CEE setup with two remote sites. No problems updating the remote sites to either 2.2.0p16 or p24.

Could you test with 2.2.0p16 since it was actually with 2.2.0p17 changes in the licensing component were introduced that were targeted at the 2.2 → 2.3 update?

EDIT: You already tried 2.2.0p14… I will ask the devs which info they need to further dig down.

After upgrade to 2.2.0p12, activation status:

) () got an unexpected keyword argument ‘license_state’

Updating all sites at once solved the issue.

That would have been my next suggestion:

The preferred, safest procedure is to update in one go, in which you perform the following steps:

  1. First, stop all sites

  2. Then perform the update for all sites

  3. Restart the updated sites

If this is not possible — for example, because the environment is distributed across sites time zones and with different supporting teams — a temporary mixed operation can be implemented under strict conditions. The versions may differ by no more than one version for major updates, and always assumes a specific patch level for the current (existing) version.

Taken from: Updates and Upgrades

Thanks Mattias, all good now

1 Like