Error message: Timeout during configuration updates in web interface
Output of “cmk-update-config –site-may-run”: Successfully runs on an upgraded slave site, but not through the web interface. Running a time with the command shows it taking 6.5 minutes (much longer than the 110 second timeout). Is this normal? Can configuration updates not run to slaves during the upgrades? The particular task taking a very long time is the “Create precompiled host and folder files” running on the slave site.
Can you please explain in more detail what steps in what order you have executed ?
The current 2.3 checkmk release is 2.3.0p42, why did you update to p12 ?
I was under the impression I needed to update to the same “p” version. It sounds like that is incorrect, which is great.
The steps taken for testing were to update a remote site and test a config change from the master through the web interface. We have quite a few remote sites and were trying to see how the main site would operate while we worked through the upgrade. The issue is that new configurations take place somewhat regularly and there was concern about remote sites being unable to receive configuration updates during the upgrade. It sounds like we need to pause those until all sites are updated.
The correct upgrade path is 2.2x → 2.2 latest → 2.3 latest → 2.4 latest.
I would upgrade all satellites first and the main site last in one step, one major version after the other.
I have seen several occasions where config changes did not work during updates, so i would freeze the environment during Major Update steps.
Depending on size and requierements other ways may be necessary and possible but this can involve more preparations and testing.
Thank you for clarifying. I must have missed the upgrade to the latest 2.2 as the first step. I will go with a freeze for the period of the update. The last question I have still is regarding the step during the “cmk-update-config –site-may-run” command taking such a long time (6.5+ minutes) to compile the list of hosts and folders. Is that expected?
We are currently in the evaluation of upgrade from 2.2.0p39 to 2.3.0platest.
At least during past upgrades we never had to update to latest patch version before we upgrade to next major version and we do this since 1.2. With ~280 remote sites it would become a long journey
We never had major issues running remote sites one major version above master. Its fully supported and without we would not be able to upgrade because the update phase lasts several weeks.
In general we have a test, preprod and prod envirnment.
To test we do a backup of a remote site from test env. and restore it on a isolated VM. During update of this isolated site we already see any possible issues which needs a correction. We also run cmk-update-config -vvv –debug to identify any issues during the mixed version operation. Timing of cmk-update-config is very important because during the mixed operation the runtime of cmk-update-config is added to the activate changes process.
The step that appears to take the longest is “Create precompiled host and folder files” as stated above. Even with -vvv and –debug, the output on the cli just shows the step in progress without additional output while running
This is from the checkmk documentation, so this is the “official” statement.
There were several occasions in the past where you had to be at least at a certain patch level to update without problems, so latest will probably safest, e.g. werk 15693, where you had to have at least p8 or above.
I agree that this does not have to be enforced in every environment and patchlevel.
I had this with several environments and diferent checkmk versions, that activate changes did not work during the upgrade phase with satellites being one level above the master, but i have not tried it since 2.3 and above. Maybe things have gotten better
What @aeckstine wrote is absolutely correct, and I fully support that approach. That said, in some environments you may have constraints that make it impossible to follow the “silver bullet” solution in every case.
In our setup, we also deliberately wait until a release has reached a reasonable level of maturity before we begin testing updates. At the moment, we are testing an upgrade from 2.2.0p39 to 2.3.0p42.
I have to admit that we have encountered issues during upgrades as well. However, these occurred during the testing phase in our test environment, where we were able to resolve them with the help of Checkmk’s excellent support team.
Regardless of which path you choose, I strongly recommend carefully reading all relevant Werks first to fully understand what has changed, and then performing the upgrade in an isolated test environment. Checkmk provides useful simulation features for offline testing of agent output, but in my experience you still need some real hosts available for proper testing — especially when validating agent updates or new agent plugins.
I will check the disk I/O during the next run since that is one thing I didn’t look into. The machine itself is a VM running with 8 cores and 16G RAM. One thing I notice is that during the update, that process looks to use a single cpu. Is that configurable?
I am now working with hosts on 2.2.0p47 working towards the upgrade of the latest 2.3 release. I am seeing a similar issue where configuration updates cannot be pushed successfully from the master through the web interface once a remote site is updated. Should I start a new topic or continue in this thread?