Bug: activation often hangs since 2.3.0

CMK version: 2.3.0p1.cee
OS version: Ubuntu 22.04

Since updating to 2.3.0, the activation process often hangs on one or more sites. This is especially likely when all sites have updates to process, e.g. after updating an extension package. To note: we have more than 50 sites in total.

What happens is:

  • Activation starts
  • Progress for some gets stuck at “activating”, for others at “synchronizing”
  • Some sites get to “failed” with error messages such as [Error 9] bad file descriptor or [SSL: WRONG_VERSION_NUMBER]
  • After waiting for maybe 2 minutes an error message is shown at the top: Unknown activation process

Here’s a screenshot of all of those occurring simultaneously:

In such a case I have to reload the activation page & try activation again. It often takes three, sometimes even four tries until all sites are successfully activated.

Which sites get stuck in which state is not deterministic, neither is which one ends in a “failed” state, and if so, with which of the two aforementioned error messages.

The “failed” state with one of those error messages also happens seemingly randomly during activation jobs when only a handful of sites have to be updated. In that case I haven’t seen sites being stuck in “activating” or “synchronizing” yet.

None of this has happened prior to 2.3.0.

Hi,
We had the same problem that changes could not be activated. The problem for us was incorrect file permissions.

The file netstat.save is located in the web folder of the site. Check_mk no longer had access to this file as it suddenly belonged to the root user. A simple chown: netstat.save solved the problem.

In our case, the file was located under /omd/sites//var/check_mk/web

Thanks for the feedback. That’s interesting. Unfortunately it doesn’t apply to our situation. I’ve checked with the following shell snippet:

cd /opt/omd/sites
for site in * ; do
  find $site "!" -user $site
done

Which is basically “for each site find every item not owned by that site’s user”. It turned up no hits on any of our sites. Dang, that would have been an easy fix :grin:

Also get the same thing with on 6 sites. Only thing we do have thats non standard is a custom special agent.

Reactivating clears the problem…but it is odd!

This means someone modified the file as root, which changed the permissions.
If you interact with a Checkmk site, always become the site user (omd su $SITE).

I have the same problem since 2.3.0 but I get this error, has anyone an Idea where to start troubleshoot?
image