Activate pending changes stuck after rules.mk file edited, if restart configuration corrupted

Problem: Checkmk 1.6.0p18 stuck when activating changes. Cannot discard changes, cmk -O and cmk -R don’t work (“Other restart currently in progress. Aborting.”). If I stop omd, I cannot boot it again and have to restore from backup (has happened before).

So, with the problem above, I will describe the scenario in which it happens, because it has happened before, to see if someone can help understand why this is happening.

Scenario: I have a lot of active checks configured, and sometimes we have to add hundreds more active checks, so I edit rules.mk file directly, and “force a reload” using the following method:

  1. edit rules.mk file, making sure the file formatting is kept
  2. I go to “Host Services and Parameters”, then “Active checks”, then “Check http service” to see if any rules.mk parsing errors show up. If there are none, I should be able to see the new http checks that I have added to rules.mk, but they are not yet “loaded”. If at this moment I do omd stop or reload, these new http checks don’t show, thats why I do step 3:
  3. To “force” the reload of the http checks, I go to “hosts”, click on any host, then just click “save and finish”.
  4. I go to the changes menu, and apply the changes.

Most of the time the changes apply successfully, with no problem, and checkmk is reloaded with the new http checks configured, but a few times it has happened what I described:

  • Changes don’t finish applying
  • Checkmk is still working, responsive, doing everything and working fine, but without the new checks.
  • No subsequent changes can be applied because they go in a “queue”, and the first applying never finishes
  • Cannot discard changes
  • If I stop and start omd, configuration crashes (log below).

I don’t know where to look for something that might be happening to crash it. It’s not a rules.mk problem, because after restoring from backup, I will do the same edit, and it will work fine.

OMD[supcdteste]:~/var/check_mk/wato/log$ omd start

Starting mkeventd…OK
Starting liveproxyd…OK
Starting mknotifyd…OK
Starting rrdcached…OK
Starting cmc…Failed (Config /omd/sites/supcdteste/var/check_mk/core/config missing, run “cmk -U” and try again)
Starting apache…OK
Starting dcd…OK
Initializing Crontab…OK
OMD[supcdteste]:~/var/check_mk/wato/log$ cmk -U
Generating configuration for core (type cmc)…Process Process-2:
Traceback (most recent call last):
File “/omd/sites/supcdteste/lib/python2.7/multiprocessing/process.py”, line 267, in _bootstrap
self.run()
File “/omd/sites/supcdteste/lib/python2.7/multiprocessing/process.py”, line 114, in run
self._target(*self._args, **self._kwargs)
File “/omd/sites/supcdteste/lib/python/cmk_base/cee/core_cmc.py”, line 635, in wrapper
return func(*args, **kwargs)
File “/omd/sites/supcdteste/lib/python/cmk_base/cee/core_cmc.py”, line 647, in get_host_configurations
result = [host_class(hostname).get_serialized_data() for hostname in hostlist]
File “/omd/sites/supcdteste/lib/python/cmk_base/cee/core_cmc.py”, line 931, in init
host_macros={})
File “/omd/sites/supcdteste/lib/python/cmk_base/cee/core_cmc.py”, line 711, in init
self._compute()
File “/omd/sites/supcdteste/lib/python/cmk_base/cee/core_cmc.py”, line 949, in _compute
self._cmc_services()
File “/omd/sites/supcdteste/lib/python/cmk_base/cee/core_cmc.py”, line 1101, in _cmc_services
active_check_name, params)
File “/omd/sites/supcdteste/lib/python/cmk_base/config.py”, line 905, in active_check_service_description
description = act_info"service_description"
File “/omd/sites/supcd/share/check_mk/checks/check_http”, line 266, in check_http_description
description = params[“name”]
Exception: ‘name’
Original Traceback (most recent call last):
File “/omd/sites/supcdteste/lib/python/cmk_base/cee/core_cmc.py”, line 635, in wrapper
return func(*args, **kwargs)
File “/omd/sites/supcdteste/lib/python/cmk_base/cee/core_cmc.py”, line 647, in get_host_configurations
result = [host_class(hostname).get_serialized_data() for hostname in hostlist]
File “/omd/sites/supcdteste/lib/python/cmk_base/cee/core_cmc.py”, line 931, in init
host_macros={})
File “/omd/sites/supcdteste/lib/python/cmk_base/cee/core_cmc.py”, line 711, in init
self._compute()
File “/omd/sites/supcdteste/lib/python/cmk_base/cee/core_cmc.py”, line 949, in _compute
self._cmc_services()
File “/omd/sites/supcdteste/lib/python/cmk_base/cee/core_cmc.py”, line 1101, in _cmc_services
active_check_name, params)
File “/omd/sites/supcdteste/lib/python/cmk_base/config.py”, line 905, in active_check_service_description
description = act_info"service_description"
File “/omd/sites/supcd/share/check_mk/checks/check_http”, line 266, in check_http_description
description = params[“name”]
KeyError: ‘name’

Ok what you can do is “cmk --debug -vvU” or replace the U with C for only check compilation, but you already see that it has a problem with one of the service description of one http check.

If you really need to change the mk files manually i would prefer to do a “cmk --debug -vvC” after the changes on the command line.

What you also can do is active the git feature inside WATO to get reversible changes on the command line.

4 Likes

Thanks a lot. This helped. When I restored rules.mk I was able to pass “cmk --debug -vvU” and even “cmk -U”, so there is actually a problem in the rules.mk file. This is strange, because usually when I edit it, I can see if there are any parsing errors on the GUI, as I said on the post, so this is probably something to do with a strange character in the service description field.

Also, doing “cmk --debug --vvU” before trying to apply the changes was good, because it pointed the error.

What is the most adequate command to “reload” the configuration, via command line, when I have to do these edits?

I now discovered that cmk -O (or --debug --vvO) helped with the reload from the configuration. Thanks.