Nagios service stopping post upgrade to 2.4

CMK version: 2.4
OS version: rhel8

Error message:

No error message. But nagios is in a stopped state

docker container exec -it -u cmk monitoring omd status
agent-receiver: running
mkeventd: running
rrdcached: running
redis: running
npcd: running
automation-helper: running
ui-job-scheduler: running
nagios: stopped
apache: running
crontab: running

Overall state: partially running

running omd start will start the service and it will run for ~60 seconds then stop again. There is no error messages in nagios.log or anywhere else we can locate.

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

cmk --debug -vvn hostname
value store: loading from disk
Checkmk version 2.4.0
Failed to lookup IPv4 address of hostname via DNS: [Errno -2] Name or service not known(!!)

This service was upgraded from 2.3.0 latest to 2.4.0 latest and has been failing since.

setting the debug logging in etc/nagios/nagios.d/logging.cfg to a level of -1 and verbose produces no errors in the debug.log to troubleshoot.

1 Like

Some notes from the 2.4 upgrade

OMD[cmk]:~$ find ~/local/lib/python3/ -type d -name '*.*-info'
OMD[cmk]:~$ mkp list
Name Version Title Author Req. Version Until Version Files State
---- ------- ----- ------ ------------ ------------- ----- -----
OMD[cmk]:~$ 

No custom module or code.

I’ve seemed to get nagios stable by disabling a bunch of rules and notifications and monitoring of rabbitmq

Next is crash errors on service discovery of agents.

Error running automation call service-discovery-preview (exit code 2), error:
Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

Looks to be an issue with the host data after migration. There were 4 hosts of 12 that were crashing with an agent crashlog. Simply deleting the host and recreating it allowed the host and service discovery to work again.

I prior tested the host with cmk -vv --debug -I hostname and had no errors prior to deletion. After the gui and the console is working fine again.

Moved over for tracking into beta category and asked @chauhan_sudhir internally to replicate

Thanks for reporting the problem. I have some questions:

This service was upgraded from 2.3.0 latest to 2.4.0 latest and has been failing since.

Which version of 2.3 in particular you had before ?

Can you also check if any custom files under ~/local via the below command?
find -L ~/local

I have an updated Distributed setup from 2.3.0p26.cre to 2.4.0.cre and do not see any problems with the Nagios core being stopped.

Please also check the following:

  • Any OOM errors in the system syslog ?
  • Any errors in $OMD_ROOT/var/log/ui-job-scheduler/ui-job-scheduler.log ?
  • Any errors in $OMD_ROOT/var/log/nagios.log ?

I’ve seemed to get nagios stable by disabling a bunch of rules and notifications and monitoring of rabbitmq

  • Which rules in particular were disabled ?
  • By rabbitmq monitoring, do you mean the OMD checks or Process monitoring ?
1 Like

Humm the process is stopping again this am and is repeatably stopping again after no changes overnight.

In find -L ~/local

/omd/sites/cmk/local/
/omd/sites/cmk/local/bin
/omd/sites/cmk/local/share
/omd/sites/cmk/local/share/nagios
/omd/sites/cmk/local/share/nagios/htdocs
/omd/sites/cmk/local/share/nagios/htdocs/theme
/omd/sites/cmk/local/share/nagios/htdocs/theme/images
/omd/sites/cmk/local/share/nagios/htdocs/theme/stylesheets
/omd/sites/cmk/local/share/check_mk
/omd/sites/cmk/local/share/check_mk/reporting
/omd/sites/cmk/local/share/check_mk/reporting/images
/omd/sites/cmk/local/share/check_mk/mibs
/omd/sites/cmk/local/share/check_mk/checkman
/omd/sites/cmk/local/share/check_mk/pnp-templates
/omd/sites/cmk/local/share/check_mk/locale
/omd/sites/cmk/local/share/check_mk/alert_handlers
/omd/sites/cmk/local/share/check_mk/web
/omd/sites/cmk/local/share/check_mk/web/htdocs
/omd/sites/cmk/local/share/check_mk/web/htdocs/images
/omd/sites/cmk/local/share/check_mk/web/htdocs/themes
/omd/sites/cmk/local/share/check_mk/web/plugins
/omd/sites/cmk/local/share/check_mk/web/plugins/wato
/omd/sites/cmk/local/share/check_mk/web/plugins/perfometer
/omd/sites/cmk/local/share/check_mk/web/plugins/dashboard
/omd/sites/cmk/local/share/check_mk/web/plugins/visuals
/omd/sites/cmk/local/share/check_mk/web/plugins/pages
/omd/sites/cmk/local/share/check_mk/web/plugins/config
/omd/sites/cmk/local/share/check_mk/web/plugins/views
/omd/sites/cmk/local/share/check_mk/web/plugins/metrics
/omd/sites/cmk/local/share/check_mk/web/plugins/sidebar
/omd/sites/cmk/local/share/check_mk/web/plugins/icons
/omd/sites/cmk/local/share/check_mk/pnp-rraconf
/omd/sites/cmk/local/share/check_mk/inventory
/omd/sites/cmk/local/share/check_mk/notifications
/omd/sites/cmk/local/share/check_mk/checks
/omd/sites/cmk/local/share/check_mk/agents
/omd/sites/cmk/local/share/check_mk/agents/linux
/omd/sites/cmk/local/share/check_mk/agents/linux/alert_handlers
/omd/sites/cmk/local/share/check_mk/agents/special
/omd/sites/cmk/local/share/check_mk/agents/bakery
/omd/sites/cmk/local/share/check_mk/agents/plugins
/omd/sites/cmk/local/share/check_mk/enabled_packages
/omd/sites/cmk/local/share/nagvis
/omd/sites/cmk/local/share/nagvis/htdocs
/omd/sites/cmk/local/share/nagvis/htdocs/userfiles
/omd/sites/cmk/local/share/nagvis/htdocs/userfiles/styles
/omd/sites/cmk/local/share/nagvis/htdocs/userfiles/scripts
/omd/sites/cmk/local/share/nagvis/htdocs/userfiles/images
/omd/sites/cmk/local/share/nagvis/htdocs/userfiles/images/iconsets
/omd/sites/cmk/local/share/nagvis/htdocs/userfiles/images/shapes
/omd/sites/cmk/local/share/nagvis/htdocs/userfiles/images/maps
/omd/sites/cmk/local/share/nagvis/htdocs/userfiles/gadgets
/omd/sites/cmk/local/share/nagvis/htdocs/userfiles/templates
/omd/sites/cmk/local/share/nagvis/htdocs/server
/omd/sites/cmk/local/share/nagvis/htdocs/server/core
/omd/sites/cmk/local/share/nagvis/htdocs/server/core/classes
/omd/sites/cmk/local/share/nagvis/htdocs/server/core/classes/objects
/omd/sites/cmk/local/share/snmp
/omd/sites/cmk/local/share/snmp/mibs
/omd/sites/cmk/local/share/doc
/omd/sites/cmk/local/share/doc/check_mk
/omd/sites/cmk/local/share/diskspace
/omd/sites/cmk/local/lib
/omd/sites/cmk/local/lib/check_mk
/omd/sites/cmk/local/lib/check_mk/special_agents
/omd/sites/cmk/local/lib/check_mk/base
/omd/sites/cmk/local/lib/check_mk/base/cee
/omd/sites/cmk/local/lib/check_mk/base/cee/plugins
/omd/sites/cmk/local/lib/check_mk/base/cee/plugins/bakery
/omd/sites/cmk/local/lib/check_mk/base/cee/plugins/bakery/__pycache__
/omd/sites/cmk/local/lib/check_mk/base/cee/plugins/bakery/__pycache__/yum.cpython-38.pyc
/omd/sites/cmk/local/lib/check_mk/base/cee/plugins/bakery/__pycache__/yum.cpython-39.pyc
/omd/sites/cmk/local/lib/check_mk/base/plugins
/omd/sites/cmk/local/lib/check_mk/base/plugins/agent_based
/omd/sites/cmk/local/lib/check_mk/gui
/omd/sites/cmk/local/lib/check_mk/gui/plugins
/omd/sites/cmk/local/lib/check_mk/gui/plugins/dashboard
/omd/sites/cmk/local/lib/check_mk/gui/plugins/views
/omd/sites/cmk/local/lib/check_mk/plugins
/omd/sites/cmk/local/lib/nagios
/omd/sites/cmk/local/lib/nagios/plugins
/omd/sites/cmk/local/lib/python3
/omd/sites/cmk/local/lib/python3/cmk
/omd/sites/cmk/local/lib/python3/cmk/special_agents
/omd/sites/cmk/local/lib/python3/cmk/base
/omd/sites/cmk/local/lib/python3/cmk/base/cee
/omd/sites/cmk/local/lib/python3/cmk/base/cee/plugins
/omd/sites/cmk/local/lib/python3/cmk/base/cee/plugins/bakery
/omd/sites/cmk/local/lib/python3/cmk/base/cee/plugins/bakery/__pycache__
/omd/sites/cmk/local/lib/python3/cmk/base/cee/plugins/bakery/__pycache__/yum.cpython-38.pyc
/omd/sites/cmk/local/lib/python3/cmk/base/cee/plugins/bakery/__pycache__/yum.cpython-39.pyc
/omd/sites/cmk/local/lib/python3/cmk/base/plugins
/omd/sites/cmk/local/lib/python3/cmk/base/plugins/agent_based
/omd/sites/cmk/local/lib/python3/cmk/gui
/omd/sites/cmk/local/lib/python3/cmk/gui/plugins
/omd/sites/cmk/local/lib/python3/cmk/gui/plugins/dashboard
/omd/sites/cmk/local/lib/python3/cmk/gui/plugins/views
/omd/sites/cmk/local/lib/python3/cmk/plugins
/omd/sites/cmk/local/lib/python3/cmk_addons
/omd/sites/cmk/local/lib/python3/cmk_addons/plugins
/omd/sites/cmk/local/lib/python3/six.py
/omd/sites/cmk/local/lib/python3/six-1.17.0.dist-info
/omd/sites/cmk/local/lib/python3/six-1.17.0.dist-info/LICENSE
/omd/sites/cmk/local/lib/python3/six-1.17.0.dist-info/METADATA
/omd/sites/cmk/local/lib/python3/six-1.17.0.dist-info/WHEEL
/omd/sites/cmk/local/lib/python3/six-1.17.0.dist-info/top_level.txt
/omd/sites/cmk/local/lib/python3/six-1.17.0.dist-info/RECORD
/omd/sites/cmk/local/lib/python3/six-1.17.0.dist-info/INSTALLER
/omd/sites/cmk/local/lib/python3/__pycache__
/omd/sites/cmk/local/lib/python3/__pycache__/six.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa
/omd/sites/cmk/local/lib/python3/ecdsa/__init__.py
/omd/sites/cmk/local/lib/python3/ecdsa/_compat.py
/omd/sites/cmk/local/lib/python3/ecdsa/_rwlock.py
/omd/sites/cmk/local/lib/python3/ecdsa/_sha3.py
/omd/sites/cmk/local/lib/python3/ecdsa/_version.py
/omd/sites/cmk/local/lib/python3/ecdsa/curves.py
/omd/sites/cmk/local/lib/python3/ecdsa/der.py
/omd/sites/cmk/local/lib/python3/ecdsa/ecdh.py
/omd/sites/cmk/local/lib/python3/ecdsa/ecdsa.py
/omd/sites/cmk/local/lib/python3/ecdsa/eddsa.py
/omd/sites/cmk/local/lib/python3/ecdsa/ellipticcurve.py
/omd/sites/cmk/local/lib/python3/ecdsa/errors.py
/omd/sites/cmk/local/lib/python3/ecdsa/keys.py
/omd/sites/cmk/local/lib/python3/ecdsa/numbertheory.py
/omd/sites/cmk/local/lib/python3/ecdsa/rfc6979.py
/omd/sites/cmk/local/lib/python3/ecdsa/ssh.py
/omd/sites/cmk/local/lib/python3/ecdsa/test_curves.py
/omd/sites/cmk/local/lib/python3/ecdsa/test_der.py
/omd/sites/cmk/local/lib/python3/ecdsa/test_ecdh.py
/omd/sites/cmk/local/lib/python3/ecdsa/test_ecdsa.py
/omd/sites/cmk/local/lib/python3/ecdsa/test_eddsa.py
/omd/sites/cmk/local/lib/python3/ecdsa/test_ellipticcurve.py
/omd/sites/cmk/local/lib/python3/ecdsa/test_jacobi.py
/omd/sites/cmk/local/lib/python3/ecdsa/test_keys.py
/omd/sites/cmk/local/lib/python3/ecdsa/test_malformed_sigs.py
/omd/sites/cmk/local/lib/python3/ecdsa/test_numbertheory.py
/omd/sites/cmk/local/lib/python3/ecdsa/test_pyecdsa.py
/omd/sites/cmk/local/lib/python3/ecdsa/test_rw_lock.py
/omd/sites/cmk/local/lib/python3/ecdsa/test_sha3.py
/omd/sites/cmk/local/lib/python3/ecdsa/util.py
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/__init__.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/_compat.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/_rwlock.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/_sha3.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/_version.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/curves.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/der.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/ecdh.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/ecdsa.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/eddsa.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/ellipticcurve.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/errors.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/keys.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/numbertheory.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/rfc6979.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/ssh.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/test_curves.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/test_der.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/test_ecdh.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/test_ecdsa.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/test_eddsa.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/test_ellipticcurve.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/test_jacobi.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/test_keys.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/test_malformed_sigs.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/test_numbertheory.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/test_pyecdsa.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/test_rw_lock.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/test_sha3.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa/__pycache__/util.cpython-312.pyc
/omd/sites/cmk/local/lib/python3/ecdsa-0.19.1.dist-info
/omd/sites/cmk/local/lib/python3/ecdsa-0.19.1.dist-info/LICENSE
/omd/sites/cmk/local/lib/python3/ecdsa-0.19.1.dist-info/METADATA
/omd/sites/cmk/local/lib/python3/ecdsa-0.19.1.dist-info/WHEEL
/omd/sites/cmk/local/lib/python3/ecdsa-0.19.1.dist-info/top_level.txt
/omd/sites/cmk/local/lib/python3/ecdsa-0.19.1.dist-info/RECORD
/omd/sites/cmk/local/lib/python3/ecdsa-0.19.1.dist-info/INSTALLER
/omd/sites/cmk/local/lib/python3/ecdsa-0.19.1.dist-info/REQUESTED
/omd/sites/cmk/local/lib/apache
/omd/sites/cmk/local/lib/python

The version was 2.3.0p31 → 2.4.0. Basically the versions are set to follow tags 2.3.0-latest with watchtower restarting anytime there’s been a new tag released. We move to 2.4.0-latest yesterday as a part of the update. At one point we tested cloud then downgraded to raw.

Zero errors in syslog (container and host)
In ui-job-scheduler.log nothing just normal events
No errors in nagios.log just a bunch of INITIAL SERVICE STATE.

Rules I disabled were related to ignoring select filesystems on hosts.
rabbitmq monitoring was the integration of rabbitmq rules to a host “Request data from a RabbitMQ instance”.

Looks like some local files are still there. Did you used Yum MKp at some point and then removed it because the update complained about it ?

At one point we tested cloud then downgraded to raw.

Was the CCE downgraded to CRE as per this
?

rabbitmq monitoring was the integration of rabbitmq rules to a host “Request data from a RabbitMQ instance”.

You mean this one ?

I tried creating a fresh 2.3.0p31 CRE site with those 2 python packages and a rabbitmq rule and updating to 2.4.0.cre works fine. Nagios Core still runs after the update.

That’s correct, it broke I think 2.2 → 2.3 at one point so we removed it.

Yes we did this

That’s correct. Note that today the failures(nagios stopping) are still ongoing, it seems to be happening every 30-60seconds.

Can you share the $OMD_ROOT/var/log/update.log and the output of “cmk -U -vvv” as site user please ?

took a while to sanitize data

Trying to acquire lock on /omd/sites/cmk/etc/check_mk/main.mk
Got lock on /omd/sites/cmk/etc/check_mk/main.mk
Generating configuration for core (type nagios)...
Trying to acquire lock on /omd/sites/cmk/var/check_mk/passwords_merged
Got lock on /omd/sites/cmk/var/check_mk/passwords_merged
Releasing lock on /omd/sites/cmk/var/check_mk/passwords_merged
Released lock on /omd/sites/cmk/var/check_mk/passwords_merged
Trying to acquire lock on /omd/sites/cmk/var/check_mk/core/helper_config/serial.mk
Got lock on /omd/sites/cmk/var/check_mk/core/helper_config/serial.mk
Releasing lock on /omd/sites/cmk/var/check_mk/core/helper_config/serial.mk
Released lock on /omd/sites/cmk/var/check_mk/core/helper_config/serial.mk
Trying to acquire lock on /omd/sites/cmk/var/check_mk/licensing/licensed_state
Got lock on /omd/sites/cmk/var/check_mk/licensing/licensed_state
Releasing lock on /omd/sites/cmk/var/check_mk/licensing/licensed_state
Released lock on /omd/sites/cmk/var/check_mk/licensing/licensed_state
0 piggyback files for 'api.host1'.
0 piggyback files for 'api.host2'.
Trying to acquire lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/notify/host_config/api.host1
Got lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/notify/host_config/api.host1
Releasing lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/notify/host_config/api.host1
Released lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/notify/host_config/api.host1
Trying to acquire lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/notify/host_config/api.host2
Got lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/notify/host_config/api.host2
Releasing lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/notify/host_config/api.host2
Released lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/notify/host_config/api.host2
(snipped out some hosts)
Trying to acquire lock on /omd/sites/cmk/etc/nagios/conf.d/check_mk_objects.cfg
Got lock on /omd/sites/cmk/etc/nagios/conf.d/check_mk_objects.cfg
Releasing lock on /omd/sites/cmk/etc/nagios/conf.d/check_mk_objects.cfg
Released lock on /omd/sites/cmk/etc/nagios/conf.d/check_mk_objects.cfg
Trying to acquire lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/inventory_plugins_index.json
Got lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/inventory_plugins_index.json
Releasing lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/inventory_plugins_index.json
Released lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/inventory_plugins_index.json
Precompiling host checks...Creating precompiled host check config...
Precompiling host checks...
(snipped some hosts all no errors)
api.host1:Trying to acquire lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/host_checks/api.host1.py
Got lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/host_checks/api.host1.py
Releasing lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/host_checks/api.host1.py
Released lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/host_checks/api.host1.py
 ==> /omd/sites/cmk/var/check_mk/core/helper_config/907/host_checks/api.host1.
api.host2:Trying to acquire lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/host_checks/api.host2.py
Got lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/host_checks/api.host2.py
Releasing lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/host_checks/api.host2.py
Released lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/host_checks/api.host2.py
 ==> /omd/sites/cmk/var/check_mk/core/helper_config/907/host_checks/api.host2
OK
Running '/omd/sites/cmk/bin/nagios -vp /omd/sites/cmk/tmp/nagios/nagios.cfg'
Validating Nagios configuration...OK
Trying to acquire lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/stored_passwords
Got lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/stored_passwords
Releasing lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/stored_passwords
Released lock on /omd/sites/cmk/var/check_mk/core/helper_config/907/stored_passwords
Releasing lock on /omd/sites/cmk/etc/check_mk/main.mk
Released lock on /omd/sites/cmk/etc/check_mk/main.mk
2025-04-29 09:35:30 - Updating site 'cmk' from version 2.3.0p30.cre to 2.3.0p31.cre...

 * Updated        etc/nagvis/conf.d/omd.ini.php
Temporary filesystem already mounted

-| ATTENTION
-|   Some steps may take a long time depending on your installation.
-|   Please be patient.
-| 
-| Cleanup precompiled host and folder files
-| Verifying Checkmk configuration...
-|  01/07 Legacy check plug-ins...
-|  02/07 Rulesets...
-|  03/07 UI extensions...
-|  04/07 Agent based plugins...
-|  05/07 Autochecks...
-|  06/07 Invalid hosts labels...
-|  07/07 Deprecated .mk configuration of plugins...
-| Done (success)
-| 

Completed verifying site configuration. Your site now has version 2.3.0p31.cre.
Executing update-pre-hooks script "01_mkp-disable-outdated"...OK
Executing update-pre-hooks script "02_cmk-update-config"...
-| ATTENTION
-|   Some steps may take a long time depending on your installation.
-|   Please be patient.
-| 
-| Cleanup precompiled host and folder files
-| Verifying Checkmk configuration...
-|  01/07 Legacy check plug-ins...
-|  02/07 Rulesets...
-|  03/07 UI extensions...
-|  04/07 Agent based plugins...
-|  05/07 Autochecks...
-|  06/07 Invalid hosts labels...
-|  07/07 Deprecated .mk configuration of plugins...
-| Done (success)
-| 
-| Updating Checkmk configuration...
-|  01/27 Remove invalid user profiles from disk...
-|  02/27 Create precompiled host and folder files...
-|  03/27 Validate user IDs...
-|  04/27 Convert WATO audit log to be newline separated...
-|  05/27 Update views...
-|  06/27 Update dashboards...
-|  07/27 User attributes...
-|  08/27 Global settings...
-|  09/27 Rulesets...
-|  10/27 Autochecks...
-|  11/27 Hosts and folders...
-|  12/27 Migrate CLI parent scan config...
-|  13/27 Cleanup version specific caches...
-|  14/27 Delete old dedicated agent receiver cert...
-|  15/27 Synchronize automationuser secrets...
-| Automation user 'bcg-msteams' is locked!
-|  16/27 Check for incompatible password hashes...
-|  Users with outdated, no longer supported password hashes have been found. These users will be unable to log in.
-| Please manually reset these users' passwords either in Setup > Users or on the commandline using the cmk-passwd command.
-| The following users are affected:
-| cnsadmin
-|  17/27 Remove unreadable prediction files...
-|  18/27 Update existing two factor...
-|  19/27 Update pagetypes...
-|  20/27 Split large audit logs...
-|  21/27 Event Console: Rewrite active config...
-|  22/27 Sanitize audit log...
-|  23/27 Remove invalid hosts labels...
-|  24/27 Remove persisted graph options...
-|  25/27 BI config...
-|  26/27 Reset deprecations scheduling...
-|  27/27 Update core config...
-| Generating configuration for core (type nagios)...
-| Precompiling host checks...OK
-| Done (success)
OK
Finished update.

2025-05-06 14:26:20 - Updating site 'cmk' from version 2.3.0p31.cre to 2.4.0.cre...

 * Updated        .profile
 * Installed dir  etc/jaeger
 * Installed dir  etc/rabbitmq
 * Installed link etc/bash_completion.d/bash_completion
 * Updated        etc/check_mk/apache.conf
 * Updated        etc/init.d/agent-receiver
 * Updated        etc/init.d/apache
 * Installed file etc/init.d/automation-helper
 * Updated        etc/init.d/crontab
 * Installed file etc/init.d/jaeger
 * Updated        etc/init.d/mkeventd
 * Updated        etc/init.d/nagios
 * Installed file etc/init.d/piggyback-hub
 * Updated        etc/init.d/pnp_gearman_worker
 * Installed file etc/init.d/rabbitmq
 * Updated        etc/init.d/redis
 * Updated        etc/init.d/rrdcached
 * Updated        etc/init.d/stunnel
 * Installed file etc/init.d/ui-job-scheduler
 * Installed file etc/jaeger/config.yaml
 * Installed file etc/logrotate.d/automation-helper
 * Installed file etc/logrotate.d/jaeger
 * Installed file etc/logrotate.d/piggyback-hub
 * Installed file etc/logrotate.d/rabbitmq
 * Installed file etc/logrotate.d/ui-job-scheduler
 * Updated        etc/mk-livestatus/nagios.cfg
 * Updated        etc/nagios/conf.d/check_mk_templates.cfg
 * Installed dir  etc/rabbitmq/advanced_conf.d
 * Installed dir  etc/rabbitmq/conf.d
 * Installed dir  etc/rabbitmq/definitions.d
 * Installed file etc/rabbitmq/enabled_plugins
 * Installed file etc/rabbitmq/advanced_conf.d/00-advanced.conf
 * Installed file etc/rabbitmq/conf.d/00-default.conf
 * Installed file etc/rabbitmq/conf.d/03-tracing.conf
 * Installed file etc/rabbitmq/definitions.d/00-default.json
 * Installed link etc/rc.d/08-jaeger
 * Installed link etc/rc.d/40-redis
 * Installed link etc/rc.d/55-automation-helper
 * Installed link etc/rc.d/60-ui-job-scheduler
 * Installed link etc/rc.d/85-rabbitmq
 * Installed link etc/rc.d/90-piggyback-hub
 * Installed dir  etc/ssl/certs
 * Installed dir  etc/ssl/private
 * Permissions    0750 -> 0640 etc/ssl/misc/CA.pl
 * Permissions    0750 -> 0640 etc/ssl/misc/tsget.pl
 * Updated        etc/stunnel/conf.d/01-livestatus.conf
 * Identical new  var/check_mk/discovered_host_labels
 * Permissions    0777 -> 0750 var/check_mk/discovered_host_labels
 * Identical new  var/check_mk/packages_local
 * Permissions    0755 -> 0750 var/check_mk/packages_local
 * Vanished       etc/rc.d/85-redis
 * Vanished       etc/pnp4nagios/config.php
 * Vanished       etc/nagios/cgi.cfg
 * Vanished       etc/nagios/config.inc.php
 * Vanished       etc/cron.d/cmk_multisite
 * Vanished       etc/apache/conf.d/omd.conf
 * Vanished       .modulebuildrc
Executing 'cmk-update-config --conflict install --dry-run'
-| ATTENTION
-|   Some steps may take a long time depending on your installation.
-|   Please be patient.
-| 
-| Cleanup precompiled host and folder files
-| Verifying Checkmk configuration...
-|  01/08 Legacy check plug-ins...
-|  02/08 Rulesets...
-|  03/08 UI extensions...
-|  04/08 Migrate Azure Databases...
-|  05/08 Agent based plugins...
-|  06/08 Autochecks...
-|  07/08 Invalid hosts labels...
-|  08/08 Deprecated .mk configuration of plugins...
-| Done (success)
-| 

Completed verifying site configuration. Your site now has version 2.4.0.cre.
Executing update-pre-hooks script "01_mkp-disable-outdated"...OK
Executing update-pre-hooks script "02_cmk-update-config"...
-| ATTENTION
-|   Some steps may take a long time depending on your installation.
-|   Please be patient.
-| 
-| Cleanup precompiled host and folder files
-| Verifying Checkmk configuration...
-|  01/08 Legacy check plug-ins...
-|  02/08 Rulesets...
-|  03/08 UI extensions...
-|  04/08 Migrate Azure Databases...
-|  05/08 Agent based plugins...
-|  06/08 Autochecks...
-|  07/08 Invalid hosts labels...
-|  08/08 Deprecated .mk configuration of plugins...
-| Done (success)
-| 
-| Updating Checkmk configuration...
-|  01/29 User attributes...
-|  02/29 Create precompiled host and folder files...
-|  03/29 Update dashboards...
-|  04/29 Global settings...
-|  05/29 Convert counter files...
-|  06/29 Migrate Azure Databases...
-|  07/29 Message broker port of site connections...
-|  08/29 Rulesets...
-|  09/29 Notification host tag conditions...
-|  10/29 Autochecks...
-|  11/29 Migrate CLI parent scan config...
-|  12/29 Process discovery for self monitoring...
-| Adding: Shipped rule to monitor sites rrdcached
-| Adding: Shipped rule to monitor sites rrd helper
-| Adding: Shipped rule to monitor sites redis-server
-| Adding: Shipped rule to monitor sites real-time helper
-| Adding: Shipped rule to monitor sites rabbitmq
-| Adding: Shipped rule to monitor sites piggyback hub
-| Adding: Shipped rule to monitor sites notify helper
-| Adding: Shipped rule to monitor sites notification spooler
-| Adding: Shipped rule to monitor sites livestatus proxy
-| Adding: Shipped rule to monitor sites jaeger
-| Adding: Shipped rule to monitor sites fetcher helpers
-| Adding: Shipped rule to monitor sites event console
-| Adding: Shipped rule to monitor sites dcd
-| Adding: Shipped rule to monitor sites cmc
-| Adding: Shipped rule to monitor sites checker helpers
-| Adding: Shipped rule to monitor sites automation helpers
-| Adding: Shipped rule to monitor sites apache
-| Adding: Shipped rule to monitor sites alert helper
-| Adding: Shipped rule to monitor sites agent receiver
-| Adding: Shipped rule to monitor sites active check helpers
-|  13/29 Update background jobs...
-|  14/29 Remove leftovers of user profile cleanup background job...
-|  15/29 Migrate notifications...
-|        Wrote notification configuration backup to
-|        /omd/sites/cmk/notifications_backup.mk.
-| 
-|        Please check if the notification pages in the GUI work as expected.
-|        In case of problems you can copy the backup files back to 
-|        /omd/sites/cmk/etc/check_mk/conf.d/wato/notifications.mk.
-|        If everything works as expected you can remove the backup.
-| 
-|  16/29 Cleanup version specific caches...
-|  17/29 Host attribute topics...
-|  18/29 Terminating all existing user sessions...
-|  19/29 Update LDAP connections...
-|  20/29 Remove unreadable prediction files...
-|  21/29 Topics...
-|  22/29 Event Console: Migrate history files to sqlite...
-|  23/29 Remove invalid hosts labels...
-|  24/29 Migrate etc/diskspace.conf...
-|  25/29 Reset deprecations scheduling...
-|  26/29 Ensure message broker certs are ready...
-|  27/29 Check for deprecated check_http plug-in rules...
-| WARNING: You have 136 rules using the ruleset Check HTTP service deprecated.
-| This ruleset will be deprecated along with the old HTTP monitoring plug-in in the next version(s) of Checkmk.
-| Rules must therefore be migrated to the new ruleset which is used by the httpv2 plugin.
-| Rule migration can be done manually or by calling cmk-migrate-http as site user. See cmk-migrate-http --help for more information on this helper script.
-| For additional information on the deprecation of the HTTP plug-in see the werk #17665.
-|  28/29 Validating configuration files...
-|  29/29 Update core config...
-| Generating configuration for core (type nagios)...
-| Precompiling host checks...OK
-| Done (success)
OK
Finished update.

Your update.log doesn’t say anything about the downgrade from CCE to CRE.
Anyways:

  1. Does the problem still happen ?
  2. When you start the core, does it stil get stopped ?
  3. Can you also share the $OMD_ROOT/var/log/web.log ?

The downgrade would have been a long time ago. Just wanted to mention that.

  1. Yes, its still happening
  2. Yes the nagios service keeps stopping. Note yesterday it was stopping every few mins in the am then it worked for hours with no issues only to resume stopping again overnight.
  3. Sure , this is the most recent crash
2025-05-07 15:37:21,514 [40] [cmk.web 37546] Unhandled exception (Crash ID: 4fbc966e-2b72-11f0-b1e9-06f90f2b9d11)
Traceback (most recent call last):
  File "/omd/sites/cmk/lib/python3.12/site-packages/cmk/livestatus_client/__init__.py", line 327, in query_row
    return result[0]
           ~~~~~~^^^
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/omd/sites/cmk/lib/python3/cmk/gui/pages.py", line 102, in handle_page
    action_response = self.page()
                      ^^^^^^^^^^^
  File "/omd/sites/cmk/lib/python3/cmk/gui/views/page_ajax_reschedule.py", line 30, in page
    return self._do_reschedule(api_request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/cmk/lib/python3/cmk/gui/views/page_ajax_reschedule.py", line 114, in _do_reschedule
    row = self._wait_for(site, host, what, wait_spec, now, add_filter)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/cmk/lib/python3/cmk/gui/views/page_ajax_reschedule.py", line 43, in _wait_for
    return sites.live().query_row(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/cmk/lib/python3.12/site-packages/cmk/livestatus_client/__init__.py", line 329, in query_row
    raise MKLivestatusNotFoundError(
cmk.livestatus_client.MKLivestatusNotFoundError: No matching entries found for query: GET services
WaitObject: ahost1;Check_MK
WaitCondition: last_check >= 1746643034
WaitTimeout: 10000
WaitTrigger: check
Columns: last_check state plugin_output
Filter: host_name = ahost1
Filter: service_description = Check_MK

This is one of the hosts that had crash events in the dashboard on check_mk agent status. We deleted this and 3 other hosts and recreated them which got rid of these errors.

Can you submit the crash report ?
Does this host “ahost1” still exist in monitoring ?

No we removed it and re-added it, and then service discovery worked. The crashes stopped after then.

Good Morning @UnderTheSea1,

I am happy to hear, that your Checkmk solution works again after re-adding the host.

If that solves your problem, would you be so kind to mark the topic solved?
image

Best regards and thank you
Hartmut

No I’m afraid that wasn’t the solution.

That was just the solution to the discovery crashes of a host. The overall nagios service kept stopping and continues to. We started over from scratch and rebuilt the monitoring on 2.4.0 while having the volume data in a separate local docker instance to reference.

We can close this thread as the problem is fixed rebuilding.

Good Morning @UnderTheSea1,

sorry to hear that you had to rebuild the monitoring.
But I am glad, that it is now working again. I marked your post a solution.

Sunny Greetings
Hartmut

Edit: post deleted and moved here:

as it’s a much bigger thread on the same thing.

2.5.0 is still in pre-alpha stage and only suitable for testing and debugging new features. It is absolutely not suited for productive use. In case it is the same bug, it will be eventually fixed, but production versions (read current stable 2.4.0 and old stable 2.3.0) have priority.

1 Like