Update to 2.4 fails with ui-job-scheduler failures while change activation

CMK version: 2.3p31 → 2.4
OS version: Rocky9 current

Error message:
Could not connect to ui-job-scheduler. Possibly the service ui-job-scheduler is not started, please make sure that all site services are all started. Tried to connect via /omd/sites/master/tmp/run/ui-job-scheduler.sock. Reported error was: HTTPConnectionPool(host=‘local-ui-job-scheduler’, port=80): Max retries exceeded with url: /start (Caused by NewConnectionError(‘: Failed to establish a new connection: [Errno -2] Name or service not known’)).

Can you share the following?

  • Output of “omd status”
  • Any Proxy config defined in the $OMD_ROOT/.bashrc or .bash_profile. You can also run the following as site user:
    env|grep -i proxy

Doing ‘status’ on site master:
agent-receiver: running
mkeventd: running
liveproxyd: running
mknotifyd: running
rrdcached: running
redis: running
automation-helper: running
ui-job-scheduler: running
cmc: running
apache: running
dcd: running
crontab: running

Overall state: running

all fine.

and there is nothing configured.

We have a second instance which run fine for the update.
Not sure if there is something with the apache configuration which is old and taken over years :wink:

Yeah it could be the the apache config as well. Looking into the following log file may help:

  • $OMD_ROOT/var/log/web.log
  • $OMD_ROOT/var/log/apache/error_log
  • $OMD_ROOT/~/var/log/ui-job-scheduler/ui-job-scheduler.log and also the error.log
  • Finally the System Apache log file

I get this as a user in “User → received messages”

Failed to execute the test ACTestUnknownCheckParameterRuleSets: Traceback (most recent call last):
File “/omd/sites/master/local/lib/python3/urllib3/connection.py”, line 174, in _new_conn
conn = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/omd/sites/master/local/lib/python3/urllib3/util/connection.py”, line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/omd/sites/master/lib/python3.12/socket.py”, line 978, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/omd/sites/master/local/lib/python3/urllib3/connectionpool.py”, line 714, in urlopen
httplib_response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File “/omd/sites/master/local/lib/python3/urllib3/connectionpool.py”, line 415, in _make_request
conn.request(method, url, **httplib_request_kw)
File “/omd/sites/master/local/lib/python3/urllib3/connection.py”, line 244, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File “/omd/sites/master/lib/python3.12/http/client.py”, line 1338, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/omd/sites/master/lib/python3.12/http/client.py”, line 1384, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/omd/sites/master/lib/python3.12/http/client.py”, line 1333, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/omd/sites/master/lib/python3.12/http/client.py”, line 1093, in _send_output
self.send(msg)
File “/omd/sites/master/lib/python3.12/http/client.py”, line 1037, in send
self.connect()
File “/omd/sites/master/local/lib/python3/urllib3/connection.py”, line 205, in connect
conn = self._new_conn()
^^^^^^^^^^^^^^^^
File “/omd/sites/master/local/lib/python3/urllib3/connection.py”, line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f4e305fb440>: Failed to establish a new connection: [Errno -2] Name or service not known

and the web.log:

025-05-08 13:34:50,418 [40] [cmk.web 97814] error calling AJAX page handler
Traceback (most recent call last):
File “/omd/sites/master/lib/python3/cmk/gui/pages.py”, line 102, in handle_page
action_response = self.page()
^^^^^^^^^^^
File “/omd/sites/master/lib/python3/cmk/gui/wato/pages/activate_changes.py”, line 1013, in page
activation_id = manager.start(
^^^^^^^^^^^^^^
File “/omd/sites/master/lib/python3.12/contextlib.py”, line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File “/omd/sites/master/lib/python3/cmk/gui/watolib/activate_changes.py”, line 1480, in start
self._start_activation()
File “/omd/sites/master/lib/python3.12/contextlib.py”, line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File “/omd/sites/master/lib/python3/cmk/gui/watolib/activate_changes.py”, line 1784, in _start_activation
raise result.error
cmk.gui.job_scheduler_client.StartupError: Could not connect to ui-job-scheduler. Possibly the service ui-job-scheduler is not started, please make sure that all site services are all started. Tried to connect via /omd/sites/master/tmp/run/ui-job-scheduler.sock. Reported error was: HTTPConnectionPool(host=‘local-ui-job-scheduler’, port=80): Max retries exceeded with url: /start (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x7f6c2c40a330>: Failed to establish a new connection: [Errno -2] Name or service not known’)).
2025-05-08 13:34:50,756 [40] [cmk.web 97814] Unhandled exception (Crash ID: 73e07bf8-2c00-11f0-9dc5-005056941c49)
Traceback (most recent call last):
File “/omd/sites/master/lib/python3/cmk/gui/pages.py”, line 102, in handle_page
action_response = self.page()
^^^^^^^^^^^
File “/omd/sites/master/lib/python3/cmk/gui/wato/pages/activate_changes.py”, line 1013, in page
activation_id = manager.start(
^^^^^^^^^^^^^^
File “/omd/sites/master/lib/python3.12/contextlib.py”, line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File “/omd/sites/master/lib/python3/cmk/gui/watolib/activate_changes.py”, line 1480, in start
self._start_activation()
File “/omd/sites/master/lib/python3.12/contextlib.py”, line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File “/omd/sites/master/lib/python3/cmk/gui/watolib/activate_changes.py”, line 1784, in _start_activation
raise result.error
cmk.gui.job_scheduler_client.StartupError: Could not connect to ui-job-scheduler. Possibly the service ui-job-scheduler is not started, please make sure that all site services are all started. Tried to connect via /omd/sites/master/tmp/run/ui-job-scheduler.sock. Reported error was: HTTPConnectionPool(host=‘local-ui-job-scheduler’, port=80): Max retries exceeded with url: /start (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x7f6c2c40a330>: Failed to establish a new connection: [Errno -2] Name or service not known’)).

Was the $OMD_ROOT/var/log/update.log error free ?
Also, no proxy config defined for this site?

It’s solved

It was a very special and tricky combination of earlier libraries and mkp’s.
In the end there was an old urrlib3 version called wich was in the path of the instance.
so the call to the ui-job-scheduler was not possible.

we had to rollback and remove all the old packages and libraries by ‘pip’ etc.

now it’s fixed.

Thanks for your help

1 Like

same error here. :confused:

could you explain you solution for someone who expects an upgrade to work out-of-the-box?

edit: removing all the python packages from ~/local/ solved the problem. Those were installed for redfish but not needed in 2.4 anymore. Here it is:

pip3 install ‘urllib3<2’ redfish

You did it yourself :slight_smile::

removing all the python packages from ~/local/ solved the problem.

Or more generically: Follow the major upgrade article of the official user guide, and you should be all set.

I know. And unlike some, i did share the answer to help others. Hence the

1 Like

Hello there!

I ran into the same problem: the upgrade seemingly works fine at first, but after some time, the ui-job-scheduler service stops and the “Could not connect to ui-job-scheduler.” error shows.
I tried the solution proposed here and uninstalled all python packages i installed for some custom special agents, before updating to 2.4 following the update guide. I did not uninstall packages that listed a version under “/opt/omd/versions/…” in the uninstall prompt, like pip or charset_normalizer.
After the update completes, i reinstalled the missing packages (pyvmomi and vsphere-automation-sdk-python). This seemed to work for our test-instance, which ran for a few days without the error. But when i also upgraded the prod-instance, the problem came back (for both test and prod).
Both instances are running as seperate docker containers (raw edition).

Also both times i attempted to downgrade the version after the failed update, it did not work, the container got stuck in a restart loop (without showing any errors). Both times i ultimately had to restore from backup.

Any further ideas why this could still happen?

I also wondered, why the system python is used for everything instead of using venvs? Like using a seperate python venv for special agent execution and one for system internal stuff like the job-ui-scheduler? Shouldnt that help against problems like this?