Internal automation error: Your request timed out after 110 seconds

CMK version: 2.1.0p24
OS version:CentOS 7

Error message: Internal automation error: Your request timed out after 110 seconds. This issue may be related to a local configuration problem or a request which works with a too large number of objects. But if you think this issue is a bug, please send a crash report.

I haven’t included further logs yet, but I’m happy to eventually. I wanted to get into a discussion about best steps on one of my remote sites on distributed monitoring that is beginning to have major activation issues. The issue’s pretty simple. No matter what I do, it will always give me this error, and never properly activate within 110 seconds, meaning I cannot push distributed monitoring changes to the site.

It’s a big site, ~2000 hosts, ~176,000 services. That much I know. I’m already looking for ways to parse it down a little bit, either by introducing new hardware to split off some of the hosts to, or by maybe even making a second site on the same hardware to allow for some basic parallelization of functions such as activation on less hosts at one time (could take a second thread on beefy hardware them for things like activation).

But, my concern is that due to the issues I’m running up against, I can’t easily move a host at the master level from one monitoring site to another, and as a result, I’m worried about losing data and getting things out of sync.

So, guess it’s multiple questions here:

  1. What are the best things I can do to cut down activation time and give this thing a fighting chance of activating even on occasion? When I do local activations (cmk -O and cmk -R), the activation works, but of course, the local site think there are zero pending changes, the distributed monitoring is trying to push changes there.

  2. Would having multiple sites on the same hardware be a potentially good idea to fix this issue?

  3. If I am not able to fix the activation issue, would the best solution be to manually move performance data to the second site, and then delete/add hosts at a local level, and when I’ve done so to my satisfaction, then resynchronize to the master instance?

All ideas appreciated here. Thanks!

One added quirk. There are 2 “changes” which aren’t activating according to the master server. However, when I look at the local server, the requested changes are there.

The full error:

Internal automation error: Error running automation call restart: Your request timed out after 110 seconds. This issue may be related to a local configuration problem or a request which works with a too large number of objects. But if you think this issue is a bug, please send a crash report.
Traceback (most recent call last):
  File "/omd/sites/lsvmaster/lib/python3/cmk/gui/watolib/automations.py", line 103, in check_mk_local_automation_serialized
    completed_process = subprocess.run(
  File "/omd/sites/lsvmaster/lib/python3.9/subprocess.py", line 507, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/omd/sites/lsvmaster/lib/python3.9/subprocess.py", line 1134, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/omd/sites/lsvmaster/lib/python3.9/subprocess.py", line 1979, in _communicate
    ready = selector.select(timeout)
  File "/omd/sites/lsvmaster/lib/python3.9/selectors.py", line 416, in select
    fd_event_list = self._selector.poll(timeout)
  File "/omd/sites/lsvmaster/lib/python3/cmk/gui/utils/timeout_manager.py", line 37, in handle_request_timeout
    raise RequestTimeout(
cmk.gui.exceptions.RequestTimeout: Your request timed out after 110 seconds. This issue may be related to a local configuration problem or a request which works with a too large number of objects. But if you think this issue is a bug, please send a crash report.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/omd/sites/lsvmaster/lib/python3/cmk/gui/wato/pages/automation.py", line 266, in _execute_automation_command
    response.set_data(repr(automation.execute(automation.get_request())))
  File "/omd/sites/lsvmaster/lib/python3/cmk/gui/wato/pages/activate_changes.py", line 761, in execute
    return activate_changes.execute_activate_changes(
  File "/omd/sites/lsvmaster/lib/python3/cmk/gui/watolib/activate_changes.py", line 2091, in execute_activate_changes
    warnings = get_config_domain(domain_request.name)().activate(domain_request.settings)
  File "/omd/sites/lsvmaster/lib/python3/cmk/gui/watolib/config_domains.py", line 74, in activate
    return {"restart": restart, "reload": reload,}[
  File "/omd/sites/lsvmaster/lib/python3/cmk/gui/watolib/check_mk_automations.py", line 277, in restart
    _automation_serialized("restart", args=hosts_to_update),
  File "/omd/sites/lsvmaster/lib/python3/cmk/gui/watolib/check_mk_automations.py", line 67, in _automation_serialized
    cmdline, serialized_result = check_mk_local_automation_serialized(
  File "/omd/sites/lsvmaster/lib/python3/cmk/gui/watolib/automations.py", line 113, in check_mk_local_automation_serialized
    raise local_automation_failure(command=command, cmdline=cmd, exc=e)
cmk.utils.exceptions.MKGeneralException: Error running automation call restart: Your request timed out after 110 seconds. This issue may be related to a local configuration problem or a request which works with a too large number of objects. But if you think this issue is a bug, please send a crash report.

This is quite common on larger sites, and with multiple sites in distributed monitoring. We have some sites with over 5000 hosts.
Checkmk was not built for this scenario.

The activation is initially performed over http(s) by contacting the remote site, and as such you will use the apache reverse proxy and then the sites own apache config.

These values (110) are time out values that are hard-coded in Checkmk and Apache. They can be changed of course.

Just search for checkmk 110 timeout and you will get a lot of hints how to change this!

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.