Activation fails all of a sudden in distributed setup

Hi forum
all of a sudden i have trouble with the activation of changes in my distributed setup.
On of my sites cannot activate the changes. The sync runs, and then after a while the infamous

RequestTimeout: Your request timed out after 110 seconds. This issue may be related to a local configuration problem or a request which works with a too large number of objects. But if you think this issue is a bug, please send a crash report.

with all its glory appears.
cmk --debug -vvR shows no errors, a config change in the apache has not been made.
Several reboots of the corresponding VM did not work.
By clicking on “Disregard changes!” i managed to reset all settings the users have made (theme and start url) accidentally, also the ldap<->contactgroup and role settings have been reset with this. :frowning:
This all happened during this day out of nowhere. All other sites sync as they should.
This site only holds ca 250 of our 22500 hosts so there is not many load also

Maybe someone can give me a hint what else i could do.
1.6p14 CEE on Ubuntu 18.04`

Full error:

Internal automation error: Failed to deploy configuration: &quot;Traceback (most recent call last):
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/watolib/sites.py&quot;, line 732, in execute
    self._save_site_globals_on_slave_site(request.tar_content)
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/watolib/sites.py&quot;, line 759, in _save_site_globals_on_slave_site
    multitar.extract_from_buffer(tarcontent, [(&quot;dir&quot;, &quot;sitespecific&quot;, tmp_dir)])
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/multitar.py&quot;, line 446, in extract_from_buffer
    extract(tarfile.open(None, &quot;r&quot;, stream), elements)
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/multitar.py&quot;, line 670, in extract
    (name, traceback.format_exc()))
MKGeneralException: Failed to extract subtar sitespecific: Traceback (most recent call last):
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/multitar.py&quot;, line 666, in extract
    subtar = tarfile.open(fileobj=subtarstream)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 1675, in open
    return func(name, &quot;r&quot;, fileobj, **kwargs)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 1778, in bz2open
    t = cls.taropen(name, mode, fileobj, **kwargs)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 1723, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 1587, in __init__
    self.firstmember = self.next()
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 2358, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 1251, in fromtarfile
    buf = tarfile.fileobj.read(BLOCKSIZE)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 676, in read
    raw = self.fileobj.read(self.blocksize)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 831, in read
    buf += self.fileobj.read(size - len(buf))
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 743, in read
    return self.readnormal(size)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 756, in readnormal
    self.fileobj.seek(self.offset + self.position)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/gzip.py&quot;, line 442, in seek
    self.read(1024)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/gzip.py&quot;, line 267, in read
    self._read(readsize)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/gzip.py&quot;, line 319, in _read
    self._add_read_data( uncompress )
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/gzip.py&quot;, line 335, in _add_read_data
    self.crc = zlib.crc32(data, self.crc) &amp; 0xffffffffL
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/htmllib.py&quot;, line 786, in handle_request_timeout
    &quot;issue is a bug, please send a crash report.&quot;) % duration)
RequestTimeout: Your request timed out after 110 seconds. This issue may be related to a local configuration problem or a request which works with a too large number of objects. But if you think this issue is a bug, please send a crash report.

&quot;. Please note that the site configuration has been synchronized partially.
Traceback (most recent call last):
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/wato/pages/automation.py&quot;, line 186, in _execute_automation_command
    html.write(repr(automation.execute(automation.get_request())))
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/watolib/sites.py&quot;, line 745, in execute
    &quot;partially.&quot;) % traceback.format_exc())
MKGeneralException: Failed to deploy configuration: &quot;Traceback (most recent call last):
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/watolib/sites.py&quot;, line 732, in execute
    self._save_site_globals_on_slave_site(request.tar_content)
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/watolib/sites.py&quot;, line 759, in _save_site_globals_on_slave_site
    multitar.extract_from_buffer(tarcontent, [(&quot;dir&quot;, &quot;sitespecific&quot;, tmp_dir)])
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/multitar.py&quot;, line 446, in extract_from_buffer
    extract(tarfile.open(None, &quot;r&quot;, stream), elements)
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/multitar.py&quot;, line 670, in extract
    (name, traceback.format_exc()))
MKGeneralException: Failed to extract subtar sitespecific: Traceback (most recent call last):
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/multitar.py&quot;, line 666, in extract
    subtar = tarfile.open(fileobj=subtarstream)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 1675, in open
    return func(name, &quot;r&quot;, fileobj, **kwargs)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 1778, in bz2open
    t = cls.taropen(name, mode, fileobj, **kwargs)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 1723, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 1587, in __init__
    self.firstmember = self.next()
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 2358, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 1251, in fromtarfile
    buf = tarfile.fileobj.read(BLOCKSIZE)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 676, in read
    raw = self.fileobj.read(self.blocksize)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 831, in read
    buf += self.fileobj.read(size - len(buf))
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 743, in read
    return self.readnormal(size)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/tarfile.py&quot;, line 756, in readnormal
    self.fileobj.seek(self.offset + self.position)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/gzip.py&quot;, line 442, in seek
    self.read(1024)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/gzip.py&quot;, line 267, in read
    self._read(readsize)
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/gzip.py&quot;, line 319, in _read
    self._add_read_data( uncompress )
  File &quot;/omd/sites/INFMON01_3/lib/python2.7/gzip.py&quot;, line 335, in _add_read_data
    self.crc = zlib.crc32(data, self.crc) &amp; 0xffffffffL
  File &quot;/omd/sites/INFMON01_3/lib/python/cmk/gui/htmllib.py&quot;, line 786, in handle_request_timeout
    &quot;issue is a bug, please send a crash report.&quot;) % duration)
RequestTimeout: Your request timed out after 110 seconds. This issue may be related to a local configuration problem or a request which works with a too large number of objects. But if you think this issue is a bug, please send a crash report.

&quot;. Please note that the site configuration has been synchronized partially.

This was on the slave running without problem or?
The only thing what i see inside the error message is that it cannot extract all the transferred tar files.
You can only look inside “~/var/check_mk/wato” if there are some defect files from the first failed sync.

1 Like

Yeah this was on the slave site which fails.
Could there be a link to this?: Bug version 1.6.0p14?

I don’t think so as there is no active livestatus-proxy communication involved.

Hi Andreas
thank you for your really fast resonses, but infoirtunately the problem is not solved.
Now our biggest slave site which holds 11k hosts and ~120k services is also affected on every 2nd to 3rd activation. But here i can solve the problem with an “omd restart”.
The master site and and the second biggest slave site are not affected.
I had a look in the directory you mentioned:
The slave sites look like this:

drwxrwxr-x 2 INFMON01_3 INFMON01_3 4096 Jul 30 07:26 auth/
-rw-rw---- 1 INFMON01_3 INFMON01_3 35 Jan 27 2020 automation_secret.mk
-rw-rw---- 1 INFMON01_3 INFMON01_3 8 Jan 27 2020 last_bake.mk
drwxrwxr-x 2 INFMON01_3 INFMON01_3 4096 Jul 30 05:24 log/
drwxrwx— 2 INFMON01_3 INFMON01_3 4096 Jul 22 05:44 php-api/
drwxrwxr-x 2 INFMON01_3 INFMON01_3 4096 Jan 27 2020 snapshots/
The master site looks similar but has some additional files in it:
-rw-rw---- 1 INFMON01 INFMON01 0 Jul 30 09:53 replication_changes_INFMON01.mk
-rw-rw---- 1 INFMON01 INFMON01 632 Jul 30 09:51 replication_changes_INFMON01_1.mk
-rw-rw---- 1 INFMON01 INFMON01 632 Jul 30 09:51 replication_changes_INFMON01_2.mk
-rw-rw---- 1 INFMON01 INFMON01 2978 Jul 30 09:51 replication_changes_INFMON01_3.mk
-rw-rw---- 1 INFMON01 INFMON01 129 Jul 30 09:53 replication_status_INFMON01.mk
-rw-rw---- 1 INFMON01 INFMON01 193 Jul 30 09:58 replication_status_INFMON01_1.mk
-rw-rw---- 1 INFMON01 INFMON01 227 Jul 30 10:01 replication_status_INFMON01_2.mk
-rw-rw---- 1 INFMON01 INFMON01 227 Jul 30 10:01 replication_status_INFMON01_3.mk

I just noticed that the “Discard changes” action also reset the notifications.
I just got them back from a backup.

BR

In my opinion this is the worst option and should be markt as very very dangerous in distributed setups.

In my system here there is one extra folder “/activiation”.
The replication changes and status files are ok. With these files your master knows where needs a config pushed and activated.

The Only thing what you can do now is that you pay attention to your web server log at the time of the activation of a single site. Please try the sites one by one not all together to get a better picture what happens.

Hi Andreas
the situation was: notifications lost, synced AD user lost, assignment of local users to groups lost. all with a single click :wink:
I restored all involved VMs entirely from a backup and everything is working.

BR Thomas