Distributed Monitoring - Activation of remote sites failed

Hello,

  • 1 central instance
  • 2 fresh installs on 2 servers for distributed monitoring
    • instances are configured for DM
    • can be reached, login was configured succesfully
  • activation of config fails with attached error messages

Any ideas?

Kind regards,

Sebastian

CMK version: 2.0.0p21 Raw Edition
OS version: Ubuntu 21.10 Server

Error message:

Started at: 12:17:54. Finished at: 12:17:55.
Got invalid data:

Internal automation error: Error running automation call restart (exit code 1), error: 

Nagios Core 3.5.1
Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-30-2013
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
Processing object config directory '/omd/sites/it_monitor_01/etc/nagios/conf.d'...
Processing object config file '/omd/sites/it_monitor_01/etc/nagios/conf.d/check_mk_templates.cfg'...
Processing object config file '/omd/sites/it_monitor_01/etc/nagios/conf.d/jmx4perl_nagios.cfg'...
Processing object config file '/omd/sites/it_monitor_01/etc/nagios/conf.d/templates.cfg'...
Processing object config file '/omd/sites/it_monitor_01/etc/nagios/conf.d/mkeventd_notifications.cfg'...
Processing object config file '/omd/sites/it_monitor_01/etc/nagios/conf.d/check_mk_objects.cfg'...
<div class=err>Error: Could not find any contactgroup matching 'all' (config file '/omd/sites/it_monitor_01/etc/nagios/conf.d/mkeventd_notifications.cfg', starting on line 5)</div>
   Error processing object config files!


***> One or more problems was encountered while processing the config files...

     Check your configuration file(s) to ensure that they contain valid
     directives and data defintions.  If you are upgrading from a previous
     version of Nagios, you should be aware that some variables/definitions
     may have been removed or modified in this version.  Make sure to read
     the HTML documentation regarding the config files, as well as the
     'Whats New' section to find out what has changed.

An error occurred: Error creating configuration: Configuration for monitoring core is invalid. Rolling back. The broken file has been copied to "/omd/sites/it_monitor_01/tmp/check_mk/check_mk_objects.cfg.broken" for analysis.


Traceback (most recent call last):
  File "/omd/sites/it_monitor_01/lib/python3/cmk/gui/wato/pages/automation.py", line 170, in _execute_automation_command
    html.write(repr(automation.execute(automation.get_request())))
  File "/omd/sites/it_monitor_01/lib/python3/cmk/gui/wato/pages/activate_changes.py", line 670, in execute
    return cmk.gui.watolib.activate_changes.execute_activate_changes(request.domains)
  File "/omd/sites/it_monitor_01/lib/python3/cmk/gui/watolib/activate_changes.py", line 1792, in execute_activate_changes
    warnings = domain_class().activate()
  File "/omd/sites/it_monitor_01/lib/python3/cmk/gui/watolib/config_domains.py", line 60, in activate
    return check_mk_local_automation(config.wato_activation_method)
  File "/omd/sites/it_monitor_01/lib/python3/cmk/gui/watolib/automations.py", line 143, in check_mk_local_automation
    raise _local_automation_failure(command=command,
cmk.utils.exceptions.MKGeneralException: Error running automation call restart (exit code 1), error: 

Nagios Core 3.5.1
Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-30-2013
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
Processing object config directory '/omd/sites/it_monitor_01/etc/nagios/conf.d'...
Processing object config file '/omd/sites/it_monitor_01/etc/nagios/conf.d/check_mk_templates.cfg'...
Processing object config file '/omd/sites/it_monitor_01/etc/nagios/conf.d/jmx4perl_nagios.cfg'...
Processing object config file '/omd/sites/it_monitor_01/etc/nagios/conf.d/templates.cfg'...
Processing object config file '/omd/sites/it_monitor_01/etc/nagios/conf.d/mkeventd_notifications.cfg'...
Processing object config file '/omd/sites/it_monitor_01/etc/nagios/conf.d/check_mk_objects.cfg'...
<div class=err>Error: Could not find any contactgroup matching 'all' (config file '/omd/sites/it_monitor_01/etc/nagios/conf.d/mkeventd_notifications.cfg', starting on line 5)</div>
   Error processing object config files!


***> One or more problems was encountered while processing the config files...

     Check your configuration file(s) to ensure that they contain valid
     directives and data defintions.  If you are upgrading from a previous
     version of Nagios, you should be aware that some variables/definitions
     may have been removed or modified in this version.  Make sure to read
     the HTML documentation regarding the config files, as well as the
     'Whats New' section to find out what has changed.

An error occurred: Error creating configuration: Configuration for monitoring core is invalid. Rolling back. The broken file has been copied to "/omd/sites/it_monitor_01/tmp/check_mk/check_mk_objects.cfg.broken" for analysis.

Is it possible that you removed the default contact group β€œall” from your central instance?

Hello,

I did not actively remove the contact group β€œall”.
On the central instance under " Setup > Users > Contact groups" there is the default contact group β€œall”.

On the remote instances I never touched the configuration of the created sites.
Additional Info: I configured distributed monitoring of course, as mentioned in initial posting.

Kind regards,

Sebastian

BTW:
I intentionally created this topic in English because I thought this might enable more users to answer.
German is possible too.

As I understand the response of the remoter site is not as expected because the nagios configs are invalid, am i right?

    try:
        response = ast.literal_eval(response)
    except SyntaxError:
        # The remote site will send non-Python data in case of an error.
        raise MKAutomationException("%s: <pre>%s</pre>" % (_("Got invalid data"), response))

Excerpt from weblog file:

2022-03-22 18:09:09,019 [40] [cmk.web.site[it_monitor_01] 3463560] error activating changes
Traceback (most recent call last):
  File "/omd/sites/it_monitor/lib/python3/cmk/gui/watolib/automations.py", line 304, in do_remote_automation
    response = ast.literal_eval(response)
  File "/omd/sites/it_monitor/lib/python3.8/ast.py", line 59, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/omd/sites/it_monitor/lib/python3.8/ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 1
    Internal automation error: Error running automation call <tt>restart</tt> (exit code 1), error: <pre>
             ^
SyntaxError: invalid syntax

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/omd/sites/it_monitor/lib/python3/cmk/gui/watolib/activate_changes.py", line 1388, in _do_run
    configuration_warnings = self._do_activate()
  File "/omd/sites/it_monitor/lib/python3/cmk/gui/watolib/activate_changes.py", line 1651, in _do_activate
    configuration_warnings = self._call_activate_changes_automation()
  File "/omd/sites/it_monitor/lib/python3/cmk/gui/watolib/activate_changes.py", line 1664, in _call_activate_changes_automation
    response = cmk.gui.watolib.automations.do_remote_automation(
  File "/omd/sites/it_monitor/lib/python3/cmk/gui/watolib/automations.py", line 307, in do_remote_automation
    raise MKAutomationException("%s: <pre>%s</pre>" % (_("Got invalid data"), response))
cmk.gui.watolib.automations.MKAutomationException: Got invalid data: <pre>Internal automation error: Error running automation call <tt>restart</tt> (exit code 1), error: <pre>

I found the solution:

  1. Add contact group β€œall” to remote sites in
    • /omd/sites/it_monitor_01/etc/nagios/conf.d/check_mk_templates.cfg
    • /omd/sites/it_monitor_02/etc/nagios/conf.d/check_mk_templates.cfg
  2. Activate changes on remote sites
    • Status becomes green
  3. Test: change a configured host to be monitored on a remote site
    • Error: contact group β€œall” is duplicate
  4. remove contact group β€œall” from check_mk_templates.cfg on both remote sites

I tested the monitoring for the host which I β€œmoved” to the remote sites and it does indeed work.

Kind regards,

Sebastian

Hello,

One more question: Historic data for a host and services is lost when moving it to new site?

Kind regards,

Sebastian

Hi Sebastian,
Yes that’s correct.
One possible solution for this is, you could copy the historic data from one site to the new site. But this is indeed not very practical, but maybe for your scenario.

Regards
Christian

Hello Christian,

I would like to give it a try and copy the data.
Currently I am wondering where the relevant data is stored.

Which folders should I copy for this?
I guess:

β”œβ”€β”€ pnp4nagios
β”‚   β”œβ”€β”€ perfdata
β”œβ”€β”€ rrdcached

Any more?

:~# tree -L 2 -d /omd/sites/it_monitor/var
/omd/sites/it_monitor/var
β”œβ”€β”€ check_mk
β”‚   β”œβ”€β”€ autochecks
β”‚   β”œβ”€β”€ background_jobs
β”‚   β”œβ”€β”€ backup
β”‚   β”œβ”€β”€ core
β”‚   β”œβ”€β”€ crashes
β”‚   β”œβ”€β”€ discovered_host_labels
β”‚   β”œβ”€β”€ inventory
β”‚   β”œβ”€β”€ inventory_archive
β”‚   β”œβ”€β”€ inventory_delta_cache
β”‚   β”œβ”€β”€ license_usage
β”‚   β”œβ”€β”€ logwatch
β”‚   β”œβ”€β”€ notify
β”‚   β”œβ”€β”€ packages
β”‚   β”œβ”€β”€ persisted
β”‚   β”œβ”€β”€ precompiled
β”‚   β”œβ”€β”€ precompiled_checks
β”‚   β”œβ”€β”€ site_configs
β”‚   β”œβ”€β”€ snmp_cache
β”‚   β”œβ”€β”€ snmpwalks
β”‚   β”œβ”€β”€ update_config
β”‚   β”œβ”€β”€ wato
β”‚   └── web
β”œβ”€β”€ log
β”‚   β”œβ”€β”€ apache
β”‚   └── mkeventd
β”œβ”€β”€ mkeventd
β”‚   └── history
β”œβ”€β”€ monitoring-plugins
β”œβ”€β”€ nagios
β”‚   └── archive
β”œβ”€β”€ nagvis
β”‚   └── profiles
β”œβ”€β”€ omd
β”œβ”€β”€ pnp4nagios
β”‚   β”œβ”€β”€ log
β”‚   β”œβ”€β”€ perfdata
β”‚   β”œβ”€β”€ spool
β”‚   └── stats
β”œβ”€β”€ redis
β”œβ”€β”€ rrdcached
β”œβ”€β”€ ssl
β”œβ”€β”€ tmp
└── www

44 directories

Kind regards,

Sebastian

HI @sefr
as far as I know you’ll find the rrd data for your systems in this folder:
~/var/check_mk/rrd

And you’ll need to copy the related folder (with the name of the system) from site a to site b.

I never did this, so maybe someone else should confirm this or you could test this first in a test environment.

regards

Hello,

folders containing rrd in name:

:~# find  /omd/sites/it_monitor/ -type d -name '*rrd*'           
/omd/sites/it_monitor/etc/check_mk/rrdcached.d
/omd/sites/it_monitor/etc/rrdcached.d
/omd/sites/it_monitor/tmp/rrdcached
/omd/sites/it_monitor/var/rrdcached
/omd/sites/it_monitor/.version_meta/skel/etc/rrdcached.d
/omd/sites/it_monitor/.version_meta/skel/tmp/rrdcached
/omd/sites/it_monitor/.version_meta/skel/var/rrdcached

I will try. Thanks for your advice. I think this topic can be considered as β€œclosed” now.
For any further questions I will create a new topic.

Best regards,

Sebastian

I Think you do not need to copy every rrd-folder to the other site.
Just the hostname-folder.
e.g.

OMD[slave1]:~/var/check_mk/rrd$ ls -al
drwxr-xr-x. 77  slave1 slave1  4096 Mar 10 14:44 ./
drwxr-xr-x. 27  slave1 slave1  4096 Mar 23 10:33 ../
drwxrwx---.  2  slave1 slave1    39 Mar 11  2020 host1.domain.com/
drwxrwx---.  2  slave1 slave1    39 Mar 11  2020 host2.domain.com/
drwxrwx---.  2  slave1 slave1    39 Mar 11  2020 host3.domain.com/
...

OMD[slave1]:~/var/check_mk/rrd/host1.domain.com$ ls -al
drwxrwx---.  2 slave1 slave1     4096 Jan  7 11:10 ./
drwxr-xr-x. 77 slave1 slave1     4096 Mar 10 14:44 ../
-rw-rw----.  1 slave1 slave1      144 Jan  7 10:54 Check_MK.info
-rw-r--r--.  1 slave1 slave1  2301312 Mar 23 11:06 Check_MK.rrd
-rw-rw----.  1 slave1 slave1       71 Jan  7 10:54 CPU_load.info
...

Search for the folder of the host you moved to the other site and copy the folder e.g. host1.domain.com from old site to new site.

Just to clarify: the folder you describe does not exist on my machine

  • checkmk raw edition
  • installed as .deb-file

It should be there as desribed in the docs: Performance data and graphing - Evaluating measured values in Checkmk quickly and easily

OMD[it_monitor]:~$ ls -la var/check_mk/
total 16
drwxr-xr-x 1 it_monitor it_monitor  614 Mar 23 00:05 ./
drwxr-xr-x 1 it_monitor it_monitor  170 Mar 15 15:27 ../
-rw-rw---- 1 it_monitor it_monitor 2303 May 28  2021 acknowledged_werks.mk
drwxr-xr-x 1 it_monitor it_monitor 1314 Mar 16 18:13 autochecks/
drwxr-x--- 1 it_monitor it_monitor 1582 Mar 23 12:27 background_jobs/
drwxrwxr-x 1 it_monitor it_monitor  190 Mar 23 12:01 backup/
drwxr-xr-x 1 it_monitor it_monitor   26 Mar 15 15:27 core/
drwxrwxr-x 1 it_monitor it_monitor   38 Mar 15 15:27 crashes/
drwxrwx--- 1 it_monitor it_monitor  734 Mar 16 18:13 discovered_host_labels/
drwxrwx--- 1 it_monitor it_monitor 1782 Mar 22 23:42 inventory/
drwxrwxr-x 1 it_monitor it_monitor  494 Mar 17 18:16 inventory_archive/
drwxrwx--- 1 it_monitor it_monitor  104 Mar 15 18:14 inventory_delta_cache/
-rw-rw---- 1 it_monitor it_monitor 1416 Mar 23 00:05 ipaddresses.cache
drwxrwx--- 1 it_monitor it_monitor   70 Mar 23 10:30 license_usage/
drwxrwxr-x 1 it_monitor it_monitor   16 Mar 15 15:27 logwatch/
drwxrwxr-x 1 it_monitor it_monitor   64 Mar 23 12:19 notify/
drwxr-xr-x 1 it_monitor it_monitor   14 Mar 15 15:27 packages/
drwxr-xr-x 1 it_monitor it_monitor  424 Mar 23 12:27 persisted/
drwxr-xr-x 1 it_monitor it_monitor    0 May 28  2021 precompiled/
drwxr-xr-x 1 it_monitor it_monitor   24 Mar 15 15:27 precompiled_checks/
-rw-rw---- 1 it_monitor it_monitor    3 Dec  9 12:34 report_schedule.py
drwxrwx--- 1 it_monitor it_monitor    0 Mar 22 19:34 site_configs/
drwxrwx--- 1 it_monitor it_monitor  396 Mar 15 15:27 snmp_cache/
drwxr-xr-x 1 it_monitor it_monitor   94 Mar 15 15:27 snmpwalks/
-rw-rw---- 1 it_monitor it_monitor   34 Mar 22 18:27 stored_passwords
drwxrwx--- 1 it_monitor it_monitor   36 Mar 15 15:27 update_config/
drwxrwxr-x 1 it_monitor it_monitor  624 Mar 23 11:07 wato/
drwxrwxr-x 1 it_monitor it_monitor  304 Mar 15 15:28 web/
OMD[it_monitor]:~$ ls -la var/check_mk/rrd
ls: cannot access 'var/check_mk/rrd': No such file or directory

Then the rrd folder is β€œ~/var/pnp4nagios/perfdata/”

~/var/check_mk/rrd/ only exists on enterprise installations

1 Like

Thank you, I had an idea that this might be the corresponding folder.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.