Distributed monitoring - no notification's after upgrade to 2.0

CMK version:
Distributed monitoring Central Site: 1.6.0p24
Distributed monitoring working Remote Site: 1.6.0p24
Distributed monitoring not working Remote Site: 2.0.0p19
OS version:
CentOS 7

Error message:
Hello,

We have a big problem in our company. We have a CMK site from about 4 years, constantly upgrading. We want to move forward to 2.0 but we spotted numerous problems but the biggest is no notification from remote site, after update.

  • after upgrade to 2.0 port 6555 is turned off, no notifications can be pushed to master site,
  • OS firewall and hardware firewall is fine, because before upgrade everything worked fine,
  • file in path “omd/sites/sitename/etc/check_mk/mknotifyd.d/wato/sitespecific.mk” is empty, but in 1.6 configuration was there and working fine, we tryied to coppy it, but it doesn’t work,
  • command “netstat -tulpn | grep :65” on 2.0 not showing 6555, but on 1.6 same command is showing “LISTEN 3982/python”,
  • file in path “/omd/sites/“site name”/var/log/notify.log” is different for both version:
    2.0
    "2022-02-22 00:08:32,944 [20] [cmk.base.notify] Got spool file 616d186f (“hostname”;Service “service name”) for local delivery via mail
    2022-02-22 00:08:32,944 [20] [cmk.base.notify] executing /omd/sites/“site name”/share/check_mk/notifications/mail
    1.6
    "
    2022-02-19 00:21:26 * notifying helpdesk via mail, parameters: smtp, host_subject, from, service_subject, elements, bulk: no
    2022-02-19 00:21:26 executing /omd/sites/“site name”/share/check_mk/notifications/mail
    2022-02-19 00:22:27 ----------------------------------------------------------------------
    "

For now we are not able to update all of our client’s because of this error. We make an update of one site, and after one week we spotted there was no notification, and 1st, we need to repair this, and move forward.

It’s my first case, i will provide any necesary information, but i don’t know where to look for it.

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

afaik the versions need to be the same, like mentioned here: Distributed monitoring - Scaling and distributing Checkmk

The Checkmk-version (e.g. 2.0.0) of the remote and the central site is the same — mixed versions are not supported. The patch level (e.g. p11) may be different in the most cases. But in rare cases there will be an incompatibility even here. You are then obligated to stick to the exact version (e.g. 2.0.0p11) on all sites to allow both sites to work with each other without any problems. Please notice because of that always the incompatible changes at each patch version in the Werk list.

2 Likes

Hi,

But this documentation is from “Last modified on 16-Sep-2019”. From one and a half year, we had mixed version but in range of 1.6, and everything worked fine.

I couldnt find if anyone had the same problem on forum. I wanted to update everything, but im affraid there will be no turning back, and i will have to deploy all of the environment from beggining.

When i upgraded master yesterday to 2.0 i couldnt perform any changes in WATO/System configuration because of “Failed to create site config directory”, so first i wanted to fix notification, as a core functionality.

Sincerely,

Unfortunately, I don’t have the same environment, but if everything is to be managed centrally and a configuration already exists on the other instance, it probably has to be overwritten.

I would create an update beforehand and import it to the master instance.

Hi @blazej.s

But this documentation is from “Last modified on 16-Sep-2019”.

Just because the documentation is from 2019 doesn’t make it wrong. It’s still true.

From one and a half year, we had mixed version but in range of 1.6, and everything worked fine

If I understand you correctly, previously you were using different patch level (p… ) of the 1.6. That (normally) works, as @Man-in-Black already quoted. So mixing 1.6p17 and 1.6p12 would normally work.

But mixing two different major versions of Checkmk (so a 1.6 and a 2.0) will not work.

@elias.voelker ,

There is a werk from Lars clarifying this. You could run e.g. 1.6 on master and 2.0 on the remote site. That setup is supported: Distributed monitoring: Improve version compatibility validation

regards

Michael

1 Like

Hi Michael,

I wasn’t aware of this! Thanks!

Cheers
Elias

Are you sure this applies here? The affected checkmk versions in the werk are listed as 2.1 - so it wouldn’t help on 2.0 (yet)

As far as I understand the werk, it only adds checking for compatible versions. The compatibility itself should already be given in older versions as well.

Config master is pushing out to config slaves, so it makes sense to me to assume a 2.0 master’s config can not be understood by a 1.6 slave. But judging from the werk’s notes, a 2.0 slave can still handle the config sent by a 1.6 master.

I clarified this with Tribe29 because we have an environment with ~300 sites on 1.6 and we need to upgrade to 2.0 and we cannot do it in one big step because the risk is to high to stop monitoring globally for a longer time.

The WERK´s are always not clear to what versions it really applies. I had opened already a bunch of tickets to clarify this because you cannot be sure.

regards

Michael

Just to be complete:
What is not working currently is the compatibility between 1.6 and 2.0 for custom plugins. Especially code for Agent bakery, clustered checks and some SNMP checks is not compatible. Tribe29 is currently investigating a solution to allow installing code for 1.6 and 2.0 in parallel and only deploy the code which fits to the according version. In parallel I have an unofficial and not very well tested solution as plan B for this limitation.

regards

Michael

Although being supported, it is still not a good idea to run a mixed setup over longer periods of time!
Michael’s environment is quite peculiar. Chances are, for most people a big bang upgrade is the cleanest, easiest and safest way to go.

Hello,

I’m sorry for my late response, but I can confirm that upgrading all of the site’s to version 2.0 was the only solution to achieve working notification on all of our site’s. We were just unsecure, when we updated just two, and everything collapsed.

Thank all of you for help. After all the answers, we just made a few snapshot’s and backup’s of our distributed machines and perform successful upgrade on our environment, about 3 months ago.

Thread can be closed.

obraz

Sincerely,

1 Like

Can you please mark the answer as the solution if it solved your problem?

Thanks, i’ve marked correct answer!

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.