BUG: Replication of ~/local to slave sites broken after upgrade to Checkmk 2.0

So today I upgraded our distributed environment from CME 1.6.0p22 to CME 2.0.0p3.
Worked like a charm, expected nothing less.

BUT: Now the replication from master to slaves does not replicate the ~/local structure anymore.
Before it worked on all connected sites. And the settings are still correct, I verified that.
There are no errors shown in the UI and the configuration synchronization itself works.
The permissions in the filesystem look about right, too.

Does anyone have a similar issue or an idea what could be causing this behavior?

I got the idea that after changes are applied, local was synced to the other slaves. Have you checked the sync with new file and with (new test) checkmk site? To verify that it is in your site or in the CheckMK version.

betrifft die managed edition, korrekt?
Wir sind dran!

1 Like

Ist es möglich, dass das auch die Enterprise Edition betrifft?
Nach dem Upgrade von 1.6.0p22 auf 2.0.0p1 hatten wir das Problem, dass nach der Aktivierung von Changes oft alle Files unter “local” auf den Slave-Instanzen gelöscht und nicht erneut gesynct wurden - nach dem Aktivieren eines weiteren Changes, wurden die Files dann wieder korrekt gesynct. Wir haben daraufhin temporär den Sync in den Settings für unsere Slave-Sites deaktiviert. Getestet haben wir allerdings nur mit der 2.0.0p1, aktuell sind wird schon auf der 2.0.0p3.

Korrekt, wobei ich andere Editionen nicht getestet habe.

Super, das höre ich gerne, macht mir nämlich ein paar Checks kaputt. Danke!

Grüße aus Köln nach Köln. :wink:

Also bei in der CME mir ist das Problem persistent, egal wie oft ich Ă„nderungen aktiviere.
Blöderweise wird auf dem Satelliten tatsächlich ~/local komplett geleert, sodass ich auch nicht manuell etwas hinlegen kann.

@keylane_sbaas: Yes I modified some files but to no avail.

I can understand how difficult it is with no clear message in logging. If you want to can increase the logging level to maximum of debug logging. Maybe that will give you a message in logging.

So you did create second slave site and tried/check if the replication does work from your master? I was not sure from the answer if you did.

Last resort that I can think is, create CheckMK and OMD backup. Create clean new site and restore you backup and check if the problem persist.

Thanks for your input, I apprechiate that!
I did not test as thoroughly as you suggest though.

But @_rb statet above in German, they seem to be aware of the issue and already working on it:

betrifft die managed edition, korrekt?
Wir sind dran!

So I think I will wait on the next patch release or until he gives an update here.

sorry for switching to german. We are aware of the problem in CME. We will check if CEE is also affected

2 Likes

Looks good for CEE but would be interesting if you still have such problems with the latest release

I re-enabled it today and tested it several times. The problem still exists in 2.0.0p3, but it happens only in about 1 out of 4 cases. Sometimes the files in the “local” folder just disappeared on the slave site.
We definitely didn’t have the problem before version 2.0.

I am not able to reproduce this. Do you have any entries in /var/log/web.log regarding the sync? (central and remote site)

Unfortunately nothing in the web.log.

We currently have the following setup:

Server 1:

  • Local site (master)
  • Additional slave site (Connect via TCP/unencrypted - “Replicate extensions” normaly enabled, but problems since version 2.0)

Server 2:

  • Additional slave site (Connect via TCP/encrypted - “Replicate extensions” generally disabled)

Please let me know if you need further informations.

so you have a distributed setup with 3 sites:

central site
remote site 1 - Replicate extensions enabled - unencrypted
remote site 2 - Replicate extensions disabled - encrypted

correct?

Do you have a special change in configuration where the local path gets cleared on remote site 1 or is that happening on all changes?

Correct.

No, it doesn’t matter which change it is. Yesterday during my tests i did edit & save some folders (without a real change) and it also happened.

Any news on this topic?

CME 2.0.0p4 fixed the issue for me! :tada:
For reference: Werk 12461

1 Like

Our problem still exists in version 2.0.0p7 CEE. We recently set up another slave site that keeps losing all files in /local from time to time after config activations.

I must also resurrect this thread. The problem is existing also with p8 and p9.
Situation is

  • CEE
  • 8 sites with mkp sync enabled
  • connection between all sites is encrypted

Problem happens from time to time without any “warning”.
Would be good to have some information if this is also known internally at tribe29. @_rb

Any news @_rb ? It’s a major problem for us since version 2.0.

1 Like