1.6 to 2.0 upgrade

Update:solved (feedback to devs)
I am testing restore of backup onto another virtual appliance, firmware and 1.6 verions are updated and everything works in prod today. So restore works okay and then I upgrade and it failes. It seem to come a long way but ant the end it just hangs or stalls with a traceback text.(somewhere some managment board address is missing)…and I cannot find any clue from the text or logfiles so far. Important progress I made was to try and switch the core from cmc to nagios, it then complained about a service group. I deleted that group and rules that used it and then it could run a version 2.0 with nagios as core. Switching to cmc just makes it hang with the same traceback as when I first tried to update. There is a line with warning: before the traceback, it just says failed to look up ipv4 on a host via DNS- will not be monitored correctly.
I have now tried several times and after each patch update.

Update: as I write this I made a change in the test site with nagios as core and when applying the changes it failed and wrote out the name of the culprit. So I deleted the named host and applied ok. After that I could actually change to cmc core and it seem to run normal. So cmc worked fine with that service group configuration and nagios did not, there is a difference but I dont think it was really important. The installation failed on one of my hosts missing “Management board address is not configured” and there it hang. After deleting that host it worked. I have searched the logs but not found that host other than after nagios core failed to apply the changes. Maybe the logging in the update or cmc startup could be improved, I dont know hos I could have found this without nagios core working temporary.

cheers

Without seeing the actual error messages at the time of the update it is complicated to say what really happend.
You description sounds like that it hangs only at the last step where it compiles a new configuration for the configured core. This also can explain why you get different errors depending on the configured core in the system. If the system complains about problem at time of config generation like in your case, a “cmk --debug -vvU” can help to find the real reason for the problem.

The missing mgmt IP from one of your hosts you also should see in the old 1.6 if you do the same command on the shell.

1 Like

Yes, info like that --debug and stuff, I could not find and if it would have helped we will not know now.

The host that caused error was set to be monitored by snmp, no agent, snmp info was set as expected, but then also in the management board section there was put in info username, etc but not in the address field, it was unchecked. Address unchecked but info in other fields. Logical error, with no address I think maybe the gui should have discarded the remaining values, what good is username and password if the adress is empty?

Image of last screen that just stalls and hangs there…

I faced the issue a few times during the update to Checkmk 2.0
I will share tomorrow my notes to solve the problem.
If I remember correct: This happens e.g if you activate the management board for a host and didn’t configure anyone. This will mostly happen, if you enable this on a folder base with hosts in there with no Management board. In this case you need to modify the hosts.mk to solve the problem. I will share tomorrow my notes.

BR
Anastasios

Looking forward to your notes. We will be upgrading to CheckMK 2.0 asap. So all the notes and tips are welcome.

These are my german notes. As I already have written, you need to modify the hosts.mk and remove this section.

The problem is:f you activate the management board for a host and didn’t configure anyone. This will mostly happen, if you enable this on a folder base with hosts in there with no Management board.