Bult-in tag group update problem 1.5.0p25 -> 1.6.0p27

Dear folks,

I am currently preparing the upgrade our landscape from 1.5.0p25 to 1.6.0p27 and encounter a problem on the automatically configuration update process related to the change of tag groups. We are running on CEE and SLES 12SP3 as operating system.
In preparation I changed all used legacy hosttags from the tag group agent to the new ones and deleted the old legacy tags from the tag group, cleaned all rules and hosts accordingly:

From my understanding, deleting the custom tag group agent should bring up the built-in tag group agent and all hosts should be added as before with the custom tag group.

This is not the case, after confirming the dialog on deleting the custom tag group all host are changed back to default value Contact either Check_MK Agent or use datasource program. This affects als hosts which are formerly set to no-agent. It’s not suitable to change back around 3500 hosts manually to there original value.

Hosts before deletion:


Hots after deletion:

(no-agent missing, cmk-agent added)

Beside the described issue we encounter also another problem on the update process which is suspicious similar to the other problem. The omd update <site> command ends in a stack trace:

Updating Checkmk configuration...
 + Rewriting WATO tags...
 + Rewriting WATO hosts and folders...
 ERROR: Please repair this and run "cmk-update-config -v" BEFORE starting the site again.
Traceback (most recent call last):
  File "/omd/sites/hr_rz/lib/python/cmk/update_config.py", line 185, in main
    UpdateConfig(logger, arguments).run()
  File "/omd/sites/hr_rz/lib/python/cmk/update_config.py", line 90, in run
    step_func()
  File "/omd/sites/hr_rz/lib/python/cmk/update_config.py", line 112, in _rewrite_wato_host_and_folder_config
    root_folder.rewrite_hosts_files()
  File "/omd/sites/hr_rz/lib/python/cmk/gui/watolib/hosts_and_folders.py", line 1732, in rewrite_hosts_files
    subfolder.rewrite_hosts_files()
  File "/omd/sites/hr_rz/lib/python/cmk/gui/watolib/hosts_and_folders.py", line 1732, in rewrite_hosts_files
    subfolder.rewrite_hosts_files()
  File "/omd/sites/hr_rz/lib/python/cmk/gui/watolib/hosts_and_folders.py", line 1732, in rewrite_hosts_files
    subfolder.rewrite_hosts_files()
  File "/omd/sites/hr_rz/lib/python/cmk/gui/watolib/hosts_and_folders.py", line 1730, in rewrite_hosts_files
    self._rewrite_hosts_file()
  File "/omd/sites/hr_rz/lib/python/cmk/gui/watolib/hosts_and_folders.py", line 1758, in _rewrite_hosts_file
    self.save_hosts()
  File "/omd/sites/hr_rz/lib/python/cmk/gui/watolib/hosts_and_folders.py", line 677, in save_hosts
    self._save_hosts_file()
  File "/omd/sites/hr_rz/lib/python/cmk/gui/watolib/hosts_and_folders.py", line 717, in _save_hosts_file
    tag_groups = host.tag_groups()
  File "/omd/sites/hr_rz/lib/python/cmk/gui/watolib/hosts_and_folders.py", line 2087, in tag_groups
    and tag_groups["agent"] == "no-agent" \
KeyError: 'agent'

Aborted.

This hangs until i abort the process manually via ^C. As far as i know the old tag group should also work with 1.6.0 and should cleared before upgrading to versions above 1.6.0.

Does anybody has a clue what’s going wrong or any advice how to fix it?
I already thought about just deleting the tag group from hosttags.mk but haven’t validated it yet.

Hello,

We did upgrade from 1.5 to 1.6.0p27 with customized tag group agent in place and had no issues with that. Did you tried to first apply the upgrade and later fix the tag group?
As far as I remember the process was that as soon as all legacy tags removed from the tag group it switches automatically to the built-in in 1.6.
Deleting the tag group agent probably cause the stack trace.

and tag_groups["agent"] == "no-agent" \
KeyError: 'agent'

regards

Michael

Short notice to this problem. It is no real problem with 1.6 to keep the “modified” tag group.
The problem will be only then later with the 2.0 upgrade.

My procedure to clean up this problem was the following.
First remove all modifications like the first picture from @tosch it shows.
Second step is the same as the picture “Hosts before deletion” shows, inspect the tags on the normal hosts.

SNMP hosts should have the tags - snmp:snmp - snmp_ds:snmp-v2 or v1 and agent:no-agent
Normal agent hosts should have - agent:cmk-agent, tcp:tcp, snmp_ds:no-snmp

All this is done after the upgrade to 1.6

I don’t try to delete the tag groups is WATO - i had only problems with this way.
Instead i edit the file ~/etc/check_mk/multisite.d/wato/tags.mk
Only solution there is to remove the tag group from the file. It’s a nice dictionary.

Hi @mike1098,

i tried to just ran the update but it won’t work. The update stuck with the stack trace and also the repair process cmk-update-config -v isn’t working. So there is no option to clean this afterwards.
I have done this before the clean up of legacy tags and also after cleaning them, both result in the stack trace at the omd update <site> and stuck there.

For my understanding the stack trace complains because tag group agent doesn’t exist:

and tag_groups["agent"] == "no-agent" \
KeyError: 'agent'

Are you sure that this tag group exist in 1.5 before you do the upgrade to 1.6?

regards

Michael

Look at the first picture @mike1098 , the tag group is clearly there :slight_smile:
The whole problem never appears on updating any of our massive amount of sites, only this one is kinda stubborn.

From what i can tell the deletion directly at the ~siteuser/etc/check_mk/multisite.d/wato/hosttags.mk is working so far, as @andreas-doehler suggested. I just need to double check the tags on the hosts now and retry the update again.

I keep you guys informed how it goes.

OK, wish you good luck.

Michael

I removed the tag group from the ~siteuser/etc/check_mk/multisite.d/wato/hosttags.mk, regenerated the configuration and restarted the core. I got the built-in tag group agent and anything fine with monitoring. All hosts still have the correct tags.
Unfortunately the update process still trows the same stack trace.

EDIT:
I found two hosts haven’t set any of the agent tags, fix this and will try the update and cleanup again. This is maybe causing the problem.

I found some hosts have set no-agent and no-snmp. Could this cause the problem?
And I don’t have the tags with : included on 1.5, just the site tag contains this format.

The “:” is only visible at 1.6 i think.

1 Like

This fixed the stack trace problem on the update process. No clue how hosts ended without any agent tag.

1 Like