Check_MK upgrading (1.5.0p7.cee -> current stable 1.6.0p*)

Hi all,

The company I work for is using Check_MK for monitoring (the paid version). Everything is working fine, but it is time for upgrade. I have read the documentation and the upgrade process is pretty straight-forward.
However, I got the feeling that it is way too easy - OK some custom plugins/edited files may need to be taken care of, but in general, the procedure is no big deal. I have read the following: “https://checkmk.com/cms_update.html”.

Is tha above all? Any advise on what else, I can read up/get familiar with?
One major question is coming to mind - does the Check_MK agents that are installed on the monitored hosts have to be updated also?

Thank you all in advance.

some quick notes:

  • in a distributed setup (with config push), first update all slaves, then the master
  • after upgrading, check the list of “unacknowledged incompatible werks” in the release notes (red numbered button besides the version number on the top of the sidebar)
  • agents can be older but should not be newer than the monitoring server. you might miss some features when using an older agent though, so upgrading agents afterwards is highly recomended.
  • always have a working backup :slight_smile:
2 Likes

Thank you for the fast response! Not having to upgrade 200+ agents is a very nice thing. About your other points - yeah - they will be taken care of (especially the backup).
BR.

If you’re worried about having to upgrade 200+ agents manually (yes you really don’t want to do that; I agree), consider letting some tool like puppet handle the upgrade of the agents. Yes, that will be a one time annoying job, but once it’s done upgrading the agent on however many servers will be a piece of cake.

Louis

Hi again,
Something else came to mind while I was re-reading the upgrade guide. Nothing is mentioned about the Check_MK config (I mean the config we have done via the web - WATO - adding checks, hosts, devices). After the upgrade, we will not have to add the hosts and the devices again, right? I do understand that there may be some “incompatible werks”, that we will have to work on. But the expectation is that the checks will be there, the users/mail groups, etc will be there also? The main functionality will stay mainly unaffected and functional? And then it is up to us, to start building up and using the new functionalities and optimizations that the new version is providing.

Thanks.

P.S.
@louis
Yeah - already started reading on how to do it with puppet.

Well, it should all work out of the box. Now I must admit I’ve never done an upgrade for de commercial version, only the raw edition, but for me that worked flawlessly.

Maybe, if your CheckMK server is running as a virtual machine, you could clone it into an isolated environment and try the upgrade there, before going into production?

Hi again,

Following up.
I found a server in our environment that was installed back in the day with the production one - using the same Check_MK version and having only one host to monitor (having the mail-groups and things also, also connected to our AD). So I update this VM first.
The update went pretty fast and easy. Came out with 127 “incompatible werks”. I am reading them up currently, but wanted to ask something in general first.

  1. Am I supposed to expect the same amount of werks on my production server? Having in mind that I have a lot of more hosts there.
  2. Is it safe to state the following: “Bug fixes that are compatible may be ignored.”.Example:
    1.6.0p15 2020-07-17 11:40:46 Bug fix Trivial change replace error message with no services discovered when licensing information is not found
    I do not see what can I do about it, really. My guess is that these kind of werks are informational, kind of like a Readme.
  • The “Incompatible - TODO” werks are, of course, a whole different story and they should be addressed.

Thank you in advance.

You night want to read https://checkmk.com/check_mk-werks.php about what werks are and how they are organized/classified

ad 1: werks are just the individual changes that make up the new version. So they are the same no matter how many or what kind of hosts you monitor. That’s why you as the monitoring admin should inspect the incompatible werks and check if they affect your environment. For example, if a werk changed the naming of a services on, say, Fortigate firewalls, then you should do a service rediscovery on your fortigate systems, and perhaps adjust rulesets etc. If you are using only Checkpoint, then this does not affect you. In both cases, you can then acknowledge the incompatible werk in question to indicate you have checked it.
ad 2: yes, if the werk is compatible, then you no action is required. You still might want to have a quick glance at the werk subjects so you know which bugs are fixed etc.

Thank you for the fast response - covers my thoughts completely. Already glancing over the bug fixes - just to make sure that I understand them.
Then will dive deep into the TODO things.

1 Like

Following up - I guess you guys may be interested.
So I went to upgrade our production server last night (reverted it back to the old version, but will do the procedure again soon enough).
So pretty much everything worked out-of-the box, besides two major things.

  1. ESXi hosts multipath is renamed (not gone or anything - renamed). If the monitoring service with 1.5 is called something like “Multipath L20 physical”, with Check_MK 1.6, the same thing is discovered with a WWN number. So this is some manual work (the ESXi hosts are 41), but it is OK.
  2. A lot (and I mean a lot) of the Oracle checks are gone… the discovery cannot find them. Checks like ASM groups getting UKNOWN status, Inventory jobs getting CRIT status…
  • So what I suggested to do is to add a Oracle host (srv1 let’s say) to the already updated test Check_MK server. Then ask the Oracle people to check if what they “see” there is enough for them.
  • Then, as a follow up, I am thinking of updating the Check_MK agent on the srv1, and see what checks will appear/disapper…

Basically - Oracle is fancy and makes problems :slight_smile:

P.S. I guess there is no problem for a certain VM to be monitored from two Check_MKs ? I understand that when/if I update the agent of the srv1.

1 Like

Hi again,

So the discovery Check_MK gets into UKNOWN state every time it is ran and tells me to “submit a crash report”. What I see in the logs is the following:

ValueError (invalid literal for int() with base 10: ‘ST_RAC_CIMB/’)

Traceback
File “/omd/sites/NWTSBCK/lib/python/cmk_base/decorator.py”, line 58, in wrapped_check_func
status, infotexts, long_infotexts, perfdata = check_func(hostname, *args, **kwargs)
File “/omd/sites/NWTSBCK/lib/python/cmk_base/discovery.py”, line 422, in check_discovery
on_error=“raise”)
File “/omd/sites/NWTSBCK/lib/python/cmk_base/discovery.py”, line 1057, in _get_host_services
return _get_node_services(host_config, ipaddress, sources, multi_host_sections, on_error)
File “/omd/sites/NWTSBCK/lib/python/cmk_base/discovery.py”, line 1065, in _get_node_services
multi_host_sections, on_error)
File “/omd/sites/NWTSBCK/lib/python/cmk_base/discovery.py”, line 1098, in _get_discovered_services
multi_host_sections, on_error)
File “/omd/sites/NWTSBCK/lib/python/cmk_base/discovery.py”, line 834, in _discover_services
check_plugin_name, on_error):
File “/omd/sites/NWTSBCK/lib/python/cmk_base/data_sources/host_sections.py”, line 299, in _update_with_parse_function
return parse_function(section_content)
File “/omd/sites/NWTSBCK/share/check_mk/checks/oracle_asm_diskgroup”, line 157, in parse_oracle_asm_diskgroup
“fg_disks”: int(fg_disks),

As far as I understand this is the reason why I cannot see the Oracle ASM disks in the monitoring.
Any idea how to fix this?
Thank you in advance.

P.S. A little bit below in the logs, I see that it finds the ASM disks (and their actual size, etc) … but something’s wrong and it cannot show them:
" [None,
u’MOUNTED’,
u’EXTERN’,
u’N’,
u’512’,
u’4096’,
u’4194304’,
u’2289640’,
u’162292’,
u’0’,
u’162292’,
u’0’,
u’N’,
u’ST_RAC_DATA/’],"

Is the Oracle plugin also the newer version from 1.6?

Nope… ohhh. I understand - OK. Thanks :slight_smile:
Actually by Oracle plugin - you mean that I should update the Check_mk-agent version, right?

Both the agent and the agent plugins :slight_smile: