I was wondering how you all test your environment with real data for an update. We have a somewhat big and heterogeneous network and it’s not feasible to set up a test environment with all relevant checks and technologies from scratch, so the only option is to copy the production site to a new VM and apply an update to test the werks. I know about simulation mode, but I’m looking for a way to “isolate” the test environment from everything. For exaple, once you start the test site, notifications are still enabled and the distributed monitoring configuration is the same, so the test site will try to contact the main site.
The only thing I can think of it to manually edit the conf in ~/etc before starting and hoping I didn’t forget anything that will interfere with production in any way. How do you guys handle this situation?
apparently its something still not available and requires a lot of extra work from your side.
Its not only disabling the notifications manualy, its also for example removing spool files so its not sending out unprocessed notifications and many more…
Some ideas been collected here and there’ve been some talks with QA and for example @robin.gierse from CMK as well.
lets vote it up and hope this will come with the 2.4 finally so it will be easy for everyone to set up quickly, a bullet proof test envioronment without the extra work and by this, make CMK more stable in the upcoming versions as easier for everyone to test quickly.
Cheers
We have no way of testing this in full before upgrading in prod, our environment is to large and complex.
Instead we have both dev and stage environments. Our stage is “live” meaning it contains hosts that we do monitor just like they where in production, they have notifications, and API integrations for things like scheduled downtime.
But we can also import all hosts into a separate VM as we have a very well defined structure in our CMDB/IPAM for what should be monitored in Checkmk.
On that site we just disable monitoring on all hosts in the MAIN folder in WATO, just set the Criticality to “Do not monitor”
as well as ours
And yes we needed to make workarounds, but we believe this would work for many out of the box k if we can create a “simulation clone/testing clone” funktion which considers some already mentioned prerequirments for this and would enable EASY testing for everyone.
Could be, but for us the main site does not contain any hosts, it’s done on the slave sites, and we have like 70 of them around the world.
So for us to be able to test we would need to create all 70 sites. We would also have to find to select what we want to test, running dual tests from both “test” and “production” would not make our network or the check agent happy
For us it’s also important to test automaton, and we would not do that If we would just restore a config from the main site.
Same goes for us with many sites and many of them with individual cases not to find on every side.
But this you would simulate against agent data and not actually triggering the agent/host it self. So Prod ist doing Prod and actually connecting to the Host/Device and Test is running against the latest copy of Prod data as written in the comments
#Host Simulation Data
All Datasources for each Host will be moved towards cat ~/var/check_mk/agent_output/$HOSTNAME$
Ensuring not to trigger the Agent, pulling data from the Host by two Sites (Prodcution & Simulation)
Same for us, we cant run a second time the agent parallel on the same host
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.