Best Setup to test environment and apply to prod

Hello,

I want to achieve a setup where I have two environments: one for test and one for prod. I think I could use the sites for that, if I don’t misuse the site-concept for my case.
I configure everything beforehand on the “test”-site, such as creating hosts, tags, labels, service-configurations, alerts etc. pp.
When everything works as intended (monitoring itself, alerts, etc. pp) I want to transport the site “test” to “prod”? For example via omd cp test prod.

Is there something wrong in my plan? Any problems which could occur? Or are there better ways / built in ways in accomplishing what I want to achieve (Test first then transport to Prod)?

Greetings,

Julien K.

Hello,

We use a three tier environment: Test → PreProd → Prod. Each environment is independent frm each other and a distributed setup with a central and some remote sites.

Basically we work with “monitoring templates”. These are sets of rules defining the monitoring for a specific technology like e.g Checkpoint Firewalls.
Everything is managed ITIL conform with change requests. The owner of each specific technology needs to send me a change request, then we add some test hosts to the test environment and work with the owner of the technology on what needs to be monitored and with what levels etc.
If that is finalized I write in the change request a procedure what has to be setup in checkmk and forward this to the Monitoring run team. They will then try to apply the changes in PreProd, come back to me in case of questions etc., and if all works well they will implement it in the prod environment. As soon as this is done we just add hosts to the prod system and the run team apply the according template(s). In our case the creation is done by a request automatically from CMDB data.

I hope that gives some ideas how to limit the risk of changes

BR

MF

Thank you very much for your response and for your ideas.

I think a 3-staged-environment will be too much for us, as I am the only one configuring Checkmk. I will stick to Test and Prod. So you do not “copy” a whole site, but apply the changes manually from stage to stage? Noting down the Changes done on Test-Environment is a good idea before adding them up later on in Prod-Environment. How do you do that? Any tool-assist to not overlook some changes? For example if you change a configfile on test-site, do you automatically save the change somewhere so it is not forgot when you want to apply it on preprod and then prod?

Basically we work with “monitoring templates”. These are sets of rules defining the monitoring for a specific technology like e.g Checkpoint Firewalls.

So the “monitoring templates” are basically all changes occured by configurations via checkmk-frontend and manually changed configfiles on the monitoring-host in the specific site?

Yes, a different team is doing this.

Basically I write that to the change request. That way its documented and requester can review. If its complicated or longer context I write it first in OneNote and then add a PDF to the change request. Also in case we do a dev we add the MKP to the change request. This way we have all content in one place and see the history

The requester has to validate the change before it goes to prod. Normally we dont forget things but could happen.

No, its not automatically transported. As any change needs a change request, normally we do not forget things.

No, the template describe the monitoring for a specific technology. E.g. we have templates for Windows, Linux, AIX and in addition we have templates for e.g. Databases like Oracle, DB2. MSSQL or even for specific applications like Active Directory Domain Controllers.

This way Monitoring run team configure the host object:

Such a template may change over time but every time a change request is needed for each

So you want to override your prod instance with your test? That does not sound like a good way and is generally not a recommended way.

We, just like @mike1098 have 3 environments, Dev, Stage and Prod (Same purpose I think)

The “Dev” environment we in monitoring are the only ones having access, this should run the latest stable Checkmk. “Stage” should be a copy of prod (or when in stage verify should be = Dev) and “Prod” is whatever we released.

Stage and Prod are connected to CMDB and other things for automation and just like mike our internal users, lets say database team verify their changes in Stage. We have quite a lot of integrations towards git so we pull down changes in dev for internal test, push them to stage for verification and last push to prod.

We use different hardware for all this and have different amounts of sites in distributed monitoring for all three. Dev is MWP just for our testing, Stage is MWP for testing with our teams.
We run VMs for dev/stage.

On top of that we have a fourth CI/CD pipeline where we build a complete Checkmk setup with Ansible every time there is a new Checkmk release, but that’s another story :slight_smile:

Why is that not a good way? omd cp test prod would copy everything over. Is it better to change manually everything on prod-env which I changed beforehand on test-env?

You will no longer have access to your Metrics you collected in prod.

1 Like

We use test also for developing new plugins and even testing a lot of things. Some times we tinker around with something to see if it works. So the config is some how left in a state which is not for a productive environment.
Its a better approach to ‘cp prod test’ in order to have a freshup of your real config in your test system to be able to detect if any change is conflicting with your productive config.

So you never ever change your prod? You never add a new host?

If you do
cp prod test
… stuff
cp test prod

but then all your notifications will go twice, as an example

I don’t agree that any methods are good, they all have their flaws. Let’s say you have 1000 hosts and now you copy the whole setup to a “test” one and all your servers will be contacted twice, you get twice the amount of notifications.

Even with distributed monitoring this will break

Basically it was only an reply to this:

So I assumed that in Julien´s case it works.

We had this approach in nagios 3 managed with git and here we really contacted the hosts from three monitoring servers. Develop, incubator and master.

In our new environment with checkmk only master is connecting the monitored hosts. In Test we just add hosts on demand when we need this to develop new templates or plugins. In general its a burden because we need to request opening firewall ports from security.

Only our master environment is sending notifications to our umbrella monitoring systems where they are handled by our pilots. Test and preprod allow individual mail notification but there is no global rule.

hello, what exactly do you mean by “templates” or “rule templates”? is it a feature within checkmk? maybe give me an example!

A template is a set of rules defining an environment.
E.g. we have a Template for Linux Servers where we define certain things:

  • Checks we dont want (disabled checks)
  • Levels for CPU, Load, Filesystem, uptime etc.
  • Monitoring of defuntc processes
  • Monitoring of general processes like virus protection, inventory for CMDB etc.

The condition in the rules is using a tag like linux for above.
For Windows we maintain a tag for each Version and an AUX tag WINDOWS.

Templates exist for various OSs, DataBases, SAP Application Servers, AD Domain Controllers etc.

image

Or
image

I hope that clarifies