Remove VMs (hosts) that are powered off

I have a vCenter server configured and we have VMs that are powered off. These VMs are detected as services with “power state: poweredOff” and Status WARN.

Naturally i don’t want to create hosts from these powered off VMS as that would be silly. However, i can not find any way to do that. DCD does not give me any options to do this. Nor does CheckMK give me any other options to solve this problem.

I tried:

Finding a way to not create these hosts with dcd
Finding a way to disable services with “power state: poweredOff”
Finding a way to move them into a separate folder
Finding a way to move them in a specific host group
Finding a way to delete them

At least not in WATO

Can anyone give me an idea how to solve this. I can not be the only one that has powered off VMs in vcenter that don’t need monitoring.

Hi and welcome to the Forum,

This is my interpretation of your case:

I think your approach is wrong, as in monitoring it is wishful (if not mandatory) to have all hosts in the monitoring-solution, as it will never produce a ‘false negative’ if a host/VM that should be up is down and mistakenly ignored.

From my logic thinking:

  • Do not try to not create hosts , all detected hosts should be in your monitoring (up or down)
  • Do not disable services with the “powered off state”, if a valid up host is down you will miss it
  • Do not delete them, as they will be re-discovered, so this is a waste of time.

Instead you (and colleagues) should mark those hosts/VMS who are on purpose/intended Down in CMK with ‘Schedule Downtime’.
Via this way you exclude valid down hosts (or services, as you can schedule downtime on services as well as on complete hosts) from the checks, but still monitor the rest without issues.

Please see https://docs.checkmk.com/latest/en/basics_downtimes.html?lquery=schedule%20down for how to exactly achieve this.

  • Glowsome
2 Likes

I’m with @Glowsome how i handle this situation in my systems.

But if you really want to do what you want, you can take a look at this.

and this example scripts how to use it.

Instead of setting labels you can use this also to disable the monitoring on automatically created hosts (change criticality).
Or you can find services with some specific text and then create rules to disable these services. All this should be possible with API implementation from @r.sander.

1 Like

Our latest addition is “auto_ack”, an alert handler that automatically acknowledges the problem as soon as it occurs:

We tag or label systems with an auto_ack:yes and the alert handler only gets active on them. Combined with “maximum number of check attempts” this also avoids alarms and unhandled problems.

1 Like

Thank you all for your replies. I do get your points.

We have some Hosts that run exclusively VMs for Consultants or development that get started and stopped by Users themselves if they need them. We simply can not manage to mark those as this happens way too often every day.

I think it would be best to just monitor the hosts and not the VMs in those cases.

Other hosts run business critical VMs and those need to be monitored accordingly.

Cheers

Just my opinion on your response…

If this is all about Development - and/or Consultant VM’s who go on their own business, it would sound fair to give them the means to schedule down-times in CheckMK for those machines.

After an explanation/How-to -session to them as to how to handle this it would put the responsibility of making sure they are not triggering an unwanted alert in your monitoring system in their own hands.
This while - from a monitoring perspective still have a full oversight of all hosts.

Sometimes a solution is not only technical, but also organizational, as the symptoms are technical, but the solution lies in good conduct by the users.

  • Glowsome

I agreee with you, however the nature of an engineer or developer is do do as little overhead as possible and even the smallest form of documentation or responsibility outside of their project throws them into a hissy fit.

Not to mention that they will forget and we will have to contact them for the next 6-12 months to remind them to do this.

As Head of IT I could conspire with the ISB and make this part of the company guidelines :wink:

Have a nice day.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.