Opinion Needed: Integrate CheckMK And Powercli For Monitoring Linux Appliances

Good Day All,

Happy Holidays!!! I have an issue that I like to solve by leveraging CheckMK and Powercli if possible. Here is my situation. I have to manage a VMware environment at a state run hospital. We have Linux VM appliances given to us by different vendors to run. Recently - one of our Linux VM (ghxvm1) - stop working at the application layer. The issue was the root (/) filesystem ran out of space. I would like to monitor our production Linux OVA appliance servers but here are restrictions:

  1. No Agent Can Be installed on each OVA server
  2. SSH has been blocked between servers on the network by the networking team
  3. No SNMP has been turned on each Linux production OVA VM.

Thus I would like to leverage PowerCLI with CheckMK. I would like to use the Invoke-VMScript command to have a single PowerCLI script, tell the Linux production VM to execute a command within the guest OS. That command would export the CPU, memory, and root space usage information in a text file on the docker server that can be interrogated by the CheckMK to display and then alerted above a certain threshold.

Here is my setup now:

Server: RHEL 9 VM running the pre-made CheckMK docker container.

  1. On the RHEL 9VM host server, I installed PowerCLI running on Linux.
  2. The RHEL 9 VM would execute a PowerCLI ps1 script that connects to a vCenter to each Linux VM, execute a Invoke-VMscript command to have the local Linux guest OS of each Linux OVA server, to gather its current CPU, Memory, and / disk usage.
  3. The same PowerCLI script, (with its information from each OVA server) would create and write a text file for each server, to the RHEL 9 VM local docker server in a folder containing generated text files with CPU, memory, and root disk usage information.
  4. Within the CheckMK docker container - setup hosts in CheckMK to open text file and create CheckMK services that can be monitored and alerted.

That’s it

I would like to get opinions and suggestions for this proposed setup or if there is a better way to go about monitoring Linux OVA appliances that really can’t be modified.

Thanks
Steve

I must say, your approach is highly creative,but I am missing both facts and arguments before even considering such an approach:

No Agent Can Be installed on each OVA server

  • Are the/your vendors of a/the Appliances involved in this process?
  • Do they offer an API of their own?
  • “can not be installed” or is not allowed by IT-rules?

SSH has been blocked between servers on the network by the networking team

  • Has this issue/monitoring demand even been discussed with the networking and/or Security officer?
  • if SSH would be an option, then network team would only have to allow SSH from the monitoring server, however you will still need to gather the information in a (by CMK) readable/parse-able way.

No SNMP has been turned on each Linux production OVA VM

  • Even if it were turned on i would assume that network team would block, so a network change would be needed.

  • Glowsome

I might be late to the party… You can install just the agent script within the containers/VMs. Then use a cronjob to run the agent script once per minute and dump it to a text file on some network share that is accessible be other hosts, either the “parent” or the Checkmk server.

Then you use the rule “Individual program call instead of agent access” to read these files from the Checkmk server. This rule can run an arbitrary command, so there are no limits to your creativity - even if a few hops are involved.

Just two notes: While you can get creative transferring this agent output to the Checkmk server, keep in mind that read access to file that is just being written will result in corrupted output. So you might want to write temporary files and move them later. Please also note that the agent output might be different compared to the regular Linux agent depending on the environment. So you might also want to set some environment variables.

Mschlenker - I actually did a setup, similar to your suggestion, at my home lab to get CPUReady values by dumping to a text file on a share and using CheckMK to read the file after a cron job. Another idea is too install Powercli on the CheckMK server and call Powercli commands, via the Invoke-VMScript cmdlet, to grab the data. That may put a strain on the vCenter but it should work as well :grinning: