Some LXC rule hints

Posting here to share some changes I’ve made for monitoring LXC containers.

First off, I added a new tag to differentiate the types of host - currently I have bare metal, VM, LXC, PVE (hypervisor node) and (network) switch.

While Check_MK is smart enough to NOT report load, there are a couple of other things it looks for which are not really appropriate.

Memory page tables
Like load, the size of the page tables comes from the hypervisor rather than the LXC host. Check_MK then calculates a percentage based on the memory assigned to the LXC. So the metric it uses is artifically inflated.

I hve a rule in Service Monitoring → Operating System Resources → Memory and Swap usage on Linux setting the warning and critical thresholds at 30, 40% where host type is lxc

systemd-journald-audit.socket
My Turnkey Linux LXCs on Proxmox don’t have these.

I have a rule in Service Monitoring → Applications, Processes & Services → Systemd Socket summary excluding units matching .*audit.* where the Host type is lxc.

Hey!
Thank’s so much for sharing this. I found a lot of pages about “the mystery of Linux memory management” but they all did not seem to fit. Your explanation does make much more sense!

Do you know, by chance, if the “miscalculation” (size of page table as a ratio of container assigned memory) also applies to shared memory?

I have the following warning / data:

total virtual memory: 18.21% - 93.2 MiB of 512 MiB
Shared memory: 73.46% - 376 MiB of 512 MiB RAM (warn/crit at 70.00%/75.00% used)WARN
RAM: 18.21% - 93.2 MiB of 512 MiB
Commit Limit: -6162.16% - -30.8 GiB of 512 MiB virtual memory
Page tables: 7.30% - 37.4 MiB of 512 MiB RAM
Disk Writeback: 0% - 0 B of 512 MiB RAM
RAM available: 8.34% free - 42.7 MiB of 512 MiB
Hardware Corrupted: 0% - 0 B of 512 MiB RAM

any recommendation for a “mitigation rule”?

Thanks,
Soc

On reflection, I think the mitigation I applied to general memory usage is not really appropriate / I should simply disable this for LXCs and monitor the metric in the hypervisor instead.

A quick read over the following dicussion suggests that it may be possible to get per-container shared memory if you feel like writing your own agent:

thanks for the pointer :slightly_smiling_face:

For the time being writing my own agent is out of scope. I disabled the shared memory check for LXC container in general as the metric is bogus.