About the packages on the exchange, here is a longer explanation of what happened: Checkmk Exchange: Technical Difficulties with Package Reviews
In short, for some time we could not review the packages and now the team is doing their best to catch up. But it is true – some updates could not be reviewed on time, unfortunately.
Hello Andreas! So, the current situation is that uploading packages work, and the system is fixed – we are just a little behind on the reviews. Just to make sure – have you also uploaded the new version to the exchange?
I made the things a little bit different but in the end it is the same
Also reworked the temperature check to use the normal check_temperature.
Can you please test if it is working as expected.
My system where i got the data from had no missing sensors.
My system where i got the data from had no missing sensors.
I’d only monitered fully loaded systems, too - until yesterday, when I noticed a lot of yellow services after discovery.
Not an issue anymore with your new version, I’ve tried a Tabula Rasa and I don’t see the Absent items any longer, and everything else is still there. Everything looks fine to me!
I’d like to get rid of the need to configure SNMP on the XCC boards altogether, but the “official” Lenovo plugins by Silvio Erdenberger use SNMP to poll health information, and while a lot of them have been made redundant by your excellent special agent, I’m wondering whether the missing pieces can be polled via Redfish as well, or are not even that useful:
CPUs
Disks
RAM DIMMs (we’ve got the temperature, likely this would tell is if a DIMM went critical)
PSU FRU / S/Ns
overall system health status
Most of the missing checks are only displaying FRU / S/N information and their only benefit is getting an inventory of all of your hardware servers without logging into each XCC GUI.
In terms of monitoring / alerting, if anything was wrong with them it would be enough to simply see a yellow or red “system health” service; one would certainly continue investigating (and checking the correct FRU for replacement) via manual XCC login / onsite? So probably not really necessary to add explicit CPU/RAM/Disk checks.
But is there a way to get the overall health status? Or how would we notice, via your XCC checks, whether a disk failed?