Lenovo Xclarity plugin

Sara · October 7, 2022, 2:31pm

Hello Matjaž!

About the packages on the exchange, here is a longer explanation of what happened: Checkmk Exchange: Technical Difficulties with Package Reviews
In short, for some time we could not review the packages and now the team is doing their best to catch up. But it is true – some updates could not be reviewed on time, unfortunately.

Sara · October 7, 2022, 2:33pm

Hello Andreas! So, the current situation is that uploading packages work, and the system is fixed – we are just a little behind on the reviews. Just to make sure – have you also uploaded the new version to the exchange?

andreas-doehler · October 7, 2022, 2:34pm

At the moment i have 3 pending updates (added in august) and 1 newly added version today.

Sara · October 7, 2022, 2:36pm

Got it, thanks. I will talk to the reviewers on Monday to check with them on the status, this information really helps.

Sara · October 18, 2022, 2:53pm

Hi @andreas-doehler! Last week it seems all was resolved with package reviews. Is everything ok with your updated versions?

andreas-doehler · October 18, 2022, 3:07pm

Hi @Sara it looks ok now.

bitwiz · October 22, 2022, 6:36pm

Should “Absent” items not be skipped on discovery?
(or rather, should there be a rule which “status” values are valid for discovery?)

Quick diff:

--- local/lib/check_mk/base/plugins/agent_based/utils/lenovo_xclarity.py        2022-10-22 20:54:37.189527505 +0200
+++ local/lib/check_mk/base/plugins/agent_based/utils/lenovo_xclarity.py.NEW    2022-10-22 20:53:03.747459228 +0200
@@ -34,4 +34,8 @@
 
 def discovery_lenovo_xclarity_multiple(section) -> DiscoveryResult:
     for item in section:
-        yield Service(item=item)
+        data = section.get(item)
+        state = data.get("Status", {"State": "Unknown"}).get("State", "Unknown")
+        if state != "Absent":
+            yield Service(item=item)
+

andreas-doehler · October 22, 2022, 8:44pm

I made the things a little bit different but in the end it is the same
Also reworked the temperature check to use the normal check_temperature.
Can you please test if it is working as expected.

My system where i got the data from had no missing sensors.

bitwiz · October 22, 2022, 9:46pm

My system where i got the data from had no missing sensors.

I’d only monitered fully loaded systems, too - until yesterday, when I noticed a lot of yellow services after discovery.

Not an issue anymore with your new version, I’ve tried a Tabula Rasa and I don’t see the Absent items any longer, and everything else is still there. Everything looks fine to me!

I’d like to get rid of the need to configure SNMP on the XCC boards altogether, but the “official” Lenovo plugins by Silvio Erdenberger use SNMP to poll health information, and while a lot of them have been made redundant by your excellent special agent, I’m wondering whether the missing pieces can be polled via Redfish as well, or are not even that useful:

CPUs
Disks
RAM DIMMs (we’ve got the temperature, likely this would tell is if a DIMM went critical)
PSU FRU / S/Ns
overall system health status

Most of the missing checks are only displaying FRU / S/N information and their only benefit is getting an inventory of all of your hardware servers without logging into each XCC GUI.

In terms of monitoring / alerting, if anything was wrong with them it would be enough to simply see a yellow or red “system health” service; one would certainly continue investigating (and checking the correct FRU for replacement) via manual XCC login / onsite? So probably not really necessary to add explicit CPU/RAM/Disk checks.

But is there a way to get the overall health status? Or how would we notice, via your XCC checks, whether a disk failed?

bitwiz · October 31, 2022, 10:51am

In case of error, the agent nevertheless exits with 0? This leads to Checkmk assuming that everything went well, when it didn’t.

OMD[MYSITE]:~$ /omd/sites/MYSITE/local/share/check_mk/agents/special/agent_lenovo_xclarity -u ‘USERID’ -p ‘MYPASSWORD’ -i ‘172.16.16.16’
Please check the username, password, IP is correct
Error: 0OMD[MYSITE]:~$ echo $?
0