Lenovo Xclarity plugin

Hello Matjaž!

About the packages on the exchange, here is a longer explanation of what happened: Checkmk Exchange: Technical Difficulties with Package Reviews
In short, for some time we could not review the packages and now the team is doing their best to catch up. But it is true – some updates could not be reviewed on time, unfortunately.

Hello Andreas! So, the current situation is that uploading packages work, and the system is fixed – we are just a little behind on the reviews. Just to make sure – have you also uploaded the new version to the exchange?

At the moment i have 3 pending updates (added in august) and 1 newly added version today.

Got it, thanks. I will talk to the reviewers on Monday to check with them on the status, this information really helps.

Hi @andreas-doehler! Last week it seems all was resolved with package reviews. Is everything ok with your updated versions?

Hi @Sara it looks ok now.

1 Like


Should “Absent” items not be skipped on discovery?
(or rather, should there be a rule which “status” values are valid for discovery?)

Quick diff:

--- local/lib/check_mk/base/plugins/agent_based/utils/lenovo_xclarity.py        2022-10-22 20:54:37.189527505 +0200
+++ local/lib/check_mk/base/plugins/agent_based/utils/lenovo_xclarity.py.NEW    2022-10-22 20:53:03.747459228 +0200
@@ -34,4 +34,8 @@
 
 def discovery_lenovo_xclarity_multiple(section) -> DiscoveryResult:
     for item in section:
-        yield Service(item=item)
+        data = section.get(item)
+        state = data.get("Status", {"State": "Unknown"}).get("State", "Unknown")
+        if state != "Absent":
+            yield Service(item=item)
+

I made the things a little bit different but in the end it is the same :slight_smile:
Also reworked the temperature check to use the normal check_temperature.
Can you please test if it is working as expected.

My system where i got the data from had no missing sensors.

1 Like

My system where i got the data from had no missing sensors.

I’d only monitered fully loaded systems, too - until yesterday, when I noticed a lot of yellow services after discovery.

Not an issue anymore with your new version, I’ve tried a Tabula Rasa and I don’t see the Absent items any longer, and everything else is still there. Everything looks fine to me!

I’d like to get rid of the need to configure SNMP on the XCC boards altogether, but the “official” Lenovo plugins by Silvio Erdenberger use SNMP to poll health information, and while a lot of them have been made redundant by your excellent special agent, I’m wondering whether the missing pieces can be polled via Redfish as well, or are not even that useful:

  • CPUs
  • Disks
  • RAM DIMMs (we’ve got the temperature, likely this would tell is if a DIMM went critical)
  • PSU FRU / S/Ns
  • overall system health status

Most of the missing checks are only displaying FRU / S/N information and their only benefit is getting an inventory of all of your hardware servers without logging into each XCC GUI.

In terms of monitoring / alerting, if anything was wrong with them it would be enough to simply see a yellow or red “system health” service; one would certainly continue investigating (and checking the correct FRU for replacement) via manual XCC login / onsite? So probably not really necessary to add explicit CPU/RAM/Disk checks.

But is there a way to get the overall health status? Or how would we notice, via your XCC checks, whether a disk failed?

In case of error, the agent nevertheless exits with 0? This leads to Checkmk assuming that everything went well, when it didn’t.

OMD[MYSITE]:~$ /omd/sites/MYSITE/local/share/check_mk/agents/special/agent_lenovo_xclarity -u ‘USERID’ -p ‘MYPASSWORD’ -i ‘172.16.16.16’
Please check the username, password, IP is correct
Error: 0OMD[MYSITE]:~$ echo $?
0