Supermicro IPMI hardware monitoring

Hello,

I wonder if anybody found a way to monitoring SEL log on supermicro hardware?
I want to use this to catch hardware issue like ram ecc error…

Found an old topic, but no responds…

Thanks!

It’s like you are reading my mind! I’m looking to do a very similar thing, and I wonder if you’ve run across some of the same pitfalls that I have.

  1. The old SuperMicro IPMI logs can’t output to anything but email, which would then be checked and imported into the Event Console for new events. But, that feels so hanky to do, but you have to since there’s no direct SNMP.

  2. The IPMI checks in Checkmk cover most things, but you’re right, they don’t get the ECC checks.

I’m fumbling around right now with trying to coerce ChatGPT to give me some code to properly parse it to find events from the past year or so, as something from 2017 isn’t likely to be actionable to me, but newer things might be, and also grep for mention of ecc in the event log message.

The other potential gotcha to such a script would be that you’d have to pass your IPMI password in clear text, so it would need to be something you don’t use for any other purpose.

Let me know if this helps inspire you somewhat. I do think that no matter how I do this script, the best way is to drop SMCIPMITool on the local server and then run it there. Depending on your number of SuperMicro servers (we have over 1,000), it would bring a monitoring server to its knees if it’s a server-side check. Also would help on this if your IPMI’s have a standard naming convention similar to your standard hostnames.

What you can do is - extend my Redfish agent with the option to fetch the SEL logs. They are available inside the Redfish tree. But be aware that this procedure is very time consuming and you should have a plan how to inspect at the first run what is the latest entry and then later only fetch new entries. This should be a valid way i think.

Sadly my stuff is mostly X9 Intel gear, so even too old for RedFish. :frowning:

I’ve actually written a check for that:

#!/bin/bash
#
# /usr/lib/check_mk_agent/local/600/check_ipmi_sel
# by Jay2k1 2023
#
# checks ipmi system event log (SEL)
#
# TO CLEAR IPMI EVENT LOG: ipmi-sel --clear
#
# TO ADD EXCLUDES:
# - get full event log: /usr/sbin/ipmi-sel  --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all
# - get the "type" column for the event you want to exclude, e.g. "Session Audit"
# - run "ipmi-sel -L" to get a list of machine readable event types and find your type there, e.g. "Session_Audit"
# - add that type to the exclude list below
exclude_types="Session_Audit,OEM_Reserved,OS_Boot,OS_Critical_Stop,Physical_Security"
#
# if you want to exclude a certain event text, not a type, add it to the "grep -v" in the ipmi-sel command
#
######################################################################################################

if [ ! -x /usr/sbin/ipmi-sel ]; then
    echo "3 'IPMI SEL' - /usr/sbin/ipmi-sel not found or not executable"
    exit 0
fi

count=0
critcount=0
warncount=0
lastmsg=""
selentries=""
while read line; do
        count=$((count+1))
        lastmsg=$(echo $line | sed -E 's#,# | #g')
        selentries="$selentries\n$lastmsg"
        lastmsg_short=$(echo $lastmsg | rev | cut -d '|' -f 1 | rev)

        if echo "$line" | grep -q ',Warning,'; then warncount=$((warncount+1))
        elif echo "$line" | grep -q ',Critical,'; then critcount=$((critcount+1))
        fi

done < <(/usr/sbin/ipmi-sel \
       --output-event-state \
       --interpret-oem-data \
       --entity-sensor-names \
       --sensor-types=all \
       --exclude-sensor-types="$exclude_types" \
       --comma-separated-output \
       --no-header-output | grep -v ',Nominal,')

if [ $critcount -gt 0 ]; then echo -n 2
elif [ $warncount -gt 0 ]; then echo -n 1
else echo -n 0
fi

echo -n " 'IPMI SEL' - $count events found ($critcount CRIT, $warncount WARN) - click for details"
[ $count -gt 0 ] && echo -n " (last msg: $lastmsg_short)"
echo "\n$selentries"

exit 0

Thanks Jay2k1.
This is a local only check, I will manage to use ipmitool in the same way.