We have some Dell servers where we would like to monitor their hardware
status, including RAID.
Using recent Dell iDRAC service processors it's possible to read all
relevant information via SNMP, without having to install management
software on the server itself. This is especially interesting on servers
running "bare-metal"-like hypervisors.
I wonder if someone might be working on an SNMP-check for check_mk for
this kind of polling? - If not, I might give it a try.
Of course, there's a complication here: The SNMP agent to contact is at a
different address from the host being monitored, and how would that best
be configured, keeping WATO in mind? I'm thinking that one might use
tagging to indicate the organization's service processor addressing
convention, so that an address doesn't have to be configured for each
host. A convention could be that hosts with hostname H.DOMAIN have a
service processor at sp-H.DOMAIN; another could be H.sp.DOMAIN -- etc.
Umm…it’s IPMI, which has a terrible track record when it’s done over a network (google ipmi super micro), but assuming thats not an issue for you…
A classical check? and prefixing all your idracs with console- in dns, so your classical check could be something like
/mychecks/check_iDRAC -H console-$HOSTNAME$ -v 3 -u user -f /tmp/authfile --all
Then maybe a hosttag to say that a machine has an idrac so you can apply the rule at the main level. Not as integrated, but pretty straight forward. Other option swould be doing it with a mrpe check on the host you are monitoring and local host entry, or using OME to track the HW instead of check_mk.
···
On Mon, Jun 23, 2014 at 4:51 PM, Troels Arvin troels@arvin.dk wrote:
Hello,
We have some Dell servers where we would like to monitor their hardware
status, including RAID.
Using recent Dell iDRAC service processors it’s possible to read all
relevant information via SNMP, without having to install management
software on the server itself. This is especially interesting on servers
running “bare-metal”-like hypervisors.
There is already a Nagios plugin to check hardware that way:
If these system are using IPMI I strongly recommend you point a vulnerability scanner, nessus does the job nicely, at them and/or the subnet they’re on to help find the gaping security holes that may exist on your systems. I have a SAN which was affected by these IPMI vulnerabilities which I uncovered during testing and was able to remote power cycle the thing w/o providing any authentication.
Here’s a link to all kinds of fun info regarding this
On Mon, Jun 23, 2014 at 6:17 PM, Patrick Flaherty pflaherty@wsi.com wrote:
Umm…it’s IPMI, which has a terrible track record when it’s done over a network (google ipmi super micro), but assuming thats not an issue for you…
A classical check? and prefixing all your idracs with console- in dns, so your classical check could be something like
/mychecks/check_iDRAC -H console-$HOSTNAME$ -v 3 -u user -f /tmp/authfile --all
Then maybe a hosttag to say that a machine has an idrac so you can apply the rule at the main level. Not as integrated, but pretty straight forward. Other option swould be doing it with a mrpe check on the host you are monitoring and local host entry, or using OME to track the HW instead of check_mk.
A classical check? and prefixing all your idracs with console- in dns
[...]
Yes, I already have that running. But I believe that "agent-less"
hardware monitoring is going to be common enough to have an SNMP check
built into check_mk, preferably including inventorying, based on one or
more naming schemes which the user configures.
On Tue, Jun 24, 2014 at 1:47 AM, Troels Arvin <troels@arvin.dk> wrote:
Hello,
Patrick Flaherty wrote:
[...]
A classical check? and prefixing all your idracs with console- in dns
[...]
Yes, I already have that running. But I believe that "agent-less"
hardware monitoring is going to be common enough to have an SNMP check
built into check_mk, preferably including inventorying, based on one or
more naming schemes which the user configures.
is the Dell iDRAC using the same OIDs as the windows agents?
No
E.g., the "check_openmanage" Nagios-plugin doesn't work with the iDRAC.
And the check_iDRAC Nagios-plugin doesn't work with a classical Dell
OpenManage installation.
Another way to look at it, with a classical OpenManage installation, one
may fetch the global status at 1.3.6.1.4.1.674.10892.1.200.10.1.2.1,
corresponding to MIB-Dell-10892::systemStateGlobalSystemStatus.1.
However, such an OID doesn't exist in the iDRAC.
The iDRAC, however, exposes
IDRAC-MIB::globalSystemStatus.0==.1.3.6.1.4.1.674.10892.5.2.1.0
and
IDRAC-MIB::systemStateGlobalSystemStatus.1==
.1.3.6.1.4.1.674.10892.5.4.200.10.1.2.1
You can also get some hardware status info from Dells (and Suns)
using ipmitool:
ipmitool -I lanplus -H <idracIP or hostname> -U root -P <password> chassis status
The output for Dells looks something like:
System Power : on
Power Overload : false
Power Interlock : inactive
Main Power Fault : false
Power Control Fault : false
Power Restore Policy : previous
Last Power Event :
Chassis Intrusion : inactive
Front-Panel Lockout : inactive
Drive Fault : false
Cooling/Fan Fault : false
Sleep Button Disable : not allowed
Diag Button Disable : allowed
Reset Button Disable : not allowed
Power Button Disable : allowed
Sleep Button Disabled: false
Diag Button Disabled : true
Reset Button Disabled: false
Power Button Disabled: false
Not very detailed, but enough so that someone can login to the idrac
and check out the fan/ps/disk status, logs, and front panel msg.
···
----- Original Message -----
From: "Troels Arvin" <troels@arvin.dk>
To: checkmk-en@lists.mathias-kettner.de
Sent: Tuesday, June 24, 2014 9:48:27 AM
Subject: Re: [Check_mk (english)] Agent-less Dell hardware monitoring
Hello,
Andreas Döhler wrote:
is the Dell iDRAC using the same OIDs as the windows agents?
No
E.g., the "check_openmanage" Nagios-plugin doesn't work with the iDRAC.
And the check_iDRAC Nagios-plugin doesn't work with a classical Dell
OpenManage installation.
Another way to look at it, with a classical OpenManage installation, one
may fetch the global status at 1.3.6.1.4.1.674.10892.1.200.10.1.2.1,
corresponding to MIB-Dell-10892::systemStateGlobalSystemStatus.1.
However, such an OID doesn't exist in the iDRAC.
The iDRAC, however, exposes
IDRAC-MIB::globalSystemStatus.0==.1.3.6.1.4.1.674.10892.5.2.1.0
and
IDRAC-MIB::systemStateGlobalSystemStatus.1==
.1.3.6.1.4.1.674.10892.5.4.200.10.1.2.1
Ok that’s not so nice
Then you have to write all plugins from the scratch. I was hopping that you can reuse the check_om_xxxx checks from the cmk checks.