[Check_mk (english)] problem with host state

Yesterday I upgraded the OS on my main monitoring box from Fedora 15 to Scientific Linux 6.3. I have OMD 0.56 installed and preserved the filesystem where it was before the reinstall. After the install of SL6.3 I have everything working on the system and check_mk with one exception. This server runs as a main server that checks 6 other OMD installs who in turn check all the systems on each subnet. The checks for the six servers are working but the host state for each server is down.

I can force a check via the web browser and it returns the services as ok. I can run "cmk -v <host>" and it returns all service states just fine. But I still have these six other OMD servers showing up as down hosts. I'm pretty sure the livestatus stuff is working. Through the web pages of the main server I can force a check on a system monitored by one of the other servers and that works.

In the status for each host I see:
Output of host check plugin (Return code of 126 is out of bounds - plugin may be missing)

Which plugin would it be referring to here?

I've tried everything I can think of, checked the firewall, disabled the firewall. Both ping and nmap from the main server to the other servers work just fine. I can telnet to port 6556 on the other servers and client systems, no problem.

···

--
Stephen Berg
Systems Administrator
NRL Code: 7320
Office: 228-688-5738
stephen.berg.ctr@nrlssc.navy.mil

Stephen,

I think this is more of a Nagios setup issue than a CheckMK issue, but since you’re using OMD they are sort of tied together… Anyway, let’s check this out on the command line instead of through a web interface. Everything below is assuming you have not changed the default command used to check hosts…

Check this file → /omd/sites//etc/nagios/conf.d/check_mk_objects.cfg

In that file you should find a Nagios definition stanza for your host. It will start off like this:

define host {

There will be a line in the definition that probably says “use check_mk_host”

That is defined in the file “/omd/sites//etc/nagios/conf.d/check_mk_templates.cfg”.

Check that file and you’ll find that in the definition for check_mk_host it uses “check_mk_default”, which in turn is defined to use the command “check-mk-ping”.

The command definition for “check-mk-ping” is $USER4$/lib/nagios/plugins/check_icmp $ARG1$ $HOSTADDRESS$.

Running the find command shows us the location of the file:

]# find /omd/ -name check_icmp

/omd/versions/0.56/lib/nagios/plugins/check_icmp

Let’s check the permissions of the file:

ll /omd/versions/0.56/lib/nagios/plugins/check_icmp

-rwsr-x—. 1 root omd 59658 Sep 27 20:43 /omd/versions/0.56/lib/nagios/plugins/check_icmp

Be sure to note that the file has the ‘suid’ bit set and has octal permissions of 4750.

Let me know if the permissions of your file do not match above or if you find anything else weird.

···

On Thu, Dec 27, 2012 at 6:44 AM, Stephen Berg (Contractor) stephen.berg.ctr@nrlssc.navy.mil wrote:

Yesterday I upgraded the OS on my main monitoring box from Fedora 15 to Scientific Linux 6.3. I have OMD 0.56 installed and preserved the filesystem where it was before the reinstall. After the install of SL6.3 I have everything working on the system and check_mk with one exception. This server runs as a main server that checks 6 other OMD installs who in turn check all the systems on each subnet. The checks for the six servers are working but the host state for each server is down.

I can force a check via the web browser and it returns the services as ok. I can run “cmk -v ” and it returns all service states just fine. But I still have these six other OMD servers showing up as down hosts. I’m pretty sure the livestatus stuff is working. Through the web pages of the main server I can force a check on a system monitored by one of the other servers and that works.

In the status for each host I see:

Output of host check plugin (Return code of 126 is out of bounds - plugin may be missing)

Which plugin would it be referring to here?

I’ve tried everything I can think of, checked the firewall, disabled the firewall. Both ping and nmap from the main server to the other servers work just fine. I can telnet to port 6556 on the other servers and client systems, no problem.

Stephen Berg

Systems Administrator

NRL Code: 7320

Office: 228-688-5738

stephen.berg.ctr@nrlssc.navy.mil


checkmk-en mailing list

checkmk-en@lists.mathias-kettner.de

http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en


.˙. Learn more about Dale Stubblefield at www.dalestubblefield.com .˙.

That was the problem, kind of. I installed the Nov 14 nightly rpm and updated the site and the problem went away. I compared both versions of check_icmp and it apparently was the group ownership that got futzed up when I went from Fedora to SciLinux. After the update to 0.57.2012114 the group owner is now correct.

···

On 12/27/2012 07:19 AM, Dale Stubblefield wrote:

Let's check the permissions of the file:
# ll /omd/versions/0.56/lib/nagios/plugins/check_icmp
-rwsr-x---. 1 root omd 59658 Sep 27 20:43 /omd/versions/0.56/lib/nagios/plugins/check_icmp

Be sure to note that the file has the 'suid' bit set and has octal permissions of 4750.

Let me know if the permissions of your file do not match above or if you find anything else weird.

--
Stephen Berg
Systems Administrator
NRL Code: 7320
Office: 228-688-5738
stephen.berg.ctr@nrlssc.navy.mil