[Check_mk (english)] logging of execution of check_mk and ssh command

I’m checking 2 servers which are behind a firewall, so I use portforwarding to send the ssh requests to the servers. Unfortunately it happens like 2-10 times per hour that I get a notification, that ssh failed. But most of the time it is working, so I don’t think it is a configuration error, I’m also checking some other Servers which are local, and with them I don’t have any problems at all. The error I get back is 255, which is just a general ssh error. So I wanted to ask if there is a possibility to convince check_mk to log to a file (or send with the notification mail), the whole output of the check? Including the output of the ssh command? I wasn’t able to find anything in the documentation therefore…

The error I get is:

CRIT - Programm ssh -l root -p 1023 -i /etc/check_mk/check_mk.key example.com exited with code 255, execution time 15.1 sec

And a few minutes later again:

OK - Agent version 1.1.10, execution time 7.3 sec

Thanks in advance J

ronney

Hi,

iirc an error 255 means that the ssh connection failed, so you
would see that no log file is generated when it fails - because your
datasource program is not able to setup the ssh connection at the given
time.

You could expand your datasource_program to handle the return code so that
it returns an "UNKNOWN"?
If you have many reconnects you might also want to switch to the caching
agent.

Regards,
Florian

···

On Wed, 22 Jun 2011 13:18:21 +0200, "Ronney Meier Rorotec GmbH" <rm.meier@rorotec.ch> wrote:

I'm checking 2 servers which are behind a firewall, so I use
portforwarding to send the ssh requests to the servers. Unfortunately it
happens like 2-10 times per hour that I get a notification, that ssh
failed. But most of the time it is working, so I don't think it is a
configuration error, I'm also checking some other Servers which are
local, and with them I don't have any problems at all. The error I get
back is 255, which is just a general ssh error. So I wanted to ask if
there is a possibility to convince check_mk to log to a file (or send
with the notification mail), the whole output of the check? Including
the output of the ssh command? I wasn't able to find anything in the
documentation therefore...

The error I get is:

CRIT - Programm ssh -l root -p 1023 -i /etc/check_mk/check_mk.key
example.com exited with code 255, execution time 15.1 sec

And a few minutes later again:

OK - Agent version 1.1.10, execution time 7.3 sec

Thanks in advance J

ronney

--
Mathias Kettner GmbH | \/ | |/ / M A T H I A S K E T T N E R
Florian Heigl | |\/| | ' /
Steinstr. 44 | | | | . \ Linux Beratung & Schulung
81667 München |_| |_|_|\_\ http://mathias-kettner.de
Tel.: 089 / 1890 4210
Fax.: 089 / 1890 4211 Mail: fh@mathias-kettner.de

Hi

You were completly right about the ssh connection failing, there is no log at all :-(.

I just wasn't able to find out what the caching agent is. Could you maybe give me a hint where I could find it?

···

Hi,

iirc an error 255 means that the ssh connection failed, so you would see that
no log file is generated when it fails - because your datasource program is not
able to setup the ssh connection at the given time.

You could expand your datasource_program to handle the return code so
that it returns an "UNKNOWN"?
If you have many reconnects you might also want to switch to the caching
agent.

Regards,
Florian

On Wed, 22 Jun 2011 13:18:21 +0200, "Ronney Meier Rorotec GmbH" > <rm.meier@rorotec.ch> wrote:
> I'm checking 2 servers which are behind a firewall, so I use
> portforwarding to send the ssh requests to the servers. Unfortunately
> it happens like 2-10 times per hour that I get a notification, that
> ssh failed. But most of the time it is working, so I don't think it is
> a configuration error, I'm also checking some other Servers which are
> local, and with them I don't have any problems at all. The error I get
> back is 255, which is just a general ssh error. So I wanted to ask if
> there is a possibility to convince check_mk to log to a file (or send
> with the notification mail), the whole output of the check? Including
> the output of the ssh command? I wasn't able to find anything in the
> documentation therefore...
>
>
>
> The error I get is:
>
> CRIT - Programm ssh -l root -p 1023 -i /etc/check_mk/check_mk.key
> example.com exited with code 255, execution time 15.1 sec
>
>
>
> And a few minutes later again:
>
> OK - Agent version 1.1.10, execution time 7.3 sec
>
>
>
> Thanks in advance J
>
> ronney

--
Mathias Kettner GmbH | \/ | |/ / M A T H I A S K E T T N E R
Florian Heigl | |\/| | ' /
Steinstr. 44 | | | | . \ Linux Beratung & Schulung
81667 München |_| |_|_|\_\ http://mathias-kettner.de
Tel.: 089 / 1890 4210
Fax.: 089 / 1890 4211 Mail: fh@mathias-kettner.de

Hi Ronney,

You were completly right about the ssh connection failing, there is no

log

at all :-(.

ok you need to debug that first - that's an issue outside of check_mk.
Of course you could start with a legacy check that monitors ping & ssh to
the monitored host :>

I just wasn't able to find out what the caching agent is. Could you

maybe

give me a hint where I could find it?

About the caching agent:
It's a agent that doesn't do all commands each time it runs, instead it
first checks if the
results are still fresh enough and then just sends you the cached data
from its last run.
(I didn't yet use it myself since I just didnt need it)
There should be a agent.tar.gz or a agents directory on your nagios
server.
Look either in /usr/share/check_mk or in
/opt/omd/versions/default/share/check_mk
for a directory called "agents".

In there you should find check_mk_agent.linux.

As said before, it cannot fix your SSH issue, it can just ease the pain.

Florian

···

On Mon, 27 Jun 2011 14:22:08 +0200, "Ronney Meier Rorotec GmbH" <rm.meier@rorotec.ch> wrote:

--
Mathias Kettner GmbH | \/ | |/ / M A T H I A S K E T T N E R
Florian Heigl | |\/| | ' /
Steinstr. 44 | | | | . \ Linux Beratung & Schulung
81667 München |_| |_|_|\_\ http://mathias-kettner.de
Tel.: 089 / 1890 4210
Fax.: 089 / 1890 4211 Mail: fh@mathias-kettner.de

Hi mathias

> You were completly right about the ssh connection failing, there is no
log
> at all :-(.

ok you need to debug that first - that's an issue outside of check_mk.
Of course you could start with a legacy check that monitors ping & ssh to the
monitored host :>

Well ping wouldn't work, the gateway is configured not to respond on pings, but the host availability checks are made over check_tcp and this one never fails. It's only the ssh connection which fails from time to time. Even I never had problems with manual ssh connections. I'm running a script since 4 hours which executes the datasource_program every minute but I didn't get any error so far.
As far as I can guess it's propably a problem with the router there, which I can not influence, so I'm just trying to ignore the error ;-).
As far as I understand the caching agent, using it will result in the same problem, it will just query the hosts less frequent, but if this query fails I will have the same problem again...

You could expand your datasource_program to handle the return code so
that it returns an "UNKNOWN"?

I tried to do that with a datasource_program entry of
ssh -l root -p 1022 -i /etc/check_mk/check_mk.key <IP> ; exit 3

which doesn't work, after looking at get_agent_info_program in chech_mk_base.py it looks like it will raise an error as soon as the exit code is different from 0.
Is there another way to do it, or did I misunderstood you completely? :wink:

Thanks already
Ronney

···

> I just wasn't able to find out what the caching agent is. Could you
maybe
> give me a hint where I could find it?

About the caching agent:
It's a agent that doesn't do all commands each time it runs, instead it first
checks if the results are still fresh enough and then just sends you the cached
data from its last run.
(I didn't yet use it myself since I just didnt need it) There should be a
agent.tar.gz or a agents directory on your nagios server.
Look either in /usr/share/check_mk or in
/opt/omd/versions/default/share/check_mk
for a directory called "agents".

In there you should find check_mk_agent.linux.

As said before, it cannot fix your SSH issue, it can just ease the pain.

Florian

--
Mathias Kettner GmbH | \/ | |/ / M A T H I A S K E T T N E R
Florian Heigl | |\/| | ' /
Steinstr. 44 | | | | . \ Linux Beratung & Schulung
81667 München |_| |_|_|\_\ http://mathias-kettner.de
Tel.: 089 / 1890 4210
Fax.: 089 / 1890 4211 Mail: fh@mathias-kettner.de

`Hi,

As explained in a previous mail to Ronney, i solved datasource programs
issues this way :

/etc/check_mk/main.mk :

ssh tag to use ssh for monitoring with Check_MK

all_hosts = [

“myhost|tcp|ubuntu|linux|dmz|ssh” ,

]

datasource_programs = [

( “ssh -p 119 -l root check_mk_agent”, [‘ssh’], ALL_HOSTS ),

you can also use :

( “ssh -l root check_mk_agent”, [‘dmz’], ALL_HOSTS ),

( “ssh -p 119 -l root -i /var/lib/nagios/id_rsa ”, [

‘linux’ ] ),

( “ssh -p 119 -l root check_mk_agent”, [ ‘ssh’ ] ),

]

You have to su nagios and export public key on host “myhost” (ssh on
TCP port 119 here) :

su - nagios

ssh-heygen -t rsa

ssh-copy-id “root@myhost -p 119”

Do not forget to export root public key too :

su - root

``# ssh-heygen -t rsa`

`ssh-copy-id “root@myhost -p 119”

Otherwise you will not be able to run commands :

check_mk --flush myhost && check_mk -I
myhost && check_mk -O

Do not try “ssh -l root -p 1022 -i /etc/check_mk/check_mk.key
”, it will not work…

I tried dozens of combinations…

Keep in mind you must have your ssh keys in Nagios home directory !

Ubuntu 10.04.2 LTS Server

Nagios® Core™ Version 3.2.3

Nagios and the other packages compiled from source.

It also works with precompiled packages from my Ubuntu distro.

No more than half an hour to setup Nagios/Check_MK/Nagvis/PnP4Nagios.
Tuning Mathias tool takes more time but it’s worth !

Sorry if I misunderstood the probllem.

``

``

Bye.

``

Ronney Meier Rorotec GmbH a écrit :`

`
Hi mathias
`
`You were completly right about the ssh connection failing, there is no
`
`log
`
`at all :-(.
`
`ok you need to debug that first - that's an issue outside of check_mk.
Of course you could start with a legacy check that monitors ping & ssh to the
monitored host :>
`
`
Well ping wouldn't work, the gateway is configured not to respond on pings, but the host availability checks are made over check_tcp and this one never fails. It's only the ssh connection which fails from time to time. Even I never had problems with manual ssh connections. I'm running a script since 4 hours which executes the datasource_program every minute but I didn't get any error so far.
As far as I can guess it's propably a problem with the router there, which I can not influence, so I'm just trying to ignore the error ;-).
As far as I understand the caching agent, using it will result in the same problem, it will just query the hosts less frequent, but if this query fails I will have the same problem again...
`
`You could expand your datasource_program to handle the return code so
that it returns an "UNKNOWN"?
`
`
I tried to do that with a datasource_program entry of
ssh -l root -p 1022 -i /etc/check_mk/check_mk.key <IP> ; exit 3
which doesn't work, after looking at get_agent_info_program in chech_mk_base.py it looks like it will raise an error as soon as the exit code is different from 0.
Is there another way to do it, or did I misunderstood you completely? ;-)
Thanks already
Ronney
`
`I just wasn't able to find out what the caching agent is. Could you
`
`maybe
`
`give me a hint where I could find it?
`
`
About the caching agent:
It's a agent that doesn't do all commands each time it runs, instead it first
checks if the results are still fresh enough and then just sends you the cached
data from its last run.
(I didn't yet use it myself since I just didnt need it) There should be a
agent.tar.gz or a agents directory on your nagios server.
Look either in /usr/share/check_mk or in
/opt/omd/versions/default/share/check_mk
for a directory called "agents".
In there you should find check_mk_agent.linux.
As said before, it cannot fix your SSH issue, it can just ease the pain.
Florian
--
Mathias Kettner GmbH | \/ | |/ / M A T H I A S K E T T N E R
Florian Heigl | |\/| | ' /
Steinstr. 44 | | | | . \ Linux Beratung & Schulung
81667 München |_| |_|_|\_\ Tel.: 089 / 1890 4210
Fax.: 089 / 1890 4211 Mail: `
`
_______________________________________________
checkmk-en mailing list
`

`

`

···

http://mathias-kettner.defh@mathias-kettner.decheckmk-en@lists.mathias-kettner.dehttp://lists.mathias-kettner.de/mailman/listinfo/checkmk-en

Hi fred

I’m already using quite a similar configuration as you do.

“ssh -l root -p 1022 -i /etc/check_mk/check_mk.key <IP>"

Works like a charm with quite a few servers.

The problem is that there are 2 servers behind a nat router, to which it sometimes fails to build up the ssh connection. 1 Minute later manages to connect again, and I’m getting flooded with emails because of that…

So I’m looking for a way to ignore the failed connection attempt.

···

Von: Fred [mailto:aroc_gwada@yahoo.fr]
Gesendet: Dienstag, 28. Juni 2011 19:31
An: Ronney Meier Rorotec GmbH
Cc: checkmk-en@lists.mathias-kettner.de
Betreff: Re: [Check_mk (english)] logging of execution of check_mk and ssh command

Hi,

As explained in a previous mail to Ronney, i solved datasource programs issues this way :

/etc/check_mk/main.mk :
# ssh tag to use ssh for monitoring with Check_MK
all_hosts = [
"myhost|tcp|ubuntu|linux|dmz|ssh" ,
]
datasource_programs = [
( "ssh -p 119 -l root <IP> check_mk_agent", ['ssh'], ALL_HOSTS ),
# you can also use :
# ( "ssh -l root <IP> check_mk_agent", ['dmz'], ALL_HOSTS ),
# ( "ssh -p 119 -l root -i /var/lib/nagios/id_rsa <IP>", [ 'linux' ] ),
# ( "ssh -p 119 -l root <IP> check_mk_agent", [ 'ssh' ] ),
]

You have to su nagios and export public key on host "myhost" (ssh on TCP port 119 here) :

su - nagios
# ssh-heygen -t rsa
ssh-copy-id "root@myhost -p 119"

Do not forget to export root public key too :

su - root
# ssh-heygen -t rsa
ssh-copy-id "root@myhost -p 119"

Otherwise you will not be able to run commands :

check_mk --flush myhost && check_mk -I myhost && check_mk -O

Do not try "ssh -l root -p 1022 -i /etc/check_mk/check_mk.key <IP>", it will not work...

I tried dozens of combinations...

Keep in mind you must have your ssh keys in Nagios home directory !

Ubuntu 10.04.2 LTS Server
Nagios® Core™ Version 3.2.3

Nagios and the other packages compiled from source.
It also works with precompiled packages from my Ubuntu distro.

No more than half an hour to setup Nagios/Check_MK/Nagvis/PnP4Nagios. Tuning Mathias tool takes more time but it's worth !

Sorry if I misunderstood the probllem.

Bye.

Ronney Meier Rorotec GmbH a écrit :

`Hi mathias`
``
`You were completly right about the ssh connection failing, there is no`
`log`
`at all :-(.`
`ok you need to debug that first - that's an issue outside of check_mk.`
`Of course you could start with a legacy check that monitors ping & ssh to the`
`monitored host :>`
`Well ping wouldn't work, the gateway is configured not to respond on pings, but the host availability checks are made over check_tcp and this one never fails. It's only the ssh connection which fails from time to time. Even I never had problems with manual ssh connections. I'm running a script since 4 hours which executes the  datasource_program every minute but I didn't get any error so far.`
`As far as I can guess it's propably a problem with the router there, which I can not influence, so I'm just trying to ignore the error ;-).`
`As far as I understand the caching agent, using it will result in the same problem, it will just query the hosts less frequent, but if this query fails I will have the same problem again...`
``
`You could expand your datasource_program to handle the return code so`
`that it returns an "UNKNOWN"?`
`I tried to do that with a datasource_program entry of`
`ssh -l root -p 1022 -i /etc/check_mk/check_mk.key <IP> ; exit 3`
``
`which doesn't work, after looking at get_agent_info_program in chech_mk_base.py it looks like it will raise an error as soon as the exit code is different from 0.`
`Is there another way to do it, or did I misunderstood you completely? ;-)`
``
`Thanks already`
`Ronney`
``
`I just wasn't able to find out what the caching agent is. Could you`
`maybe`
`give me a hint where I could find it?`
`About the caching agent:`
`It's a agent that doesn't do all commands each time it runs, instead it first`
`checks if the results are still fresh enough and then just sends you the cached`
`data from its last run.`
`(I didn't yet use it myself since I just didnt need it) There should be a`
`agent.tar.gz or a agents directory on your nagios server.`
`Look either in /usr/share/check_mk or in`
`/opt/omd/versions/default/share/check_mk`
`for a directory called "agents".`
``
`In there you should find check_mk_agent.linux.`
``
`As said before, it cannot fix your SSH issue, it can just ease the pain.`
``
`Florian`
``
``
`--`
`Mathias Kettner GmbH  |  \/  | |/ /   M A T H I A S   K E T T N E R`
`Florian Heigl         | |\/| | ' /`
`Steinstr. 44          | |  | | . \        Linux Beratung & Schulung`
`81667 München         |_|  |_|_|\_\       [http://mathias-kettner.de](http://mathias-kettner.de)`
`Tel.: 089 / 1890 4210`
`Fax.: 089 / 1890 4211 Mail:  fh@mathias-kettner.de`
``
``
``
`_______________________________________________`
`checkmk-en mailing list`
`checkmk-en@lists.mathias-kettner.de`
`[http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en](http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en)`