[Check_mk (english)] Agent Not responding

Good day all-

I have a specific problem, on one specific node. Both the CMK Server and managed host are Ubuntu 18.04.

I have the agent installed and so far have been unsuccessful in multiple attempts to run and finish discovery.

I have opened all ports between the CMK Server and the managed node in question, and can actively send a telnet command to 6556:

telnet 10.x.x.x 6556
Trying 10…x.x.x…

Connected to 10.x.x.x.

Escape character is ‘^]’.

<<<check_mk>>>

Version: 1.5.0p11

AgentOS: linux

Hostname: xxxxxx01

AgentDirectory: /etc/check_mk

DataDirectory: /var/lib/check_mk_agent

SpoolDirectory: /var/lib/check_mk_agent/spool

PluginsDirectory: /usr/lib/check_mk_agent/plugins

LocalDirectory: /usr/lib/check_mk_agent/local

OnlyFrom:

<<>>

but this output is after re-installing the agent and restarting xinetd, and as you can see the output was completely truncated after <<>>, where it just hangs until I break the session.

all attempts to do discovery from command line or WATO fail- command line (cmk -vvII ) hangs indefinitely, and the gui times out after 110 seconds. I see no errors or recorded issues in journalctl or logs to indicate an issue, and running the local
/usr/bin/check_mk_agent on the managed host produces immediate return of all discovery elements.

Anyone have any thoughts on what I could do to narrow in on the cause of this issue, as I’m running out of ideas.

Anything you may have is appreciated!

Thanks-

Karl

···

Sent from Hiri

What the agent does at this stage is
  df -PTlk

Try running this command on the managed node and see what happens.

Regards,
Robert

···

On 05.02.2019 16:32, Karl Otterbein via checkmk-en wrote:

telnet 10.x.x.x 6556
Trying 10..x.x.x...
Connected to 10.x.x.x.
Escape character is '^]'.
<<<check_mk>>>
Version: 1.5.0p11
AgentOS: linux
Hostname: xxxxxx01
AgentDirectory: /etc/check_mk
DataDirectory: /var/lib/check_mk_agent
SpoolDirectory: /var/lib/check_mk_agent/spool
PluginsDirectory: /usr/lib/check_mk_agent/plugins
LocalDirectory: /usr/lib/check_mk_agent/local
OnlyFrom:�
<<<df>>>

but this output is after re-installing the agent and restarting xinetd,
and as you can see the output was completely truncated after <<<df>>>,
where it just hangs until I break the session.

Thanks Robert for the quick reply-

works properly:

df -PTlk
Filesystem Type 1024-blocks Used Available Capacity Mounted on

udev devtmpfs 197408056 0 197408056 0% /dev

tmpfs tmpfs 39486656 183692 39302964 1% /run

… (truncated)

K

···

Sent from Hiri

On 2019-02-05 10:37:07-05:00 checkmk-en wrote:

	What the agent does at this stage is
df -PTlk
Try running this command on the managed node and see what happens.
Regards,
Robert
On 05.02.2019 16:32, Karl Otterbein via checkmk-en wrote:
&gt; telnet 10.x.x.x 6556
&gt; Trying 10..x.x.x...
&gt; Connected to 10.x.x.x.
&gt; Escape character is '^]'.
&gt; &lt;&lt;<check_mk>&gt;&gt;
&gt; Version: 1.5.0p11
&gt; AgentOS: linux
&gt; Hostname: xxxxxx01
&gt; AgentDirectory: /etc/check_mk
&gt; DataDirectory: /var/lib/check_mk_agent
&gt; SpoolDirectory: /var/lib/check_mk_agent/spool
&gt; PluginsDirectory: /usr/lib/check_mk_agent/plugins
&gt; LocalDirectory: /usr/lib/check_mk_agent/local
&gt; OnlyFrom:&nbsp;
&gt; &lt;&lt;<df>&gt;&gt;
&gt; &gt; but this output is after re-installing the agent and restarting xinetd,
&gt; and as you can see the output was completely truncated after &lt;&lt;<df>&gt;&gt;,
&gt; where it just hangs until I break the session.
_______________________________________________
checkmk-en mailing list
checkmk-en@lists.mathias-kettner.de
Manage your subscription or unsubscribe
https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en</df></df></check_mk>

Does the agent run through completion on the local host?

telnet localhost 6556

and/or

/usr/bin/check_mk_agent (or wherevever the ahent lives)

Also, is the agent version of equivalent or lesser then the monitoring server? Having a newer agent version could cause some conflicts.

···

On Tue, Feb 5, 2019 at 7:40 AM Karl Otterbein via checkmk-en checkmk-en@lists.mathias-kettner.de wrote:

Thanks Robert for the quick reply-

works properly:

df -PTlk
Filesystem Type 1024-blocks Used Available Capacity Mounted on

udev devtmpfs 197408056 0 197408056 0% /dev

tmpfs tmpfs 39486656 183692 39302964 1% /run

… (truncated)

K

Sent from Hiri

On 2019-02-05 10:37:07-05:00 checkmk-en wrote:

	What the agent does at this stage is
df -PTlk
Try running this command on the managed node and see what happens.
Regards,
Robert
On 05.02.2019 16:32, Karl Otterbein via checkmk-en wrote:
&gt; telnet 10.x.x.x 6556
&gt; Trying 10..x.x.x...
&gt; Connected to 10.x.x.x.
&gt; Escape character is '^]'.
&gt; &lt;&lt;<check_mk>&gt;&gt;
&gt; Version: 1.5.0p11
&gt; AgentOS: linux
&gt; Hostname: xxxxxx01
&gt; AgentDirectory: /etc/check_mk
&gt; DataDirectory: /var/lib/check_mk_agent
&gt; SpoolDirectory: /var/lib/check_mk_agent/spool
&gt; PluginsDirectory: /usr/lib/check_mk_agent/plugins
&gt; LocalDirectory: /usr/lib/check_mk_agent/local
&gt; OnlyFrom:&nbsp;
&gt; &lt;&lt;<df>&gt;&gt;
&gt; &gt; but this output is after re-installing the agent and restarting xinetd,
&gt; and as you can see the output was completely truncated after &lt;&lt;<df>&gt;&gt;,
&gt; where it just hangs until I break the session.
_______________________________________________
checkmk-en mailing list
checkmk-en@lists.mathias-kettner.de
Manage your subscription or unsubscribe
[https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en](https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en)</df></df></check_mk>

checkmk-en mailing list

checkmk-en@lists.mathias-kettner.de

Manage your subscription or unsubscribe

https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en

Does the agent run through completion on the local host? - yes- I’m able to complete through the agent run with both commands below…

telnet localhost 6556

and/or

/usr/bin/check_mk_agent (or wherevever the ahent lives)

Also, is the agent version of equivalent or lesser then the monitoring server? Having a newer agent version could cause some conflicts. They are the same- both 1.5.0p11.

I wonder if the following I receive when running cmk -vvII may cause an issue:

···

Sent from Hiri

On 2019-02-05 10:52:19-05:00 Paul Dott wrote:

Does the agent run through completion on the local host?

telnet localhost 6556

and/or

/usr/bin/check_mk_agent (or wherevever the ahent lives)

Also, is the agent version of equivalent or lesser then the monitoring server? Having a newer agent version could cause some conflicts.

On Tue, Feb 5, 2019 at 7:40 AM Karl Otterbein via checkmk-en checkmk-en@lists.mathias-kettner.de wrote:

Thanks Robert for the quick reply-

works properly:

df -PTlk
Filesystem Type 1024-blocks Used Available Capacity Mounted on

udev devtmpfs 197408056 0 197408056 0% /dev

tmpfs tmpfs 39486656 183692 39302964 1% /run

… (truncated)

K

Sent from Hiri

On 2019-02-05 10:37:07-05:00 checkmk-en wrote:

	What the agent does at this stage is
df -PTlk
Try running this command on the managed node and see what happens.
Regards,
Robert
On 05.02.2019 16:32, Karl Otterbein via checkmk-en wrote:
&gt; telnet 10.x.x.x 6556
&gt; Trying 10..x.x.x...
&gt; Connected to 10.x.x.x.
&gt; Escape character is '^]'.
&gt; &lt;&lt;<check_mk>&gt;&gt;
&gt; Version: 1.5.0p11
&gt; AgentOS: linux
&gt; Hostname: xxxxxx01
&gt; AgentDirectory: /etc/check_mk
&gt; DataDirectory: /var/lib/check_mk_agent
&gt; SpoolDirectory: /var/lib/check_mk_agent/spool
&gt; PluginsDirectory: /usr/lib/check_mk_agent/plugins
&gt; LocalDirectory: /usr/lib/check_mk_agent/local
&gt; OnlyFrom:&nbsp;
&gt; &lt;&lt;<df>&gt;&gt;
&gt; &gt; but this output is after re-installing the agent and restarting xinetd,
&gt; and as you can see the output was completely truncated after &lt;&lt;<df>&gt;&gt;,
&gt; where it just hangs until I break the session.
_______________________________________________
checkmk-en mailing list
checkmk-en@lists.mathias-kettner.de
Manage your subscription or unsubscribe
[https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en](https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en)</df></df></check_mk>

checkmk-en mailing list

checkmk-en@lists.mathias-kettner.de

Manage your subscription or unsubscribe

https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en

sorry- sent by mistake without finishing:

I wonder if the following I receive when running cmk -vvII may cause an issue:

[agent] Connecting via TCP to 10.51.169.15:6556 (5.0s timeout)

I wonder if there is a timeout waiting- this is a remote host in the cloud reaching to a co-lo in a different DC- is there a means to up that TCP timeout?

It doesn’t make much sense because it’s a 10GB direct connect.

Full output:

  • FETCHING DATA

[agent] No persisted sections loaded

[agent] Not using cache (Does not exist)

[agent] Execute data source

[agent] Connecting via TCP to 10.51.169.15:6556 (5.0s timeout)

[agent] Reading data from agent

···

Sent from Hiri

On 2019-02-05 10:58:33.309644-05:00 Karl Otterbein wrote:

Does the agent run through completion on the local host? - yes- I’m able to complete through the agent run with both commands below…

telnet localhost 6556

and/or

/usr/bin/check_mk_agent (or wherevever the ahent lives)

Also, is the agent version of equivalent or lesser then the monitoring server? Having a newer agent version could cause some conflicts. They are the same- both 1.5.0p11.

I wonder if the following I receive when running cmk -vvII may cause an issue:

Sent from Hiri

On 2019-02-05 10:52:19-05:00 Paul Dott wrote:

Does the agent run through completion on the local host?

telnet localhost 6556

and/or

/usr/bin/check_mk_agent (or wherevever the ahent lives)

Also, is the agent version of equivalent or lesser then the monitoring server? Having a newer agent version could cause some conflicts.

On Tue, Feb 5, 2019 at 7:40 AM Karl Otterbein via checkmk-en checkmk-en@lists.mathias-kettner.de wrote:

Thanks Robert for the quick reply-

works properly:

df -PTlk
Filesystem Type 1024-blocks Used Available Capacity Mounted on

udev devtmpfs 197408056 0 197408056 0% /dev

tmpfs tmpfs 39486656 183692 39302964 1% /run

… (truncated)

K

Sent from Hiri

On 2019-02-05 10:37:07-05:00 checkmk-en wrote:

	What the agent does at this stage is
df -PTlk
Try running this command on the managed node and see what happens.
Regards,
Robert
On 05.02.2019 16:32, Karl Otterbein via checkmk-en wrote:
&gt; telnet 10.x.x.x 6556
&gt; Trying 10..x.x.x...
&gt; Connected to 10.x.x.x.
&gt; Escape character is '^]'.
&gt; &lt;&lt;<check_mk>&gt;&gt;
&gt; Version: 1.5.0p11
&gt; AgentOS: linux
&gt; Hostname: xxxxxx01
&gt; AgentDirectory: /etc/check_mk
&gt; DataDirectory: /var/lib/check_mk_agent
&gt; SpoolDirectory: /var/lib/check_mk_agent/spool
&gt; PluginsDirectory: /usr/lib/check_mk_agent/plugins
&gt; LocalDirectory: /usr/lib/check_mk_agent/local
&gt; OnlyFrom:&nbsp;
&gt; &lt;&lt;<df>&gt;&gt;
&gt; &gt; but this output is after re-installing the agent and restarting xinetd,
&gt; and as you can see the output was completely truncated after &lt;&lt;<df>&gt;&gt;,
&gt; where it just hangs until I break the session.
_______________________________________________
checkmk-en mailing list
checkmk-en@lists.mathias-kettner.de
Manage your subscription or unsubscribe
[https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en](https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en)</df></df></check_mk>

checkmk-en mailing list

checkmk-en@lists.mathias-kettner.de

Manage your subscription or unsubscribe

https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en

Hi,

Is the connection between the Server and the Remote Host going through a VPN tunnel? If so, maybe you are facing an MSS (Maximum Segment Size) issue.

Best regards.

···

El mar., 5 feb. 2019 a las 13:02, Karl Otterbein via checkmk-en (checkmk-en@lists.mathias-kettner.de) escribió:

sorry- sent by mistake without finishing:

I wonder if the following I receive when running cmk -vvII may cause an issue:

[agent] Connecting via TCP to 10.51.169.15:6556 (5.0s timeout)

I wonder if there is a timeout waiting- this is a remote host in the cloud reaching to a co-lo in a different DC- is there a means to up that TCP timeout?

It doesn’t make much sense because it’s a 10GB direct connect.

Full output:

  • FETCHING DATA

[agent] No persisted sections loaded

[agent] Not using cache (Does not exist)

[agent] Execute data source

[agent] Connecting via TCP to 10.51.169.15:6556 (5.0s timeout)

[agent] Reading data from agent

Sent from Hiri

On 2019-02-05 10:58:33.309644-05:00 Karl Otterbein wrote:

Does the agent run through completion on the local host? - yes- I’m able to complete through the agent run with both commands below…

telnet localhost 6556

and/or

/usr/bin/check_mk_agent (or wherevever the ahent lives)

Also, is the agent version of equivalent or lesser then the monitoring server? Having a newer agent version could cause some conflicts. They are the same- both 1.5.0p11.

I wonder if the following I receive when running cmk -vvII may cause an issue:

Sent from Hiri

On 2019-02-05 10:52:19-05:00 Paul Dott wrote:

Does the agent run through completion on the local host?

telnet localhost 6556

and/or

/usr/bin/check_mk_agent (or wherevever the ahent lives)

Also, is the agent version of equivalent or lesser then the monitoring server? Having a newer agent version could cause some conflicts.

On Tue, Feb 5, 2019 at 7:40 AM Karl Otterbein via checkmk-en checkmk-en@lists.mathias-kettner.de wrote:

Thanks Robert for the quick reply-

works properly:

df -PTlk
Filesystem Type 1024-blocks Used Available Capacity Mounted on

udev devtmpfs 197408056 0 197408056 0% /dev

tmpfs tmpfs 39486656 183692 39302964 1% /run

… (truncated)

K

Sent from Hiri

On 2019-02-05 10:37:07-05:00 checkmk-en wrote:

	What the agent does at this stage is
df -PTlk
Try running this command on the managed node and see what happens.
Regards,
Robert
On 05.02.2019 16:32, Karl Otterbein via checkmk-en wrote:
&gt; telnet 10.x.x.x 6556
&gt; Trying 10..x.x.x...
&gt; Connected to 10.x.x.x.
&gt; Escape character is '^]'.
&gt; &lt;&lt;<check_mk>&gt;&gt;
&gt; Version: 1.5.0p11
&gt; AgentOS: linux
&gt; Hostname: xxxxxx01
&gt; AgentDirectory: /etc/check_mk
&gt; DataDirectory: /var/lib/check_mk_agent
&gt; SpoolDirectory: /var/lib/check_mk_agent/spool
&gt; PluginsDirectory: /usr/lib/check_mk_agent/plugins
&gt; LocalDirectory: /usr/lib/check_mk_agent/local
&gt; OnlyFrom:&nbsp;
&gt; &lt;&lt;<df>&gt;&gt;
&gt; &gt; but this output is after re-installing the agent and restarting xinetd,
&gt; and as you can see the output was completely truncated after &lt;&lt;<df>&gt;&gt;,
&gt; where it just hangs until I break the session.
_______________________________________________
checkmk-en mailing list
checkmk-en@lists.mathias-kettner.de
Manage your subscription or unsubscribe
[https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en](https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en)</df></df></check_mk>

checkmk-en mailing list

checkmk-en@lists.mathias-kettner.de

Manage your subscription or unsubscribe

https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en


checkmk-en mailing list

checkmk-en@lists.mathias-kettner.de

Manage your subscription or unsubscribe

https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en

No, it is a direct connect pipe with no VPN. I have however discovered that the bond interface is dropping packets so I am going to go figure out if there is something physically wrong with the host- looks to be related.

Thank you all, I’ll reply if I find a cause with the bond, but appreciate everyone’s input!

K

···

On Feb 5, 2019, at 11:13 AM, Ezequiel Tolstanov ezequiel@atdt.com.ar wrote:

Hi,

Is the connection between the Server and the Remote Host going through a VPN tunnel? If so, maybe you are facing an MSS (Maximum Segment Size) issue.

Best regards.

El mar., 5 feb. 2019 a las 13:02, Karl Otterbein via checkmk-en (checkmk-en@lists.mathias-kettner.de) escribió:

sorry- sent by mistake without finishing:

I wonder if the following I receive when running cmk -vvII may cause an issue:

[agent] Connecting via TCP to
10.51.169.15:6556 (5.0s timeout)

I wonder if there is a timeout waiting- this is a remote host in the cloud reaching to a co-lo in a different DC- is there a means to up that TCP timeout?

It doesn’t make much sense because it’s a 10GB direct connect.

Full output:

  • FETCHING DATA

[agent] No persisted sections loaded

[agent] Not using cache (Does not exist)

[agent] Execute data source

[agent] Connecting via TCP to
10.51.169.15:6556
(5.0s timeout)

[agent] Reading data from agent

Sent from Hiri

On 2019-02-05 10:58:33.309644-05:00 Karl Otterbein wrote:

Does the agent run through completion on the local host? - yes- I’m able to complete through the agent run with both commands below…

telnet localhost 6556

and/or

/usr/bin/check_mk_agent (or wherevever the ahent lives)

Also, is the agent version of equivalent or lesser then the monitoring server? Having a newer agent version could cause some conflicts. They are the same- both 1.5.0p11.

I wonder if the following I receive when running cmk -vvII may cause an issue:

Sent from Hiri

On 2019-02-05 10:52:19-05:00 Paul Dott wrote:

Does the agent run through completion on the local host?

telnet localhost 6556

and/or

/usr/bin/check_mk_agent (or wherevever the ahent lives)

Also, is the agent version of equivalent or lesser then the monitoring server? Having a newer agent version could cause some conflicts.

On Tue, Feb 5, 2019 at 7:40 AM Karl Otterbein via checkmk-en checkmk-en@lists.mathias-kettner.de wrote:

Thanks Robert for the quick reply-

works properly:

df -PTlk
Filesystem Type 1024-blocks Used Available Capacity Mounted on

udev devtmpfs 197408056 0 197408056 0% /dev

tmpfs tmpfs 39486656 183692 39302964 1% /run

… (truncated)

K

Sent from Hiri

On 2019-02-05 10:37:07-05:00 checkmk-en wrote:

	What the agent does at this stage is
df -PTlk
Try running this command on the managed node and see what happens.
Regards,
Robert
On 05.02.2019 16:32, Karl Otterbein via checkmk-en wrote:
&gt; telnet 10.x.x.x 6556
&gt; Trying 10..x.x.x...
&gt; Connected to 10.x.x.x.
&gt; Escape character is '^]'.
&gt; &lt;&lt;<check_mk>&gt;&gt;
&gt; Version: 1.5.0p11
&gt; AgentOS: linux
&gt; Hostname: xxxxxx01
&gt; AgentDirectory: /etc/check_mk
&gt; DataDirectory: /var/lib/check_mk_agent
&gt; SpoolDirectory: /var/lib/check_mk_agent/spool
&gt; PluginsDirectory: /usr/lib/check_mk_agent/plugins
&gt; LocalDirectory: /usr/lib/check_mk_agent/local
&gt; OnlyFrom:&nbsp;
&gt; &lt;&lt;<df>&gt;&gt;
&gt; &gt; but this output is after re-installing the agent and restarting xinetd,
&gt; and as you can see the output was completely truncated after &lt;&lt;<df>&gt;&gt;,
&gt; where it just hangs until I break the session.
_______________________________________________
checkmk-en mailing list
checkmk-en@lists.mathias-kettner.de
Manage your subscription or unsubscribe
[https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en](https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en)</df></df></check_mk>

checkmk-en mailing list

checkmk-en@lists.mathias-kettner.de

Manage your subscription or unsubscribe

https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en


checkmk-en mailing list

checkmk-en@lists.mathias-kettner.de

Manage your subscription or unsubscribe

https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/checkmk-en