Connection refused from Agent Socket on 6556 for a NATted IP (Socket OK, Agent registered ok)

Checkmk Server:
Checkmk version: 2.2.0b4_0 raw
OS: Ubuntu 20.04.6 LTS
Package: check-mk-raw-2.2.0b4_0.focal_amd64.deb

Monitored System (Host):
Checkmk Agent version: 2.2.0b4-1
OS: Ubuntu 20.04.5 LTS monitored system (Host)
Package: check-mk-agent_2.2.0b4-1_all.deb

The monitored system is in a local network and have a local IP address only: 192.168.1.110
On the border router/firewall a NAT Forward has been configured so that incoming connection on a public IP are forwarded to the Monitored System.
This way the Monitored System …“serves” http/https, ssh, etc requests.
The Firewall is open for port 6556

CheckMk_MonitoredHost_Setup.drawio

The Agent on the Monitored system has been succesfully installed and registered:

    sudo cmk-agent-ctl status
            Version: 2.2.0b4
            Agent socket: operational
            IP allowlist: any

            Connection: xx.xx.xxx.x/xxx_xxxxxxx
	            UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
	            Local:
		            Connection mode: pull-agent
		            Connecting to receiver port: 8000
		            Certificate issuer: Site 'aaa_bbbbbbb' agent signing CA
		            Certificate validity: Fri, 28 Apr 2023 16:06:26 +0000 - Fri, 28 Apr 2028 16:06:26 +0000
	            Remote:
		            Connection mode: pull-agent
		            Hostname: xxxxxxx.xxxxxxxxx.xx

Description of the problem :
The socket of the monitored system work as expected when the connection ‘point’ to the local IP configured:
Telnet connection from a shell on the monitored system itself:

telnet 192.168.1.110 6556
    Trying 192.168.1.110...
    Connected to 192.168.1.110.
    Escape character is '^]'.
    16

BUT, if i try to connect to port 6556 from the outside via the PUBLIC IP (forwarded via NAT by the Router/Firewall) i got a “Connection refused”.
Notably, the same thing (Connection refused) happens even if i try to connect from a shell on the monitored server itself !!!

telnet PUBLIC_NATTED_IP 6556
    Trying XXX.XX.XXX.XXX...
    telnet: Unable to connect to remote host: Connection refused

The socket looks ok to me:

sudo ss -tlpn | grep 6556 
   LISTEN    0         4096                     *:6556                   *:*        users:(("cmk-agent-ctl",pid=78735,fd=9)) 

and not using xinetd but systemd.

To debug the situation i tried to look at the log produced by the Agent Controller:

journalctl -fb -u cmk-agent-ctl-daemon.service
    -- Logs begin at Fri 2023-02-10 08:39:12 CET. --
    Apr 28 19:18:25 dev systemd[1]: Started Checkmk agent controller daemon.
    Apr 29 12:45:15 dev cmk-agent-ctl[78735]: WARN [cmk_agent_ctl::modes::pull] [::ffff:127.0.0.1]:60636: Request failed. (received corrupt message)
    Apr 29 12:46:59 dev cmk-agent-ctl[78735]: WARN [cmk_agent_ctl::modes::pull] [::ffff:127.0.0.1]:53128: Request failed. (received corrupt message)
    Apr 29 15:03:57 dev cmk-agent-ctl[78735]: WARN [cmk_agent_ctl::modes::pull] [::ffff:192.168.1.111]:43988: Request failed. (received corrupt message)
    Apr 29 15:21:09 dev cmk-agent-ctl[78735]: WARN [cmk_agent_ctl::modes::pull] [::ffff:192.168.1.111]:40832: Request failed. (received corrupt message)
    Apr 29 17:00:12 dev cmk-agent-ctl[78735]: WARN [cmk_agent_ctl::modes::pull] [::ffff:192.168.1.110]:54602: Request failed. (received corrupt message)

but i only see the succesfull telnet connections to the Local IP.

  • Is there a way to activate/see more detailed LOGs from the Agent Controller or other logs that can have useful informations?
  • Any idea of what is causing the problem ?
  • Is there a way to configure the Monitored System to PUSH data to the CheckMK Server ? (the comunication in this dorection works, being i was able to register the Agent)

Maybe the problem is related to the PUBLIC IP that is forwardes via NAT.
But the NAT forward is currently working correctly for the WEB Server (80,443) and for the SSH servr (22) so i guess this would be ok for the Agent Controller as well…

Thanks in advance
l.

Hi @lucabuka,

and welcome to the forum.

What is the IP of the host in your Checkmk Server?
Is it 192.168.1.110 or the public IP of your Router/Firewall?

The registration worked because it’s the other way around. Your monitored host makes a HTTP/HTTPS request to the public Checkmk Server. TLS Registration normally uses Port 8000.

Another question is: Can you share a screenshot of your NAT rule on your firewall? I think there lies the problem. Another approach would be to packet capture the connection.

Regards
Norm

Hi Norm,
thanks for your reply.

> *What is the IP of the host in your Checkmk Server?*
> *Is it 192.168.1.110 or the public IP of your Router/Firewall*

The image i posted refers only to the “Monitored Server” and it’s Network (datacenter of an hosting provider - the Monitored host is a VM i use as a web server and the hosting provider manage the NAT/Firewall).
The CMK Server is in another network (our private network with 6 Public IP , router/firewall, NAT, etc, directly managed by me)
The IP of the CheckMK Server is one of the Public IP of our network (IP_A in the image below).
This is the complete picture:

  
> *The registration worked because it’s the other way around. Your monitored host makes a HTTP/HTTPS *
> *request to the public Checkmk Server. TLS Registration normally uses Port 8000.**

Yes, i configure the firewall in our network to allow ports 80,8000 from the public IP of the monitored host.
I have direct access to the firewall configuration and this is why i think maybe it’s easier to setup a configuration where the Monitored System “Push” in some way the data to the Server (if this is possible with CheckMK Agent).

> *Another question is: Can you share a screenshot of your NAT rule on your firewall? *
> *I think there lies the problem. Another approach would be to packet capture the connection.*

The Monitored System is a VM hosted in a Hosting Provider Network.
I have no access to the Firewall cfg, but the hosting team opens the port 6556 for connections coming from the public IP of our network (corresponding to the CheckMK Server).
My impression was the firewall setting was working ok - i got immediately a “Connection refuse” (when the port is closed by the firewall, the telnet connection just timeout after a while)

I found a way to increase the loglevel for cmk-agent-ctl (editing the systemd cfg startup for cmk-agent-ctl-daemon.service

[Service]
        ExecStart=/usr/bin/cmk-agent-ctl -vv daemon

and i can see the logs using

    journalctl -fb -u cmk-agent-ctl-daemon.service
        # Logs when running telnet from localhost to local IP (192.168.100.110)
        May 01 14:58:35 dev cmk-agent-ctl[229959]: INFO [cmk_agent_ctl::modes::pull] [::ffff:192.168.1.110]:51960: Handling pull request.
        May 01 14:58:35 dev cmk-agent-ctl[229959]: DEBUG [cmk_agent_ctl::modes::pull] [::ffff:192.168.1.110]:51960: Handling pull request DONE (Task detached).
        May 01 14:58:35 dev cmk-agent-ctl[229959]: DEBUG [cmk_agent_ctl::modes::pull] handle_request starts
        May 01 14:58:35 dev systemd[1]: Started Checkmk agent (PID 229959/UID 997).
        ......

but, when i got a Connection refused, nothing happens !!!

To be 100% sure the problem was not related to CMK Control Agent i stop the agent and open a TCP socket listening on port 6556 with Netcat:

# Netcat to open a TCP4 socket and listen for connections (echo received chars to STDOUT on newline)
ncat -l 6556

sudo ss -tlp
        State     Recv-Q    Send-Q       Local Address:Port         Peer Address:Port   Process                                       
        LISTEN    0           10                [::]:6556                          [::]:*                          users:(("ncat",pid=236653,fd=3)) 

And i got the exact same behaviour (connection refused on telnet to the Puclic IP)

So you are right and the problem it’s related to the firewall or to some other Host configuration blocking the requests for port 6556.

I will investigate further and see what i’ll find.

Thanks again
l.

As Norman states in its reply, the problem was related to a misconfiguration of the NAT forward in the Hosting Provider Network (IP_B in the network schema).
The setup shown in the picture with both CheckMK server and Agent works perfectly.

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.