2.0.0p3.cre (RAW): check_mk.socket: Too many incoming connections (4) from source x.x.x.x, dropping connection

abclution · April 30, 2021, 10:50am

check_mk.socket: Too many incoming connections (4) from source x.x.x.x, dropping connection

Keep getting this error sporadically, hilariously.

Monitoring from a remote LTE/ADSL connection very hard to overwhelm the remote servers 1gb+ connection.

Yes I can edit the /etc/systemd/system/check_mk.socket but how come / what causes these sockets to not be released in a timely manner.

Restarting the systemctl restart check_mk.socket service doesn’t seem to fix it either.

Another 24 hours later and a bunch of hosts show check_mk inventory / service offline due to this in their agent log.

I have my checks set to 3 minutes, which I thought would be far enough to prevent this sort of issue as well.

May 01 11:06:13 ns2 systemd[1]: check_mk.socket: Too many incoming connections (31) from source 192.168.192.3, dropping connection.
May 01 11:08:59 ns2 systemd[1]: check_mk.socket: Too many incoming connections (31) from source 192.168.192.3, dropping connection.
May 01 11:09:13 ns2 systemd[1]: check_mk.socket: Too many incoming connections (31) from source 192.168.192.3, dropping connection.
May 01 11:10:43 ns2 systemd[1]: check_mk.socket: Too many incoming connections (31) from source 192.168.192.3, dropping connection.
May 01 11:11:59 ns2 systemd[1]: check_mk.socket: Too many incoming connections (31) from source 192.168.192.3, dropping connection.
May 01 11:12:13 ns2 systemd[1]: check_mk.socket: Too many incoming connections (31) from source 192.168.192.3, dropping connection.
May 01 11:14:59 ns2 systemd[1]: check_mk.socket: Too many incoming connections (31) from source 192.168.192.3, dropping connection.
May 01 11:15:13 ns2 systemd[1]: check_mk.socket: Too many incoming connections (31) from source 192.168.192.3, dropping connection.
May 01 11:15:45 ns2 systemd[1]: check_mk.socket: Too many incoming connections (31) from source 192.168.192.3, dropping connection.
May 01 11:17:59 ns2 systemd[1]: check_mk.socket: Too many incoming connections (31) from source 192.168.192.3, dropping connection.

The only thing that seems to work once a host has gone like this is

systemctl restart check_mk.socket;systemctl daemon-reload;systemctl status check_mk.socket

Its super frustrating as reliable monitoring alerts is the only reason to even monitor something. Getting blasted by crits that are caused by the monitoring system makes me sad.

eN0Rm · May 6, 2021, 7:00am

I can confirm that I have the samme issue. I can add that my Raspberry Pi host that I monitor runs on a wifi connection that connects back to monitoring subnet over wireguard.

andreas-doehler · May 6, 2021, 7:55am

For such special devices it would tend to use the connection over SSH.
This is very stable also with VPN/DSL connections.

thorian93 · May 6, 2021, 8:17am

I can confirm this also happens on a very “normal” Debian 10 server.
Reinstallations of the agent did not help. The host gets monitored for some time, then the time used to query the agent increases and eventually the agent stops responding altogether.
Wen are still looking into this, but every bit of help is appreciated.

abclution · May 10, 2021, 10:53am

I started using the caching agent, and setting the service to unlimited/not set concurrent connection to avoid getting blasted by crits, it has helped. But… I am having to flush out failed check_mk.socket services almost daily, no idea why, on a bunch of normal machines.

I’ve just accepted it as a fact of life, but wanted to leave the commands here that I use here for anyone else to setup a cron. Also, I am running checks from an LTE remote connection (remote servers are hosted on stable connection though.), so I understand that my connection itself is a bit shaky. Perhaps others with a better connection have less of an issue.

systemctl daemon-reload;  # This should NOT be needed, but sometimes the service just doesn't "clear" up without it. No idea why.
systemctl reset-failed;  # This to clear out all the failed services see image below, Additionally I have created a rule to ignore check_mk services failures in the systemd module
systemctl restart check_mk.socket;  # Restart the service
systemctl status check_mk.socket  # Make sure its running after.

thorian93 · May 12, 2021, 5:55am

I started using the caching agent, and setting the service to unlimited/not set concurrent connection to avoid getting blasted by crits, it has helped. But… I am having to flush out failed check_mk.socket services almost daily, no idea why, on a bunch of normal machines.

In your first paragraph, do you mean you modified the systemd unit file, or did you change a ruleset in Checkmk?

abclution · May 12, 2021, 7:11am

Actually both / all.

In the systemd unit(s)

Setup caching agent:    nano -w /etc/systemd/system/check_mk@.service

Change  
ExecStart=-/usr/bin/check_mk_agent 
to      
ExecStart=-/usr/bin/check_mk_caching_agent

Adjust Client Socket MAX connections -   nano -w /etc/systemd/system/check_mk.socket
comment out MaxConnectionsPerSource=3 (So there is no limit)
# MaxConnectionsPerSource=3

And then since the default rules still give crits/warning when services fail, I adjusted the systemd rule to ignore failed check_mk service failures. Alternately you can clear the failures on the monitored client via

 systemctl reset-failed

autotest100 · May 12, 2021, 10:42am

same her with 2.0.0p4.cre on Debian 10 and Ubuntu 20.04 servers in a “normal” LAN.

thorian93 · June 3, 2021, 8:25am

@_rb may I adress you here directly?
I think this is something for the Devs. You probably need to configure the Checkmk Agent Socket to allow more connections. See the link (paywall) below for reference. You basically need to configure the systemd unit file for the check-mk-agent.socket with a setting similar to this:

[Socket]
MaxConnectionsPerSource=20

I will be verifying this the next days and update here.

Update: I was just able to verify my suspicion. The value of MaxConnectionsPerSource is set to 3 by default. As we use a retry interval shorter than one minute (don’t ask) we probably trigger this limit and the socket stops responding until a reset. So from my point of view there are two possible solutions: Increase the value by default or make it configurable via Setup.

Link: systemd: <SomeService>.socket: Too many incoming connections (64) - Red Hat Customer Portal

moritz · July 26, 2021, 1:06pm

Hi there,
I’m looking into systemd related trouble at the moment. I think this is caused by long running “asynchronous” processes (cached plugins, cached local checks or cached MRPE plugins, that run longer that a check interval is). Those keep the connection alive, resulting too many connections per source.
Can anyone confirm that (by temporarily disabling those plugins)? If that’s the case, I may be able to provide you with a workaround (and ultimately a fix, of course).

MasopustC · July 28, 2021, 11:32am

Hi Moritz,

having the same issue here with a 2.0.0p7 agent, no cached plugins or cached local checks in use.

Cheers,
Christian

thorian93 · August 3, 2021, 2:28pm

We had the Docker plugin enabled on the affected host, but the problem seems to have went away on its own. So I cannot confirm your query currently. Will update, once I hit this issue again.

moritz · August 4, 2021, 6:24am

Thanks for the feedback.
Long running processes are always a problem (and I think docker disk usage is notorious for running long, see example config!). Usually, we’d advise users to configure the plugin to run asynchronously, but that’s just what was broken with systemd.
It is fixed in the master branch now (check_mk_agent: Fix issues with systemd), and we are condsidering bringing the fix to the stable branch.

DerD · August 31, 2021, 6:58pm

Solved this with Ansible for now.

Why does the .deb package still deploy /etc/xinetd.d/ckeck_mk when systemd is used?

louis · October 7, 2021, 9:19am

I can confirm that I also have this issue (2.0.0p12). And I can confirm that it’s from a long running check.

In more detail, it’s from a custom plugin that I’ve written to check the status of our MySQL backups. Every hour it will read the full contents of the backup to verify if the backup was succesful. Now, depending on the size of the databases this can take some time; more than the check interval of 1 minute (that’s why I only run it once every hour).

But I can see that indeed, every hour I now get 1 failed check_mk@blah.service; which is nicely in line with the long run every hour.

For now I’ve set MaxConnectionsPerSource to unlimited and I’ll clean up the failed services every now and then, but this is quite annoying.

shafiullah · April 7, 2022, 9:02am

If the maximum connection per source increases then found that sometime after it goes back to default 3 connection per source often.

system · April 7, 2023, 9:03am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.