Cmk-agent - connection ok - dump has no output

CMK version: 2.1.0p26_0
OS version: Server and agent Debian 11

Error message:

Server: Service - Check_MK - [agent] MKTimeout(‘Fetcher for host “my_server” timed out after 60 seconds’)CRIT, Got no information from hostCRIT, execution time 60.0 sec
CRIT
Check_MK Discovery no unmonitored services found, 37 vanished services (apt:1, checkmk_agent:1, cpu_loads:1, cpu_threads:1, df:3, diskstat:1, kernel_performance:1, kernel_util:1, lnx_if:12, md:4, mem_linux:1, mounts:3, mrpe:3, systemd_units_services_summary:1, tcp_conn_stats:1, timesyncd:1, uptime:1), no new host labels, [agent] MKTimeout(‘Fetcher for host “my_server” timed out after 60 seconds’

Agent: ● cmk-agent-ctl-daemon.service - Checkmk agent controller daemon
Loaded: loaded (/lib/systemd/system/cmk-agent-ctl-daemon.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2023-05-07 16:34:36 CEST; 4h 0min ago
Main PID: 1009 (cmk-agent-ctl)
Tasks: 3 (limit: 77026)
Memory: 7.4M
CPU: 1.537s
CGroup: /system.slice/cmk-agent-ctl-daemon.service
└─1009 /usr/bin/cmk-agent-ctl daemon

May 07 20:23:29 my_server cmk-agent-ctl[1009]: WARN [cmk_agent_ctl::modes::pull] [::ffff:192.168.122.server]:59942: Request failed. (Too many active connections)
May 07 20:24:29 my_server cmk-agent-ctl[1009]: WARN [cmk_agent_ctl::modes::pull] [::ffff:192.168.122.server]:40886: Request failed. (Too many active connections)
May 07 20:24:34 my_server cmk-agent-ctl[1009]: WARN [cmk_agent_ctl::modes::pull] [::ffff:192.168.122.server]:34248: Request failed. (Broken pipe (os error 32))
May 07 20:26:35 my_server cmk-agent-ctl[1009]: WARN [cmk_agent_ctl::modes::pull] [::ffff:192.168.122.server]:59088: Request failed. (Broken pipe (os error 32))
May 07 20:28:34 my_server cmk-agent-ctl[1009]: WARN [cmk_agent_ctl::modes::pull] [::ffff:192.168.122.server]:50620: Request failed. (Broken pipe (os error 32))
May 07 20:31:29 my_server cmk-agent-ctl[1009]: WARN [cmk_agent_ctl::modes::pull] [::ffff:192.168.122.server]:54964: Request failed. (Too many active connections)
May 07 20:32:29 my_server cmk-agent-ctl[1009]: WARN [cmk_agent_ctl::modes::pull] [::ffff:192.168.122.server]:39884: Request failed. (Too many active connections)
May 07 20:33:29 my_server cmk-agent-ctl[1009]: WARN [cmk_agent_ctl::modes::pull] [::ffff:192.168.122.server]:40948: Request failed. (Too many active connections)
May 07 20:34:29 my_server cmk-agent-ctl[1009]: WARN [cmk_agent_ctl::modes::pull] [::ffff:192.168.122.server]:33710: Request failed. (Too many active connections)
May 07 20:34:35 my_server cmk-agent-ctl[1009]: WARN [cmk_agent_ctl::modes::pull] [::ffff:192.168.122.server]:52878: Request failed. (Broken pipe (os error 32))
root@myserver ~ # service check-mk-agent [pressing Tab]
check-mk-agent@67-1009-998 check-mk-agent@70-92535-998
check-mk-agent@68-1009-998 check-mk-agent-async
check-mk-agent@69-1009-998

What is this? Why are there different services here?

When I enter the command “cmk-agent-ctl dump”, I receive no output. However, on other clients, I quickly get an output. anyone have an idea what the problem is here? I reinstalled the checkmk-agent and since then I have this error.

Hi @hose93

In the output you provided you can find some hints like this one:

That means the maximum concurrent connections are used and thus no more agent output can be provided.

Somebody on the forum had a similar issue and changed the settings to allow more connections.

But before you change this you should ask yourself why are multiple systems are connecting to your agent to receive output. :thinking:

Hope this helps.

Regards
Norm

That’s strange. I have only 1 server with 1 instance. I had a different checkmk server before. How can I clean up the unnecessary connections? I don’t have a folder with cmk-agent-ctl-daemon.service.d in my systemd. I’m slowly not understanding anything anymore :frowning:

Looks like dangling connections. Connections that timeout but never get closed. First get open network connections on the CMK server, for example with lsof or sockstat. Then try

time cmk-agent-ctl dump

on suspected hosts.

That’s really strange. I’m actually getting an output. However, it takes an incredibly long time:
real 9m6.657s
user 0m0.003s
sys 0m0.005s

Hi @hose93
are there any plugins or localchecks running on that host?

Please check under:

/usr/lib/check_mk_agent/plugins/
/usr/lib/check_mk_agent/local/

If yes please remove the execution permission to test if the long runtime is caused by one of your plugins. (e.g. sudo chmod -x pluginname)

Then check the execution time again to find the cause of the long execution time.

Regards
Norm

1 Like

Actually… it worked. Apparently, an Nginx plugin is causing issues here. Once I deactivate it, there are no more problems. Interestingly, the entire agent works fine on another client. There, the Nginx plugin also recognizes a web server.

Should I remove unnecessary plugins or how can I work around this issue? Nginx is not in use on the faulty host, only in a Docker environment as a reverse proxy

I had the same issue.

$ systemctl status cmk-agent-ctl-daemon

WARN [cmk_agent_ctl::modes::pull] [::ffff:10.141.11.248]:40570: Request failed. (Too many active connections)

thanks for the hint to take a closer look at the plugin directory!

/usr/lib/check_mk_agent/plugins/

In my case the nvidia plugin took very long to respond, due to the fact that nvidia-smi showed an error with one of the the GPU cards.

So temporarily doing this on the plugin helped:

chmod 644 /usr/lib/check_mk_agent/plugins/nvidia_smi

We experienced this recently with the mysql_capacity check.
I’ve made a PR to make part of the mysql check easily disableable make mysql capacity check optional (high load caused oom error) by ITJamie · Pull Request #671 · Checkmk/checkmk · GitHub

it would be great if the “Check_MK Discovery” check would alert on checks that take a long time to process.