Hello,
I have an issue with the registration of the host on a server : the agent-receiver on server-side is always crashing around 15/20 seconds and restarts.
The server and the host are on the same network, the same virtualization cluster.
Host can telnet on port 8000 of the server and a curl request works (although there is an issue with the certificate but I dont think it’s the issue since i’m passing --trust-cert with the registration command).
We use distributed monitoring and this host and this server are on site2 (site1 being the main one).
Error message:
Host side :
cmk-agent-ctl register --hostname webserver --server 10.44.251.1:8000 --site site2 --user automation --password <password> --trust-cert -v
INFO [cmk_agent_ctl] starting
INFO [cmk_agent_ctl] Loaded config from '"/etc/check_mk/cmk-agent-ctl.toml"', legacy pull 'LegacyPullMarker("/var/lib/cmk-agent/allow-legacy-pull")' exists
ERROR [cmk_agent_ctl] Error pairing with 10.44.251.1:8000/site2
Caused by:
0: error sending request for url (https://10.44.251.1:8000/site2/agent-receiver/pairing): connection closed before message completed
1: connection closed before message completed
Error on server side in agent-receiver/error.log :
[2022-06-13 17:30:48 +0200] [3787746] [DEBUG] Current configuration:
config: ./gunicorn.conf.py
wsgi_app: None
bind: ['0.0.0.0:8000']
backlog: 2048
workers: 1
worker_class: uvicorn.workers.UvicornWorker
threads: 1
worker_connections: 1000
max_requests: 0
max_requests_jitter: 0
timeout: 30
graceful_timeout: 30
keepalive: 2
limit_request_line: 4094
limit_request_fields: 100
limit_request_field_size: 8190
reload: False
reload_engine: auto
reload_extra_files: []
spew: False
check_config: False
print_config: False
preload_app: False
sendfile: None
reuse_port: False
chdir: /opt/omd/sites/site2
daemon: True
raw_env: []
pidfile: /omd/sites/site2/tmp/run/agent-receiver.pid
worker_tmp_dir: None
user: 996
group: 1000
umask: 0
initgroups: False
tmp_upload_dir: None
secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
forwarded_allow_ips: ['127.0.0.1']
accesslog: /omd/sites/site2/var/log/agent-receiver/access.log
disable_redirect_access_to_syslog: False
access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
errorlog: /omd/sites/site2/var/log/agent-receiver/error.log
loglevel: debug
capture_output: False
logger_class: gunicorn.glogging.Logger
logconfig: None
logconfig_dict: {}
syslog_addr: udp://localhost:514
syslog: False
syslog_prefix: None
syslog_facility: user
enable_stdio_inheritance: False
statsd_host: None
dogstatsd_tags:
statsd_prefix:
proc_name: None
default_proc_name: agent_receiver.apps:main_app()
pythonpath: None
paste: None
on_starting: <function OnStarting.on_starting at 0x7fe0c7f55550>
on_reload: <function OnReload.on_reload at 0x7fe0c7f55670>
when_ready: <function WhenReady.when_ready at 0x7fe0c7f55790>
pre_fork: <function Prefork.pre_fork at 0x7fe0c7f558b0>
post_fork: <function Postfork.post_fork at 0x7fe0c7f559d0>
post_worker_init: <function PostWorkerInit.post_worker_init at 0x7fe0c7f55af0>
worker_int: <function WorkerInt.worker_int at 0x7fe0c7f55c10>
worker_abort: <function WorkerAbort.worker_abort at 0x7fe0c7f55d30>
pre_exec: <function PreExec.pre_exec at 0x7fe0c7f55e50>
pre_request: <function PreRequest.pre_request at 0x7fe0c7f55f70>
post_request: <function PostRequest.post_request at 0x7fe0c7f62040>
child_exit: <function ChildExit.child_exit at 0x7fe0c7f62160>
worker_exit: <function WorkerExit.worker_exit at 0x7fe0c7f62280>
nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7fe0c7f623a0>
on_exit: <function OnExit.on_exit at 0x7fe0c7f624c0>
proxy_protocol: False
proxy_allow_ips: ['127.0.0.1']
keyfile: /omd/sites/site2/etc/ssl/agent_receiver_cert.pem
certfile: /omd/sites/site2/etc/ssl/agent_receiver_cert.pem
ssl_version: 2
cert_reqs: 0
ca_certs: None
suppress_ragged_eofs: True
do_handshake_on_connect: False
ciphers: None
raw_paste_global_conf: []
strip_header_spaces: False
[2022-06-13 17:30:48 +0200] [3787746] [INFO] Starting gunicorn 20.1.0
[2022-06-13 17:30:48 +0200] [3787746] [DEBUG] Arbiter booted
[2022-06-13 17:30:48 +0200] [3787746] [INFO] Listening at: https://0.0.0.0:8000 (3787746)
[2022-06-13 17:30:48 +0200] [3787746] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2022-06-13 17:30:48 +0200] [3787751] [INFO] Booting worker with pid: 3787751
[2022-06-13 17:30:48 +0200] [3787746] [DEBUG] 1 workers
[2022-06-13 17:30:49 +0200] [3787751] [INFO] Started server process [3787751]
[2022-06-13 17:30:49 +0200] [3787751] [INFO] Waiting for application startup.
[2022-06-13 17:30:49 +0200] [3787751] [INFO] Application startup complete.
[2022-06-13 17:31:35 +0200] [3787746] [CRITICAL] WORKER TIMEOUT (pid:3787751)
[2022-06-13 17:31:36 +0200] [3787746] [WARNING] Worker with pid 3787751 was terminated due to signal 9
There is no log in the agent-receiver.log file and in the access.log, which seems strange to me like the process is crashing before even accepting the HTTP request.
CMK version: 2.1.0p2.cre
OS version: Ubuntu 20.04.4 LTS
This behavior doesn’t happen on the main site, I believe I openned the right network ports on both sides.
I did not find a way to put the agent-receiver on debug mode to get more logs to investigate on if anyone knows how to do this.
It would be great if anybody have helpful tips, debugging tricks or even the solution to this issue (I checked here and did not see it).
Thanks