MK-Livestatus crashing, polling failed connection timed out

CMK version: 2.1.0p40
OS version: Ubuntu 22.04 LTS

Error message:

2024-03-11 23:23:30 [main] Polling failed: Connection timed out
2024-03-11 23:23:33 [main] Polling failed: Connection timed out
2024-03-11 23:23:35 [main] Polling failed: Connection timed out
2024-03-11 23:23:38 [main] Polling failed: Connection timed out
2024-03-11 23:23:40 [main] Polling failed: Connection timed out
2024-03-11 23:23:43 [main] Polling failed: Connection timed out
2024-03-11 23:23:45 [main] Polling failed: Connection timed out
2024-03-11 23:23:47 [client 1] accepted client connection on fd 20
2024-03-11 23:23:47 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:47 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:47 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:48 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:48 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:48 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:48 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:48 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:49 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:49 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:49 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:49 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:49 [main] Polling failed: Connection timed out
2024-03-11 23:23:49 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:50 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:50 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:50 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:50 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:50 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:51 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:51 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:51 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:51 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:51 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:52 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:52 [main] Polling failed: Connection timed out
2024-03-11 23:23:52 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:52 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:52 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:52 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:53 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:53 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:53 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:53 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:53 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:54 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:54 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:54 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:54 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:54 [main] Polling failed: Connection timed out
2024-03-11 23:23:54 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:55 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:55 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:55 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:55 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:55 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:56 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:56 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:56 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:56 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:56 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:57 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:57 [main] Polling failed: Connection timed out
2024-03-11 23:23:57 [client 1] Polling failed: Connection timed out
2024-03-11 23:23:57 [client 1] Timeout of 10000 ms exceeded while reading query
2024-03-11 23:23:57 [client 1] error: terminating client connection: timeout
2024-03-11 23:23:58 [client 2] accepted client connection on fd 20
2024-03-11 23:23:58 [client 2] request: GET hosts\nStats: name !=\nStats: has_been_checked = 0\nStats: has_been_checked = 1\nStats: state = 0\nStatsAnd: 2\nStats: has_been_checked = 1\nStats: state = 1\nStatsAnd: 2\nStats: has_been_checked = 1\nStats: state = 1\nStats: active_checks_enabled = 1\nStats: acknowledged = 0\nStats: scheduled_downtime_depth = 0\nStatsAnd: 5\nStats: has_been_checked = 1\nStats: state = 2\nStatsAnd: 2\nStats: has_been_checked = 1\nStats: state = 2\nStats: active_checks_enabled = 1\nStats: acknowledged = 0\nStats: scheduled_downtime_depth = 0\nStatsAnd: 5\nOutputFormat: json\nResponseHeader: fixed16

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

I compiled the livestatus.o and unixcat as standalone using the check-mk-raw source using the --with-nagios4 option. The module appears to load fine and xinetd is also configured to serve on port 6557. Connectivity to the port seems to be fine, however, the module would error with the above errors and crash Nagios itself. Really need some assistance with this.

livestatus xinetd

service livestatus
{
        type		= UNLISTED
        port            = 6557
        socket_type	= stream
        protocol	= tcp
        wait		= no
        cps             = 100 3
        instances       = 500
        per_source      = 250
        flags           = NODELAY
        user		= nagios
        server		= /usr/local/bin/unixcat
        server_args     = /opt/nagios/var/rw/live
        disable		= no
}

nagios.log

[1710199574] Nagios 4.5.1 starting... (PID=540)
[1710199574] Local time is Mon Mar 11 23:26:14 UTC 2024
[1710199574] LOG VERSION: 2.0
[1710199574] qh: Socket '/opt/nagios/var/rw/nagios.qh' successfully initialized
[1710199574] qh: core query handler registered
[1710199574] qh: echo service query handler registered
[1710199574] qh: help for the query handler registered
[1710199574] wproc: Successfully registered manager as @wproc with query handler
[1710199574] wproc: Registry request: name=Core Worker 544;pid=544
[1710199574] wproc: Registry request: name=Core Worker 546;pid=546
[1710199574] wproc: Registry request: name=Core Worker 542;pid=542
[1710199574] wproc: Registry request: name=Core Worker 543;pid=543
[1710199574] wproc: Registry request: name=Core Worker 545;pid=545
[1710199574] wproc: Registry request: name=Core Worker 547;pid=547
[1710199574] livestatus: setting debug level to 7
[1710199574] livestatus: Livestatus by Checkmk GmbH started with PID 540
[1710199574] livestatus: version 2.1.0p40 compiled Mon, 11 Mar 2024 22:17:20 +0000 on 38db34bfb518
[1710199574] livestatus: built with g++-11 (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, using C++11 regex engine
[1710199574] livestatus: please visit us at https://checkmk.com/
[1710199574] livestatus: socket path = '/opt/nagios/var/rw/live'
[1710199574] livestatus: pnp path = ''
[1710199574] livestatus: inventory path = ''
[1710199574] livestatus: structured status path = ''
[1710199574] livestatus: robotmk html log path = '""'
[1710199574] livestatus: logwatch path = ''
[1710199574] livestatus: log file path = '/opt/nagios/var/livestatus.log'
[1710199574] livestatus: mkeventd socket path = '/opt/nagios/var/rw/mkeventd/status'
[1710199574] livestatus: rrdcached socket path = '/opt/nagios/var/rw/rrdcached.sock'
[1710199574] livestatus: Hint: Please try out Checkmk (https://checkmk.com/)
[1710199574] livestatus: removed old socket file /opt/nagios/var/rw/live
[1710199574] livestatus: opened UNIX socket at /opt/nagios/var/rw/live
[1710199574] livestatus: your event_broker_options are sufficient for livestatus.
[1710199574] livestatus: finished initialization, further log messages go to /opt/nagios/var/livestatus.log
[1710199574] Event broker module '/opt/nagios/mk-livestatus/livestatus.o' initialized successfully.

Did some additional tests. It seems like hosts and services are returning completely empty values for some reason, while status works as expected.

root@nagios:/# echo 'GET status' | unixcat /opt/nagios/var/rw/live
accept_passive_host_checks;accept_passive_service_checks;average_latency_cmk;average_latency_fetcher;average_latency_generic;average_latency_real_time;average_runnable_jobs_checker;average_runnable_jobs_fetcher;cached_log_messages;carbon_bytes_sent;carbon_bytes_sent_rate;carbon_overflows;carbon_overflows_rate;carbon_queue_usage;carbon_queue_usage_rate;check_external_commands;check_host_freshness;check_service_freshness;connections;connections_rate;core_pid;enable_event_handlers;enable_flap_detection;enable_notifications;execute_host_checks;execute_service_checks;external_command_buffer_max;external_command_buffer_slots;external_command_buffer_usage;external_commands;external_commands_rate;forks;forks_rate;has_event_handlers;helper_usage_checker;helper_usage_cmk;helper_usage_fetcher;helper_usage_generic;helper_usage_real_time;host_checks;host_checks_rate;influxdb_bytes_sent;influxdb_bytes_sent_rate;influxdb_overflows;influxdb_overflows_rate;influxdb_queue_usage;influxdb_queue_usage_rate;interval_length;is_trial_expired;last_command_check;last_log_rotation;license_usage_history;livechecks;livechecks_rate;livestatus_active_connections;livestatus_overflows;livestatus_overflows_rate;livestatus_queued_connections;livestatus_threads;livestatus_usage;livestatus_version;log_messages;log_messages_rate;max_long_output_size;metrics_count;metrics_count_rate;mk_inventory_last;nagios_pid;neb_callbacks;neb_callbacks_rate;num_hosts;num_queued_alerts;num_queued_notifications;num_services;obsess_over_hosts;obsess_over_services;perf_data_count;perf_data_count_rate;process_performance_data;program_start;program_version;requests;requests_rate;rrdcached_bytes_sent;rrdcached_bytes_sent_rate;rrdcached_overflows;rrdcached_overflows_rate;rrdcached_queue_usage;rrdcached_queue_usage_rate;service_checks;service_checks_rate;state_file_created
1;1;0;0;1.21293e-320;0;0;0;0;0;0;0;0;0;0;1;0;1;1;0;1117;1;0;1;1;1;0;0;0;0;0;12;0.0192456;1;0;0;0;0;0;14;0.0437976;0;0;0;0;0;0;15;0;0;0;;0;0;10;0;0;-9;10;1;2.1.0p40;15;0.0217838;0;0;0;0;1117;913;2.63223;1;0;0;1;0;0;0;0;1;1710207962;4.5.1;1;0;0;0;0;0;0;0;161;0.503794;0
root@nagios:/# echo 'GET hosts' | unixcat /opt/nagios/var/rw/live
root@nagios:/# echo 'GET services' | unixcat /opt/nagios/var/rw/live

I would say that there is no chance to build the actual livestatus for Nagios4.
The livestatus is expecting some information that the Nagios4 core cannot provide, like inventory data and some other things.
If you need a Nagios4 like core i would recommend to check out Naemon, this has a own livestatus implementation that is working.

Is there a specific use case that you need the livestatus module for Nagios4?

1 Like

I’m trying to get Nagios updated to current and still use it with Thruk, which we’ve been using for quite some time. The old version of Nagios4 (4.3.1) is using mk-livestatus 1.2.6p6 and that works perfectly fine. However, that’s also running on CentOS 6, so we need that retired as well. Unfortunately, the source for 1.2.6 does not compile on newer OS I’ve tried so far.

Ok if you want to use Thruk then a core like Naemon would be the best solution.
This core should be compatible to old Nagios configuration.