HTTP active checks no longer working after server reboot

Man-in-Black · March 4, 2022, 6:56am

CMK version:
2.0.0p21
OS version:
Ubuntu 20.04.3

Error message(s):

connect to address … and port 443: Connection refused
CRITICAL - Socket timeout after 10 seconds
Temporary failure in name resolution

I’ve had a problem since the last restart of my Checkmk server.
I set up some HTTP checks that worked great. But not since the restart and I get the error messages as above. This applies to both port 80 and 443 testing
My rule for this is as follows:

When I run the command on the console I also get a timeout.
However, if I add the “-4” switch to the command, it works fine.
Here’s the error:
OMD[monitoring]:~$ /omd/sites/monitoring/lib/nagios/plugins/check_http --ssl --extended-perfdata --sni -p 443 -H [FQDN] [FQDN]
CRITICAL - Socket timeout after 10 seconds

No error here:
OMD[monitoring]:~$ /omd/sites/monitoring/lib/nagios/plugins/check_http --ssl --extended-perfdata --sni -p 443 -H [FQDN] [FQDN] -4
HTTP OK: HTTP/1.1 200 OK - 391 bytes in 0.009 second response time |time=0.008576s;;;0.000000;10.000000 size=391B;;;0 time_connect=0.001873s;;;;10.000000 time_ssl=0.005420s;;;;10.000000 time_headers=0.000006s;;;;10.000000 time_firstbyte=0.001088s;;;;10.000000 time_transfer=0.001190s;;;;10.000000

I assumed that the point “Enforce IPv4” does exactly that. But it doesn’t matter how I set it.
What can be the reason?

I have a simple question: Is there a way to work with wildcards via the “Notes URL for Host” rule?

We use a Dokuwiki and if I could set a wildcard here I would always get a search result quickly.

Best regards
Sascha

ChristianM · March 4, 2022, 7:22am

Hi Sacha,

did you check ~/var/log/web.log if there is an error relatet to the check_http wrapper? Looks like that the “-4” parameter not correct set at this point. You say this was after a reboot. Looks like that teh HTTP service only bind to IPv6. Looks like something is changed in your system config.

Best regards,
Christian

Man-in-Black · March 4, 2022, 7:56am

The only entries in this logile are:

...
2022-03-04 06:20:01,928 [30] [cmk.web.background-job 2918318] Found no abandoned profile.
2022-03-04 07:21:01,639 [30] [cmk.web.background-job 3100445] Found no abandoned profile.
2022-03-04 08:22:01,282 [30] [cmk.web.background-job 3291933] Found no abandoned profile.

What exactly did you mean with

Looks like that teh HTTP service only bind to IPv6.

I unterstand this when I am using for e.g. an Apache webserver or sort like that, but in this case I’m just sending a check command. How can a port be bound here?

Man-in-Black · March 7, 2022, 6:26am

I’ve investigated it the whole weekeend and could not find any hint.
From commandline it is working fine, only from the Checkmk GUI it is not working.

andreas-doehler · March 7, 2022, 7:46am

The problem is that the setting “enforce IPv4” don’t leads to any config option inside the command string.
Only the setting for IPv6 has an effect. That’s clearly a bug inside the transformation of the setup rule to the check command.
Would be good to change the title to something pointing in this direction.

Why this problem occurred after a reboot i don’t know. I can only assume that now you get an IPv6 answer as the first answer for the DNS query.

Man-in-Black · March 7, 2022, 10:03am

I just checked it again, the error also occurs with hosts that only use IPv4.
Even if I disable IPv6 completely, the problem persists.

Is there a way to add IPv4 to the check? possibly on the command line?

andreas-doehler · March 7, 2022, 12:06pm

You can copy the “check_http” from “~/share/check_mk/checks/” to “~/local/share/check_mk/checks/” and edit the “common_args” section.

There you find the lines.

    if host.family == "ipv6":
        args.append("-6")

add here two more lines

    if host.family == "ipv4":
        args.append("-4")

Now you should have the “-4” option also inside your command.

Man-in-Black · March 7, 2022, 2:55pm

I now have the “-4” option in my command, but it is not working yet.
But now I get the following error in the details:
HTTP CRITICAL - Unable to open TCP socket

andreas-doehler · March 7, 2022, 4:42pm

Manually the “check_http” with the same options as shown inside the check command is working? If the monitoring instance cannot open a TCP socket then you should have the same error message with manual execution.

Man-in-Black · March 7, 2022, 10:31pm

Within the Checkmk GUI i got the following:

If I want to execute the command manual I got the following error:

OMD[monitoring]:~$ ./local/share/check_mk/checks/check_http   
Traceback (most recent call last):
  File "./local/share/check_mk/checks/check_http", line 256, in <module>
    active_check_info["http"] = {
NameError: name 'active_check_info' is not defined

andreas-doehler · March 8, 2022, 7:23pm

For the manual test you can only use the “~/lib/nagios/plugins/check_http” with a the shown arguments from inside the GUI.
The check_http you posted is only the wrapper for the Nagios plugin to generate all the options.

Man-in-Black · March 15, 2022, 7:08am

I’m a bit embarrassed now:
It was probably less a problem with the check and more a stray rule I created. This is actually controlled by a tag whether the HTTP check should run or not, but I probably set the default wrong and servers were checked that actually do not provide HTTP.
I would like to thank you for your active support on the subject, but it looks like I was able to solve it myself, but you guided me in the right direction with the tips.

system · March 15, 2023, 7:09am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.