Mknotifyd on slave site not starting anymore after update to 2.1.0p3

CMK version: cee 2.1.0p3
OS version: rhel8

Error message:

2022-06-25 10:49:18,853 [20] [cmk.mknotifyd] -----------------------------------------------------------------
2022-06-25 10:49:18,853 [20] [cmk.mknotifyd] Check_MK Notification Spooler version 2.1.0p3 starting
2022-06-25 10:49:18,854 [20] [cmk.mknotifyd] Log verbosity: 0
2022-06-25 10:49:18,857 [20] [cmk.mknotifyd] Daemonized with PID 167872.
2022-06-25 10:49:18,859 [40] [cmk.mknotifyd] FATAL ERROR:
Traceback (most recent call last):
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/main.py", line 307, in main
    run_notifyd(args, config, paths, start_time)
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/main.py", line 62, in run_notifyd
    connection_manager.initialize_stunnel()
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/connection_manager.py", line 192, in initialize_stunnel
    config: str = create_stunnel_config_from_mknotifyd_config(
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/connection_manager.py", line 46, in create_stunnel_config_from_mknotifyd_config
    if incoming and incoming["encryption"] != "unencrypted":
KeyError: 'encryption'
2022-06-25 10:49:45,616 [20] [cmk.mknotifyd] -----------------------------------------------------------------
2022-06-25 10:49:45,616 [20] [cmk.mknotifyd] Check_MK Notification Spooler version 2.1.0p3 starting
2022-06-25 10:49:45,616 [20] [cmk.mknotifyd] Log verbosity: 0
2022-06-25 10:49:45,619 [20] [cmk.mknotifyd] Daemonized with PID 168980.
2022-06-25 10:49:45,622 [40] [cmk.mknotifyd] FATAL ERROR:
Traceback (most recent call last):
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/main.py", line 307, in main
    run_notifyd(args, config, paths, start_time)
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/main.py", line 62, in run_notifyd
    connection_manager.initialize_stunnel()
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/connection_manager.py", line 192, in initialize_stunnel
    config: str = create_stunnel_config_from_mknotifyd_config(
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/connection_manager.py", line 46, in create_stunnel_config_from_mknotifyd_config
    if incoming and incoming["encryption"] != "unencrypted":
KeyError: 'encryption'

I want to add that the notification spooler configuration is very confusing because when you configure the central site to ‘connect to remote sites’ - this configuration will also be sync’ed to the remote sites. Thats not what anyone wants I guess. One has then to enable wato on the remote site, go into the spooler config and revert the config to ‘accept incoming tcp connections’ - I don’t get it.
Also, when you Save the changes on the remote site - there are ‘no pending changes’ so I assume the settings are not saved

After the update to 2.1.0p4 the mknotifyd worked for like 5mins and then it stopped again.
Seems there’s something being configured which crashes it. The update fixed it but then it got changed again I guess.

2022-07-04 09:01:57,775 [20] [cmk.mknotifyd] sending command LOG;SERVICE NOTIFICATION RESULT: msteams;xxxxxxxxx;Systemd Timesyncd Time;OK;msteams;200;200
2022-07-04 09:02:17,805 [20] [cmk.mknotifyd] Configuration has changed.
2022-07-04 09:02:17,806 [40] [cmk.mknotifyd] FATAL ERROR:
Traceback (most recent call last):
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/main.py", line 307, in main
    run_notifyd(args, config, paths, start_time)
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/main.py", line 62, in run_notifyd
    connection_manager.initialize_stunnel()
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/connection_manager.py", line 192, in initialize_stunnel
    config: str = create_stunnel_config_from_mknotifyd_config(
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/connection_manager.py", line 46, in create_stunnel_config_from_mknotifyd_config
    if incoming and incoming["encryption"] != "unencrypted":
KeyError: 'encryption'

2022-07-04 09:01:57 was the last working notification

Hi, are all of your sites on 2.1? Because we fixed that with Fix KeyError on mknotifyd start after upgrade to 2.1.0

Yes they are. I just updated both instances this morning.

For the record 2.1.0p5 didn’t fix it.

2022-07-05 15:32:15,923 [20] [cmk.mknotifyd] -----------------------------------------------------------------
2022-07-05 15:32:15,924 [20] [cmk.mknotifyd] Check_MK Notification Spooler version 2.1.0p5 starting
2022-07-05 15:32:15,925 [20] [cmk.mknotifyd] Log verbosity: 0
2022-07-05 15:32:15,928 [20] [cmk.mknotifyd] Daemonized with PID 3054956.
2022-07-05 15:32:15,930 [20] [cmk.mknotifyd] Listening for remote unencrypted connections at port 6555
2022-07-05 15:32:58,393 [20] [cmk.mknotifyd] Configuration has changed.
2022-07-05 15:32:58,394 [40] [cmk.mknotifyd] FATAL ERROR:
Traceback (most recent call last):
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/main.py", line 307, in main
    run_notifyd(args, config, paths, start_time)
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/main.py", line 62, in run_notifyd
    connection_manager.initialize_stunnel()
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/connection_manager.py", line 192, in initialize_stunnel
    config: str = create_stunnel_config_from_mknotifyd_config(
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/connection_manager.py", line 46, in create_stunnel_config_from_mknotifyd_config
    if incoming and incoming["encryption"] != "unencrypted":
KeyError: 'encryption'

2.1.0p6 didn’t fix it as well

2022-07-08 12:48:33,749 [20] [cmk.mknotifyd] -----------------------------------------------------------------
2022-07-08 12:48:33,749 [20] [cmk.mknotifyd] Check_MK Notification Spooler version 2.1.0p6 starting
2022-07-08 12:48:33,750 [20] [cmk.mknotifyd] Log verbosity: 0
2022-07-08 12:48:33,764 [20] [cmk.mknotifyd] Daemonized with PID 2954988.
2022-07-08 12:48:33,766 [20] [cmk.mknotifyd] Listening for remote unencrypted connections at port 6555
2022-07-08 12:49:01,677 [20] [cmk.mknotifyd] Configuration has changed.
2022-07-08 12:49:01,712 [40] [cmk.mknotifyd] FATAL ERROR:
Traceback (most recent call last):
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/main.py", line 307, in main
    run_notifyd(args, config, paths, start_time)
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/main.py", line 62, in run_notifyd
    connection_manager.initialize_stunnel()
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/connection_manager.py", line 192, in initialize_stunnel
    config: str = create_stunnel_config_from_mknotifyd_config(
  File "/omd/sites/monitoring01/lib/python3/cmk/cee/mknotifyd/connection_manager.py", line 46, in create_stunnel_config_from_mknotifyd_config
    if incoming and incoming["encryption"] != "unencrypted":
KeyError: 'encryption'

Hi,
how are these settings looking like for your remote site and how are they looking like for the central site?

Remote Site

Central site

But I disabled pushing notifications to the central site for now as the mknotifyd was not starting.
Should I try again?

EDIT: I configured to let central site connect to remote site again but this time I enabled encryption as well - seems like it is now working. Fingers crossed

Anyway, could it be that unencrypted traffic is just not possible?

I now get the following error

Connection failed or terminatedCRIT
Error reading data: [Errno 104] Connection reset by peer

This is the central site.
I tested the port 6555 from the central site with netcat and it works.

How is your configuration now?

Central Site


Remote Site

I will check that. Please disable TLS for now. I will come back to you

hi, I checked that with the same settings. No luck to reproduce the issue.
Could you please check the mknotifyd.log on central and remote?

Choosing plaintext now works - lets see how long. Thanks for digging into it.

I had this problem on all my check mk appliances (5) - I upgraded and mknotifyd would not start.

I changed the connection to plaintext and it now works also.

Strange that something that worked in an older version is now broken especially since it adds security.

Anyway wanted to say thank you for getting me out of a fix :slight_smile:

We had the same problem today upgrading from 2.0p37 to 2.1.p30.
Mknotifyd died on all satellites with the error message in the mknotifyd.log as above ( KeyError: ‘encryption’ )
We solved it this way

  1. Set all Notification connections to unencrypted in the central sites and on the satellites
  2. activated the changes
  3. restarted the mknotifyd with omd restart <sitename> mknotifyd
  4. Setting all connections to “Encrypt with TLS” in the central sites and on the satellites
  5. activated the changes
1 Like