Why do the notification scripts run twice? Do you plan to change it?

peterge · January 6, 2023, 2:49am

CMK version:
2.1.0p18, raw, docker
OS version:
Debian 11

Hey guys,
I came across something strange which I don’t understand yet. I was able to drive around the “error” (don’t want to call it bug ) right now. Let me explain what I did:

I am using notification scripts to recreate what is available with alert handlers, but in pure bash (we use SSH for agent calls, so I needed to work around the ForceCommand with a wrapper script which detects what the content of SSH_ORIGINAL_COMMAND is).
I am logging with a curl command to our Mattermost channel before one of the script, executing the SSH command on $NOTIFY_HOSTNAME, runs.
Now here is what I found:

This got printed after the matching service went CRITICAL. Now, my question: Why does it get executed twice?
By printing the date (with $(date)), it came clear what’s going on: The script is executing twice, the second run starts after the first has finished. In the screenshot is a delay of sleep 10 after sending the Mattermost message.

I was able to work around this “feature” (?) by creating a .lock file at the end of the notification script and checking if it’s there before printing to Mattermost. If it’s found, I just rm it and exit 0. This way I am able to compensate the double execution, which feels like a dirty workaround.

Is this intended? Why?

Thanks!

Anders · January 6, 2023, 8:30am

Did you by any change trigger this using a “fake alert”? if so it will automatically be reset. So you should have one CRIT that goes back to “OK” - So in total two notifications.

I might misunderstood you here, I have not used the RAW edition.

peterge · January 6, 2023, 9:06am

Good idea, I was using fallocate on the monitored client to achive a full filesystem, which is set as condition in the Notification Rule. Thats why I am running a ‘cleanup’ script via a SSH command .

But I just took the time and tested it with fake check result.

And the following happened:

And this are the log messages triggered by the RECOVERY:

As you can see, they come basically at the same time (9:57), which is the automatically reset, you mentioned.

So, as conslusion, fake check result is not helping at all, it just confirms the initial error/bug? reported by me…

ChristianM · January 6, 2023, 9:10am

Hi.

I think this is the same discussion about alerting. Most of the time it’s a problem with teh notification rule.
Double notifications - Troubleshooting - Checkmk Community
Best, Christian

Anders · January 6, 2023, 9:21am

Yes I think you are right. It’s been some time since I wrote my last notification script and I remind that I explicitly had to set that rule to not notify any contact groups or users.

peterge · January 6, 2023, 9:24am

Yep, thats it. Thank you very much! (I really liked my .lock file logic )

system · January 6, 2024, 9:24am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.