Maximum long output size in notifications

mayrstefan · November 12, 2025, 2:52pm

I’m currently playing around with bulk notifications and observed something surprising:

I got a NOTIFY_LONGSERVICEOUTPUT with ~ 150K of characters.

In global settings we have a configured “Maximum long output size” of 6000 Bytes.

Checkmk version is 2.3.0p39 (Enterprise Edition).

What should be the expected behaviour?

full output
output truncated to whatever is set in global settings

Also interesting: for regular notifications everything is passed as an environment variable. When the notification plugin is called all arguments and environment variables should be limited by the operation system. On our Main instance this is currently

OMD[Main]:~$ getconf ARG_MAX
2097152

So this would be an upper limit for all data that is passed to a notification in non-bulk mode.

mayrstefan · November 14, 2025, 6:31pm

Also interesting: Nagios limits output to 4K, see Nagios Core Plugin API · Nagios Core Documentation

Does that result in different output lengths between Raw (using Nagios Core) and CEE (using CMC)?

mayrstefan · November 16, 2025, 6:08pm

I did some testing over the weekend with a Checkmk 2.4 Raw Edition (Docker container) with the following findings:

regular notifications script get output truncated if it is larger then 64K (65536 characters)
on truncated output the string “…” is added (which results in a line of 65539 characters
for bulk notifications the output seems not to be truncated

mayrstefan · November 16, 2025, 9:11pm

Found checkmk/cmk/base/notify.py at master · Checkmk/checkmk · GitHub

The limit is calculated from pagesize. This is usually 4K which gives us 64K (65536). Adding “…\nAttention: Removed remaining content because it was too long.” gives us 65601 characters.

I guess the appended text contains a bug: the “\n” for the newline should be escaped as all other newlines in the output.

Strange is also the fallback calculation which uses 4046 instead of 4096 as pagesize. Maybe someone from the Checkmk team can tell us why they use 4046.

andreas-doehler · November 17, 2025, 7:27am

This affects only the web GUI nothing else.

One question - what was the real problem in your case?
The 150kByte mail or something else?

msommer · November 17, 2025, 7:41am

This affects only the web GUI nothing else.

…and is currently broken/ignored in 2.3.0p40 anyway.

mayrstefan · November 17, 2025, 9:15am

One question - what was the real problem in your case?

There are multiple answers to this question. So what brought me there:

We suffer from very slow notifications because of our large configuration. Each notification has a 4-5s delay when Checkmk loads the complete site configuration although it only needs the notification configuration. When there are larger networks hickups it takes hours until all notifications are processed. We were told to wait for 2.5 which should improve notification speed.
Because our regular notification scripts do not support bulk mode I’m trying to write a generic wrapper that creates the necessary environment and calls the regular notification script (checkmk-extensions/generic_bulk_wrapper/src/notifications/generic_bulk_wrapper.py at main · mayrstefan/checkmk-extensions · GitHub). It’s only purpose is to avoid the previously described performance bottleneck. Occasionally I got an OSError 7 exception because tried to create an environment which was too large. It worked in non-bulk mode but it sometimes failed in bulk-mode which showed me that there is a difference
I’m trying to understand: what is specified, what is documented and how do these informations relate to each other (or not). The problem with undocumented behaviour is that you work with assumptions that may be wrong. And sooner or later something will break.

The slow notification issue is something old we can find for example in Spooled Notifications are to slow (only 1 notif per sec) and maybe Check_MK slow sending notifications was also related to that. Support told us to reduce notifications but I don’t see how to do this: the more configuration objects you have the slower the notification gets. The more things you monitor the higher the probility something will change state. So a growing site means more notifications and they will get slower. We’ll see what we will get with 2.5.

The 150kByte mail or something else?

The plugin with the 150K output was the Checkmk builtin NetScaler SNMP plugin.

andreas-doehler · November 17, 2025, 10:02am

I don’t know what is planned for 2.5 but in 2.3 & 2.4 , if i had this problem, i tried to switch to “Enable synchronous delivery via SMTP" if possible. The biggest problem at the moment is the spooler. If you have a huge amount of notifications then it does not work, there you are right.

Also in very big environments i try to handle the notifications directly on the distributed notes and don’t use the notification forwarding.

mayrstefan · November 17, 2025, 6:48pm

Our notifications are no mails. We forward our notifications as some sort of pseudo XML data with its own protocol to our central event management systemen. So we need our own notification scripts to get it formated right before we can pass it to a cli command to send it the central system.

Also we already use the spoolers on each site to decouple things.