One question - what was the real problem in your case?
There are multiple answers to this question. So what brought me there:
- We suffer from very slow notifications because of our large configuration. Each notification has a 4-5s delay when Checkmk loads the complete site configuration although it only needs the notification configuration. When there are larger networks hickups it takes hours until all notifications are processed. We were told to wait for 2.5 which should improve notification speed.
- Because our regular notification scripts do not support bulk mode I’m trying to write a generic wrapper that creates the necessary environment and calls the regular notification script (checkmk-extensions/generic_bulk_wrapper/src/notifications/generic_bulk_wrapper.py at main · mayrstefan/checkmk-extensions · GitHub). It’s only purpose is to avoid the previously described performance bottleneck. Occasionally I got an OSError 7 exception because tried to create an environment which was too large. It worked in non-bulk mode but it sometimes failed in bulk-mode which showed me that there is a difference
- I’m trying to understand: what is specified, what is documented and how do these informations relate to each other (or not). The problem with undocumented behaviour is that you work with assumptions that may be wrong. And sooner or later something will break.
The slow notification issue is something old we can find for example in Spooled Notifications are to slow (only 1 notif per sec) and maybe Check_MK slow sending notifications was also related to that. Support told us to reduce notifications but I don’t see how to do this: the more configuration objects you have the slower the notification gets. The more things you monitor the higher the probility something will change state. So a growing site means more notifications and they will get slower. We’ll see what we will get with 2.5.
The 150kByte mail or something else?
The plugin with the 150K output was the Checkmk builtin NetScaler SNMP plugin.