OK so one server had 32k disabled services, the other had 46k (so many in fact that the “service discovery” page would run into a timeout).
Fun sidenote: My Chrome managed to display the discovery page for the server with 32k services, but the tab consumed 3.2G RAM. When i tried to move some of them to undecided, it tried to rescan/refresh as it usually does, and the tab’s memory usage went up to 5GB, then the tab crashed with “out of memory”. ![]()
I was able to get rid of the ZFS services by running cmk --checks zfsget -II <hostname>.
I made a few changes and activated them and it looks like I’m back to 23-26 seconds per activation. So maybe that was it!
During this journey, I also found out that a bulk discovery from the web interface does not remove vanished services that are disabled. I had to use the CLI.
Hopefully this helps someone else at some point.