Removing stale service alerts

I have several hosts that have stale services showing up in monitor->stale services. A couple of the hosts haven’t been able to connected to for months and I’d like to just clear them out completely. Is there a way to clear these off of the stale alerts, or tell the system to just ignore these hosts and don’t check them at all without simply deleting the host completely?

I do have a custom attribute for test/prod/ignore but I was hoping to find a built in method before using this create any custom rules.

Thank you.

Stale is not an alert directly it is only shown inside the web gui.
You have only the option to rediscover these hosts. Then all services are removed that have no actual data. If you don’t want to see stale services inside the overview in the sidebar then you can disable the stale services there.

Thanks for the quick answer Andreas. The hosts are down right now so re-discovering just times out, so there’s no opportunity to remove vanished services. In this case I’ll probably just remove the hosts completely, or just leave the stale services as they are.

Thanks!

You can also set the hosts to ping only and then discover them → all services should be gone and the host is only shown as down.

Hi,
please habe a look at OMD .* performance. If you see Helper Usage > 80 %, please adjust the core setting to handle more results (there are different settings between 1.6 and 2.0).

Cheers,
Christian

1 Like

I agree with @ChristianM regarding monitoring performance.
However, if hosts are down it is pointless to check them. So I would suggest disabling them in Setup, or remove them altogether. Automating this might be possible, but I am not aware of a built-in way.

The problem is more a viewing problem. A host that is down should not show services as stale inside the sidebar “Overview” as this makes no sense.
The default sidebar snap-in has the wrong filters set.

If a host is up then it would be correct to see stale services of this host but not on a down host.

For such systems where i have many hosts that are part time down i create a own sidebar snap-in and set the correct filters there, that i only see stale services if the host is up.

Hello Andreas,
I always thought it was supposed to be that the services of an unreachable host are shown as stale. As I understand you, you also find this nonsensical. Does it make sense to place this somehow at Checkmk? Maybe as feature request?

@robin.gierse what do you think about this?

Regards
Christian

I would say it so - from an operational point of view
Host down → all services get no data that is clear - no stale is needed here
Host up → some or all services get no data → stale should be shown as this is not expected/normal for an up host

1 Like

Alright, disclaimer first: The following is only my opinion, not necessarily that of the tribe.

@alanb: I think the easiest way to solve our immediate query is to use the built-in host tag Criticality and set it to Do not monitor this host.
image

Regarding the general thoughts on staleness: I get what @andreas-doehler is saying, but I cannot fully support it. It depends on how you approach this topic. On one hand: Sure you could handle it like suggested, because to some that will be the intuitive understanding. But on the other hand checkmk knows for sure that the host is down, but it cannot know for sure the state of the services of that host. So technically speaking this is correct.
Should you bring this to the tribe’s as a feature request? I cannot say. Feel free, there are plenty of ways to do so. It will just be very low priority. :slight_smile:
The other alternative is to go ahead as @andreas-doehler suggested and simple create a custom sidebar element, based on the Overview and filter it accordingly.

On one hand: Sure you could handle it like suggested, because to some that will be the intuitive understanding. But on the other hand checkmk knows for sure that the host is down, but it cannot know for sure the state of the services of that host. So technically speaking this is correct.

that makes sense from a purely technical point of view, but I think over 90% of real world cases are the different way around and hence the stale column becomes meaning less.

Maybe instead of a feature, where users have to actively change this particular setting or having everyone built a custom snapin, that then each user has to add as well, can we add this to a list of questions for the next conference? I think it was always a good idea that GUI changes, where there is no obvious “wrong/right” are decided by popular vote :). I’m feeling confident that the vast majority will share @andreas-doehler’s point of view.

2 Likes