UPS battery alert when self-test

I have an APC Smart-UPS that can do self-tests on schedule. The issue is, when it does the self-test, the battery goes down a few points of percentage and I get a notification.

Battery status: normal, Output status: on battery (calibration invalid) (self-test running), Capacity: 95% (warn/crit below 99.0%/80.0%)WARN, Time remaining: 98 m

As we can see, the check does say the self-test is running. When this happens, the output status is on battery (that’s the purpose of the self-test). APC has a schedule for the self-test but it isn’t very consistent, it does it when it boots and then every week but the timing depends on the boot time. We can blame APC all we want for the lack of options, but I’d like to find a solution with Checkmk and would avoid getting any alert during the self-test. Is there a way to do that without editing the battery percentage thresholds since I do want to be notified if we’re on battery for any other reason than a self-test?

I did not check the code, but you probably have to go with the percentages of the battery, if you do not want to change the plugin. Apparently it does not care about state of the self test.
Have you checked all available rule sets?

In the rule ‘APC Symmetra Checks’ you find the option ‘Levels of battery parameters after calibration’ which might help. I never tested it but please report here your results.
Anyway I would rather care about runtime than on the % of battery capacity.

have a goo done

I wanna know as soon as we’re on battery power as long as it’s not a self-test. I don’t see any option to do that currently with the rules available and how the plugin works. Setting the percentage threshold warning to 99% gives me what I want except for the self-test situation which I’d like to ignore… the plugin needs improvement. I will see what I can do in that regard.

Mayby I misinterpret something but reading the inline help it sounds like the exact function you are looking for:

After a battery calibration the battery capacity is reduced until the battery is fully charged again. Here you can specify an alternative lower level in this post-calibration phase. Since apc devices remember the time of the last calibration only as a date, the alternative lower level will be applied on the whole day of the calibration until midnight. You can extend this time period with an additional time span to make sure calibrations occuring just before midnight do not trigger false alarms.

I would give it a try

Calibration and self-test are 2 different features. During the calibration, the UPS goes on battery until it’s fully drained, this way it can know how long it can truly last on batteries. Most of the times you don’t calibrate your UPS unless you’re suspecting an issue.

A self-test on the other hand is just a quick test (less than 5 minutes) where the UPS goes on battery and check that everything is OK with the battery, the voltage, all the sensors etc.

OK, thank you for clarification.
The only chance left for you is to copy the plugin to local structure and adjust the code to your needs as mentioned by Robin.

That is a very common misunderstanding. You should calibrate your UPS in a regular interval (one time per year) to get real runtime results. Without valid calibration, all values shown for runtime are only an estimate.

That’s completely correct and now comes the problem. The wording inside the rule is wrong.
The check takes the last diag date (self test) as the time of the last calibration.
If i read the code correct then the rule @mike1098 mentioned is really for the time after an self test and not like the wording suggests for time after calibration.

That’s a real bug @robin.gierse

1 Like

Hi Andreas, thank you that was also my assumption but I avoid to struggle :wink:
@dnLL Again, please give it a try and let us know if it works.
@robin.gierse If possible clarify and improve the inline help.

I refer to the official APC documentation: How do I perform a Battery Calibration on my Smart-UPS? - APC USA. More specifically, this:

Due to increased wear and tear on the battery we recommend performing a calibration no more than once every 6 months (if needed).

I guess it’s up to interpretation what they mean by if needed. For my homelab, I would recommend against it. For production environnements… well, I know we’re not doing the calibration and we have hundreds of APC UPS, maybe this should be reconsidered.

I just tested it and I can confirm that you are correct. Here is how I set it up:

The wording needs to be adjusted, otherwise it works very well.

Also, as I was testing, the status detail incorrectly mentions calibration as well.

Would be happy if you could mark my solution as solution :wink:

Hi,

so i did some troubleshooting with this plugin and i think the wording is correct, but it takes the wrong OID in the background. APC calls this TestDiagnostic and not Selftest in SNMP. Maybe that was misleading.

So i’ll forward this as a bug and it should fixed in the upcoming releases.

About the self-test:
We already consider the running self-test in the overall state of the service. But as dnLL pointed out the battery check is not included.

So here’s a question for all and especially the UPS Users. We have 4 options:

  • No change.
  • We ignore the configured battery thresholds while a self-test is running generally.
  • Adding an option/checkbox “Ignore battery threshold while self-test is running”.
  • Integrate another alternate battery level configuration for the time the self-test is running.

What would you prefer?

KR,
Max

1 Like

The OID the check fetches are the correct ones to see the self test date and the calibration result. The wording is not correct. Inside the check there is no test if a calibration is running only the self test check.

    if state_output_state != "":
        # string contains a bitmask, convert to int
        output_state_bitmask = int(state_output_state, 2)
    else:
        output_state_bitmask = 0
    self_test_in_progress = output_state_bitmask & 1 << 35 != 0

The same can be done for calibration is running on the same bitmask.
If i counted correctly it should be.

    calibration_in_progress = output_state_bitmask & 1 << 54 != 0

As a result you need two options inside the rule set.
Option 1 - time after self test without thresholds
Option 2 - time after calibration without thresholds - this time is significant longer than the first one as the battery is drained to around 25%

Important here is the time after the self test and calibration.

As addition you need to specify what should happen while the test is running.

Hi Andreas,

i still think that the wording is correct, but we took the lastselftestdate and not the lastcalibrationdate.
Mainly because it wouldn’t make sense to configure a longtime alternate battery threshold after a self-test.
Also we know and consider the calibration state already. Variable is “calib_result” and as long as it has “calibration in progress” the state of check will be Ok.

But you’re right that we also should consider a short amount of time after the self-test when the battery is reloading again.

So i would suggest to add a checkbox to “Levels of battery capacity”.
As example: “Ignore configured levels while self-test is running + 10 minutes”

KR,
Max

edit: Ok, the post-selftest timespan will be a problem, as we just have the date, not any time.
So we can switch off the battery check while running the self-test, but it will be on warn/crit when the self-test is finished and the battery is not fully loaded again.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.