Mdadm - check causing warning

akeilhofer · April 2, 2020, 6:01am

Hi,
since recent debian templates on hetzner images are using this cronjob for checking software raids we are getting warnings:

Check:
19 2 2 * * root if [ -x /usr/share/mdadm/checkarray ]; then /usr/share/mdadm/checkarray --cron --all --idle --quiet; fi

Warning:
WARN - Status: active, Spare: 0, Failed: 0, Active: 4, Status: 4/4, UUUU, [Resync/Recovery] Finish: 242.9min, Speed: 151795K/sec WARN

output of /proc/mdstat

Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10]
md1 : active raid5 sdd2[4] sdb2[1] sda2[0] sdc2[2]
20957184 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
resync=DELAYED

md2 : active raid5 sdd3[4] sdb3[1] sda3[0] sdc3[2]
17558625792 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
[============>…] check = 60.4% (3540764180/5852875264) finish=260.9min speed=147679K/sec
bitmap: 1/44 pages [4KB], 65536KB chunk

md0 : active raid1 sdc1[2] sdd1[3] sda1[0] sdb1[1]
523712 blocks super 1.2 [4/4] [UUUU]

unused devices:

Is there any way to disable warnings on rechecks?

thanks

tosch · April 2, 2020, 6:09am

Hi @akeilhofer and welcome to the checkmk community.

If you know the time thsi will happen you also can schedule a periodic downtime of this service. You can accomplish this via rule (Recurring downtimes for services) or via the action menu from your service.

If you don’t like to get notifications to this service at the rebuild time you also can set a rule to diasable the notification for this specific service via the rule Notification period for services.

May this options help you with the issue.

akeilhofer · April 2, 2020, 6:26am

Hi,
Thanks - but this check is spread across multiple machines with multiple timeframes

tosch · April 2, 2020, 6:41am

Then I guess you need multiple rules with multiple definitions.

To my opinion behind:

And rebuild of an RAID will always be as situation which isn’t normal for a system. It can have an impact to the I/O performance and also is kind of an degraded state of your system/RAID. Additionally, if you set a rebuild state to OK you will never get informed about, if your RAID is rebuilding outside the cron planned rebuild.

akeilhofer · April 2, 2020, 6:54am

Hi,
I’m sharing youre opinion - but this is a recheck and no rebuild.
https://raid.wiki.kernel.org/index.php/Resync

Most Debian and Debian-derived distributions create a cron job which issues an array check at 0106 hours each first Sunday of the month in /etc/cron.d/mdadm. This task appears as resync in /proc/mdstat and syslog. So if you suddenly see RAID-resyncing for no apparent reason, this might be a place to take a look.

Hetzner seems to distribute these checks to various time by default, but nonetheless this seems odd to me.

tosch · April 2, 2020, 7:51am

Yeah, I know this behavior from hetzner. For mdstat rebuild or recheck is both the same, a syncing state. I actually have no software RAID system to see the output of the agent at this RAID recheck. Maybe it’s possible to modify the check to differ between rebuild and recheck.

Can you send me the output section of you agent at this situation?

akeilhofer · April 2, 2020, 8:04am

Hi,
btime 1582208749
processes 5710189
procs_running 3
procs_blocked 0
softirq 2889495877 10 1041581211 3461917 615710407 0 0 839752 847093929 795528 380013123
<<>>
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10]
md1 : active raid5 sdd2[4] sdb2[1] sda2[0] sdc2[2]
20957184 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
resync=DELAYED

md2 : active raid5 sdd3[4] sdb3[1] sda3[0] sdc3[2]
17558625792 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
[===============>…] check = 78.8% (4615807880/5852875264) finish=198.2min speed=104019K/sec
bitmap: 3/44 pages [12KB], 65536KB chunk

md0 : active raid1 sdc1[2] sdd1[3] sda1[0] sdb1[1]
523712 blocks super 1.2 [4/4] [UUUU]

unused devices:
<<<vbox_guest>>>
<<ntp:cached(1585814551,30)>>
% 0.debian.pool.n .POOL. 16 p - 64 0 0.000 0.000 0.000
% 1.debian.pool.n .POOL. 16 p - 64 0 0.000 0.000 0.000
% 2.debian.pool.n .POOL. 16 p - 64 0 0.000 0.000 0.000
% 3.debian.pool.n .POOL. 16 p - 64 0 0.000 0.000 0.000

is this enough?

tosch · April 2, 2020, 8:07am

Okay, the full output of mdstat is available from the agent. Let me check the code to the check and maybe it’s an easy task to edit it to your needs.

tosch · April 2, 2020, 8:42am

The parse function for md have to be changed to get use to the third state check beside recovery and resync for the check function. It’s not an task of a few minutes. You can write a mail to feedback@checkmk.com or talk to one of the checkmk partners to build a extension pack for (by inserting coins).

If you are comfortable with python you can try on your own to get this working by copying the ~site/share/check_mk/checks/md file to ~site/local/share/check_mk/checks/md and changing it to you needs.

system · May 2, 2020, 6:42pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.