ReliableRaid

Differences between revisions 21 and 22
Revision 21 as of 2009-12-15 12:37:07
Size: 9460
Editor: 77-21-62-108-dynip
Comment:
Revision 22 as of 2009-12-15 12:46:20
Size: 9675
Editor: 77-21-62-108-dynip
Comment:
Deletions are marked like this. Additions are marked like this.
Line 20: Line 20:
The assembling of arrays with "mdadm" has been moved from the debian startup scripts to the hotplug system (udev rules), however some bugs defy the hotplug mechanism and two things that are generally expected (as in just works in other distros) are missing functionality in ubuntu: The assembling of arrays with "mdadm" has been moved from the debian startup scripts to the hotplug system (udev rules), however some bugs defy the hotplug mechanism and three things that are generally expected (as in just works in other distros) are missing functionality in ubuntu:
Line 22: Line 22:
1. No handling of raid degration during boot for non-root filesystems (at all). (Boot simply stops at a recovery console)  1. No handling of raid degration during boot for non-root filesystems (at all). (Boot simply stops at a recovery console)
Line 26: Line 26:
1. Only limited and bugy handling of raid degration for the rootfs.(Working only for plain no lvm/crypt md's and after applying a fix from the 9.10 release notes.).  1. Only limited and bugy handling of raid degration for the rootfs.(Working only for plain no lvm/crypt md's and after applying a fix from the 9.10 release notes.).
Line 37: Line 37:
No problem arises in a hotplugable system if an array is degraded and a drive comes  1. No notification of users/admins about raid events is enabled (email question suppressed during install without a buzzer/notify-send replacement.)

Note thate no problem arises in a hotplugable system if an array is degraded and a drive comes
Line 43: Line 45:
There really isn't any problem requiring a rescue console/repair to boot, if an array
is degraded. There is a problem of not notififying anybody in all other
cases of disk failures (and booting straight up in those cases anyway).
There really isn't any problem that requires a rescue console/repair to boot, if an array
is degraded while the system was powered down. There is however a problem of not notififying anybody in all other
cases of disk failures (booting straight up in those cases anyway).
Line 47: Line 49:
Possible tasks requiring admin action *after*
the raid has done what it is designed to (saving ass in case of
Possible tasks that require an admin action *after* the raid has done what it is designed to (saving ass in case of
Line 51: Line 52:
 * To forcibly readding a drive marked faulty to the array (occasional blockerror that drives do remap automatically)  * Forcibly re-adding a drive marked faulty to the array (occasional block error that drives do remap automatically).

Summary

RAIDs (Redundant arrays of independent disks) allow systems to keep functioning even if some parts fail. You just plug more then one disk side by side. If a disk fails the mdadm monitor will trigger a buzzer, notify-send or send email to notify that a (new spare) disk has to be added to up the redundancy again. All the while the system keeps working unaffectedly.

Release Note

Event driven raid/crypt setup. (General hotplugging ability with support for booting more than only a simple root on mdX device setup if they are degraded.)

Rationale

Unfortunately ubuntu's md (software) raid configuration seems to suffer from a little incompleteness.

The assembling of arrays with "mdadm" has been moved from the debian startup scripts to the hotplug system (udev rules), however some bugs defy the hotplug mechanism and three things that are generally expected (as in just works in other distros) are missing functionality in ubuntu:

  1. No handling of raid degration during boot for non-root filesystems (at all). (Boot simply stops at a recovery console)
  2. There is no init script at all to start/run necessary regular (non-rootfs) arrays degraded. 259145 non-root raids fail to run degraded on boot

  3. Only limited and bugy handling of raid degration for the rootfs.(Working only for plain no lvm/crypt md's and after applying a fix from the 9.10 release notes.).
  4. The initramfs boot process is not (a state machine) capable of assembling the base system from devices appearing in any order and starting necessary raids degraded if they are not complete after some time.
    • 491463 upstart init within initramfs (Could handle most of the following nicely by now.)

    • 251164 boot impossible due to missing initramfs failure hook integration

    • 136252 mdadm, initramfs missing ARRAY lines

    • 247153 encrypted root initialisation races/fails on hotplug devices (does not wait)

    • 488317 installed system fails to boot with degraded raid holding cryptdisk

    • The proper mdadm --incremental option does not work in initramfs (not creating device nodes) 251663

  5. No notification of users/admins about raid events is enabled (email question suppressed during install without a buzzer/notify-send replacement.)

Note thate no problem arises in a hotplugable system if an array is degraded and a drive comes up later. It is simply added to the array (and synced in the background if writes have been performed in the meantime). The admin however can get a notification that a drive is starting to have problems coming up quickly.

There really isn't any problem that requires a rescue console/repair to boot, if an array is degraded while the system was powered down. There is however a problem of not notififying anybody in all other cases of disk failures (booting straight up in those cases anyway).

Possible tasks that require an admin action *after* the raid has done what it is designed to (saving ass in case of failure) can be:

  • Forcibly re-adding a drive marked faulty to the array (occasional block error that drives do remap automatically).
  • Replacing a faulty drive.

Use Cases

  • Angie installs ubuntu on a raid stripe for the rootfs (/) and a raid mirror with lvm for /home and swap. When one of the raid mirror members fails/is detatched while the system is powerd down: The system waits 20 seconds (default) for the missing member then resumes booting with a degraded raid emits the notifications by means of beeping, notify-send, and email (configurable). When the raid mirror member is reattatched later on (hotplugable interface) it gets automatically synced in the background.
  • Bono does the same but uses lvm on top of cryptsetup on the raids.

Design

Event driven degration for mdadm should be possible with a simple configuration change to the mdadm package to hook it into upstart so a raid is started degraded if it hasn't fully come up after a timeout. (Would result in appropriately replacing the second mdadm init.d script present in the debian package. (Instead of dropping it.))

  • cryptsetup is already set up event driven (with upstart not yet with udev)

For eventdriven bahaviour in the initramfs: The inintramfs scripts and their failure hooks look like way to much work and overcomplicating things. A event based boot would have to be reimplemented with the initramfs scripts instead of using upstart to set up (crypt, raid, lvm, ... and) the rootfs from initramfs. It would be good to adapt the upstart approach taken for 1) to set up the rootfs within the initramfs.

  • cryptsetup will need to be converted to the event driven setup in initramfs

Implementation

  • The proper command (i.e. for boot scripts) to start *only specific* hotplugable raids degraded (i.e. the rootfs after a timeout from initramfs) may not be available. 251646 (Workaround maybe: removing a member a re-adding it with --incremental --run)

  • Using the legacy method to start an array degraded will break later --incremental (re)additions from udev/hotplugging.
  • The command "mdadm --incremental --scan --run" to start *all remaining* hotplugable raids degraded (something to execute only manually if at all!) does not start anything. 244808

  • mdadm still reads/depends on a static /etc/mdadm/mdadm.conf file containing UUIDs (in the initramfs !!!). It refuses to assemble any hotpluged array not mentioned and taged with the own hostname. (It does not default to just go assembling matching superblocks and run arrays (only) if they are complete.) This behaviour actually breaks the autodetection of every array newly created on a system, as well as pluging in a (complete) md arrays from another system. 252345 For updating that initramfs refer to: http://ubuntuforums.org/showthread.php?p=8407182

  • Ubuntu should make use of partitionable (/dev/md_dX type) arrays.
  • The ubuntu server manual says and claims that "If the array has become degraded, due to the chance of data corruption, by default Ubuntu Server Edition will boot to initramfs after thirty seconds. Once the initramfs has booted there is a fifteen second prompt giving you the option to go ahead and boot the system, or attempt manual recover." However,...
    • The kernel will never autostart a raid that is not reproducing a correct checksum, no matter if degraded or not. There is nothing to manualy recover about a degraded raid before it can be started. A recovery console is appropiate *after* starting a raid degraded has failed.
    • A replacement disk can be added at any time if a spare disk isn't already installed from the beginning. If the disks are not connected over a hotplugable interface, the system must be powered down for this. (Recovery console is also pointless in this case.)
    • If a drive fails while the system is powered up, by default nobody is notified and the system will simply reboot degraded afterwards anyway. Reason: 244810 inconsistency with the --no-degraded option.

    • The boot process will usually not be stopped (and should not) for something (like adding and syncing a new drive to the raid) that is designed to be done on live systems (quite a good thing to do as default).
  • We've tried to avoid "fallback after a timeout" kind of behaviours in the past.
    • crypsetup currently needs to and is timing out (is not event driven) in initramfs. Raid setup always needs a timeout to decide about degrading. The usual implementation uses a second startup script later in the boot up process. (But this has been silently droped in ubuntu without a proper replacement.)
  • How would you decide what device is needed?
    • This may be about the only reason for keeping a /etc/mdadm/mdadm.conf like file around in a hotpluggable system. (Listing only those arrays required to boot.) Package cryptsetup contains code to determine the dependencies that need to be set up early from within initramfs. mountall looks for a bootwait parameter in the fstab and that may also point to arrays that might need to be degraded before running things that depend on them.

UI Changes

None necessary.

Code Changes

Code changes should include an overview of what needs to change, and in some cases even the specific details.

Migration

Include:

  • data migration, if any
  • redirects from old URLs to new ones, if any
  • how users will be pointed to the new way of doing things, if necessary.

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release. Please add an entry to http://testcases.qa.ubuntu.com/Coverage/NewFeatures for tracking test coverage.

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.


CategorySpec

ReliableRaid (last edited 2015-01-28 01:12:46 by penalvch)