Gentoo Logo

Disclaimer : The original version of this article was first published on IBM developerWorks, and is property of Westtech Information Services. This document is an updated version of the original article, and contains various improvements made by the Gentoo Linux Documentation team.
This document is not actively maintained.


Software RAID in the new Linux 2.4 kernel, Part 2

Content:

1.  Setting up RAID-1 in a production environment

Real-world RAID

In my previous article, I introduced you to Linux 2.4's software RAID functionality, showing you how to set up linear, RAID-0, and RAID-1 volumes. In this article, we look at what you need to know in order to use RAID-1 to increase availability in a production environment. This requires a lot more understanding and knowledge than just setting up RAID-1 on a test server or at home -- specifically, you'll need to know exactly what RAID-1 will protect you against, and how to keep your RAID volume up and running in case of a disk failure. In this article, we'll cover these topics, starting with an overview of what RAID-1, 4, and 5 can and can't do for you, and ending with a complete test simulation of a failed RAID 1 drive replacement -- something that you should actually do (with this article as your guide) if at all possible. After going through the simulation, you'll have all the experience you need to handle a RAID-1 failure in a real-world environment.

What RAID doesn't do

The fault-tolerant features of RAID are designed to protect you from the negative impacts of a spontaneous complete drive failure. That's a good thing. But RAID isn't a perfect fix for every kind of reliability problem. Before implementing a fault-tolerant form of RAID (1,4,5) in a production environment, it's extremely important that you know exactly what RAID will and will not do for you. When we're in a situation where we're depending on RAID to perform, we don't want to make any false assumptions about what it does. Let's start by dispelling common myths about RAID 1, 4, and 5.

A lot of people think that if they place all their important data on a RAID 1/4/5 volume, then they won't have to perform regular backups. This is completely false -- here's why. RAID 1/4/5 helps to protect against unplanned downtime caused by a random drive failure. However, it offers no protection against accidental or malicious data corruption. If you type cd /; rm -rf * as root on a RAID volume, you'll lose a lot of very important data in a matter of seconds, and the fact that you have a 10 drive RAID-5 configuration will be of little significance. Also, RAID won't help you if your server is physically stolen or if there's a fire in your building. And of course, if you don't implement a backup strategy, you won't have an archive of past data -- if someone in your office deletes a bunch of important files, you won't be able to recover them. That alone should be enough to convince you that, in most circumstances, you should plan and implement a backup strategy before even thinking about tackling RAID-1, 4, or 5.

Another mistake is to implement software RAID on a system composed of low-quality hardware. If you're putting together a server that's going to do something important, it makes sense to purchase the highest-quality hardware that's still comfortably within your budget. If your system is unstable or improperly cooled, you'll run into problems that RAID can't solve. On a similar note, RAID obviously can't give you any additional uptime in the case of a power outage. If your server is going to be doing anything relatively important, make sure that it's been equipped with an uninterruptible power supply (UPS).

Next, we move on to filesystem issues. The filesystem exists "on top" of your software RAID volume. This means that using software RAID does not allow you to escape filesystem issues, such as long and potentially problematic fscks if you happen to be using a non-journalled or flaky filesystem. So, software RAID isn't going to make the ext2 filesystem more reliable; that's why it's so important that the Linux community has ReiserFS, as well as JFS and XFS in the works. Software RAID and a reliable journalling filesystem make a great combination.

RAID - intelligent implementation

Hopefully, the previous section dispelled any RAID myths that you might have had. When you implement RAID-1, 4, or 5, it's very important that you view the technology as something that will enhance uptime. When you implement one of these RAID levels, you're protecting yourself against a very specific situation -- a spontaneous complete (single or multiple) drive failure. If you experience this situation, software RAID will allow the system to continue running, while you make arrangements to replace the failed drive with a new one. In other words, if you implement RAID 1, 4, or 5, you'll be reducing your risk of having a long, unplanned downtime due to a complete drive failure. Instead, you can have a short planned downtime -- just enough time to replace the dead drive. Obviously, this means that if having a highly-available system isn't a priority for you, then you shouldn't be implementing software RAID, unless you plan to use it primarily as a way to boost file I/O performance.

A smart system administrator uses software RAID for a specific purpose -- to improve the reliability of an already very reliable server. If you're a smart sysadmin, you've already covered the basics. You've protected your organization against catastrophe by implementing a regular backup plan. You've hooked your server up to a UPS, and have the UPS monitoring software up and running so that your server will shut down safely in the case of an extended power outage. Maybe you're using a journalling filesystem such as ReiserFS to reduce fsck time and increase filesystem reliability and performance. And hopefully, your server is well-cooled and is composed of high-quality hardware, and you've paid close attention to security issues. Now, and only now, should you consider implementing software RAID-1, 4 or 5 -- by doing so, you'll potentially give your server a few more percentage points of uptime by guarding it against a complete drive failure. Software RAID is that added layer of protection that makes an already rugged server even better.

2.  A RAID-1 walkthrough

Now that you've read about what RAID can and can't do, I hope you have reasonable expectations and the right attitude. In this section, I'll walk you through the process of simulating a disk failure, and then bringing your RAID volume back out of degraded mode. If you're have the ability to set up a RAID-1 volume on a test machine and follow along with me, I highly recommend that you do so. This kind of simulation can be fun. And having a little fun right now will help to ensure that when a drive really fails, you'll be calm and collected, and know exactly what to do.

Important: To perform this test, it's essential that you set up your RAID-1 volume so that you can still boot your Linux system with one hard drive unplugged, because this is how we're going to simulate a drive failure.

OK, our first step is to set up a RAID-1 volume; refer to my previous article if you need a refresher on how to do this. Once you've set up your volume, you'll see something like this if you cat /proc/mdstat:

Code Listing 2.1: Examining the RAID volume

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] 
read_ahead 1024 sectors
md0 : active raid1 ide/host2/bus0/target0/lun0/part1[1] ide/host0/bus0/target0/lun0/part5[0]
      4610496 blocks [2/2] [UU]
      [======>..............]  resync = 34.8% (1606276/4610496) finish=3.2min speed=15382K/sec

unused devices: <none>

Note that I'm using devfs, and that's why you see the extremely long device names listed above. I'm actually using /dev/hda5 and /dev/hde1 as my RAID-1 disks. At the moment, the kernel software RAID code is synchronizing the drives so that they're exact mirrors of each other. If your RAID-1 volume is at this point, you can go ahead and create a filesystem on the volume, and then mount it somewhere. Copy some files over to it, and then set up your /etc/fstab so that the volume (/dev/md0) will be mounted when your system boots. Here's the line I added to my fstab; yours may differ slightly:

Code Listing 2.2: fstab information

/dev/md0       /mnt/raid1              reiserfs        defaults            0 0

OK; we're almost ready to simulate a drive failure, but not quite. First, cat /proc/mdstat again, and wait until all your volume's disks are synchronized. When they are, your /proc/mdstat will look like this:

Code Listing 2.3: Re-examining the RAID volume

# cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid5] 
read_ahead 1024 sectors
md0 : active raid1 ide/host2/bus0/target0/lun0/part1[1] ide/host0/bus0/target0/lun0/part5[0]
      4610496 blocks [2/2] [UU]
      
unused devices: <none>

The simulation begins

OK, now that the resync is complete, we're ready for the simulation. Go ahead and shut down your machine and power it down. Then, open it up and unplug one of the hard disks that make up your RAID-1 array. Of course, you won't want to unplug the disk that contains your Linux root partition -- we'll need to boot Linux again! OK, now that the hard drive is unplugged, bring the machine back up. Once you log in, you should find that /dev/md0 is mounted and that you're still able to use the volume. When you cat /proc/mdstat, you'll see the major change:

Code Listing 2.4: Missing a disk

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5]
read_ahead 1024 sectors
md0 : active raid1 ide/host0/bus0/target0/lun0/part5[0]
      4610496 blocks [2/1] [U_]
 
unused devices: <none>

Here, you can see that my /dev/md0 volume is running in degraded mode. I unplugged drive /dev/hde, so /dev/hde1 wasn't found when the kernel booted and tried to autostart my array. Fortunately, the kernel found /dev/hda5, and /dev/md0 was able to start in degraded mode. As you can see, the /dev/hde1 partition isn't listed in /proc/mdstat, and one of the RAID disks is marked as "down" ([U_] instead of [UU]). But hey, since /dev/md0 is still going, software RAID-1 is doing what it's supposed to do: keeping our data available.

Recovery

Right now, we're experiencing a simulated drive failure. If the drive that currently doesn't have power actually failed while the system was running, this is the kind of situation we'd be in. Our RAID-1 volume would be running in degraded mode, meaning that our volume is still available but without any redundancy. At a convenient time, we'd want to shut down the system, replace the failed drive, and start the system back up again. Our RAID-1 volume would still be running in degraded mode at this point.

Once we have the new drive in the machine, we'd want to create a RAID autodetect (FD) partition of the appropriate size on our new disk. An additional reboot may be needed so that Linux can reread the disk's partition tables. Once the new partition is visible to the system, we're ready to restore our degraded RAID-1 array -- then, we'll have some redundancy again.

Of course, we're only performing a simulation. To practice adding a partition back into our RAID array, we can do one of two things, depending on what kind of scenario you'd like to prepare or. You can either shut down your machine, plug the drive in, boot it up, and add the old partition back to the array, or you can shut down your machine, plug the drive in, boot up, wipe the drive, create a new RAID autodetect (FD) partition to add the array (of the correct size, of course -- at least as big as the partition it's replacing) and then add this brand-new partition to the array. The second choice would be closer to what would happen in the event of a real drive failure, while the first would simulate something like a failed disk controller or bad cable situation -- where one of your mirror drives was temporarily unavailable, causing /dev/md0 to run in degraded mode, and requiring one of the partitions to be added back to the volume after the problem was remedied. Whichever simulation you choose to do, the "fix" is the same -- after the new partition is ready, we need to manually add it back to the /dev/md0 volume.

Looking at dmesg

Before we add the partition back to our array, this would be a good time to take a look at our kernel boot messages. If you type dmesg | more, you'll be able to view the kernel boot messages. You should see a bunch of text similar to this:

Code Listing 2.5: Kernel boot messages

linear personality registered
raid0 personality registered
raid1 personality registered
raid5 personality registered
raid5: measuring checksumming speed
   8regs     :  1291.209 MB/sec
   32regs    :  1195.197 MB/sec
   pII_mmx   :  2110.740 MB/sec
   p5_mmx    :  2652.522 MB/sec
raid5: using function: p5_mmx (2652.522 MB/sec)
md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md.c: sizeof(mdp_super_t) = 4096
autodetecting RAID arrays
(read) ide/host0/bus0/target0/lun0/part5's sb offset: 4610560 [events: 00000004]
(read) ide/host2/bus0/target0/lun0/part1's sb offset: 4610496 [events: 00000002]
autorun ...
considering ide/host2/bus0/target0/lun0/part1 ...
  adding ide/host2/bus0/target0/lun0/part1 ...
  adding ide/host0/bus0/target0/lun0/part5 ...
created md0
bind<ide/host0/bus0/target0/lun0/part5,1>
bind<ide/host2/bus0/target0/lun0/part1,2>
running: <ide/host2/bus0/target0/lun0/part1><ide/host0/bus0/target0/lun0/part5>
now!
ide/host2/bus0/target0/lun0/part1's event counter: 00000002
ide/host0/bus0/target0/lun0/part5's event counter: 00000004
md: superblock update time inconsistency -- using the most recent one
freshest: ide/host0/bus0/target0/lun0/part5
md: kicking non-fresh ide/host2/bus0/target0/lun0/part1 from array!
unbind<ide/host2/bus0/target0/lun0/part1,1>
export_rdev(ide/host2/bus0/target0/lun0/part1)
md0: max total readahead window set to 124k
md0: 1 data-disks, max readahead per data-disk: 124k
raid1: device ide/host0/bus0/target0/lun0/part5 operational as mirror 0
raid1: md0, not all disks are operational -- trying to recover array
raid1: raid set md0 active with 1 out of 2 mirrors
md: updating md0 RAID superblock on device
ide/host0/bus0/target0/lun0/part5 [events: 00000005](write) ide/host0/bus0/target0/lun0/part5's sb offset: 4610560
md: recovery thread got woken up ...
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
..
.... autorun DONE.

Now would be a good time to carefully read these messages, because they'll help you to understand the process that the kernel uses to autostart /dev/md0, giving you another valuable insight into the inner workings of Linux software RAID. If you read the kernel output listed above, you'll find that my kernel found /dev/hda5 and /dev/hde1, but hde1 was out of sync with hda5. So, the kernel started up /dev/md0 in degraded mode, using /dev/hda5 and not touching /dev/hde1 at all. Now, it's time to add our original (or newly created) partition to our volume. Here's how.

Restoration continues

First, if your replacement partition has a new device name, update /etc/raidtab so that it reflects this new information. Then, add the new partition to the volume using the following command, replacing /dev/hde1 with the device name of the partition you're adding:

Code Listing 2.6: Adding the new device

# raidhotadd /dev/md0 /dev/hde1

Your hard drive lights should begin glowing as reconstruction begins. Go ahead and cat /proc/mdstat to check the status of the RAID-1 reconstruction that's now in progress:

Code Listing 2.7: Check the status of the RAID-1 reconstruction

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] 
read_ahead 1024 sectors
md0 : active raid1 ide/host2/bus0/target0/lun0/part1[2] ide/host0/bus0/target0/lun0/part5[0]
      4610496 blocks [2/1] [U_]
      [>....................]  recovery =  1.8% (84480/4610496) finish=3.5min speed=21120K/sec
unused devices: <none>

In a matter of minutes, your RAID-1 volume will be back to normal:

Code Listing 2.8: The normal RAID volume

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] 
read_ahead 1024 sectors
md0 : active raid1 ide/host2/bus0/target0/lun0/part1[1] ide/host0/bus0/target0/lun0/part5[0]
      4610496 blocks [2/2] [UU]
      
unused devices: <none>

Voila! We've successfully recovered from a simulated drive failure, and you're ready to start using RAID-1 in a production environment. You can now affix your homemade "RAID-1 certified" sticker to your forehead and begin flapping your arms and running around the office to the delight of your coworkers. Actually, maybe that isn't such a great idea. See you in the next article.

Resources



Print

Page updated October 9, 2005

Summary: In this two-part series, Daniel Robbins introduces you to Linux 2.4 Software RAID, a technology used to increase disk performance and reliability by distributing data over multiple disks. In this article, Daniel explains what software RAID-1, 4, and 5 can and cannot do for you and how you should approach the implementation of these RAID levels in a production environment. In the second half of the article, Daniel walks you through the simulation of a RAID-1 failed drive replacement.

Daniel Robbins
Author

Donate to support our development efforts.

Copyright 2001-2014 Gentoo Foundation, Inc. Questions, Comments? Contact us.