Replace a Bad Drive in Software RAID1, or Replace Drives With Larger Drives in Software RAID1
Over the years, I’ve had to replaced failed raid1 drives or change drives so that I could use larger disks in an existing raid1 array. Since it seems I have to google the process every time, I figured I’d take a moment to jot down the process.
1. My Configuration
In my example I have two hard drives in software RAID1 (mdadm), /dev/sda and /dev/sdb. The partitions are /dev/sda1, /dev/sda3, /dev/sdb1 and /dev/sdb3. They look like this
/dev/sda1 and /dev/sdb1 are RAID1 array /dev/md0 /dev/sda3 and /dev/sdb3 are RAID1 array /dev/md1/dev/md0 is my /boot partition /dev/md1 is my / partition
This example will cover replacing a single failed drive, and also replacing both drives with larger disks, while maintaining the RAID1 array.
For now, I’ll pretend that /dev/sdb has failed and we will replace it.
Note: If you are replacing /dev/sda as you follow along, you’ll want to be sure that you have Grub installed on /dev/sdb first. See further down for details on how to do that.
2. Removing the Drive
Shut down your server
shutdown -h now
Remove the /dev/sdb hard drive from the server and replace it with a new drive. Be sure the drive is the same size or greater. If you add a drive that is the same size as the original, it is recommended that it’s the same model as /dev/sda or you may run into complications (not all same-size drives contain the same amount of sectors)
Once the server has booted back up, you’ll want to copy the exact partition structure from /dev/sda to your new drive at /dev/sdb
* If your server did not boot and got stuck, it’s likely that grub was not installed on the remaining drive.
** If the replacement drive had an OS and this OS is booting instead of the one you want, go into your BIOS and swap which is the primary boot drive
3. Replicating Partition Structure
Now we’ll replicate the disk partition structure between by copying it from the raid1 disk to the new disk you’ve just installed.
sfdisk -d /dev/sda | sfdisk /dev/sdb
Make sure you get the order of these correct or you’ll destroy your data!
To check that both drives have an identical partition structure, do:
If the are different then you’re likely using two disks that are the same size but different models. The new drive needs to contain at least the same amount of sectors as the old drive – or greater.
4. Adding the Disk to the RAID1 Array
Now we’ll add /dev/sdb into the raid array so that it can begin synchronizing.
mdadm --manage /dev/md0 --add /dev/sdb1
Do the same for the / partition
mdadm --manage /dev/md1 --add /dev/sdb3
Both arrays should now be syncing, though your md0 may already be complete if it’s a small partition.
This will you show you the current status of the synchronization process.
Personalities : [raid1] md0 : active raid1 sda1 sdb1 104320 blocks [2/2] [UU] md1 : active raid1 sda3 sdb3 73850688 blocks [2/1] [_U] [=>...................] recovery = 5.6% (4180352/73850688) finish=281.6min speed=4120K/sec unused devices: <none>
When that is complete, you should see [UU] in all arrays, like in md0 example above.
5. Install Grub on Secondary Drive
Now that the synch is done, we should be sure to install the Grub bootloader to the new drive as a failover. If your primary drive fails you want to be able to boot off the secondary, right?
# grub grub> find /grub/stage1
You’ll likely see:
This means both /dev/sda1 and /dev/sdb1 contain grub files, but grub is really only installed on /dev/sda1 right now.
grub> device (hd0) /dev/sdb grub> root (hd0,0) grub> setup (hd0) grub> quit
This is like telling grub that (hd0) is refer instead to sdb and then proceed to set it up on /dev/sdb1
You May be Done Already! See below.
If you’re only replacing a failed drive, you can stop here. You should be done. However if you’re replacing both drives and installing larger ones, continue on, but first be sure that your raid synchronization process is complete…
6. Removing the Second Old RAID1 Disk
Shut down the server so that you can remove the second disk.
shutdown -h now
Pull out the /dev/sda drive and replace it with your new larger drive. You may need to go into your BIOS and set the secondary drive as the primary boot drive. Since we’ve already complete synching to /dev/sdb in the process above, it’s now the drive with the data we want.
If your server still doesn’t boot up after that, it’s likely grub wasn’t installed correctly on /dev/sdb. Plug back in /dev/sda, boot it up, and be sure to follow the grub install mentioned above.
7. Replicate the Partition Structure
Ok, so your server is back online now. We’ll need to match the partition structure from /dev/sdb onto /dev/sda
sfdisk -d /dev/sdb | sfdisk /dev/sda
Verify they’re identical using
8. Add the Disk to the RAID1 Array
Now we’ll add /dev/sda1 to /dev/md0 and /dev/sda3 to /dev/md1
mdadm --manage /dev/md0 --add /dev/sda1 mdadm --manage /dev/md1 --add /dev/sda3
To see the progress…
cat /proc/mdstatPersonalities : [raid1] md0 : active raid1 sda1 sdb1 104320 blocks [2/2] [UU] md1 : active raid1 sda3 sdb3 73850688 blocks [2/1] [_U] [=>...................] recovery = 5.6% (4180352/73850688) finish=281.6min speed=4120K/sec unused devices: <none>
Wait for the synchronization process to complete by checking /proc/mdstat every now and then.
9. Installing Grub on the New Primary Disk
Installing Grub onto /dev/sda (you may not need to do this, depending on how you got here). It generally won’t hurt to do it if you’re not sure.
# grub grub> find /grub/stage1
Again, this is because both disks contain grub, but grub is currently only installed on /dev/sdb (hd1,0)
grub> root (hd0,0) grub> setup (hd0) grub> quit
You should now have two new drives in your raid1 array, both with the original data from your old drives. In addition, both drives have grub installed, so should the primary disk fail, the secondary will still be bootable.