VMFS Recovery
Recovers data from VMware ESXi servers

Recovering RAID 5 with Two Drive Failures

Sometimes we receive requests on recovery of RAID5 array with more than 1 failed drive.

Quick answer for such a situation: unless you manage to pull data back from failed drives, it is impossible.

Now, if you have some time and patience, let me explain the complexity and possible solutions to this problem.

Why it's impossible to recover data from RAID 5 with more than 1 failed disk

In 4 disk array RAID 5, data is split into 3 stripes+1 parity for recovery. Considering stripe 256kb, a 2 Mb file will be written like this:



1-st HDD 2-nd HDD 3-rd HDD 4-th HDD
0-256kb 257-512kb 513-762kb parity
763-1024kb 1025-1280kb parity 1281-1536kb
15367-1792kb parity 1793-2048kb empty


Note: each file starts from the beginning of the stripe set. It's impossible to write two files simultaneously in a single row. In the sample above, the next file will start from the first disk, it will not be written in the empty space on the 4th disk.

Each parity may reconstruct one of the missing disks.

Removing 2 disks from the set there will be 256kb of data missed in each row (each 768kb will contain a 256kb "hole" in the data). For example: our sample RAID loses the 1-st and 4-th drive



1-st HDD 2-nd HDD 3-rd HDD 4-th HDD
LOST 257-512kb 513-762kb LOST
LOST 1025-1280kb parity LOST
LOST parity 1793-2048kb LOST


It's clear that such a 2 Mb file will be worthless garbage.
Naturally there are possible lucky exceptions: if a file is less or equal to the sum of the undamaged piece of the first row, it will be recovered fine. In our sample with RAID 5, 4HDDs, and 2 failed drives it is possible to recover up to 512Kb of data:



1-st HDD 2-nd HDD 3-rd HDD 4-th HDD
0-256kb 257-512kb LOST LOST


Such recovery application is limited to small files, however, it may be considered as a last-hope solution for some cases. Nowadays trend is larger stripe size and we've seen stripes up to 2Mb. Normal real case size is about 512kb, but 1Mb stripes also happen. Therefore some file servers with 10 disks in array and 1Mb stripe, with malfunctioning 2 last disks may recover up to 8 Mb of data which is enough for common documents.

Note: for an easier understanding of the RAID structure we've taken into consideration only one type of RAID geometry. A scheme of how data stripes and parity are structured across RAID members.

Our recommendations.

If there is no other option except to recover data from such a complex case, you need to try all possibilities to revive data from failed drives.

First of all, we recommend trying to image failed drives using a low-level utility like MHDD or similar. If HDDs are not detected by Windows at all, you'll need to send disks to the data recovery lab for imaging. This is a costly operation because of high expenses. It may include changing HDD controllers from the donor drive or replacing plates to the donor drive, which requires a clean environment.

Some admins limit disk imaging to only 1 drive missing from the set. However, we recommend to image all failed drives. Especially if HDD malfunction was because of bad blocks or damaged head. This type of damage causes many sectors on the disk to corrupt beyond recovery and they will also be missing from the disk image. There is nothing to be done if data is stored on a single drive, however, in the RAID 5 array we can try to use an alternate set of disks, enumerating RAID configuration. Usually, we are able to recover correct data without damage.

We have examined a RAID 5 case of 4 HDDs with two failed drives. However, such an approach with no or minor alterations may be applied to RAID 4 sets with missing two drives and RAID 6 arrays with three lost drives.




Return to contents