VMFS Recovery

Recovers data from ESX server

RAID 5 recovery with 2 failed drives

Sometimes we receive requests on recovery of RAID5 array with more than 1 failed drive.

Quick answer for such situation: unless you manage to pull data back from failed drives, it is impossible.

Now, if you have some time and patience, let me explain complexity and possible solutions of this problem.

Why it's impossible to recover data from RAID 5 with more than 1 failed disk

In 4 disk array RAID 5, data is split into 3 stripes+1 parity for recovery. Considering stripe 256kb, a 2 Mb file will be written like:

1-st HDD 2-nd HDD 3-rd HDD 4-th HDD
0-256kb 257-512kb 513-762kb parity
763-1024kb 1025-1280kb parity 1281-1536kb
15367-1792kb parity 1793-2048kb empty

Note: each file starts from the beginning of stripe set. It's impossible to write two files simultaneously in a single row. In the sample above, next file will start from first disk, it will not be wrote at the empty space on the 4-th disk.

Each parity may reconstruct one of the missing disks.

Removing 2 disk from the set there will be 256kb of data missed in each row (each 768kb will contain a 256kb "hole" in the data). For example: our sample RAID loses 1-st and 4-th drive

1-st HDD 2-nd HDD 3-rd HDD 4-th HDD
LOST 257-512kb 513-762kb LOST
LOST 1025-1280kb parity LOST
LOST parity 1793-2048kb LOST

It's clear that such 2 Mb file will be a worthless garbage.
Naturally there are possible lucky exceptions: if file is less or equals to sum of the undamaged piece of first row, it will be recovered fine. In our sample with RAID 5, 4HDDs and 2 failed drives it is possible to recover up to 512Kb of data:

1-st HDD 2-nd HDD 3-rd HDD 4-th HDD
0-256kb 257-512kb LOST LOST

Such recovery application is limited to small files, however it may be considered as last hope solution for some cases. Nowadays trend is larger stripe size and we've seen stripes up to 2Mb. Normal real case size is about 512kb, but 1Mb stripes are also happens. Therefore some file server with 10 disks in array and 1Mb stripe, with malfunctioning 2 last disks may recover up to 8 Mb of data which is enough for common documents.

Note: for easier understanding of the RAID structure we've took into consideration only one type of RAID geometry. A scheme how data stripes and parity are structured across RAID members.

Our recommendations.

If there is no other option except to recover data from such a complex case, you need to try all possibilities to revive data from failed drives.

First of all, we recommend to try to image failed drives using low level utility like MHDD or similar. If HDDs are not detected by Windows at all, you'll need to send disks to data recovery lab for imaging. This is a costly operation because of high expenses. It may include changing HDD controllers from donor drive or replacing plates to the donor drive, which requires clean environment.

Some admins limit disk imaging to only 1 drive missing from the set. However we recommend to image all failed drives. Especially if HDD malfunction was because of bad blocks or damaged head. This type of damage makes many sectors on the disk to be corrupt beyond recovery and they will also be missing on the disk image. There is nothing to be done if data stored on single drive, however in RAID 5 array we can try to use alternate set of disk, enumerating RAID configuration. Usually we are able to recover correct data without damage.

We have examined a RAID 5 case of 4 HDDs with two failed drives. However such approach with no or minor alterations may be applied to RAID 4 sets with missing two drives and RAID 6 arrays with three lost drives.

Return to contents