Russell Coker: RAID Pain
One of my clients has a NAS device. Last week they tried to do what should have been a routine RAID operation, they added a new larger disk as a hot-spare and told the RAID array to replace one of the active disks with the hot-spare. The aim was to replace the disks one at a time to grow the array. But one of the other disks had an error during the rebuild and things fell apart.
I was called in after the NAS had been rebooted when it was refusing to recognise the RAID. The first thing that occurred to me is that maybe RAID-5 isn t a good choice for the RAID. While it s theoretically possible for a RAID rebuild to not fail in such a situation (the data that couldn t be read from the disk with an error could have been regenerated from the disk that was being replaced) it seems that the RAID implementation in question couldn t do it. As the NAS is running Linux I presume that at least older versions of Linux have the same problem. Of course if you have a RAID array that has 7 disks running RAID-6 with a hot-spare then you only get the capacity of 4 disks. But RAID-6 with no hot-spare should be at least as reliable as RAID-5 with a hot-spare.
Whenever you recover from disk problems the first thing you want to do is to make a read-only copy of the data. Then you can t make things worse. This is a problem when you are dealing with 7 disks, fortunately they were only 3TB disks and only each had 2TB in use. So I found some space on a ZFS pool and bought a few 6TB disks which I formatted as BTRFS filesystems. For this task I only wanted filesystems that support snapshots so I could work on snapshots not on the original copy.
I expect that at some future time I will be called in when an array of 6+ disks of the largest available size fails. This will be a more difficult problem to solve as I don t own any system that can handle so many disks.
I copied a few of the disks to a ZFS filesystem on a Dell PowerEdge T110 running kernel 3.2.68. Unfortunately that system seems to have a problem with USB, when copying from 4 disks at once each disk was reading about 10MB/s and when copying from 3 disks each disk was reading about 13MB/s. It seems that the system has an aggregate USB bandwidth of 40MB/s slightly greater than USB 2.0 speed. This made the process take longer than expected.
One of the disks had a read error, this was presumably the cause of the original RAID failure. dd has the option conv=noerror to make it continue after a read error. This initially seemed good but the resulting file was smaller than the source partition. It seems that conv=noerror doesn t seek the output file to maintain input and output alignment. If I had a hard drive filled with plain ASCII that MIGHT even be useful, but for a filesystem image it s worse than useless. The only option was to repeatedly run dd with matching skip and seek options incrementing by 1K until it had passed the section with errors.
for n in /dev/loop[0-6] ; do echo $n ; mdadm examine -v -v scan $n grep Events ; done
Once I had all the images I had to assemble them. The Linux Software RAID didn t like the array because not all the devices had the same event count. The way Linux Software RAID (and probably most RAID implementations) work is that each member of the array has an event counter that is incremented when disks are added, removed, and when data is written. If there is an error then after a reboot only disks with matching event counts will be used. The above command shows the Events count for all the disks.
Fortunately different event numbers aren t going to stop us. After assembling the array (which failed to run) I ran mdadm -R /dev/md1 which kicked some members out. I then added them back manually and forced the array to run. Unfortunately attempts to write to the array failed (presumably due to mismatched event counts).
Now my next problem is that I can make a 10TB degraded RAID-5 array which is read-only but I can t mount the XFS filesystem because XFS wants to replay the journal. So my next step is to buy another 2*6TB disks to make a RAID-0 array to contain an image of that XFS filesystem.
Finally backups are a really good thing