Search Results: "goswin"

7 December 2016

Jonas Meurer: On CVE-2016-4484, a (securiy)? bug in the cryptsetup initramfs integration

On CVE-2016-4484, a (security)? bug in the cryptsetup initramfs integration On November 4, I was made aware of a security vulnerability in the integration of cryptsetup into initramfs. The vulnerability was discovered by security researchers Hector Marco and Ismael Ripoll of CyberSecurity UPV Research Group and got CVE-2016-4484 assigned. In this post I'll try to reflect a bit on

What CVE-2016-4484 is all about Basically, the vulnerability is about two separate but related issues:

1. Initramfs rescue shell considered harmful The main topic that Hector Marco and Ismael Ripoll address in their publication is that Debian exits into a rescue shell in case of failure during initramfs, and that this can be triggered by entering a wrong password ~93 times in a row. Indeed the Debian initramfs implementation as provided by initramfs-tools exits into a rescue shell (usually a busybox shell) after a defined amount of failed attempts to make the root filesystem available. The loop in question is in local_device_setup() at the local initramfs script In general, this behaviour is considered as a feature: if the root device hasn't shown up after 30 rounds, the rescue shell is spawned to provide the local user/admin a way to debug and fix things herself. Hector Marco and Ismael Ripoll argue that in special environments, e.g. on public computers with password protected BIOS/UEFI and bootloader, this opens an attack vector and needs to be regarded as a security vulnerability:
It is common to assume that once the attacker has physical access to the computer, the game is over. The attackers can do whatever they want. And although this was true 30 years ago, today it is not. There are many "levels" of physical access. [...] In order to protect the computer in these scenarios: the BIOS/UEFI has one or two passwords to protect the booting or the configuration menu; the GRUB also has the possibility to use multiple passwords to protect unauthorized operations. And in the case of an encrypted system, the initrd shall block the maximum number of password trials and prevent the access to the computer in that case.
While Hector and Ismael have a valid point in that the rescue shell might open an additional attack vector in special setups, this is not true for the vast majority of Debian systems out there: in most cases a local attacker can alter the boot order, replace or add boot devices, modify boot options in the (GNU GRUB) bootloader menu or modify/replace arbitrary hardware parts. The required scenario to make the initramfs rescue shell an additional attack vector is indeed very special: locked down hardware, password protected BIOS and bootloader but still local keyboard (or serial console) access are required at least. Hector and Ismael argue that the default should be changed for enhanced security:
[...] But then Linux is used in more hostile environments, this helpful (but naive) recovery services shall not be the default option.
For the reasons explained about, I tend to disagree to Hectors and Ismaels opinion here. And after discussing this topic with several people I find my opinion reconfirmed: the Debian Security Team disputes the security impact of the issue and others agree. But leaving the disputable opinion on a sane default aside, I don't think that the cryptsetup package is the right place to change the default, if at all. If you want added security by a locked down initramfs (i.e. no rescue shell spawned), then at least the bootloader (GNU GRUB) needs to be locked down by default as well. To make it clear: if one wants to lock down the boot process, bootloader and initramfs should be locked down together. And the right place to do this would be the configurable behaviour of grub-mkconfig. Here, one can set a password for GRUB and the boot parameter 'panic=1' which disables the spawning of a rescue shell in initramfs. But as mentioned, I don't agree that this would be sane defaults. The vast majority of Debian systems out there don't have any security added by locked down bootloader and initramfs and the benefit of a rescue shell for debugging purposes clearly outrivals the minor security impact in my opinion. For the few setups which require the added security of a locked down bootloader and initramfs, we already have the relevant options documented in the Securing Debian Manual: After discussing the topic with initramfs-tools maintainers today, Guilhem and me (the cryptsetup maintainers) finally decided to not change any defaults and just add a 'sleep 60' after the maximum allowed attempts were reached. 2. tries=n option ignored, local brute-force slightly cheaper Apart from the issue of a rescue shell being spawned, Hector and Ismael also discovered a programming bug in the cryptsetup initramfs integration. This bug in the cryptroot initramfs local-top script allowed endless retries of passphrase input, ignoring the tries=n option of crypttab (and the default of 3). As a result, theoretically unlimited attempts to unlock encrypted disks were possible when processed during initramfs stage. The attack vector here was that local brute-force attacks are a bit cheaper. Instead of having to reboot after max tries were reached, one could go on trying passwords. Even though efficient brute-force attacks are mitigated by the PBKDF2 implementation in cryptsetup, this clearly is a real bug. The reason for the bug was twofold:
  • First, the condition in setup_mapping() responsible for making the function fail when the maximum amount of allowed attempts is reached, was never met:
    setup_mapping()
     
      [...]
      # Try to get a satisfactory password $crypttries times
      count=0                              
    while [ $crypttries -le 0 ] [ $count -lt $crypttries ]; do export CRYPTTAB_TRIED="$count" count=$(( $count + 1 )) [...] done if [ $crypttries -gt 0 ] && [ $count -gt $crypttries ]; then message "cryptsetup: maximum number of tries exceeded for $crypttarget" return 1 fi [...]
    As one can see, the while loop stops when $count -lt $crypttries. Thus the second condition $count -gt $crypttries is never met. This can easily be fixed by decreasing $count by one in case of a successful unlock attempt along with changing the second condition to $count -ge $crypttries:
    setup_mapping()
     
      [...]
      while [ $crypttries -le 0 ]   [ $count -lt $crypttries ]; do
          [...]
          # decrease $count by 1, apparently last try was successful.
          count=$(( $count - 1 ))
          [...]
      done
      if [ $crypttries -gt 0 ] && [ $count -ge $crypttries ]; then
          [...]
      fi
      [...]
     
    
    Christian Lamparter already spotted this bug back in October 2011 and provided a (incomplete) patch, but back then I even managed to merge the patch in an improper way, making it even more useless: The patch by Christian forgot to decrease $count by one in case of a successful unlock attempt, resulting in warnings about maximum tries exceeded even for successful attemps in some circumstances. But instead of adding the decrease myself and keeping the (almost correct) condition $count -eq $crypttries for detection of exceeded maximum tries, I changed back the condition to the wrong original $count -gt $crypttries that again was never met. Apparently I didn't test the fix properly back then. I definitely should do better in future!
  • Second, back in December 2013, I added a cryptroot initramfs local-block script as suggested by Goswin von Brederlow in order to fix bug #678692. The purpose of the cryptroot initramfs local-block script is to invoke the cryptroot initramfs local-top script again and again in a loop. This is required to support complex block device stacks. In fact, the numberless options of stacked block devices are one of the biggest and most inglorious reasons that the cryptsetup initramfs integration scripts became so complex over the years. After all we need to support setups like rootfs on top of LVM with two separate encrypted PVs or rootfs on top of LVM on top of dm-crypt on top of MD raid. The problem with the local-block script is that exiting the setup_mapping() function merely triggers a new invocation of the very same function. The guys who discovered the bug suggested a simple and good solution to this bug: When maximum attempts are detected (by second condition from above), the script sleeps for 60 seconds. This mitigates the brute-force attack options for local attackers - even rebooting after max attempts should be faster.

About disclosure, wording and clickbaiting I'm happy that Hector and Ismael brought up the topic and made their argument about the security impacts of an initramfs rescue shell, even though I have to admit that I was rather astonished about the fact that they got a CVE assigned. Nevertheless I'm very happy that they informed the Security Teams of Debian and Ubuntu prior to publishing their findings, which put me in the loop in turn. Also Hector and Ismael were open and responsive when it came to discussing their proposed fixes. But unfortunately the way they advertised their finding was not very helpful. They announced a speech about this topic at the DeepSec 2016 in Vienna with the headline Abusing LUKS to Hack the System. Honestly, this headline is missleading - if not wrong - in several ways:
  • First, the whole issue is not about LUKS, neither is it about cryptsetup itself. It's about Debians integration of cryptsetup into the initramfs, which is a compeletely different story.
  • Second, the term hack the system suggests that an exploit to break into the system is revealed. This is not true. The device encryption is not endangered at all.
  • Third - as shown above - very special prerequisites need to be met in order to make the mere existance of a LUKS encrypted device the relevant fact to be able to spawn a rescue shell during initramfs.
Unfortunately, the way this issue was published lead to even worse articles in the tech news press. Topics like Major security hole found in Cryptsetup script for LUKS disk encryption or Linux Flaw allows Root Shell During Boot-Up for LUKS Disk-Encrypted Systems suggest that a major security vulnerabilty was revealed and that it compromised the protection that cryptsetup respective LUKS offer. If these articles/news did anything at all, then it was causing damage to the cryptsetup project, which is not affected by the whole issue at all. After the cat was out of the bag, Marco and Ismael aggreed that the way the news picked up the issue was suboptimal, but I cannot fight the feeling that the over-exaggeration was partly intended and that clickbaiting is taking place here. That's a bit sad.

6 January 2011

Michael Prokop: Booting ISO images from within GRUB2

You might be aware of GRUB s loopback option for booting an ISO, I wrote about it in Boot an ISO via Grub2 more than a year ago. A few months ago Goswin von Brederlow came up with this idea:
grml functions great as rescue system. So it would be nice to have a boot entry for it in grub instead of having to go look for the CD when needed. To make installing and updating simple it would be great if one could install this as a normal debian package which would register itself in grub (and maybe lilo too).
What a lovely idea: no need for a CD or USB stick as long as the bootloader and the harddisk are still working. Minimal manual intervention needed to keep it up2date and working sounds like the perfect rescue system. 8-) Now since end of 2010 a new stable version of sysadmin s favorite live system (AKA Grml 2010.12) is available and the great news is that we came up with two independent solutions known as grml-rescueboot and grub-imageboot. Having shipped them to one of my customers already I d like to write some words about it and why you should also consider using it on all your systems where GRUB2 is available. grml-rescueboot uses GRUB2 and its loopback feature for booting. Grml as well as Ubuntu provide all what s needed out-of-the-box. If you re interested in the details check out Jordan Uggla s excellent Loopback.cfg webpage in the supergrubdisk wiki. The best about it: you can provide custom bootoptions automatically when booting Grml. For example just set CUSTOM_BOOTOPTIONS= ssh=grml2011 in /etc/default/grml-rescueboot and the Grml rescue system will automatically start the OpenSSH server with the specified argument as password for user grml. Of course you can use all the other nifty bootoptions like scripts/netscript/ to further customize your Grml rescue system. [A note to other distributions like Ubuntu: I'd be interested to establish a mechanism to pass kernel options for loopback boot in a standardized way, drop me a note if you're interested in this.] grub-imageboot uses GRUB2 and syslinux memdisk to boot ISOs and floppy images. It doesn t rely on the loopback feature but instead maps the ISO into the memory directly. This is great for Linux ISOs that can t and won t support the the loopback feature (and have everything inside their initrd so the ISO doesn t have to find itself during booting), like BIOS or firmware updates. Also for example FreeDOS and Alpharev are known to work just fine. But sadly memdisk ISO emulation doesn t work with all Linux systems, as documented in the syslinux wiki. The good news is that Grml supports the memdiskfind/phram/mtdblock approach out-of-the-box. As far as I know therefore Grml is the first Debian based live system supporting memdisk ISO boot, though my patches already went to the Debian-Live project so if you re using live-boot >=2.0.14-1 or >=2.0.12-1+grml.04 your live system should be able to boot via memdisk ISO emulation as well. Summary: If you want to boot a Linux system which supports loopback.cfg use grml-rescueboot as its a more powerful tool to boot Linux live systems like Grml and Ubuntu. If you want to boot non-Linux systems (BIOS-/Firmware-Updates/FreeDOS/ .) use grub-imageboot. Alright how do you deploy this solution? Just grab and install grml-rescueboot and/or grub-imageboot. To deploy grml-rescueboot (adjust $VERSION if you want to use another flavour):
# choose Grml version:
VERSION=grml64-medium_2010.12
# create directory
mkdir -p /boot/grml
# download and verify ISO
cd /boot/grml
wget download.grml.org/$ VERSION .iso ,.md5 
md5sum -c $ VERSION .iso.md5
To deploy grub-imageboot:
# choose Grml version:
VERSION=grml64-medium_2010.12
# create directory
mkdir -p /boot/images
# download and verify ISO
cd /boot/images
wget download.grml.org/$ VERSION .iso ,.md5 
md5sum -c $ VERSION .iso.md5
That s it! Now when running update-grub you should get something like:
# update-grub
Generating grub.cfg ...
Found linux image: /boot/vmlinuz-2.6.36-grml64
Found initrd image: /boot/initrd.img-2.6.36-grml64
Found memtest86+ image: /boot/memtest86+.bin
Found memtest86+ multiboot image: /boot/memtest86+_multiboot.bin
  No volume groups found
Found Grml ISO image: /boot/grml/grml64-medium_2010.12.iso
Found memdisk: /boot/memdisk
Found iso image: /boot/images/grml64-medium_2010.12.iso
done
Voil , when rebooting your system you should see something like: Screenshot: grml-rescueboot The Grml Rescue System entry is what grml-rescueboot provides and Bootable ISO Image is what s provided by grub-imageboot. Just select the entry you d like to use, press enter and and you should get the bootsplash of the according ISO. BTW: I ve tested this with Ubuntu 10.10 too, grml-rescueboot works out-of-the-box with the Ubuntu ISO as well, it just doesn t support the memdisk ISO boot by grub-imageboot (yet). Tip: if you want to use grml-rescueboot and grub-imageboot with the same ISOs without having them twice on the disk just point the configuration option IMAGES in /etc/default/grub-imageboot to the according directory (like /boot/grml). Note: if you re using the ext 2,3,4 filesystem on the partition that s being used for /boot please be aware that you need GRUB >=1.98+20100804-12 (newer version is already available in Debian/unstable and hopefully migrates to squeeze in time) or GRUB >=1.99~20101122-1 because of the INDIRECT_BLOCKS support, see #543924 for the details. PS: We think about providing grml-rescueboot and grub-imageboot within official Debian. If you re interested in seeing integration support for other distributions as well please help and drop us a note.

4 July 2010

Torsten Landschoff: Postprocessing conference videos

I was planning to attend DebConf New York this year, but for a number of reasons I decided not to go. Fortunately, Ferdinand Thommes organized a MiniDebConf in Berlin at LinuxTag and I managed to attend. Thanks, Ferdinand! There were a number of interesting Talks. I especially liked the talk of our DPL, and those about piuparts and git-buildpackage. In contrast to the other LinuxTag talks, we had a livestream of our talks and recorded (most) of them. The kudos for setting this up goes to Alexander Wirt, who spent quite a few hours to get it up and running. I have to apologize for being late in bringing my Notebook, which was intended to do the theora encoding of the livestream. This was a misunderstanding on my part, I should have known that this is not going to be setup in the night before show time So to compensate the extra hours he had to put in for me, I offered to do the post processing of the videos. Basic approach for post processing The main goal of post processing the videos was (of course) to compress them to a usable size from the original 143 GB. I also wanted to have a title on each video, and show the sponsors at the end of the video. My basic idea to implement that consisted of the following steps:
  1. Create a title animation template.
  2. Generate title animations from template for all talks.
  3. Use a video editor to create a playlist of the parts title talk epilogue.
  4. Run the video editor in batch mode to generate the combined video.
  5. Encode the resulting video as ogg theora.
As always with technology, it turned out that the original plan needed a few modifications. Title animations
<video controls="controls" height="288" src="http://www.landschoff.net/blog/uploads/2010/07/mdc2010_title_anim1.ogv" width="360 "></video>
Originally I wanted to use Blender for the title animation, but I knew it is quite a complicated bit of software. So I looked for something simpler, and stumbled across an article that pointed me towards Synfig Studio for 2D animation. This is also in Debian, so I gave it a try. I was delighted that Synfig Studio has a command line renderer which is just called synfig and that the file format is XML, which would make it simple to batch-create the title animations. My title template can be found in this git repository. Batch creation of title animations I used a combination of make and a simple python script to replace the author name and the title of the talk into the synfig XML file. The data for all talks is another XML file talks.xml. Basically, I used a simple XPath expression to find the relevant text node and change the data using the ElementTree API of lxml python module. The same could be done using XSLT of course (for a constant replacement, see this file) but I found it easier to combine two XML files in python. Note that I create PNG files with synfig and use ffmpeg to generate a DV file from those. Originally, I had synfig create DV files directly but those turned out quite gray for some reason. I am now unable to reproduce this problem. Combining the title animation with the talk For joining the title animation with the talk, I originally went with OpenShot, which somebody of the video team had running at the conference. My idea was to mix a single video manually and just replace the underlying data files for each talk. I expected that this would be easy using the openshot-render command, which renders the output video from the input clips and the OpenShot project file. However, OpenShot stores the video lengths in the project file and will take those literally, so this did not work for talks of different play times I considered working with Kino or Kdenlive but they did not look more appropriate for this use case. I noticed that OpenShot and Kdenlive both use the Media Lovin Toolkit under the hood, and OpenShot actually serializes the MLT configuration to $HOME/.openshot/sequence.xml when rendering. I first tried to read that XML file from python (using the mlt python bindings from the python-mlt2 package) but did not find an API function to do that. So I just hard coded the video sequence in python. I ran into a few gotchas on the way: Things to improve While the results look quite okay for me now, there is a lot of room for improvement. Availability

2 March 2009

Ingo Juergensmann: HowTo: Migrate RAID1 to RAID5

As I wrote last week (German), I had the plan to make a RAID5 out of my existing RAID1 after I received my third 1 TB harddisk. Migrating to RAID5 doesn't work online yet, because mdadm doesn't support the use of --grow and --level together, as you can read in the man page when looking for "-l, --level":


-l, --level=
Set raid level. When used with --create, options are: linear, raid0, 0, stripe, raid1, 1, mirror, raid4,
4, raid5, 5, raid6, 6, raid10, 10, multipath, mp, faulty. Obviously some of these are synonymous.

When used with --build, only linear, stripe, raid0, 0, raid1, multipath, mp, and faulty are valid.

Not yet supported with --grow.


But there's always a way to obtain your goal. If you're brave enough you can find the solution by a quick internet search. Ok, I'm not that brave to try this without a backup, but you shouldn't do that kind of work without having a good backup anyway.

But before I describe the way how to do it, as usual a strong disclaimer: use this howto on your own risk! There's no warranty that this will work for you and you should take care to have a backup of your data. Whoever is doing this migration without having a backup risks the loss of all the data!

I'll describe the migration using an example... there are three logical volumes (LVs) with 1 GB in size each. This is enough for testing purposes and demonstrates how the migration works. You can use real partitions like swap partitions if you want. The LVs are r1, r2, and r3 and are within the Volume Group (VG) /dev/vg:


r1 vg -wi-a- 1.00G
r2 vg -wi-a- 1.00G
r3 vg -wi-a- 1.00G


Two of these LVs are being setup for a RAID1 /dev/md5:

CODE:
muaddib:/home/ij# mdadm --create /dev/md5 -l 1 -n 2 /dev/vg/r[12]
mdadm: /dev/vg/r1 appears to be part of a raid array:
level=raid1 devices=2 ctime=Thu Feb 26 16:16:59 2009
mdadm: /dev/vg/r2 appears to be part of a raid array:
level=raid1 devices=2 ctime=Thu Feb 26 16:16:59 2009
Continue creating array? yes
mdadm: array /dev/md5 started.


The warning to be a part of another raid array can be ignored. It's just a testing array. The result should look similar to:


md5 : active raid1 dm-12[1] dm-11[0]
1048512 blocks [2/2] [UU]
[========>............] resync = 40.2% (422900/1048512) finish=0.7min speed=14096K/sec


Now a filesystem can be created on the raid array and some data can be stored there:

CODE:
muaddib:/home/ij# mkfs.xfs -f /dev/md5
meta-data=/dev/md5 isize=256 agcount=4, agsize=65532 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=262128, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=1200, version=2
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
muaddib:/home/ij# mount /dev/md5 /mnt
muaddib:/home/ij# cp pics/backdrops/*.jpg /mnt/
muaddib:/home/ij# dir /mnt/
amigager_back1.jpg Dansk-Dunes14.jpg Dansk-Nature-2.jpg
CapeCod_085.jpg Dansk-Dunes15.jpg Dansk-Priel.jpg
ct-1-p1600.jpg Dansk-Dunes16.jpg Dansk-Woods-1.jpg
ct-2-p1600.jpg Dansk-Dunes17.jpg earth_lights_lrg.jpg
Dansk-Beach1.jpg Dansk-Dunes18.jpg HanseSail-AlterStrom2.jpg
Dansk-Beach2.jpg Dansk-Dunes2.jpg HanseSail-AlterStrom.jpg
Dansk-Beach3.jpg Dansk-Dunes3.jpg HanseSail-Mast.jpg
Dansk-Beach4.jpg Dansk-Dunes4.jpg HanseSail-Warnemuende.jpg
Dansk-Beach5.jpg Dansk-Dunes5.jpg LinuxInside_p1600.jpg
Dansk-Beachstones.jpg Dansk-Dunes6.jpg prerow.jpg
Dansk-Cliffs.jpg Dansk-Dunes7.jpg sgi-1440.jpg
Dansk-Dunes10.jpg Dansk-Dunes8.jpg sgi.jpg
Dansk-Dunes11.jpg Dansk-Dunes9.jpg Sonnenuntergang-2.jpg
Dansk-Dunes12.jpg Dansk-Dunes.jpg Sonnenuntergang.jpg
Dansk-Dunes13.jpg Dansk-Nature-1.jpg
muaddib:/home/ij# umount /mnt


After umounting the real challenge arise: creation of a RAID5 label! Normally this is no issue, but I guess you like your data and think twice before proceeding... but hey! You have a backup anyway, don't you?!

CODE:
muaddib:/home/ij# mdadm --create /dev/md5 --level=5 --raid-devices=2 /dev/vg/r[12]
mdadm: /dev/vg/r1 appears to be part of a raid array:
level=raid1 devices=2 ctime=Mon Mar 2 18:18:09 2009
mdadm: /dev/vg/r2 appears to be part of a raid array:
level=raid1 devices=2 ctime=Mon Mar 2 18:18:09 2009
Continue creating array? yes
mdadm: array /dev/md5 started.
muaddib:/home/ij# cat /proc/mdstat
md5 : active (auto-read-only) raid5 dm-12[2](S) dm-11[0]
1048512 blocks level 5, 64k chunk, algorithm 2 [2/1] [U_]
muaddib:/home/ij# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md5 : active raid5 dm-12[2] dm-11[0]
1048512 blocks level 5, 64k chunk, algorithm 2 [2/1] [U_]
[>....................] recovery = 3.1% (33152/1048512) finish=0.5min speed=33152K/sec


After stopping /dev/md5 the new RAID5 label can be written. The data ought to be safe because of the RAID5 algorithm when running with 2 disks. To verify the theory you can mount /dev/md5 again. The next steps can be done with the array online.


muaddib:/home/ij# mount /dev/md5 /mnt
muaddib:/home/ij# dir /mnt/
amigager_back1.jpg Dansk-Dunes14.jpg Dansk-Nature-2.jpg
CapeCod_085.jpg Dansk-Dunes15.jpg Dansk-Priel.jpg
ct-1-p1600.jpg Dansk-Dunes16.jpg Dansk-Woods-1.jpg
ct-2-p1600.jpg Dansk-Dunes17.jpg earth_lights_lrg.jpg
Dansk-Beach1.jpg Dansk-Dunes18.jpg HanseSail-AlterStrom2.jpg
Dansk-Beach2.jpg Dansk-Dunes2.jpg HanseSail-AlterStrom.jpg
Dansk-Beach3.jpg Dansk-Dunes3.jpg HanseSail-Mast.jpg
Dansk-Beach4.jpg Dansk-Dunes4.jpg HanseSail-Warnemuende.jpg
Dansk-Beach5.jpg Dansk-Dunes5.jpg LinuxInside_p1600.jpg
Dansk-Beachstones.jpg Dansk-Dunes6.jpg prerow.jpg
Dansk-Cliffs.jpg Dansk-Dunes7.jpg sgi-1440.jpg
Dansk-Dunes10.jpg Dansk-Dunes8.jpg sgi.jpg
Dansk-Dunes11.jpg Dansk-Dunes9.jpg Sonnenuntergang-2.jpg
Dansk-Dunes12.jpg Dansk-Dunes.jpg Sonnenuntergang.jpg
Dansk-Dunes13.jpg Dansk-Nature-1.jpg


The next step is to add the third "disk":

CODE:
muaddib:/home/ij# mdadm --add /dev/md5 /dev/vg/r3
mdadm: added /dev/vg/r3
muaddib:/home/ij# cat /proc/mdstat
md5 : active raid5 dm-13[2](S) dm-12[1] dm-11[0]
1048512 blocks level 5, 64k chunk, algorithm 2 [2/2] [UU]


That was the first step. As you can see the third "disk" has been added as a spare device. Because we don't want a spare but a RAID5 we need to expand the number of disks to 3 and tell the RAID to grow it's capacity:

CODE:
muaddib:/home/ij# mdadm --grow /dev/md5 --raid-devices=3
mdadm: Need to backup 128K of critical section..
mdadm: ... critical section passed.
muaddib:/home/ij# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md5 : active raid5 dm-13[2] dm-12[1] dm-11[0]
1048512 blocks super 0.91 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
[>....................] reshape = 3.1% (33472/1048512) finish=3.0min speed=5578K/sec


The last two commands were made with the mounted filesystem that just needs to get enlarged. With XFS you make this happen by invoking xfs_growfs:

CODE:
muaddib:/home/ij# df -h /mnt
Filesystem Size Used Avail Use% Mounted on
/dev/md5 1020M 21M 1000M 2% /mnt
muaddib:/home/ij# xfs_growfs /mnt
meta-data=/dev/md5 isize=256 agcount=4, agsize=65532 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=262128, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=1200, version=2
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
data blocks changed from 262128 to 524256
muaddib:/home/ij# df -h /mnt
Filesystem Size Used Avail Use% Mounted on
/dev/md5 2.0G 21M 2.0G 1% /mnt


Done!

If you have an LVM on your RAID you'll need to use the appropriate LVM tools to enlarge the PVs and VGs as well. But the hardest work is already done. The reshaping can take a lot of time, as you can see here:


md4 : active raid5 sda6[0] sdc6[2] sdb6[1]
969161152 blocks super 0.91 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
[>....................] reshape = 2.8% (27990592/969161152) finish=1485.0min speed=10562K/sec


There might be other fallpits if you change your setup from 2-disk arrays to 3-disk arrays. For example you'll need to adopt the changes to your configuration files like /etc/mdadm/mdadm.conf. But this has not much to do with the migration from RAID1 to RAID5 itself.

Many thanks to Goswin Brederlow for his advice and writing a wishlist bug: #517731

7 November 2006

Julien Danjou: The man who did not know he had an amd64

On sunday, I was looking around at the /proc/cpuinfo on one of my last server. I saw that this Pentium 4 had a lot more of cpu flags that the one on my workstation. I discovered the nx flags and its purpose some days before, but I did not know what the lm flags was for... Oh my god, that's the 64 bits support. This box is an amd64 and it was installed as an i386. That's like using a knife to kill a kitten when you have an axe! So, even if the box was 800 km away from me, I decided to reinstall it from scratch, with the help of a serial cable connected on it. That was so easy. I just love Debian for such things. In the end, I'm happy, even if everyone is wondering why I killed a server during 10 hours just because it's better.