Escaping BTRFS bugs
Today I’d like to share a small yet valuable trick I had to learn while dealing with problems in the btrfs filesystem.
I switched to the filesystem about 8 months ago to better manage space allocation and shutdown times, as it should be quicker to recover upon errors. So far the filesystem performed admirably considering its reputation, but today I hit a nasty bug in the (behold) free space handling code.
Fiddling with large datasets, I managed to fill the whole volume, allocating chunks on the whole RAID1 configuration I setup. After removing the data though, things didn’t go as planned. I run rebalances from time to time to reclaim large contiguous free space regions, and this time it seemed appropriate, considering I filled the RAID with 2TB worth of data and reclaimed ~1.75TB after a few hours. Should be easy right?
# btrfs balance start /storage_vol
A nice ENOSPC error greets me. I re-checked I had free space available, nothing weird here (sample stat):
# btrfs fi usage /storage_vol Overall: Device size: 3.58TiB Device allocated: 944.06GiB Device unallocated: 0.00B Device missing: 0.00B Used: 940.34GiB Free (estimated): 1.33TiB (min: 1.33TiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID1: Size:469.00GiB, Used:468.46GiB /dev/sda3 469.00GiB /dev/sdb3 469.00GiB Metadata,RAID1: Size:3.00GiB, Used:1.70GiB /dev/sda 33.00GiB /dev/sdb 33.00GiB System,RAID1: Size:32.00MiB, Used:76.00KiB /dev/sda3 32.00MiB /dev/sdb3 32.00MiB Unallocated: /dev/sda 31.33TiB /dev/sdb 31.33TiB
There’s even global reserve available, so what is going on here?
Apparently btrfs is unable to detect that there’s plenty of free space for rebalancing, even though the space is available - I can write new files, I tried writing 100GB and everything works. Using the -dusage=99 parameter (with any value) didn’t have any effects, I would still get the “not enough free space” errors.
It seems as if the rebalance code is checking whether there is “Device unallocated” space, and if none is available it’s not even checking that there’s actually “Free (estimated)” space. At this point, I was thinking whether I would have ever been able to rebalance at all again, since I had no way of relocating the whole volume and recreating it from scratch.
After one hour spent browsing bug reports, I begun wondering if there was a simple hack that would be able to force the data in being relocated. It was at that point that I thought of the possibility of shrinking the volume. Btrfs, unlike XFS and ZFS, allows shrinking volumes, and since such operation is a safe operation, it should trigger data rebalancing and moving in case data was present in a zone-to-free area.
# btrfs fi show Label: 'storage' uuid: 949887cf-6716-4c4d-ad19-64a87dd88005 Total devices 2 FS bytes used 470.17GiB devid 1 size 1.79TiB used 472.03GiB path /dev/sdb3 devid 2 size 1.79TiB used 472.03GiB path /dev/sda3
Btrfs allows resizing single devices indipendently, the RAID1 replication is enforced at chunk level. I supposed that in order to allow rebalance I would have needed to free space on both devices, otherwise it would have run on one of the two devices but not on the second, because of lack of free space. Easy enough, the chunks were filled mostly with metadata so it didn’t take too long to run the resize:
# btrfs fi resize 1:-3G Resize '/storage_vol/' of '1:-3G' # btrfs fi resize 2:-3G Resize '/storage_vol/' of '2:-3G'
I had now a smaller filesystem that would still have triggered the bug because there would be no unallocated space. So, there goes the magic:
# btrfs fi resize 1:max Resize '/storage_vol/' of '1:max' # btrfs fi resize 2:max Resize '/storage_vol/' of '2:max'
Expanding the filesystem again we get:
# btrfs fi usage /storage_vol Overall: Device size: 3.58TiB Device allocated: 944.06GiB Device unallocated: 6.00GiB Device missing: 0.00B Used: 940.34GiB Free (estimated): 1.33TiB (min: 1.33TiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID1: Size:469.00GiB, Used:468.46GiB /dev/sda3 469.00GiB /dev/sdb3 469.00GiB Metadata,RAID1: Size:3.00GiB, Used:1.70GiB /dev/sda 33.00GiB /dev/sdb 33.00GiB System,RAID1: Size:32.00MiB, Used:76.00KiB /dev/sda3 32.00MiB /dev/sdb3 32.00MiB Unallocated: /dev/sda 31.33TiB /dev/sdb 31.33TiB
Now the volume has 6GB unallocated! No more ENOSPC!
Note: when I started writing this article, I was running kernel 4.6. I’m now on 4.8 and cannot be sure whether this can still be reproduced.