Bcachefs

From Roy's somewhat wise thoughts
Jump to navigation Jump to search

The bcachefs filesystem started out as bcache a blocklevel caching (hence the name) for Linux to allow for faster disks (SSD/NVME) caching slower disks (HDD). At some point, they found out it was almost a filesystem already, so they set off to make it one, adding fs at the end.

Installation

Kernel

The filesystem is currently in development and hasn't been accepted into the linux source tree. It is available for download from their site and from there can be compiled up with the kernel it comes with (5.13.0 at the time of writing). git pull the kernel out and enter the directory with it. Then cp /boot/config-$( uname -r ) .config and then, edit the .config file and look for CONFIG_SYSTEM_TRUSTED_KEYS="debian/certs/debian-uefi-certs.pem" (or similar if on another distro than Debian). Comment this out (and probably the lines above, they'll be re-added automatically, but without that .pem file, another will be generated instead). Exit and save and run make menuconfig (you'll need to install flex, bison and ncurses-dev first) and then enter Filesystems and enable bcachefs. Save and exit and type

# CPUCOUNT=$( lscpu | awk '/^CPU.s/ { print $2 }' )
# J=$(( $CPUCOUNT + 1 ))
# make -j$J
# make modules modules_install
# make install

bcachefs-tools

Now, with the userspace tools you also pulled from git, bcachefs-tools, make sure to read the INSTALL file and install needed dependencies.

Make Debian package

This makes a package to install with apt/dpkg. This shoul

# apt install debhelper
# cd /path/to/bcachefs-tools
# dpkg-buildpackage

Directly from source

# make all install

Replicated setup

Using a small VM on KVM/libvirt, we'll setup an initial test on vdisks

vDisks
Device name Size Storage backend
/dev/sda 2GB HDD
/dev/sdb 2GB HDD
/dev/sdc 1GB SSD
/dev/sdd 1GB SSD
/dev/sde 1GB HDD
/dev/sdg 1GB HDD
/dev/sdh 8GB HDD
/dev/sdj 10GB HDD
/dev/sdj 10GB HDD
sda             8:0    0    2G  0 disk
sdb             8:16   0    2G  0 disk
sdc             8:32   0    1G  0 disk
sdd             8:48   0    1G  0 disk
sde             8:64   0    5G  0 disk
sdg             8:96   0    5G  0 disk
sdh             8:112  0    8G  0 disk
sdi             8:128  0   10G  0 disk
sdj             8:144  0   10G  0 disk


<pre style="color: grey">
bcachefs format --compression=lz4 \
--encrypted \
--replicas=2 \
--label=ssd.ssd1 /dev/sda \
--label=ssd.ssd2 /dev/sdb \
--label=hdd.hdd1 /dev/sdc \
--label=hdd.hdd2 /dev/sdd \
--label=hdd.hdd3 /dev/sde \
--label=hdd.hdd4 /dev/sdf \
--foreground_target=ssd \
--promote_target=ssd \
--background_target=hdd
# mount -t bcachefs /dev/vdh:/dev/vdi:/dev/vdj /mnt

RAID-5 like test

bcache format -help tells me

--erasure_code Enable erasure coding (DO NOT USE YET)

But someone on the Internet (specifically someone on the #bcache IRC channel at irc.oftc.net) told me to do so anyway, so here we go

root@raidtest:~# bcachefs format --erasure_code --replicas=2 /dev/sd[hij]
External UUID:			e99690cd-08c2-4bfe-b649-22dfdfab0db6
Internal UUID:			2a62bd0c-fa9c-48d7-b8bd-bb8acdae04de
Device index:			2
Label:
Version:			15
Oldest version on disk:		15
Created:			Tue Sep 28 23:38:44 2021
Squence number:			0
Block_size:			512
Btree node size:		256.0K
Error action:			ro
Clean:				0
Features:			new_siphash,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:
Metadata replicas:		2
Data replicas:			2
Metadata checksum type:		crc32c (1)
Data checksum type:		crc32c (1)
Compression type:		none (0)
Foreground write target:	none
Background write target:	none
Promote target:			none
Metadata target:                none
String hash type:		siphash (2)
32 bit inodes:			1
GC reserve percentage:		8%
Root reserve percentage:	0%
Devices:			3 live, 3 total
Sections:			members
Superblock size:		928

Members (size 176):
  Device 0:
    UUID:			c0609354-2b1c-4a2e-a8fc-fea9ad04b487
    Size:			16.0G
    Bucket size:		256.0K
    First bucket:		0
    Buckets:			65536
    Last mount:			(never)
    State:			rw
    Group:			(none)
    Data allowed:		journal,btree,user
    Has data:			(none)
    Replacement policy:		lru
    Discard:			0
  Device 1:
    UUID:			e5fa513f-ec29-486d-bad6-3395ba265825
    Size:			16.0G
    Bucket size:		256.0K
    First bucket:		0
    Buckets:			65536
    Last mount:			(never)
    State:			rw
    Group:			(none)
    Data allowed:		journal,btree,user
    Has data:			(none)
    Replacement policy:		lru
    Discard:			0
  Device 2:
    UUID:			f60cee55-dc13-484c-8691-9b02d3b28f9a
    Size:			16.0G
    Bucket size:		256.0K
    First bucket:		0
    Buckets:			65536
    Last mount:			(never)
    State:			rw
    Group:			(none)
    Data allowed:		journal,btree,user
    Has data:			(none)
    Replacement policy:		lru
    Discard:			0
initializing new filesystem
going read-write
mounted with opts: metadata_replicas=2,data_replicas=2,erasure_code
root@raidtest:~#

Mount it?

Now, good, we have a nice and stable filesystem, I think, so let's mount it. The syntax used here is quite peculiar, but we may get used to it some day

# mount -t bcachefs /dev/vdh:/dev/vdi:/dev/vdj /mnt

df now reports the filesystem being 44GB, it should be 48GB gross, but hell. I copy in 3.4GB and it now shows 8.7GB used. Perhaps I gave it too many replicas? Should the erasure code be sufficient without two replias? I don't know. We'll see. I'm tired zzzzz