Bcachefs
The bcachefs filesystem started out as bcache a blocklevel caching (hence the name) for Linux to allow for faster disks (SSD/NVME) caching slower disks (HDD). At some point, they found out it was almost a filesystem already, so they set off to make it one, adding fs at the end.
Installation
Kernel
The filesystem is currently in development and hasn't been accepted into the linux source tree. It is available for download from their site and from there can be compiled up with the kernel it comes with (5.13.0 at the time of writing). git pull the kernel out and enter the directory with it. Then cp /boot/config-$( uname -r ) .config and then, edit the .config file and look for CONFIG_SYSTEM_TRUSTED_KEYS="debian/certs/debian-uefi-certs.pem" (or similar if on another distro than Debian). Comment this out (and probably the lines above, they'll be re-added automatically, but without that .pem file, another will be generated instead). Exit and save and run make menuconfig (you'll need to install flex, bison and ncurses-dev first) and then enter Filesystems and enable bcachefs. Save and exit and type
# CPUCOUNT=$( lscpu | awk '/^CPU.s/ { print $2 }' ) # J=$(( $CPUCOUNT + 1 )) # make -j$J # make modules modules_install # make install
bcachefs-tools
Now, with the userspace tools you also pulled from git, bcachefs-tools, make sure to read the INSTALL file and install needed dependencies.
Make Debian package
This makes a package to install with apt/dpkg. This should work with other Debian-based distros as well, like Ubuntu and their deratives etc
# apt install debhelper # cd /path/to/bcachefs-tools # dpkg-buildpackage
This will probabaly give an error at the end, looking something like this
dpkg-deb: building package 'bcachefs-tools-dbgsym' in '../bcachefs-tools-dbgsym_1.0.8-2~bpo8+1_amd64.deb'. dpkg-deb: building package 'bcachefs-tools' in '../bcachefs-tools_1.0.8-2~bpo8+1_amd64.deb'. dpkg-genbuildinfo dpkg-genchanges >../bcachefs-tools_1.0.8-2~bpo8+1_amd64.changes dpkg-genchanges: info: including full source code in upload dpkg-source --after-build . dpkg-buildpackage: info: full upload; Debian-native package (full source is included) signfile bcachefs-tools_1.0.8-2~bpo8+1.dsc gpg: skipped "Mathieu Parent <sathieu@debian.org>": No secret key gpg: dpkg-sign.Q29ayekM/bcachefs-tools_1.0.8-2~bpo8+1.dsc: clear-sign failed: No secret key
This is ok, since we don't have Matheiu's private key. Still, we have the packages in the parent directory
# cd .. # apt install ./bcachefs-tools_1.0.8-2~bpo8+1_amd64.deb … N: Download is performed unsandboxed as root as file '/root/src/git/bcachefs/bcachefs-tools_1.0.8-2~bpo8+1_amd64.deb' couldn't be accessed by user '_apt'. - pkgAcquire::Run (13: Permission denied)
I never found out about the error message unsandboxed message above, but it installed correctly and works.
RHEL and friends
Under the packaging/ dir, there are spec files for building packages for EPEL/Fedora etc, which should work with RHEL, CentOS, Rocky and Alma linux. I have not tested this personally.
Directly from source
If your OS/Distro is not listed above, this will install to /usr/local
# make all install
Replicated setup
Using a small VM on KVM/libvirt, we'll setup an initial test on vdisks
Device name | Size | Storage backend |
---|---|---|
/dev/sda | 2GB | HDD |
/dev/sdb | 2GB | HDD |
/dev/sdc | 1GB | SSD |
/dev/sdd | 1GB | SSD |
/dev/sde | 5GB | HDD |
/dev/sdg | 5GB | HDD |
/dev/sdh | 8GB | HDD |
/dev/sdj | 10GB | HDD |
/dev/sdi | 10GB | HDD |
Create filesystem
# bcachefs format --compression=lz4 \ --replicas=2 \ --discard \ --label=hdd.hdd1 /dev/sda \ --label=hdd.hdd2 /dev/sdb \ --label=hdd.hdd3 /dev/sde \ --label=hdd.hdd4 /dev/sdg \ --label=hdd.hdd5 /dev/sdh \ --label=hdd.hdd6 /dev/sdj \ --label=hdd.hdd7 /dev/sdi \ --label=ssd.ssd1 /dev/sdc \ --label=ssd.ssd2 /dev/sdd \ --foreground_target=ssd \ --promote_target=ssd \ --background_target=hdd External UUID: 58040b4b-f0f1-4755-b7c8-7955261eb132 Internal UUID: 939e5b2d-dd69-4362-a91b-0422918240e3 Device index: 8 Label: Version: new_data_types Oldest version on disk: new_data_types Created: Fri Apr 29 18:48:16 2022 Sequence number: 0 Superblock size: 1800 Clean: 0 Devices: 9 Sections: members,disk_groups Features: new_siphash,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes Compat features: Options: block_size: 512 btree_node_size: 128k errors: continue [ro] panic metadata_replicas: 2 data_replicas: 2 metadata_replicas_required: 1 data_replicas_required: 1 encoded_extent_max: 64k metadata_checksum: none [crc32c] crc64 xxhash data_checksum: none [crc32c] crc64 xxhash compression: none [lz4] gzip zstd background_compression: [none] lz4 gzip zstd str_hash: crc32c crc64 [siphash] metadata_target: none foreground_target: ssd background_target: hdd promote_target: ssd erasure_code: 0 inodes_32bit: 1 shard_inode_numbers: 1 inodes_use_key_cache: 1 gc_reserve_percent: 8 gc_reserve_bytes: 0 root_reserve_percent: 0 wide_macs: 0 acl: 1 usrquota: 0 grpquota: 0 prjquota: 0 journal_flush_delay: 1000 journal_flush_disabled: 0 journal_reclaim_delay: 100 journal_transaction_names: 1 members (size 512): Device: 0 UUID: 5580a092-9fe8-4ae7-9ebb-96365e620522 Size: 2G Bucket size: 256k First bucket: 0 Buckets: 8192 Last mount: (never) State: rw Group: hdd1 (1) Data allowed: journal,btree,user Has data: (none) Discard: 1 Freespace initialized: 0 Device: 1 UUID: 962562ea-7cd4-4e66-989b-c7098e8b5e73 Size: 2G Bucket size: 256k First bucket: 0 Buckets: 8192 Last mount: (never) State: rw Group: hdd2 (2) Data allowed: journal,btree,user Has data: (none) Discard: 1 Freespace initialized: 0 Device: 2 UUID: e5e21ba5-fcfc-4857-9f4f-47b70cdd9597 Size: 5G Bucket size: 256k First bucket: 0 Buckets: 20480 Last mount: (never) State: rw Group: hdd3 (3) Data allowed: journal,btree,user Has data: (none) Discard: 1 Freespace initialized: 0 Device: 3 UUID: b165f084-bf83-4cca-86cf-b0a8e5467ad5 Size: 5G Bucket size: 256k First bucket: 0 Buckets: 20480 Last mount: (never) State: rw Group: hdd4 (4) Data allowed: journal,btree,user Has data: (none) Discard: 1 Freespace initialized: 0 Device: 4 UUID: 9bf41b82-3d49-4dbc-ae7f-4554c360a79b Size: 8G Bucket size: 256k First bucket: 0 Buckets: 32768 Last mount: (never) State: rw Group: hdd5 (5) Data allowed: journal,btree,user Has data: (none) Discard: 1 Freespace initialized: 0 Device: 5 UUID: acbea8d1-6b32-4d0d-8600-ec28f2acd426 Size: 10G Bucket size: 256k First bucket: 0 Buckets: 40960 Last mount: (never) State: rw Group: hdd6 (6) Data allowed: journal,btree,user Has data: (none) Discard: 1 Freespace initialized: 0 Device: 6 UUID: 3da85ced-9236-48e0-b3df-e20b9f76bce1 Size: 10G Bucket size: 256k First bucket: 0 Buckets: 40960 Last mount: (never) State: rw Group: hdd7 (7) Data allowed: journal,btree,user Has data: (none) Discard: 1 Freespace initialized: 0 Device: 7 UUID: 24813df6-7302-4a2f-8f4f-2d8c0047d2e9 Size: 1G Bucket size: 128k First bucket: 0 Buckets: 8192 Last mount: (never) State: rw Group: ssd1 (9) Data allowed: journal,btree,user Has data: (none) Discard: 1 Freespace initialized: 0 Device: 8 UUID: d196ba45-1a1d-4e37-a9b3-d155f1dae703 Size: 1G Bucket size: 128k First bucket: 0 Buckets: 8192 Last mount: (never) State: rw Group: ssd2 (10) Data allowed: journal,btree,user
Configure fstab and mount the filesystem
Add a line to /etc/fstab with the Global UUID as mentioned after create along with a mountpoint, giving filesystem type as bcachefs.sh to use a custom script to mount it.
Note that the syntax above does not use the UUID=xxxx as is normal for other filesystems.
58040b4b-f0f1-4755-b7c8-7955261eb132 /bcachefs bcachefs.sh defaults 0 0
The script is distributed with bcachefs-tools. Mount it and check
# mount -a # df -hT /bcachefs/ Filesystem Type Size Used Avail Use% Mounted on /dev/sda:/dev/sdb:/dev/sdf:/dev/sdg:/dev/sdh:/dev/sdj:/dev/sdi:/dev/sdc:/dev/sdd bcachefs 41G 17G 24G 41% /bcachefs
RAID-5 like test
bcache format -help tells me
--erasure_code Enable erasure coding (DO NOT USE YET)
But someone on the Internet (specifically someone on the #bcache IRC channel at irc.oftc.net) told me to do so anyway, so here we go
root@raidtest:~# bcachefs format --erasure_code --replicas=2 /dev/sd[hij] External UUID: e99690cd-08c2-4bfe-b649-22dfdfab0db6 Internal UUID: 2a62bd0c-fa9c-48d7-b8bd-bb8acdae04de Device index: 2 Label: Version: 15 Oldest version on disk: 15 Created: Tue Sep 28 23:38:44 2021 Squence number: 0 Block_size: 512 Btree node size: 256.0K Error action: ro Clean: 0 Features: new_siphash,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes Compat features: Metadata replicas: 2 Data replicas: 2 Metadata checksum type: crc32c (1) Data checksum type: crc32c (1) Compression type: none (0) Foreground write target: none Background write target: none Promote target: none Metadata target: none String hash type: siphash (2) 32 bit inodes: 1 GC reserve percentage: 8% Root reserve percentage: 0% Devices: 3 live, 3 total Sections: members Superblock size: 928 Members (size 176): Device 0: UUID: c0609354-2b1c-4a2e-a8fc-fea9ad04b487 Size: 16.0G Bucket size: 256.0K First bucket: 0 Buckets: 65536 Last mount: (never) State: rw Group: (none) Data allowed: journal,btree,user Has data: (none) Replacement policy: lru Discard: 0 Device 1: UUID: e5fa513f-ec29-486d-bad6-3395ba265825 Size: 16.0G Bucket size: 256.0K First bucket: 0 Buckets: 65536 Last mount: (never) State: rw Group: (none) Data allowed: journal,btree,user Has data: (none) Replacement policy: lru Discard: 0 Device 2: UUID: f60cee55-dc13-484c-8691-9b02d3b28f9a Size: 16.0G Bucket size: 256.0K First bucket: 0 Buckets: 65536 Last mount: (never) State: rw Group: (none) Data allowed: journal,btree,user Has data: (none) Replacement policy: lru Discard: 0 initializing new filesystem going read-write mounted with opts: metadata_replicas=2,data_replicas=2,erasure_code root@raidtest:~#
Mount it?
Now, good, we have a nice and stable filesystem, I think, so let's mount it. The syntax used here is quite peculiar, but we may get used to it some day
# mount -t bcachefs /dev/vdh:/dev/vdi:/dev/vdj /mnt
df now reports the filesystem being 44GB, it should be 48GB gross, but hell. I copy in 3.4GB and it now shows 8.7GB used. Perhaps I gave it too many replicas? Should the erasure code be sufficient without two replias? I don't know. We'll see. I'm tired zzzzz