Bcachefs

From Roy's somewhat wise thoughts
Jump to navigation Jump to search

The bcachefs filesystem started out as bcache a blocklevel caching (hence the name) for Linux to allow for faster disks (SSD/NVME) caching slower disks (HDD). At some point, they found out it was almost a filesystem already, so they set off to make it one, adding fs at the end.

Installation

Kernel

The filesystem is currently in development and hasn't been accepted into the linux source tree. It is available for download from their site and from there can be compiled up with the kernel it comes with (5.13.0 at the time of writing). git pull the kernel out and enter the directory with it. Then cp /boot/config-$( uname -r ) .config and then, edit the .config file and look for CONFIG_SYSTEM_TRUSTED_KEYS="debian/certs/debian-uefi-certs.pem" (or similar if on another distro than Debian). Comment this out (and probably the lines above, they'll be re-added automatically, but without that .pem file, another will be generated instead). Exit and save and run make menuconfig (you'll need to install flex, bison and ncurses-dev first) and then enter Filesystems and enable bcachefs. Save and exit and type

# CPUCOUNT=$( lscpu | awk '/^CPU.s/ { print $2 }' )
# J=$(( $CPUCOUNT + 1 ))
# make -j$J
# make modules modules_install
# make install

bcachefs-tools

Now, with the userspace tools you also pulled from git, bcachefs-tools, make sure to read the INSTALL file and install needed dependencies.

Make Debian package

This makes a package to install with apt/dpkg. This should work with other Debian-based distros as well, like Ubuntu and their deratives etc

# apt install debhelper
# cd /path/to/bcachefs-tools
# dpkg-buildpackage

This will probabaly give an error at the end, looking something like this

dpkg-deb: building package 'bcachefs-tools-dbgsym' in '../bcachefs-tools-dbgsym_1.0.8-2~bpo8+1_amd64.deb'.
dpkg-deb: building package 'bcachefs-tools' in '../bcachefs-tools_1.0.8-2~bpo8+1_amd64.deb'.
 dpkg-genbuildinfo
 dpkg-genchanges  >../bcachefs-tools_1.0.8-2~bpo8+1_amd64.changes
dpkg-genchanges: info: including full source code in upload
 dpkg-source --after-build .
dpkg-buildpackage: info: full upload; Debian-native package (full source is included)
 signfile bcachefs-tools_1.0.8-2~bpo8+1.dsc
gpg: skipped "Mathieu Parent <sathieu@debian.org>": No secret key
gpg: dpkg-sign.Q29ayekM/bcachefs-tools_1.0.8-2~bpo8+1.dsc: clear-sign failed: No secret key

This is ok, since we don't have Matheiu's private key. Still, we have the packages in the parent directory

# cd ..
# apt install ./bcachefs-tools_1.0.8-2~bpo8+1_amd64.deb
…
N: Download is performed unsandboxed as root as file '/root/src/git/bcachefs/bcachefs-tools_1.0.8-2~bpo8+1_amd64.deb' couldn't be accessed by user '_apt'. - pkgAcquire::Run (13: Permission denied)

I never found out about the error message unsandboxed message above, but it installed correctly and works.

RHEL and friends

Under the packaging/ dir, there are spec files for building packages for EPEL/Fedora etc, which should work with RHEL, CentOS, Rocky and Alma linux. I have not tested this personally.

Directly from source

If your OS is not listed above, this will install to /usr/local

# make all install

Replicated setup

Using a small VM on KVM/libvirt, we'll setup an initial test on vdisks

vDisks
Device name Size Storage backend
/dev/sda 2GB HDD
/dev/sdb 2GB HDD
/dev/sdc 1GB SSD
/dev/sdd 1GB SSD
/dev/sde 5GB HDD
/dev/sdg 5GB HDD
/dev/sdh 8GB HDD
/dev/sdj 10GB HDD
/dev/sdi 10GB HDD

Create filesystem

# bcachefs format --compression=lz4 \
    --replicas=2 \
    --discard \
    --label=hdd.hdd1 /dev/sda \
    --label=hdd.hdd2 /dev/sdb \
    --label=hdd.hdd3 /dev/sde \
    --label=hdd.hdd4 /dev/sdg \
    --label=hdd.hdd5 /dev/sdh \
    --label=hdd.hdd6 /dev/sdj \
    --label=hdd.hdd7 /dev/sdi \
    --label=ssd.ssd1 /dev/sdc \
    --label=ssd.ssd2 /dev/sdd \
    --foreground_target=ssd \
    --promote_target=ssd \
    --background_target=hdd
External UUID:                  58040b4b-f0f1-4755-b7c8-7955261eb132
Internal UUID:                  939e5b2d-dd69-4362-a91b-0422918240e3
Device index:                   8
Label:
Version:                        new_data_types
Oldest version on disk:         new_data_types
Created:                        Fri Apr 29 18:48:16 2022
Sequence number:                0
Superblock size:                1800
Clean:                          0
Devices:                        9
Sections:                       members,disk_groups
Features:                       new_siphash,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:

Options:
  block_size:                   512
  btree_node_size:              128k
  errors:                       continue [ro] panic
  metadata_replicas:            2
  data_replicas:                2
  metadata_replicas_required:   1
  data_replicas_required:       1
  encoded_extent_max:           64k
  metadata_checksum:            none [crc32c] crc64 xxhash
  data_checksum:                none [crc32c] crc64 xxhash
  compression:                  none [lz4] gzip zstd
  background_compression:       [none] lz4 gzip zstd
  str_hash:                     crc32c crc64 [siphash]
  metadata_target:              none
  foreground_target:            ssd
  background_target:            hdd
  promote_target:               ssd
  erasure_code:                 0
  inodes_32bit:                 1
  shard_inode_numbers:          1
  inodes_use_key_cache:         1
  gc_reserve_percent:           8
  gc_reserve_bytes:             0
  root_reserve_percent:         0
  wide_macs:                    0
  acl:                          1
  usrquota:                     0
  grpquota:                     0
  prjquota:                     0
  journal_flush_delay:          1000
  journal_flush_disabled:       0
  journal_reclaim_delay:        100
  journal_transaction_names:    1

members (size 512):
  Device:                       0
    UUID:                       5580a092-9fe8-4ae7-9ebb-96365e620522
    Size:                       2G
    Bucket size:                256k
    First bucket:               0
    Buckets:                    8192
    Last mount:                 (never)
    State:                      rw
    Group:                      hdd1 (1)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Discard:                    1
    Freespace initialized:      0
  Device:                       1
    UUID:                       962562ea-7cd4-4e66-989b-c7098e8b5e73
    Size:                       2G
    Bucket size:                256k
    First bucket:               0
    Buckets:                    8192
    Last mount:                 (never)
    State:                      rw
    Group:                      hdd2 (2)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Discard:                    1
    Freespace initialized:      0
  Device:                       2
    UUID:                       e5e21ba5-fcfc-4857-9f4f-47b70cdd9597
    Size:                       5G
    Bucket size:                256k
    First bucket:               0
    Buckets:                    20480
    Last mount:                 (never)
    State:                      rw
    Group:                      hdd3 (3)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Discard:                    1
    Freespace initialized:      0
  Device:                       3
    UUID:                       b165f084-bf83-4cca-86cf-b0a8e5467ad5
    Size:                       5G
    Bucket size:                256k
    First bucket:               0
    Buckets:                    20480
    Last mount:                 (never)
    State:                      rw
    Group:                      hdd4 (4)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Discard:                    1
    Freespace initialized:      0
  Device:                       4
    UUID:                       9bf41b82-3d49-4dbc-ae7f-4554c360a79b
    Size:                       8G
    Bucket size:                256k
    First bucket:               0
    Buckets:                    32768
    Last mount:                 (never)
    State:                      rw
    Group:                      hdd5 (5)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Discard:                    1
    Freespace initialized:      0
  Device:                       5
    UUID:                       acbea8d1-6b32-4d0d-8600-ec28f2acd426
    Size:                       10G
    Bucket size:                256k
    First bucket:               0
    Buckets:                    40960
    Last mount:                 (never)
    State:                      rw
    Group:                      hdd6 (6)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Discard:                    1
    Freespace initialized:      0
  Device:                       6
    UUID:                       3da85ced-9236-48e0-b3df-e20b9f76bce1
    Size:                       10G
    Bucket size:                256k
    First bucket:               0
    Buckets:                    40960
    Last mount:                 (never)
    State:                      rw
    Group:                      hdd7 (7)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Discard:                    1
    Freespace initialized:      0
  Device:                       7
    UUID:                       24813df6-7302-4a2f-8f4f-2d8c0047d2e9
    Size:                       1G
    Bucket size:                128k
    First bucket:               0
    Buckets:                    8192
    Last mount:                 (never)
    State:                      rw
    Group:                      ssd1 (9)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Discard:                    1
    Freespace initialized:      0
  Device:                       8
    UUID:                       d196ba45-1a1d-4e37-a9b3-d155f1dae703
    Size:                       1G
    Bucket size:                128k
    First bucket:               0
    Buckets:                    8192
    Last mount:                 (never)
    State:                      rw
    Group:                      ssd2 (10)
    Data allowed:               journal,btree,user

Configure fstab and mount the filesystem

Add a line to /etc/fstab with the Global UUID as mentioned after create along with a mountpoint, giving filesystem type as bcachefs.sh to use a custom script to mount it.

Note that the syntax above does not use the UUID=xxxx as is normal for other filesystems.

58040b4b-f0f1-4755-b7c8-7955261eb132            /bcachefs       bcachefs.sh     defaults        0       0

The script is distributed with bcachefs-tools. Mount it and check

# mount -a
# df -hT /bcachefs/
Filesystem                                                                       Type      Size  Used Avail Use% Mounted on
/dev/sda:/dev/sdb:/dev/sdf:/dev/sdg:/dev/sdh:/dev/sdj:/dev/sdi:/dev/sdc:/dev/sdd bcachefs   41G   17G   24G  41% /bcachefs

RAID-5 like test

bcache format -help tells me

--erasure_code Enable erasure coding (DO NOT USE YET)

But someone on the Internet (specifically someone on the #bcache IRC channel at irc.oftc.net) told me to do so anyway, so here we go

root@raidtest:~# bcachefs format --erasure_code --replicas=2 /dev/sd[hij]
External UUID:			e99690cd-08c2-4bfe-b649-22dfdfab0db6
Internal UUID:			2a62bd0c-fa9c-48d7-b8bd-bb8acdae04de
Device index:			2
Label:
Version:			15
Oldest version on disk:		15
Created:			Tue Sep 28 23:38:44 2021
Squence number:			0
Block_size:			512
Btree node size:		256.0K
Error action:			ro
Clean:				0
Features:			new_siphash,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:
Metadata replicas:		2
Data replicas:			2
Metadata checksum type:		crc32c (1)
Data checksum type:		crc32c (1)
Compression type:		none (0)
Foreground write target:	none
Background write target:	none
Promote target:			none
Metadata target:                none
String hash type:		siphash (2)
32 bit inodes:			1
GC reserve percentage:		8%
Root reserve percentage:	0%
Devices:			3 live, 3 total
Sections:			members
Superblock size:		928

Members (size 176):
  Device 0:
    UUID:			c0609354-2b1c-4a2e-a8fc-fea9ad04b487
    Size:			16.0G
    Bucket size:		256.0K
    First bucket:		0
    Buckets:			65536
    Last mount:			(never)
    State:			rw
    Group:			(none)
    Data allowed:		journal,btree,user
    Has data:			(none)
    Replacement policy:		lru
    Discard:			0
  Device 1:
    UUID:			e5fa513f-ec29-486d-bad6-3395ba265825
    Size:			16.0G
    Bucket size:		256.0K
    First bucket:		0
    Buckets:			65536
    Last mount:			(never)
    State:			rw
    Group:			(none)
    Data allowed:		journal,btree,user
    Has data:			(none)
    Replacement policy:		lru
    Discard:			0
  Device 2:
    UUID:			f60cee55-dc13-484c-8691-9b02d3b28f9a
    Size:			16.0G
    Bucket size:		256.0K
    First bucket:		0
    Buckets:			65536
    Last mount:			(never)
    State:			rw
    Group:			(none)
    Data allowed:		journal,btree,user
    Has data:			(none)
    Replacement policy:		lru
    Discard:			0
initializing new filesystem
going read-write
mounted with opts: metadata_replicas=2,data_replicas=2,erasure_code
root@raidtest:~#

Mount it?

Now, good, we have a nice and stable filesystem, I think, so let's mount it. The syntax used here is quite peculiar, but we may get used to it some day

# mount -t bcachefs /dev/vdh:/dev/vdi:/dev/vdj /mnt

df now reports the filesystem being 44GB, it should be 48GB gross, but hell. I copy in 3.4GB and it now shows 8.7GB used. Perhaps I gave it too many replicas? Should the erasure code be sufficient without two replias? I don't know. We'll see. I'm tired zzzzz