Bcachefs: Difference between revisions
Line 30: | Line 30: | ||
== Replicated setup == | == Replicated setup == | ||
Using a small VM on KVM/libvirt, we'll setup an initial test on vdisks | |||
{| class="wikitable" style="margin:auto" | |||
|+ vDisks | |||
|- | |||
! Device name !! Size !! Storage backend | |||
|- | |||
| /dev/sda || 2GB || HDD | |||
|- | |||
| /dev/sdb || 2GB || HDD | |||
|- | |||
| /dev/sdc || 1GB || SSD | |||
|- | |||
| /dev/sdd || 1GB || SSD | |||
|- | |||
| /dev/sde || 1GB || HDD | |||
|- | |||
| /dev/sdg || 1GB || HDD | |||
|- | |||
| /dev/sdh || 8GB || HDD | |||
|- | |||
| /dev/sdj || 10GB || HDD | |||
|- | |||
| /dev/sdj || 10GB || HDD | |||
|} | |||
<pre style="color: grey"> | |||
sda 8:0 0 2G 0 disk | |||
sdb 8:16 0 2G 0 disk | |||
sdc 8:32 0 1G 0 disk | |||
sdd 8:48 0 1G 0 disk | |||
sde 8:64 0 5G 0 disk | |||
sdg 8:96 0 5G 0 disk | |||
sdh 8:112 0 8G 0 disk | |||
sdi 8:128 0 10G 0 disk | |||
sdj 8:144 0 10G 0 disk | |||
<pre style="color: grey"> | |||
bcachefs format --compression=lz4 \ | bcachefs format --compression=lz4 \ | ||
--encrypted \ | --encrypted \ | ||
Line 43: | Line 81: | ||
--promote_target=ssd \ | --promote_target=ssd \ | ||
--background_target=hdd | --background_target=hdd | ||
# mount -t bcachefs /dev/vdh:/dev/vdi:/dev/vdj /mnt | |||
</pre> | |||
== RAID-5 like test == | == RAID-5 like test == |
Revision as of 17:11, 29 April 2022
The bcachefs filesystem started out as bcache a blocklevel caching (hence the name) for Linux to allow for faster disks (SSD/NVME) caching slower disks (HDD). At some point, they found out it was almost a filesystem already, so they set off to make it one, adding fs at the end.
Installation
Kernel
The filesystem is currently in development and hasn't been accepted into the linux source tree. It is available for download from their site and from there can be compiled up with the kernel it comes with (5.13.0 at the time of writing). git pull the kernel out and enter the directory with it. Then cp /boot/config-$( uname -r ) .config and then, edit the .config file and look for CONFIG_SYSTEM_TRUSTED_KEYS="debian/certs/debian-uefi-certs.pem" (or similar if on another distro than Debian). Comment this out (and probably the lines above, they'll be re-added automatically, but without that .pem file, another will be generated instead). Exit and save and run make menuconfig (you'll need to install flex, bison and ncurses-dev first) and then enter Filesystems and enable bcachefs. Save and exit and type
# CPUCOUNT=$( lscpu | awk '/^CPU.s/ { print $2 }' ) # J=$(( $CPUCOUNT + 1 )) # make -j$J # make modules modules_install # make install
bcachefs-tools
Now, with the userspace tools you also pulled from git, bcachefs-tools, make sure to read the INSTALL file and install needed dependencies.
Make Debian package
This makes a package to install with apt/dpkg. This shoul
# apt install debhelper # cd /path/to/bcachefs-tools # dpkg-buildpackage
Directly from source
# make all install
Replicated setup
Using a small VM on KVM/libvirt, we'll setup an initial test on vdisks
Device name | Size | Storage backend |
---|---|---|
/dev/sda | 2GB | HDD |
/dev/sdb | 2GB | HDD |
/dev/sdc | 1GB | SSD |
/dev/sdd | 1GB | SSD |
/dev/sde | 1GB | HDD |
/dev/sdg | 1GB | HDD |
/dev/sdh | 8GB | HDD |
/dev/sdj | 10GB | HDD |
/dev/sdj | 10GB | HDD |
sda 8:0 0 2G 0 disk sdb 8:16 0 2G 0 disk sdc 8:32 0 1G 0 disk sdd 8:48 0 1G 0 disk sde 8:64 0 5G 0 disk sdg 8:96 0 5G 0 disk sdh 8:112 0 8G 0 disk sdi 8:128 0 10G 0 disk sdj 8:144 0 10G 0 disk <pre style="color: grey"> bcachefs format --compression=lz4 \ --encrypted \ --replicas=2 \ --label=ssd.ssd1 /dev/sda \ --label=ssd.ssd2 /dev/sdb \ --label=hdd.hdd1 /dev/sdc \ --label=hdd.hdd2 /dev/sdd \ --label=hdd.hdd3 /dev/sde \ --label=hdd.hdd4 /dev/sdf \ --foreground_target=ssd \ --promote_target=ssd \ --background_target=hdd # mount -t bcachefs /dev/vdh:/dev/vdi:/dev/vdj /mnt
RAID-5 like test
bcache format -help tells me
--erasure_code Enable erasure coding (DO NOT USE YET)
But someone on the Internet (specifically someone on the #bcache IRC channel at irc.oftc.net) told me to do so anyway, so here we go
root@raidtest:~# bcachefs format --erasure_code --replicas=2 /dev/sd[hij] External UUID: e99690cd-08c2-4bfe-b649-22dfdfab0db6 Internal UUID: 2a62bd0c-fa9c-48d7-b8bd-bb8acdae04de Device index: 2 Label: Version: 15 Oldest version on disk: 15 Created: Tue Sep 28 23:38:44 2021 Squence number: 0 Block_size: 512 Btree node size: 256.0K Error action: ro Clean: 0 Features: new_siphash,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes Compat features: Metadata replicas: 2 Data replicas: 2 Metadata checksum type: crc32c (1) Data checksum type: crc32c (1) Compression type: none (0) Foreground write target: none Background write target: none Promote target: none Metadata target: none String hash type: siphash (2) 32 bit inodes: 1 GC reserve percentage: 8% Root reserve percentage: 0% Devices: 3 live, 3 total Sections: members Superblock size: 928 Members (size 176): Device 0: UUID: c0609354-2b1c-4a2e-a8fc-fea9ad04b487 Size: 16.0G Bucket size: 256.0K First bucket: 0 Buckets: 65536 Last mount: (never) State: rw Group: (none) Data allowed: journal,btree,user Has data: (none) Replacement policy: lru Discard: 0 Device 1: UUID: e5fa513f-ec29-486d-bad6-3395ba265825 Size: 16.0G Bucket size: 256.0K First bucket: 0 Buckets: 65536 Last mount: (never) State: rw Group: (none) Data allowed: journal,btree,user Has data: (none) Replacement policy: lru Discard: 0 Device 2: UUID: f60cee55-dc13-484c-8691-9b02d3b28f9a Size: 16.0G Bucket size: 256.0K First bucket: 0 Buckets: 65536 Last mount: (never) State: rw Group: (none) Data allowed: journal,btree,user Has data: (none) Replacement policy: lru Discard: 0 initializing new filesystem going read-write mounted with opts: metadata_replicas=2,data_replicas=2,erasure_code root@raidtest:~#
Mount it?
Now, good, we have a nice and stable filesystem, I think, so let's mount it. The syntax used here is quite peculiar, but we may get used to it some day
# mount -t bcachefs /dev/vdh:/dev/vdi:/dev/vdj /mnt
df now reports the filesystem being 44GB, it should be 48GB gross, but hell. I copy in 3.4GB and it now shows 8.7GB used. Perhaps I gave it too many replicas? Should the erasure code be sufficient without two replias? I don't know. We'll see. I'm tired zzzzz