More exotic filesystems.

This commit is contained in:
Philippe Pittoli 2025-02-07 04:43:59 +01:00
parent 9c98dd33ce
commit a10013f6a4

View file

@ -1216,20 +1216,27 @@ So, worst case scenario, data rate is
.FRAC 1 4000 .FRAC 1 4000
(huge waste) meaning that a 1GB of data would require an entire 4TB hard drive\*[*] (without even taking the inodes' size into account). (huge waste) meaning that a 1GB of data would require an entire 4TB hard drive\*[*] (without even taking the inodes' size into account).
.FOOTNOTE1 .FOOTNOTE1
Ext4 can integrate up to 60 bytes of data into an extended inode. Ext4 can integrate up to 60 bytes of data into an inode.
.TBD
.FOOTNOTE2 .FOOTNOTE2
.KS .KS
.BULLET .BULLET
.B "Miscealeneous and advanced features" . .B "Miscellaneous and advanced features" .
A few other features need to be mentionned, such as block suballocation, file content included in the inode, etc. A few other features need to be mentioned, such as block suballocation\*[*], data inclusion in unused inode space\*[*] and compression for instance.
Some filesystems added more than a decade ago then under-explored features such as snapshots, compression and transactions. Along with more advanced features such as snapshotting and transactions, they all represent incremental improvements of filesystems made over the years and which are now stable and available for the many.
.KE .KE
.ENDBULLET .ENDBULLET
.FOOTNOTE1
Block suballocation enables to save some space by putting data from two different files in a single underused block.
.FOOTNOTE2
.FOOTNOTE1
An inode size may be bigger than what is only needed to index and retrieve a file, inodes can store extended file attributes and such.
In case this space isn't used for metadata, some filesystems enables to use it for file data directly for very small files (up to a few dozen bytes in ext4), reducing disk space and redirections.
.FOOTNOTE2
. .
.KS .KS
.SSS "Quick comparison between DBMSs and filesystems" .SSS "Quick comparison between DBMSs and filesystems"
The following table shows the proximity between famous database systems and ordinary filesystems, both sharing a lot of features despite very different approaches.
.ds OK \[OK] .ds OK \[OK]
.ds NOK \[tmu] .ds NOK \[tmu]
.nr total 16.0c .nr total 16.0c
@ -1257,19 +1264,22 @@ T}
Durability : \*[OK] :limited (checksums) Durability : \*[OK] :limited (checksums)
Access Time : 0.1 to 2ms :a few µs (cache) to a few ms (first access with a hard disk) Access Time : 0.1 to 2ms :a few µs (cache) to a few ms (first access with a hard disk)
High avail. : \*[OK] :T{ High avail. : \*[OK] :T{
RAID & variants RAID & variants plus many distributed or cluster filesystems
T} T}
Transactions : \*[OK] :T{ Transactions : \*[OK] :T{
implemented in a few filesystems (BTRFS, ZFS) \*[OK] in a few filesystems (BTRFS, ZFS)
T}
Replication : \*[OK] :T{
\*[OK] in many filesystems (BTRFS, ZFS, ClusterFS, etc.)
T} T}
Performance : \*[OK] :T{ Performance : \*[OK] :T{
B trees and variants (used in all modern FS: BTRFS, ext4, Raiserfs4, NTFS, HAMMER…) are used to search data on the storage device but also to get an entry in a huge directory. B-trees and variants (used in all modern FS: BTRFS, ext4, Raiserfs4, NTFS, HAMMER…) are used to search data on the storage device but also to get an entry in a huge directory.
T} T}
Space waste :T{ Space waste :T{
almost none almost none
.ps .ps
T}:T{ T}:T{
depends on many factors, but generally important depends on many factors, but generally important on small data
T} T}
.TE .TE
.FOOTNOTE1 .FOOTNOTE1
@ -1277,10 +1287,19 @@ In a desktop environment this technique isn't viable, users usually just rewrite
However, considering a data management library, this method to ensure data integrity is a no-brainer. However, considering a data management library, this method to ensure data integrity is a no-brainer.
.FOOTNOTE2 .FOOTNOTE2
.KE .KE
This table shows an overview of some (mostly shared) DBMSs and filesystems features.
Real deployments may involve a whole range of tools, including a mix of both of these solutions.
For example, key-value databases can be used as DBMSs' cache to massively speed data retrieval up.
The main difference between DBMSs and filesystems is the
.I consistency
property.
Filesystems are almost exclusively built to store undefined streams of data with a very wide range of different shapes (plain text, multimedia, documents, etc.) and sizes (from empty to multiple terabytes and more), thus no consistency verification can be reasonably implemented.
. .
.KS .KS
.SSS "Exotic filesystems" .SSS "Exotic filesystems"
Filesystems have been developed over the years for various requirements. Filesystems have been developed over the years for various reasons.
Let's browse for a moment to provide an overview of what is possible. Let's browse for a moment to provide an overview of what is possible.
.B Kernel-related . .B Kernel-related .
@ -1291,12 +1310,44 @@ A whole class of filesystems is dedicated to provide an interface to the kernel,
(to tweak a few device parameters) or even (to tweak a few device parameters) or even
.I debugfs .I debugfs
(to provide debug info from the kernel to user-space). (to provide debug info from the kernel to user-space).
Providing information about the running system and enabling its modification through simple files and directories is a direct
.dq "everything is a file"
UNIX legacy.
.B "Cluster, network, high-availability, distributed" … .B "Network-related" .
.\" Many filesystems aim to provide a network-accessible cluster of storage, Many filesystems were designed specifically to be remotely mounted, either to be shared amongst many people in a company, or to be part of a giant cluster to provide a high-availability storage solution for tech giants with peculiar requirements or just to stack ever more commodity computers together and provide a gigantic storage space.
.\" Beside Filesystems can also be distributed with some replication in order to provide a fault-tolerant storage with ordinary computers sharing unused space.
.\" Research on filesystems Beside well-known
.KE .KE
.KS
.B "UnionFS" .
UnionFS (and its variants) is a filesystem enabling several filesystems to be mounted on the same mount-point and to show superposed contents, enabling a read-only base image to be used together with persistent data for a specific instance.
This way, a
.dq "live-cd image"
for an operating system can become persistent by storing modifications on an usb stick.
UnionFS is a copy-on-write snapshotting filesystem on top of other filesystems.
.KE
.KS
.B "Archivemount" .
Mounting a compressed archive, enabling to use day-to-day tools to search for a file in an archive without the need to uncompress it.
.KE
.KS
.B "RAM-based filesystems" .
For temporary data, intensive read and write operations on a small storage volume or for filesystem development, a chunk of the computer memory can be used as a filesystem thanks to
.B tmpfs
and variants\*[*] (ramdisk and ramfs).
.KE
.FOOTNOTE1
.B ramdisk
creates a block file based on a chunk of RAM that needs to be formated then mounted as any partition.
.B ramfs
mounts directly a RAM-based filesystem, without the need to format a fake partition.
Finally,
.B tmpfs
is the more flexible one, it is used as ramfs but can be resized and only uses a necessary amount of RAM at a given point (memory is free'd once a file is removed).
.FOOTNOTE2
. .
.KS .KS
.SSS "Conclusion on filesystems" .SSS "Conclusion on filesystems"