Filesystems.

This commit is contained in:
Philippe Pittoli 2025-02-28 00:28:02 +01:00
parent c04104dce1
commit bed189deba

View file

@ -1063,17 +1063,14 @@ These new triggers could record user-defined procedures to perform database veri
.BULLET .BULLET
.B Isolation .B Isolation
is partially taken into account with a locking mechanism preventing race conditions when modifying a value. is partially taken into account with a locking mechanism preventing race conditions when modifying a value.
This may be seen as simplistic but This may be seen as simplistic but good enough for most applications.
.SHINE "good enough"
for most applications.
.BULLET .BULLET
.B Durability .B Durability
is taken into account. is taken into account.
Data is written on disk each time it changes. Data is written on disk each time it changes.
Again, this is basic but Data checksums are delegated to the filesystem or external tools.
.SHINE "good enough" Again, this is basic but good enough for most applications.
for most applications.
A future improvement could be to write a checksum for every file to detect corrupt data, but this overlaps with some filesystems which already provide this feature. A future improvement could be to write a checksum for every file to detect corrupt data, but this overlaps with some filesystems which already provide this feature.
.ENDBULLET .ENDBULLET
@ -1149,6 +1146,8 @@ Traditional databases can be managed through command lines or a dedicated shell,
DODB cannot, for the very same reason it came into existence: enabling this kind of tooling implies an enormous amount of code and complexity, obfuscating core database operations that should be both understandable and customizable. DODB cannot, for the very same reason it came into existence: enabling this kind of tooling implies an enormous amount of code and complexity, obfuscating core database operations that should be both understandable and customizable.
.KE .KE
.ENDBULLET .ENDBULLET
In conclusion, the "missing" features are either irrelevant in the context of DODB or simple enough to implement and customize to one's needs.
. .
.SS "The state of file systems, their limitations and useful features for DODB instances" .SS "The state of file systems, their limitations and useful features for DODB instances"
A A
@ -1184,7 +1183,7 @@ called
the files are on the disk, their size, the last time they were modified or accessed and other metadata. the files are on the disk, their size, the last time they were modified or accessed and other metadata.
A filesystem is split into a list of A filesystem is split into a list of
.I blocks .I blocks
of a certain size\*[*] (4 kilobytes by default). of a certain size\*[*] (4 kilobytes by default on ext4).
.FOOTNOTE1 .FOOTNOTE1
Working only with blocks (from 0 to x) is called Working only with blocks (from 0 to x) is called
.dq "Logical block addressing" . .dq "Logical block addressing" .
@ -1243,7 +1242,7 @@ So, worst case scenario, data rate is
.FRAC 1 4000 .FRAC 1 4000
(huge waste) meaning that a 1GB of data would require an entire 4TB hard drive\*[*] (without even taking the inodes' size into account). (huge waste) meaning that a 1GB of data would require an entire 4TB hard drive\*[*] (without even taking the inodes' size into account).
.FOOTNOTE1 .FOOTNOTE1
Ext4 can integrate up to 60 bytes of data into an inode. To slightly mitigate this, ext4 can integrate up to 60 bytes of data into an inode.
.FOOTNOTE2 .FOOTNOTE2
.KS .KS
@ -1286,7 +1285,7 @@ Filesystems can also be distributed with some replication in order to provide a
.KS .KS
.B "UnionFS" . .B "UnionFS" .
UnionFS (and its variants) is a filesystem enabling several filesystems to be mounted on the same mount-point and to show superposed contents, enabling a read-only base image to be used together with persistent data for a specific instance. UnionFS (and its variants) is a filesystem enabling several filesystems to be mounted on the same mount-point and to show overlapping contents, enabling a read-only base image to be used together with persistent data for a specific instance.
This way, a This way, a
.dq "live-cd image" .dq "live-cd image"
for an operating system can become persistent by storing modifications on an usb stick. for an operating system can become persistent by storing modifications on an usb stick.
@ -1294,8 +1293,8 @@ for an operating system can become persistent by storing modifications on an usb
UnionFS is a copy-on-write snapshotting filesystem on top of other filesystems. UnionFS is a copy-on-write snapshotting filesystem on top of other filesystems.
Docker uses it to save space. Docker uses it to save space.
Docker provides different ready-to-run software as small virtual machines. Since the different software that Docker provides are ready-to-run virtual machines, a base OS image is shared amongst all instances so each instance only stores its own specific files (binaries, configuration and dependencies) written in a separate storage volume.
To preserve storage space, a base OS image is shared amongst all instances and each instance only stores its own specific files (binaries, configuration and dependencies) written in a separate storage volume. Thus, despite each software distribution requiring an entire operating system environment, the storage volume is kept reasonable.
.KS .KS
.B "Archivemount" . .B "Archivemount" .
@ -1314,8 +1313,8 @@ creates a block file based on a chunk of RAM that needs to be formated then moun
.B ramfs .B ramfs
mounts directly a RAM-based filesystem, without the need to format a fake partition. mounts directly a RAM-based filesystem, without the need to format a fake partition.
Finally, Finally,
.B tmpfs .B tmpfs ,
is the more flexible one, it is used as ramfs but can be resized and only uses a necessary amount of RAM at a given point (memory is free'd once a file is removed). the more flexible option, is used as ramfs but can be resized and only uses a necessary amount of RAM at a given point since memory is free'd once a file is removed.
.FOOTNOTE2 .FOOTNOTE2
.KS .KS
@ -1327,6 +1326,12 @@ As a side effect, searching for a file in this context can be done by computing
Well well well… doesn't that sound like the DODB tag triggers? Well well well… doesn't that sound like the DODB tag triggers?
As if databases and filesystems were intertwined somehow… As if databases and filesystems were intertwined somehow…
.FOOTNOTE2 .FOOTNOTE2
.KS
.B "And many more" !
Other specific filesystems may not be widespread like the ones mentioned above but they exist and are as exotic as the constraints in which they evolve.
.KE
.
.KS .KS
.SSS "Quick comparison between DBMSs and filesystems" .SSS "Quick comparison between DBMSs and filesystems"
The following table shows the proximity between famous database systems and ordinary filesystems, both sharing a lot of features despite very different approaches. The following table shows the proximity between famous database systems and ordinary filesystems, both sharing a lot of features despite very different approaches.
@ -1345,19 +1350,20 @@ allbox tab(:);
c | c | c c | c | c
cw(\n[col1]u) | lw(\n[col2]u) | lw(\n[col3]u). cw(\n[col1]u) | lw(\n[col2]u) | lw(\n[col3]u).
Feature : DBMS : Filesystems Feature : DBMS : Filesystems
CRUD operations : SQL :files & directories CRUD operations : \*[OK] SQL :\*[OK] files & directories
Atomicity : \*[OK] :T{ Atomicity : \*[OK] :T{
locking mechanism based on files \*[OK] locking mechanism based on files
T} T}
Consistency : \*[OK] : \*[NOK] besides very specific filesystems Consistency : \*[OK] :\*[OK] in specific filesystems (the kernel-related ones for example)
Isolation : \*[OK] :T{ Isolation : \*[OK] :T{
\*[OK]
.dq "new file then mv" .dq "new file then mv"
technique\*[*] technique\*[*]
T} T}
Durability : \*[OK] :limited (checksums) Durability : \*[OK] :\*[OK] checksums
Access Time : 0.1 to 2ms :a few µs (cache) to a few ms (first access with a hard disk) Access Time : 0.1 to 2ms :a few µs (cache) to a few ms (first access with a hard disk)
High avail. : \*[OK] :T{ High avail. : \*[OK] :T{
RAID & variants plus many distributed or cluster filesystems \*[OK] RAID & variants plus many distributed or cluster filesystems
T} T}
Transactions : \*[OK] :T{ Transactions : \*[OK] :T{
\*[OK] in a few filesystems (BTRFS, ZFS) \*[OK] in a few filesystems (BTRFS, ZFS)
@ -1366,13 +1372,14 @@ Replication : \*[OK] :T{
\*[OK] in many filesystems (BTRFS, ZFS, ClusterFS, etc.) \*[OK] in many filesystems (BTRFS, ZFS, ClusterFS, etc.)
T} T}
Performance : \*[OK] :T{ Performance : \*[OK] :T{
B-trees and variants (used in all modern FS: BTRFS, ext4, Raiserfs4, NTFS, HAMMER…) are used to search data on the storage device but also to get an entry in a huge directory. \*[OK] B-trees and variants (used in all modern FS: BTRFS, ext4, Raiserfs4, NTFS, HAMMER…) are used to search data on the storage device but also to get an entry in a huge directory
T} T}
Space waste :T{ Space waste :T{
almost none .ps -2
\*[OK] almost none
.ps .ps
T}:T{ T}:T{
depends on many factors, but generally important on small data \*[NOK] generally important on small data (there is a room for improvement), that's why just to mimic relational databases doesn't work well with current filesystem inner workings, but document-oriented databases (having a whole set of related data in a single file) make sense
T} T}
.TE .TE
.FOOTNOTE1 .FOOTNOTE1
@ -1383,12 +1390,12 @@ However, considering a data management library, this method to ensure data integ
This table shows an overview of some (mostly shared) DBMSs and filesystems features. This table shows an overview of some (mostly shared) DBMSs and filesystems features.
Real deployments may involve a whole range of tools, including a mix of both of these solutions. Real deployments may involve a whole range of tools, including a mix of both of these solutions.
For example, key-value databases can be used as DBMSs' cache to massively speed data retrieval up. For example, key-value databases often are used as DBMSs' cache to massively speed data retrieval up.
The main difference between DBMSs and filesystems is the The main difference between DBMSs and filesystems is the
.I consistency .I consistency
property. property.
Filesystems are almost exclusively built to store undefined streams of data with a very wide range of different shapes (plain text, multimedia, documents, etc.) and sizes (from empty to multiple terabytes and more), thus no consistency verification can be reasonably implemented. Filesystems are almost exclusively built to store undefined streams of data with a very wide range of different shapes (plain text, multimedia, documents, etc.) and sizes (from empty to multiple terabytes and more), thus no consistency verification can be reasonably implemented outside very specific contexts (such as kernel-related filesystems).
. .
. .
.KS .KS