Filesystem introduction.

This commit is contained in:
Philippe Pittoli 2025-04-12 21:02:27 +02:00
parent a9cc12e096
commit 41cdefef46
2 changed files with 62 additions and 59 deletions

View file

@ -29,5 +29,5 @@ subtexth = 0.2
"(actually stores data)" at TXTDEV + (0, -subtexth)
"files and" "directories" at 1/2 between FS.e and FS.w + (0, h)
"logical blocks" "ex: 0x4b1f" at 1/2 between DRI.e and DRI.w + (0, h)
"physical blocks" "ex: Cylinder 12," "Head 3, Sector 33" at 1/2 between DEV.e and DEV.w + (0, h)
"logical blocks" "ex: 0x10" at 1/2 between DRI.e and DRI.w + (0, h)
"physical blocks, AHCI commands" "ex: READ 0x2058 (LBA)" at 1/2 between DEV.e and DEV.w + (0, h)

View file

@ -1159,7 +1159,7 @@ A
.dq filesystem
is the code responsible for the way data and meta-data will be written on a storage device, which basically is some sort of low-level CRUD operations.
This code links the user interface (files and directories) with the device drivers, which finally write bytes on a hard drive for example.
The next paragraphs will give an idea of how filesystems work, the implied limitations regarding DODB\*[*] as it uses filesystems in an overtly naive way and the filesystems' features DODB instances could use for better data management.
This section will give an idea of how filesystems work, the implied limitations regarding DODB\*[*] as it uses filesystems in an overtly naive way and the filesystems' features DODB instances could use for better data management.
.FOOTNOTE1
Explaining the way filesystem work and their design is out of the scope of this document, so this part will be kept short for readability reasons.
.FOOTNOTE2
@ -1169,38 +1169,37 @@ Explaining the way filesystem work and their design is out of the scope of this
.
Modern computing systems rely on storage devices to persistently retain data, even when power is turned off.
The two most common types of storage media are Hard Disk Drives (HDDs) and Solid-State Drives (SSDs), each with distinct characteristics.
A filesystem is a software layer that structures data into files and directories, providing mechanisms for storage, retrieval, and management from a user perspective.
Thus, the filesystem abstracts the physical storage details, allowing users and applications to interact with data through logical operations (e.g., open, read, write) rather than dealing with low-level interactions with the storage devices.
The following paragraphs will explain the relation between filesystems, device drivers and storage devices.
Users do not write data directly on these devices, instead they use a
.dq filesystem ,
a software layer that structures data into files and directories, providing mechanisms for data storage from a user perspective.
Drivers bind both storage devices and filesystems by handling device-specific operations.
The following paragraphs will further explain the relation between filesystems, device drivers and storage devices.
From a logical perspective, HDDs and SSDs appear as linear arrays of fixed-size blocks (typically 512 bytes or 4 KiB), akin to a vast sequence of memory cells that can be read or written individually.
This abstraction allows higher-level software (filesystems) to interact with storage uniformly, regardless of the underlying hardware.
For example, a file system might request "block 0x42A1" without needing to know whether it resides on an HDDs magnetic platter or an SSDs NAND chip.
However, the physical reality of these devices is far more complex.
The operating systems driver layer bridges the gap between filesystems and devices: it translates block requests from the filesystem into physical block addresses\*[*] and device-specific commands (AHCI).
.FOOTNOTE1
Filesystems indicates block numbers starting from the partition whereas devices require block numbers starting from the actual device's first block.
There may be an offset due to a partition table on the disk for example.
.FOOTNOTE2
Drivers can also introduce optimizations, for SSDs it could mean remapping blocks to distribute wear.
Crucially, this complexity is hidden from the file system… well, almost.
Despite the apparent separation of duties between filesystems, drivers and devices, the physical reality is complex and so are the ramifications.
HDDs store data on spinning platters divided into tracks and sectors, with read/write heads moving mechanically to access specific locations.
This introduces latency due to seek times and rotational delays.
This introduces latency due to seek times and rotational delays, and this problem can be amplified for example when requesting blocks that are at opposite sides of the disk.
Meanwhile, SSDs use arrays of flash memory cells organized into pages and blocks, with no moving parts but constrained by erase-before-write cycles and wear-leveling requirements.
The operating systems driver layer bridges this gap: it translates logical block requests into device-specific commands.
For HDDs, this might involve calculating cylinder-head-sector (CHS) geometries; for SSDs, it could mean remapping blocks to distribute wear.
Crucially, this complexity is hidden from the file system, which operates on the logical block abstraction.
However, raw storage devices and device drivers cannot organize data efficiently all by themselves.
Filesystems are designed to overcome some physical limitations of the devices, for example to reduce seek times in a hard drive by distributing data in a
Thus, filesystems need to be designed to overcome some physical limitations of the devices, for example to reduce seek times in a hard drive by distributing data in a
.dq packed
way that minimizes the disk's head movements.
Thus, filesystems, drivers and devices are intertwined and share responsability for the performances and the device durability even though they work at different levels.
way (avoiding fragmentation) that minimizes the disk's head movements.
Raw storage devices and device drivers cannot organize data efficiently all by themselves.
Therefore, filesystems, drivers and devices are intertwined and share responsability for the performances and the device durability even though they work at different levels.
The file system builds upon logical blocks of data, adding structure and hierarchy (files and directories) to raw blocks.
From the user perspective, the file system brings some semantics.
A filesystem manages:
.STARTBULLET
.BULLET organization: mapping files and directories to sequences of blocks;
.BULLET metadata: tracking ownership, permissions, and block allocation;
.BULLET optimization: minimizing HDD seek times or SSD wear.
.ENDBULLET
To summarize, the filesystem builds upon logical blocks of data, adding structure and hierarchy (files and directories) to raw blocks.
The filesystem also manages metadata to track ownership, permissions and block allocation.
Moreover, the filesystem provides some optimizations such as minimizing HDD seek times or SSD wear.
The following figure summarizes the relations between filesystems, drivers and devices.
@ -1209,45 +1208,48 @@ reset
copy "filesystem-driver-device.pic"
.PE
.QP
This figure is a simplification, real-life device drivers do not use Cylinder-Head-Sector (CHS) scheme anymore.
The complexity of handling actual head movements shifted towards the devices.
Users interact with the filesystem through files and directories (along with some metadata over them).
The filesystem handles this abstract view and maps it to its own representation of the disk (the list of blocks it manages under-the-hood).
The filesystem requests operations on blocks to the driver which translates the block number (starting from the partition) into the physical block number\*[*] (starting from the first block of the device).
Finally, the driver sends AHCI commands to request the devices.
.QE
Most filesystems have some
.dq "special files"
called
.I inodes
(hidden from the user) to keep track of
.I where
the files are on the disk, their size, the last time they were modified or accessed and other metadata.
A filesystem is split into a list of
.I blocks
of a certain size\*[*] (4 kilobytes by default on ext4).
.FOOTNOTE1
Working only with blocks (from 0 to x) is called
.dq "Logical block addressing" .
Before that, other schemes were used such as
.I cylinder-head-sector
(CHS) but this is fairly obsolete since even hard disks do not use this anymore.
.FOOTNOTE2
Since all files cannot be reasonably expected to be written in a continuous segment of data, inodes store the block numbers where the file has been written.
Filesystems may enable to tweak the block size (related to the
.I "sector"
size of the storage device) either to reduce fragmentation and metadata (bigger block sizes to partitions with big files) or to avoid wasting space (smaller block sizes to partitions with a huge number of files under the size of a block).
Filesystems designed for specific constraints, such as writing data on a compact disk\*[*] or providing a network filesystem, are out-of-scope of this document.
.FOOTNOTE1
A compact disk has specific constraints since the device will then only provide read-only access to the data, obviating the need for most of the complexity revolving around fragmentation, inode management and so on.
All storage devices have their own particularities, but regular hard drives and solid-state drives are the important ones for this discussion since filesystems have mostly been designed for them.
.FOOTNOTE2
The rest of this section will address more
.dq generic
filesystems\*[*] unless explicitely stated otherwise.
.FOOTNOTE1
Furthermore, the rich history behind filesystems is inherently related to the rich history of storage devices, this document is not supposed to be a survey on either of those.
Let's keep it short and simple.
(CHS) but this is fairly obsolete, hard disks do not use this anymore.
The complexity of handling actual head movements shifted towards the devices.
.FOOTNOTE2
Finally, file's metadata are stored in a
.dq "special file"
called
.I inode
(hidden from the user) to keep track of
.I where
the file is stored on the disk (its block addresses\*[*]), its size, the last time it was modified or accessed, etc.
.FOOTNOTE1
Since the filesystem's block size is rather small (4 kilobytes by default on ext4), files cannot be reasonably expected to be written in a single block.
Furthermore, files are often written in a non-continuous sequence of blocks to avoid losing space.
Thus, inodes store the block numbers where the file has been written.
.FOOTNOTE2
Filesystems may enable to tweak the block size either to reduce fragmentation and metadata (bigger block sizes for partitions with big files) or to avoid wasting space (smaller block sizes for partitions with lots of small files).
Also, the block size ultimately imposes a limit on the possible number of files on a partition.
.
.\" Filesystems designed for specific constraints, such as writing data on a compact disk\*[*] or providing a network filesystem, are out-of-scope of this document.
.\" .FOOTNOTE1
.\" A compact disk has specific constraints since the device will then only provide read-only access to the data, obviating the need for most of the complexity revolving around fragmentation, inode management and so on.
.\" All storage devices have their own particularities, but regular hard drives and solid-state drives are the important ones for this discussion since filesystems have mostly been designed for them.
.\" .FOOTNOTE2
.\" The rest of this section will address more
.\" .dq generic
.\" filesystems\*[*] unless explicitely stated otherwise.
.\" .FOOTNOTE1
.\" Furthermore, the rich history behind filesystems is inherently related to the rich history of storage devices, this document is not supposed to be a survey on either of those.
.\" Let's keep it short and simple.
.\" .FOOTNOTE2
.
.SSS "Objectives of a filesystem"
Filesystems share a (loosely) common set of objectives.
@ -1442,10 +1444,11 @@ In a desktop environment this technique isn't viable, users usually just rewrite
However, considering a data management library, this method to ensure data integrity is a no-brainer.
.FOOTNOTE2
.KE
.QP
This table shows an overview of some (mostly shared) DBMSs and filesystems features.
Real deployments may involve a whole range of tools, including a mix of both of these solutions.
For example, key-value databases often are used as DBMSs' cache to massively speed data retrieval up.
.QE
The main difference between DBMSs and filesystems is the
.I consistency
@ -1671,7 +1674,7 @@ This section presents all the features I want to see in a future version of the
.SS New types of storage facility
The Log-Structured Merge-Tree algorithm is interesting for databases with intensive updates.
Database modifications are deferred to be written sequentially, greatly improving the throughput.
Implementing this algorithm in DODB (while still keeping an eye on the code complexity) could open new possibilities, bringing DODB to a new class of usage.
Adding an alternative storage facility in DODB implementing this algorithm could open new possibilities, bringing DODB to a new class of usage.
Storing data in separate files as it's currently done is great in many aspects but becomes cumbersome with large databases.
One way to enable large databases in DODB could be to add a new storage class which works differently, but would inevitably introduce complexity.