Compare commits
10 commits
03ae174112
...
69fc674a2a
| Author | SHA1 | Date | |
|---|---|---|---|
| 69fc674a2a | |||
| cdfaf3a006 | |||
| bed189deba | |||
| c04104dce1 | |||
| 7e20de3d0e | |||
| a10013f6a4 | |||
| 9c98dd33ce | |||
| 5dbc282027 | |||
| f4ab8154f9 | |||
| 1c6ea90389 |
1 changed files with 384 additions and 64 deletions
448
paper/paper.ms
448
paper/paper.ms
|
|
@ -74,13 +74,16 @@ Document sync'ed with DODB \*[VERSION]
|
|||
.fi
|
||||
.br
|
||||
.po
|
||||
.B "Status of this document" .
|
||||
Although fairly advanced, this document lacks a few reviews, a bit of discussion about filesystems (section not entirely finished), to talk about alternatives to DODB and a final conclusion.
|
||||
|
||||
.SECTION Introduction to DODB
|
||||
A database consists in managing data, enabling queries to add, to retrieve, to modify and to delete a piece of information.
|
||||
These actions are grouped under the acronym CRUD: creation, retrieval, update and deletion.
|
||||
CRUD operations are the foundation for the most basic databases.
|
||||
Yet, almost every single database engine goes far beyond this minimalistic set of features.
|
||||
Of course, almost every single database engine goes far beyond this minimalistic set of features.
|
||||
|
||||
Although everyone using the filesystem of their computer as some sort of database (based on previous definition) by storing raw data (files) in a hierarchical manner (directories), computer science classes introduce a particularly convoluted way of managing data.
|
||||
Although everyone is using the filesystem of their computer as some sort of database (based on previous definition) by storing raw data (files) in a hierarchical manner (directories), computer science classes introduce a particularly convoluted way of managing data.
|
||||
Universities all around the world teach about Structured Query Language (SQL) and relational databases.
|
||||
These two concepts are closely interlinked and require a brief explanation.
|
||||
|
||||
|
|
@ -141,12 +144,13 @@ And so on.
|
|||
For many reasons, SQL is not a silver bullet to
|
||||
.I solve
|
||||
the database problem.
|
||||
The encountered difficulties mentioned above and the original objectives of SQL not being universal\*[*], other database designs were created\*[*].
|
||||
The encountered difficulties mentioned above and the original objectives of SQL not being universal\*[*],
|
||||
.FOOTNOTE1
|
||||
To say the least!
|
||||
Not everyone needs to let users access the database without going through the application.
|
||||
For instance, writing a \f[I]blog\f[] for a small event or to share small stories about your life doesn't require manual operations on the database, fortunately.
|
||||
.FOOTNOTE2
|
||||
other database designs were created\*[*].
|
||||
.FOOTNOTE1
|
||||
A lot of designs won't be mentioned here.
|
||||
The actual history of databases is often quite unclear since the categories of databases are sometimes vague, underspecified.
|
||||
|
|
@ -168,22 +172,45 @@ And that's exactly what is being done in Document Oriented DataBase (DODB).
|
|||
|
||||
.UL "The stated goal of DODB"
|
||||
is to provide a simple and easy-to-use
|
||||
.UL library
|
||||
for developers to perform CRUD operations on documents (undescribed data structures).
|
||||
DODB aims basic to medium-sized projects, up to a few million entries\*[*].
|
||||
.UL library \*[*]
|
||||
for developers to store documents (undescribed data structures).
|
||||
.FOOTNOTE1
|
||||
Or as people might call it:
|
||||
.dq "serverless architecture" .
|
||||
.FOOTNOTE2
|
||||
|
||||
.STARTBULLET
|
||||
.KS
|
||||
.BULLET
|
||||
.B Simple ,
|
||||
because the approach is indeed trivial: the database entries are written as simple files in a directory.
|
||||
This simplicity has a snowballing effect: it only requires a few dozen lines of code.
|
||||
DODB is implemented in only a thousand lines of code in total, despite including optional features and optimized alternative implementations to make the library efficient and cover most cases.
|
||||
.KE
|
||||
|
||||
DODB doesn't strive to be minimalistic, but it avoids intermediary language and low-level optimizations.
|
||||
Storing data is writing a file.
|
||||
Indexing data is making symbolic links.
|
||||
It is that simple.
|
||||
|
||||
.KS
|
||||
.BULLET
|
||||
.B Easy-to-use ,
|
||||
because the API is high-level and doesn't take any superflous parameter.
|
||||
Creating a database only requires a path, updating an entry only requires the new version of the entry, and so on.
|
||||
Everything is designed to be enjoyable for the developers.
|
||||
.KE
|
||||
.ENDBULLET
|
||||
|
||||
DODB aims for small and medium-size projects\*[*], up to a few hundred million entries with commodity hardware.
|
||||
.FOOTNOTE1
|
||||
There is no real hard limits but the underlying filesystem, DODB can accept billions of entries.
|
||||
.br
|
||||
See the section
|
||||
.dq "Limits of DODB" .
|
||||
.FOOTNOTE2
|
||||
Code simplicity implies hackability.
|
||||
Traditional SQL relational databases have a snowballing effect on code complexity, including for applications with basic requirements.
|
||||
However, DODB may be a great starting point to implement more sophisticated features for creative minds.
|
||||
|
||||
.UL "The non-goals of DODB"
|
||||
are:
|
||||
.STARTBULLET
|
||||
.BULLET to provide a generic library w
|
||||
.ENDBULLET
|
||||
Its simplicity (approach and code) makes trivial any modification for specific needs.
|
||||
DODB may be a great starting point to implement more sophisticated features for creative minds.
|
||||
|
||||
.UL "Contrary to SQL" ,
|
||||
DODB has a very narrow scope: to provide a library enabling to store, to retrieve, to modify and to delete data.
|
||||
|
|
@ -192,8 +219,11 @@ DODB doesn't provide an interactive shell, there is no request language to perfo
|
|||
Instead, DODB reduces the complexity of the infrastructure, stores data in plain files and enables simple manual scripting with widespread unix tools.
|
||||
Simplicity is key.
|
||||
|
||||
Traditional SQL relational databases have a snowballing effect on code complexity, even for applications with basic requirements.
|
||||
Furthermore, data description in tables and relations is not intuitive contrary to storing whole documents which is simply serializing structures used in the code.
|
||||
|
||||
.UL "Contrary to other NoSQL databases" ,
|
||||
DODB doesn't provide an application but a library, nothing else.
|
||||
DODB isn't an application but a library.
|
||||
The idea is to help developers to store their data themselves, not depending on
|
||||
. I yet-another-all-in-one
|
||||
massive tool.
|
||||
|
|
@ -202,7 +232,7 @@ The library writes (and removes) data on a storage device, has a few retrieval a
|
|||
The lack of features
|
||||
.I is
|
||||
the feature.
|
||||
Even with that motto, the tool still is expected to be convenient for most applications.
|
||||
Yet, the tool is expected to be convenient for most applications.
|
||||
.FOOTNOTE2
|
||||
|
||||
Section 2 provides an extensive documentation on how DODB works and how to use it.
|
||||
|
|
@ -221,8 +251,8 @@ Finally, section 12 provides a conclusion.
|
|||
.SECTION How DODB works and basic usage
|
||||
DODB is a hash table.
|
||||
The key of the hash is an auto-incremented number and the value is the stored data.
|
||||
The following section will explain how to use DODB for basic cases including the few added mechanisms to speed-up searches.
|
||||
Also, the filesystem representation of the data will be presented since it enables easy off-application searches.
|
||||
This section explains how to use DODB for basic cases including the few added mechanisms to speed-up searches.
|
||||
Also, the filesystem representation of the data is presented since it enables easy off-application searches.
|
||||
|
||||
The presented code is in Crystal such as the DODB library.
|
||||
Keep in mind that this document is all about the method more than the current implementation.
|
||||
|
|
@ -995,12 +1025,13 @@ With Postgres, the request duration of a single value varies from 0.1 to 2 ms on
|
|||
.
|
||||
.
|
||||
.SECTION Limits of DODB
|
||||
DODB provides basic database operations such as storing, retrieving, modifying and removing data.
|
||||
However, DODB doesn't fully handle ACID properties\*[*]: atomicity, consistency, isolation and durability.
|
||||
This section presents the limits of DODB, whether the current implementation or the approach, and presents some suggestions to fill the gaps.
|
||||
DODB provides basic database operations such as storing, retrieving, modifying and removing data but doesn't fully handle ACID properties nor a few other aspects generally associated with databases\*[*].
|
||||
.FOOTNOTE1
|
||||
Traditional SQL databases handle ACID properties and may have created some "expectations" towards databases from a general public standpoint.
|
||||
Traditional SQL databases may have created some "expectations" towards databases from a general public standpoint, such as the ACID properties (atomicity, consistency, isolation and durability), transactions and replication.
|
||||
.FOOTNOTE2
|
||||
This section presents the limits of DODB, whether the current implementation or the approach.
|
||||
The state of filesystems will be discussed since DODB heavily relies on the underlying filesystem.
|
||||
Finally, this section presents some suggestions to fill the gaps with traditional databases on a few points.
|
||||
|
||||
.SS "Current state of DODB regarding ACID properties"
|
||||
.STARTBULLET
|
||||
|
|
@ -1032,19 +1063,17 @@ These new triggers could record user-defined procedures to perform database veri
|
|||
.BULLET
|
||||
.B Isolation
|
||||
is partially taken into account with a locking mechanism preventing race conditions when modifying a value.
|
||||
This may be seen as simplistic but
|
||||
.SHINE "good enough"
|
||||
for most applications.
|
||||
This may be seen as simplistic but good enough for most applications.
|
||||
|
||||
.BULLET
|
||||
.B Durability
|
||||
is taken into account.
|
||||
Data is written on disk each time it changes.
|
||||
Again, this is basic but
|
||||
.SHINE "good enough"
|
||||
for most applications.
|
||||
Data checksums are delegated to the filesystem or external tools.
|
||||
Again, this is basic but good enough for most applications.
|
||||
|
||||
A future improvement could be to write a checksum for every file to detect corrupt data, but this overlaps with some filesystems which already provide this feature.
|
||||
.ENDBULLET
|
||||
A future improvement could be to write a checksum for every written data, to easily remove corrupt data from a database.
|
||||
|
||||
.SS "Discussion on ACID properties"
|
||||
First and foremost, both atomicity and isolation properties are inherently related to parallelism, whether through concurrent threads or applications.
|
||||
|
|
@ -1055,13 +1084,13 @@ Therefore, DODB could theoretically serve millions of requests per second from a
|
|||
.FOOTNOTE1
|
||||
FYI, the service
|
||||
.I netlib.re
|
||||
uses DODB and since the database is fast enough, parallelism isn't required despite enabling several thousand requests per second.
|
||||
uses DODB and since the database is fast enough, parallelism isn't required despite enabling several thousand requests per second in a virtual machine on a low-end hardware released almost two decades ago.
|
||||
.FOOTNOTE2
|
||||
Considering this swiftness, parallelism may seem as optional.
|
||||
|
||||
The consistency property is a safety net for potentially defective software.
|
||||
Always nice to have, but not entirely necessary, especially for document-oriented databases.
|
||||
Contrary to a traditional SQL database which often requires several modifications to different tables in one go to be kept consistent, a document-oriented database stores an entire document which already is internally consistent.
|
||||
Contrary to a traditional SQL database which often requires several modifications of different tables in one go to be kept consistent, a document-oriented database stores an entire document which already is internally consistent.
|
||||
When several documents are involved (which happens from time to time), consistency needs to be checked, but this may not require much code\*[*].
|
||||
Not checking systematically for consistency upon any database modification is a tradeoff between simplicity of the code plus speed, and security.
|
||||
.FOOTNOTE1
|
||||
|
|
@ -1071,8 +1100,9 @@ Database verifications are just the last bastion against inserting junk data.
|
|||
|
||||
Moreover, the consistency property in traditional SQL databases is often used for simple tasks but quickly becomes difficult to deal with.
|
||||
Some companies and organizations (such as Doctors Without Borders for instance) cannot afford to implement all the preventive measures in their DBMSs due to the sheer complexity of it.
|
||||
Instead, these organizations adopt curative measures that they may call "data-fix".
|
||||
Thus, having some verifications in the database is not a silver bullet, it is complementary to other measures.
|
||||
Instead, these organizations adopt curative measures that they may call
|
||||
.dq data-fix .
|
||||
Having verifications in the database is not a silver bullet but a complementary measure at most.
|
||||
|
||||
DODB may provide some form of atomicity and consistency at some point, but nothing fancy nor too advanced.
|
||||
The whole point of the DODB project is to keep the code simple, hackable, enjoyable even.
|
||||
|
|
@ -1082,11 +1112,42 @@ Which also results from a lack of time.
|
|||
.FOOTNOTE2
|
||||
|
||||
.SS "Beyond ACID properties \[en] modern databases' features"
|
||||
Most current databases (traditional relational databases, some key-value databases and so on) provide additional features.
|
||||
These features may include for example high availability toolsets (replication, clustering, etc.), some forms of modularity (several storage backends, specific interfaces with other tools, etc.), interactive command lines or shells, user and authorization management, administration of databases, and so on.
|
||||
Most current databases (traditional relational databases, some key-value databases and so on) provide additional features that need to be addressed.
|
||||
|
||||
Because DODB is a library and doesn't support an intermediary language for generic requests,
|
||||
.TBD
|
||||
.STARTBULLET
|
||||
.KS
|
||||
.BULLET
|
||||
.B "High availability toolsets"
|
||||
(replication, clustering, etc.).
|
||||
This is out-of-scope.
|
||||
They don't match the DODB goals to begin with, which is to provide a database for small projects.
|
||||
The author of this document did not explore this idea and probably never will.
|
||||
.KE
|
||||
|
||||
A (maybe limited) version of these features could be provided by the filesystem itself.
|
||||
For example, CephFS is a filesystem designed for replication, fault tolerance, large-scale deployment and so on.
|
||||
|
||||
.KS
|
||||
.BULLET
|
||||
.B Modularity .
|
||||
Traditional DBMSs often have several storage backends to meet the needs in different contexts.
|
||||
Technically, DODB already implements several storage backends since the DODB RAMOnly database doesn't record data on a storage device contrary to the other implementations.
|
||||
More importantly, the definition of a database in DODB is simple enough to consider developing a specialized backend for any specific need.
|
||||
The RAMOnly database only has 33 lines of code and is a great starting point for more complex implementations.
|
||||
.KE
|
||||
|
||||
Also, traditional DBMSs may have specific interfaces with other tools, for example to delegate a feature to an external software such as ElasticSearch for complex requests on strings (which may require some sophisticated text analysis).
|
||||
There is no facility in DODB to provide this, however providing data to an external tool could be as simple as implementing a new trigger which could be achieved in a few dozen lines.
|
||||
|
||||
.KS
|
||||
.BULLET
|
||||
.B "Database administration" .
|
||||
Traditional databases can be managed through command lines or a dedicated shell, enabling interactive CRUD on databases (and tables) themselves, user and authorization management, etc.
|
||||
DODB cannot, for the very same reason it came into existence: enabling this kind of tooling implies an enormous amount of code and complexity, obfuscating core database operations that should be both understandable and customizable.
|
||||
.KE
|
||||
.ENDBULLET
|
||||
|
||||
In conclusion, the "missing" features are either irrelevant in the context of DODB or simple enough to implement and customize to one's needs.
|
||||
.
|
||||
.SS "The state of file systems, their limitations and useful features for DODB instances"
|
||||
A
|
||||
|
|
@ -1097,15 +1158,46 @@ The next paragraphs will give an idea of how filesystems work, the implied limit
|
|||
.FOOTNOTE1
|
||||
Explaining the way filesystem work and their design is out of the scope of this document, so this part will be kept short for readability reasons.
|
||||
.FOOTNOTE2
|
||||
|
||||
Beside filesystems designed for specific constraints, such as writing data on a compact disk\*[*] or providing a network filesystem, most
|
||||
.dq generic
|
||||
filesystems share a (loosely) common set of objectives.
|
||||
.
|
||||
.SSS "How a filesystem works"
|
||||
Filesystems designed for specific constraints, such as writing data on a compact disk\*[*] or providing a network filesystem, are out-of-scope of this document.
|
||||
.FOOTNOTE1
|
||||
A compact disk has specific constraints since the device will then only provide read-only access to the data, obviating the need for most of the complexity revolving around fragmentation, inode management and so on.
|
||||
All storage devices have their own particularities, but regular hard drives and solid-state drives are the important ones for this discussion since filesystems have mostly been designed for them.
|
||||
.FOOTNOTE2
|
||||
These features could be summarized in a few points.
|
||||
The rest of this section will address more
|
||||
.dq generic
|
||||
filesystems\*[*] unless explicitely stated otherwise.
|
||||
.FOOTNOTE1
|
||||
Furthermore, the rich history behind filesystems is inherently related to the rich history of storage devices, this document is not supposed to be a survey on either of those.
|
||||
Let's keep it short and simple.
|
||||
.FOOTNOTE2
|
||||
|
||||
Filesystems have similarities in their inner workings.
|
||||
For instance, they have some
|
||||
.dq "special files"
|
||||
called
|
||||
.I inodes
|
||||
(hidden from the user) to keep track of
|
||||
.I where
|
||||
the files are on the disk, their size, the last time they were modified or accessed and other metadata.
|
||||
A filesystem is split into a list of
|
||||
.I blocks
|
||||
of a certain size\*[*] (4 kilobytes by default on ext4).
|
||||
.FOOTNOTE1
|
||||
Working only with blocks (from 0 to x) is called
|
||||
.dq "Logical block addressing" .
|
||||
Before that, other schemes were used such as
|
||||
.I cylinder-head-sector
|
||||
(CHS) but this is fairly obsolete since even hard disks do not use this anymore.
|
||||
.FOOTNOTE2
|
||||
Since all files cannot be reasonably expected to be written in a continuous segment of data, inodes store the block numbers where the file has been written.
|
||||
Filesystems may enable to tweak the block size (related to the
|
||||
.I "sector"
|
||||
size of the storage device) either to reduce fragmentation and metadata (bigger block sizes to partitions with big files) or to avoid wasting space (smaller block sizes to partitions with a huge number of files under the size of a block).
|
||||
.
|
||||
.SSS "Objectives of a filesystem"
|
||||
Filesystems share a (loosely) common set of objectives.
|
||||
|
||||
.STARTBULLET
|
||||
.KS
|
||||
|
|
@ -1114,19 +1206,16 @@ These features could be summarized in a few points.
|
|||
Above all, as already established, filesystems enable CRUD operations on a storage device through the concepts of directories and files; this is how users have been directly interacting with their computer to store data for decades.
|
||||
.KE
|
||||
|
||||
.BULLET
|
||||
.KS
|
||||
.BULLET
|
||||
.B "Reliability and safety" .
|
||||
.TBD
|
||||
Since computers do not run in a vacuum, many problems can occur during operation including the loss of the energy supply.
|
||||
Filesystems try to mitigate damage by keeping a journal of operations (journalized filesystems).
|
||||
Advanced filesystems may also detect file corruption with automated checksums.
|
||||
Since applications do not run in a vacuum, many problems can occur during normal computer operation including the loss of the energy supply, slow filesystem corruption due to a faulting storage device, etc.
|
||||
Filesystems try to mitigate damage in different ways, such as keeping a journal of operations (journalized filesystems) to avoid losing files upon energy loss, or by detecting file corruption with automated checksums.
|
||||
.KE
|
||||
|
||||
.BULLET
|
||||
.KS
|
||||
.BULLET
|
||||
.B "Security" .
|
||||
.KE
|
||||
File access should be limited in a number of cases.
|
||||
For example, several applications with networking features might run on a computer.
|
||||
If one of these applications is successfully attacked, the attacker shouldn't be able to access other services data or user data.
|
||||
|
|
@ -1134,13 +1223,14 @@ Same thing for shared computers, one user shouldn't be able to see other users'
|
|||
Therefore, the most widespread form of security comes from filesystem permissions, enabling a user (or a group of users) to access (or to be denied from accessing) specific data (files and directories).
|
||||
Those permissions include the right to read, to modify or to execute a file, to list or to remove files from a directory, to create or to remove directories and a few other permissions.
|
||||
Extended permissions and attributes exist but are out-of-scope.
|
||||
.KE
|
||||
|
||||
Beside permissions, encryption also brings some kind of security.
|
||||
In this case, the point is to prevent attackers from accessing protected data despite retrieving files.
|
||||
Some advanced filesystems can encrypt files individually, others provide the encryption of a whole partition, both methods having their pros and cons.
|
||||
|
||||
.BULLET
|
||||
.KS
|
||||
.BULLET
|
||||
.B "Performance and capacity" .
|
||||
Many file systems were developed over the years to circumvent contemporary limitations on file or partition sizes, the number of possible files, the limitation on path name lengths, etc.
|
||||
While storage devices mostly impose physical limitations, a filesystem may be wasting resources because of a simplistic or inadequate design.
|
||||
|
|
@ -1152,22 +1242,193 @@ So, worst case scenario, data rate is
|
|||
.FRAC 1 4000
|
||||
(huge waste) meaning that a 1GB of data would require an entire 4TB hard drive\*[*] (without even taking the inodes' size into account).
|
||||
.FOOTNOTE1
|
||||
Ext4 can integrate up to 60 bytes of data into an extended inode.
|
||||
To slightly mitigate this, ext4 can integrate up to 60 bytes of data into an inode.
|
||||
.FOOTNOTE2
|
||||
|
||||
.BULLET
|
||||
.KS
|
||||
.B "Miscealeneous and advanced features" .
|
||||
A few other features need to be mentionned, such as block suballocation, file content included in the inode, etc.
|
||||
Some filesystems added more than a decade ago then under-explored features such as snapshots, compression and transactions.
|
||||
.BULLET
|
||||
.B "Miscellaneous and advanced features" .
|
||||
A few other features need to be mentioned, such as block suballocation\*[*], data inclusion in unused inode space\*[*] and compression for instance.
|
||||
Along with more advanced features such as snapshotting and transactions, they all represent incremental improvements of filesystems made over the years and which are now stable and available for the many.
|
||||
.KE
|
||||
.ENDBULLET
|
||||
.FOOTNOTE1
|
||||
Block suballocation enables to save some space by putting data from two different files in a single underused block.
|
||||
.FOOTNOTE2
|
||||
.FOOTNOTE1
|
||||
An inode size may be bigger than what is only needed to index and retrieve a file, inodes can store extended file attributes and such.
|
||||
In case this space isn't used for metadata, some filesystems enables to use it for file data directly for very small files (up to a few dozen bytes in ext4), reducing disk space and redirections.
|
||||
.FOOTNOTE2
|
||||
.
|
||||
.KS
|
||||
.SSS "Exotic filesystems"
|
||||
Filesystems have been developed over the years for various reasons.
|
||||
Let's browse for a moment to provide an overview of what is possible.
|
||||
.KE
|
||||
|
||||
In conclusion, no current filesystem has been designed to be used the way DODB use them.
|
||||
However, having a few millions entries is fine on most filesystems.
|
||||
.B Kernel-related .
|
||||
A whole class of filesystems is dedicated to provide an interface to the kernel, such as
|
||||
.I procfs
|
||||
(information about running processes),
|
||||
.I sysfs
|
||||
(to tweak a few device parameters) or even
|
||||
.I debugfs
|
||||
(to provide debug info from the kernel to user-space).
|
||||
Providing information about the running system and enabling its modification through simple files and directories is a direct
|
||||
.dq "everything is a file"
|
||||
UNIX legacy.
|
||||
Data cannot be freely written, files are directly related to specific structures which only accept a finite set of possible values; consistency is preserved with verifications written in the drivers.
|
||||
|
||||
.B "Network-related" .
|
||||
Many filesystems were designed specifically to be remotely mounted, either to be shared amongst many people in a company, or to be part of a giant cluster to provide a high-availability storage solution for tech giants with peculiar requirements or just to stack ever more commodity computers together and provide a gigantic storage space.
|
||||
Filesystems can also be distributed with some replication in order to provide a fault-tolerant storage with ordinary computers sharing unused space.
|
||||
|
||||
.KS
|
||||
.B "UnionFS" .
|
||||
UnionFS (and its variants) is a filesystem enabling several filesystems to be mounted on the same mount-point and to show overlapping contents, enabling a read-only base image to be used together with persistent data for a specific instance.
|
||||
This way, a
|
||||
.dq "live-cd image"
|
||||
for an operating system can become persistent by storing modifications on an usb stick.
|
||||
.KE
|
||||
|
||||
UnionFS is a copy-on-write snapshotting filesystem on top of other filesystems.
|
||||
Docker uses it to save space.
|
||||
Since the different software that Docker provides are ready-to-run virtual machines, a base OS image is shared amongst all instances so each instance only stores its own specific files (binaries, configuration and dependencies) written in a separate storage volume.
|
||||
Thus, despite each software distribution requiring an entire operating system environment, the storage volume is kept reasonable.
|
||||
|
||||
.KS
|
||||
.B "Archivemount" .
|
||||
Mounting a compressed archive, enabling to use day-to-day tools to search for a file in an archive without the need to uncompress it.
|
||||
.KE
|
||||
|
||||
.KS
|
||||
.B "RAM-based filesystems" .
|
||||
For temporary data, intensive read and write operations on a small storage volume or for filesystem development, a chunk of the computer memory can be used as a filesystem thanks to
|
||||
.B tmpfs
|
||||
and variants\*[*] (ramdisk and ramfs).
|
||||
.KE
|
||||
.FOOTNOTE1
|
||||
.B ramdisk
|
||||
creates a block file based on a chunk of RAM that needs to be formated then mounted as any partition.
|
||||
.B ramfs
|
||||
mounts directly a RAM-based filesystem, without the need to format a fake partition.
|
||||
Finally,
|
||||
.B tmpfs ,
|
||||
the more flexible option, is used as ramfs but can be resized and only uses a necessary amount of RAM at a given point since memory is free'd once a file is removed.
|
||||
.FOOTNOTE2
|
||||
|
||||
.KS
|
||||
.B "Semantic (tag-based) filesystems" .
|
||||
Some filesystems (such as tagsistant) store data based on tags for each file which enables to index a file based on many attributes and not a single path.
|
||||
As a side effect, searching for a file in this context can be done by computing the intersection of different tags\*[*].
|
||||
.KE
|
||||
.FOOTNOTE1
|
||||
Well well well… doesn't that sound like the DODB tag triggers?
|
||||
As if databases and filesystems were intertwined somehow…
|
||||
.FOOTNOTE2
|
||||
|
||||
.KS
|
||||
.B "And many more" !
|
||||
Other specific filesystems may not be widespread like the ones mentioned above but they exist and are as exotic as the constraints in which they evolve.
|
||||
.KE
|
||||
.
|
||||
.KS
|
||||
.SSS "Quick comparison between DBMSs and filesystems"
|
||||
The following table shows the proximity between famous database systems and ordinary filesystems, both sharing a lot of features despite very different approaches.
|
||||
.ds OK \[OK]
|
||||
.ds NOK \[tmu]
|
||||
.nr total 16.0c
|
||||
.nr col1 3.0c
|
||||
.nr col2 (\n[total]-\n[col1])/6
|
||||
.nr col3 (\n[total]-\n[col1]-\n[col2])
|
||||
.\"total: \n[total]
|
||||
.\"col1: \n[col1]
|
||||
.\"col2: \n[col2]
|
||||
.\"col3: \n[col3]
|
||||
.TS
|
||||
allbox tab(:);
|
||||
c | c | c
|
||||
cw(\n[col1]u) | lw(\n[col2]u) | lw(\n[col3]u).
|
||||
Feature : DBMS : Filesystems
|
||||
CRUD operations : \*[OK] SQL :\*[OK] files & directories
|
||||
Atomicity : \*[OK] :T{
|
||||
\*[OK] locking mechanism based on files
|
||||
T}
|
||||
Consistency : \*[OK] :\*[OK] in specific filesystems (the kernel-related ones for example)
|
||||
Isolation : \*[OK] :T{
|
||||
\*[OK]
|
||||
.dq "new file then mv"
|
||||
technique\*[*]
|
||||
T}
|
||||
Durability : \*[OK] :\*[OK] checksums
|
||||
Access Time : 0.1 to 2ms :T{
|
||||
a few µs (cache), a few dozen µs (SSD+NVMe), a few hundred µs (SSD)
|
||||
and up to a dozen ms (hard disk)
|
||||
T}
|
||||
High avail. : \*[OK] :T{
|
||||
\*[OK] RAID & variants plus many distributed or cluster filesystems
|
||||
T}
|
||||
Transactions : \*[OK] :T{
|
||||
\*[OK] in a few filesystems (BTRFS, ZFS)
|
||||
T}
|
||||
Replication : \*[OK] :T{
|
||||
\*[OK] in many filesystems (BTRFS, ZFS, ClusterFS, etc.)
|
||||
T}
|
||||
Performance : \*[OK] :T{
|
||||
\*[OK] B-trees and variants (used in all modern FS: BTRFS, ext4, Raiserfs4, NTFS, HAMMER…) are used to search data on the storage device but also to get an entry in a huge directory
|
||||
T}
|
||||
Space waste :T{
|
||||
.ps -2
|
||||
\*[OK] almost none
|
||||
.ps
|
||||
T}:T{
|
||||
\*[NOK] generally important on small data (there is a room for improvement), that's why just to mimic relational databases doesn't work well with current filesystem inner workings, but document-oriented databases (having a whole set of related data in a single file) make sense
|
||||
T}
|
||||
.TE
|
||||
.FOOTNOTE1
|
||||
In a desktop environment this technique isn't viable, users usually just rewrite data in-place.
|
||||
However, considering a data management library, this method to ensure data integrity is a no-brainer.
|
||||
.FOOTNOTE2
|
||||
.KE
|
||||
|
||||
This table shows an overview of some (mostly shared) DBMSs and filesystems features.
|
||||
Real deployments may involve a whole range of tools, including a mix of both of these solutions.
|
||||
For example, key-value databases often are used as DBMSs' cache to massively speed data retrieval up.
|
||||
|
||||
The main difference between DBMSs and filesystems is the
|
||||
.I consistency
|
||||
property.
|
||||
Filesystems are almost exclusively built to store undefined streams of data with a very wide range of different shapes (plain text, multimedia, documents, etc.) and sizes (from empty to multiple terabytes and more), thus no consistency verification can be reasonably implemented outside very specific contexts (such as kernel-related filesystems).
|
||||
.
|
||||
.
|
||||
.SECTION Alternatives
|
||||
.KS
|
||||
.SSS "Conclusion on filesystems"
|
||||
The difference between the feature set of traditional databases and filesystems slightly narrowed over time.
|
||||
The discrepancy will always be there since they do not share the same goal, yet some features overlap.
|
||||
Even though no current filesystem has been designed to be used the way DODB use them, this kind of database system can profit from some
|
||||
.dq recent
|
||||
developments in the filesystem world (such as transactions).
|
||||
.KE
|
||||
|
||||
Also, the codebase size (and complexity) necessary to create a database system that provides acceptable performances for a small project \*[*] shrunk drastically thanks to hardware and filesystem developments.
|
||||
.FOOTNOTE1
|
||||
Beside CRUD operations, a small project could imply basic relations between data, some simple transactions, a few databases (or
|
||||
.I tables
|
||||
in DBMS jargon) and a few thousand operations per second.
|
||||
Both relations and transactions could be handled by the application, not necessarily by the database system itself.
|
||||
.FOOTNOTE2
|
||||
Performance simply isn't a problem for most uses nowadays.
|
||||
Having a directory with a few million entries is fine on modern filesystems since reaching a file by name (with its full path) doesn't trigger a linear search.
|
||||
The first file access is slow\*[*] then the kernel
|
||||
.B automatically
|
||||
caches the file, making it reachable in about a few dozen µs which is virtually nothing.
|
||||
.FOOTNOTE1
|
||||
The first access to a file on a hard drive can be as slow as a few miliseconds and about a hundred microseconds for a SSD.
|
||||
But today with the NVMe protocol, the latency to the first file access can be as low as a dozen microseconds.
|
||||
.FOOTNOTE2
|
||||
.
|
||||
.
|
||||
.SECTION Alternative databases
|
||||
Other approaches have been used to store data over the years, including but not limited to SQL and key-value stores.
|
||||
This section briefly presents some of them and their difference from DODB.
|
||||
|
||||
|
|
@ -1191,21 +1452,80 @@ These applications are inherently complex for different reasons.
|
|||
MadiaDB has 2.3 million lines of code (MLOC) and 1.7 MLOC for Postgres.
|
||||
Other mentioned DBMSs aren't open-source software, but it seems reasonable to consider their number of LOC to be in the same ballpark.
|
||||
.br
|
||||
Just to put things into perspective, DODB is less than 1300 lines of code.
|
||||
Just to put things into perspective, DODB is just a thousand lines of code.
|
||||
Sure, DODB doesn't have the same features, but are they worth multiplying the codebase by 1700?
|
||||
.FOOTNOTE2
|
||||
|
||||
.BULLET
|
||||
.B "Embedded SQL database" .
|
||||
Example:
|
||||
.B SQLite .
|
||||
This is a library (for
|
||||
.dq "serverless"
|
||||
applications) implementing SQL for database operations, making it far more complex and slower than DODB.
|
||||
|
||||
As SQLite,
|
||||
.B DuckDB
|
||||
also is a library, with a slightly different objective.
|
||||
Instead of being written and optimized to answer real-time requests, the goal is to perform operations on large data sets with a focus on analytical processing.
|
||||
And again, as SQLite, DuckDB implements SQL and sophisticated operations making it far more complex than DODB.
|
||||
Though, the stated goal is fairly different from the subject of this paper which may explain its complexity.
|
||||
|
||||
.BULLET
|
||||
.B "New types of SQL" .
|
||||
Example:
|
||||
.B EdgeDB
|
||||
(modern SQL-like with easy-of-use complex relations),
|
||||
.B RethinkDB
|
||||
(modern SQL-like, JSON exchanges, distributed).
|
||||
These new applications try to improve the SQL language with the benefit of hindsight provided by the experience with current SQL technologies.
|
||||
Beside a few simplified operations compared to current SQL equivalent, and some performance improvements thanks to fine-grained typing, none of them tackle the complexity problem of the database itself.
|
||||
A new SQL-like language would still require an enormous piece of code to run.
|
||||
|
||||
.BULLET
|
||||
.B "Key-value stores."
|
||||
.B "Memcached"
|
||||
Example:
|
||||
.B Memcached
|
||||
(application to store data cache, not to be used as a primary database system).
|
||||
KV stores are often used as cache for traditional DBMSs.
|
||||
KV stores have the advantage of being simpler than SQL databases, Memcached
|
||||
.dq only
|
||||
has 61 kloc for example.
|
||||
|
||||
.B "Redis"
|
||||
However, most KV stores implement features beyond the core functionality.
|
||||
.B Redis ,
|
||||
and its open-source fork
|
||||
.B Valkey ,
|
||||
are complex KV stores with a lot of features, including support for many typed data, message broker, clustering, distributed cache, optional durability, server-side scripting, etc.
|
||||
|
||||
Many other KV stores can be mentioned, such as
|
||||
.B LevelDB
|
||||
(embedded),
|
||||
and
|
||||
.B RocksDB
|
||||
(fork of LevelDB with added features, such as transactions, snapshots, bloom filters, optimizations for multi-CPUs, etc.),
|
||||
.B CockroachDB
|
||||
(proprietary, distributed, ACID transactions), etc.
|
||||
|
||||
Features vary, but all these implementations of KV stores are actually efficient on data retrievial compared to SQL databases.
|
||||
|
||||
.KS
|
||||
.BULLET
|
||||
.B "Document databases" .
|
||||
Many other document-oriented databases exist beside DODB.
|
||||
For example,
|
||||
.B CouchDB
|
||||
(distributed, fault-tolerant, RESTful HTTP and JSON API…),
|
||||
.B "MongoDB"
|
||||
|
||||
.B "duckdb"
|
||||
(proprietary, ACID transactions, replication…),
|
||||
.B UnQlite
|
||||
(embedded, ACID transactions, embedded scripting language…).
|
||||
As far as the author knows, none of them is as simple as DODB.
|
||||
.KE
|
||||
.ENDBULLET
|
||||
|
||||
.B Cassandra
|
||||
|
||||
.TBD
|
||||
.
|
||||
.SECTION Future work
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue