Introduction++
This commit is contained in:
parent
7e20de3d0e
commit
c04104dce1
1 changed files with 114 additions and 84 deletions
198
paper/paper.ms
198
paper/paper.ms
|
@ -81,9 +81,9 @@ Although fairly advanced, this document lacks a few reviews, a bit of discussion
|
|||
A database consists in managing data, enabling queries to add, to retrieve, to modify and to delete a piece of information.
|
||||
These actions are grouped under the acronym CRUD: creation, retrieval, update and deletion.
|
||||
CRUD operations are the foundation for the most basic databases.
|
||||
Yet, almost every single database engine goes far beyond this minimalistic set of features.
|
||||
Of course, almost every single database engine goes far beyond this minimalistic set of features.
|
||||
|
||||
Although everyone using the filesystem of their computer as some sort of database (based on previous definition) by storing raw data (files) in a hierarchical manner (directories), computer science classes introduce a particularly convoluted way of managing data.
|
||||
Although everyone is using the filesystem of their computer as some sort of database (based on previous definition) by storing raw data (files) in a hierarchical manner (directories), computer science classes introduce a particularly convoluted way of managing data.
|
||||
Universities all around the world teach about Structured Query Language (SQL) and relational databases.
|
||||
These two concepts are closely interlinked and require a brief explanation.
|
||||
|
||||
|
@ -144,12 +144,13 @@ And so on.
|
|||
For many reasons, SQL is not a silver bullet to
|
||||
.I solve
|
||||
the database problem.
|
||||
The encountered difficulties mentioned above and the original objectives of SQL not being universal\*[*], other database designs were created\*[*].
|
||||
The encountered difficulties mentioned above and the original objectives of SQL not being universal\*[*],
|
||||
.FOOTNOTE1
|
||||
To say the least!
|
||||
Not everyone needs to let users access the database without going through the application.
|
||||
For instance, writing a \f[I]blog\f[] for a small event or to share small stories about your life doesn't require manual operations on the database, fortunately.
|
||||
.FOOTNOTE2
|
||||
other database designs were created\*[*].
|
||||
.FOOTNOTE1
|
||||
A lot of designs won't be mentioned here.
|
||||
The actual history of databases is often quite unclear since the categories of databases are sometimes vague, underspecified.
|
||||
|
@ -171,22 +172,45 @@ And that's exactly what is being done in Document Oriented DataBase (DODB).
|
|||
|
||||
.UL "The stated goal of DODB"
|
||||
is to provide a simple and easy-to-use
|
||||
.UL library
|
||||
for developers to perform CRUD operations on documents (undescribed data structures).
|
||||
DODB aims basic to medium-sized projects, up to a few million entries\*[*].
|
||||
.UL library \*[*]
|
||||
for developers to store documents (undescribed data structures).
|
||||
.FOOTNOTE1
|
||||
Or as people might call it:
|
||||
.dq "serverless architecture" .
|
||||
.FOOTNOTE2
|
||||
|
||||
.STARTBULLET
|
||||
.KS
|
||||
.BULLET
|
||||
.B Simple ,
|
||||
because the approach is indeed trivial: the database entries are written as simple files in a directory.
|
||||
This simplicity has a snowballing effect: it only requires a few dozen lines of code.
|
||||
DODB is implemented in only a thousand lines of code in total, despite including optional features and optimized alternative implementations to make the library efficient and cover most cases.
|
||||
.KE
|
||||
|
||||
DODB doesn't strive to be minimalistic, but it avoids intermediary language and low-level optimizations.
|
||||
Storing data is writing a file.
|
||||
Indexing data is making symbolic links.
|
||||
It is that simple.
|
||||
|
||||
.KS
|
||||
.BULLET
|
||||
.B Easy-to-use ,
|
||||
because the API is high-level and doesn't take any superflous parameter.
|
||||
Creating a database only requires a path, updating an entry only requires the new version of the entry, and so on.
|
||||
Everything is designed to be enjoyable for the developers.
|
||||
.KE
|
||||
.ENDBULLET
|
||||
|
||||
DODB aims for small and medium-size projects\*[*], up to a few hundred million entries with commodity hardware.
|
||||
.FOOTNOTE1
|
||||
There is no real hard limits but the underlying filesystem, DODB can accept billions of entries.
|
||||
.br
|
||||
See the section
|
||||
.dq "Limits of DODB" .
|
||||
.FOOTNOTE2
|
||||
Code simplicity implies hackability.
|
||||
Traditional SQL relational databases have a snowballing effect on code complexity, including for applications with basic requirements.
|
||||
However, DODB may be a great starting point to implement more sophisticated features for creative minds.
|
||||
|
||||
.UL "The non-goals of DODB"
|
||||
are:
|
||||
.STARTBULLET
|
||||
.BULLET to provide a generic library w
|
||||
.ENDBULLET
|
||||
Its simplicity (approach and code) makes trivial any modification for specific needs.
|
||||
DODB may be a great starting point to implement more sophisticated features for creative minds.
|
||||
|
||||
.UL "Contrary to SQL" ,
|
||||
DODB has a very narrow scope: to provide a library enabling to store, to retrieve, to modify and to delete data.
|
||||
|
@ -195,8 +219,11 @@ DODB doesn't provide an interactive shell, there is no request language to perfo
|
|||
Instead, DODB reduces the complexity of the infrastructure, stores data in plain files and enables simple manual scripting with widespread unix tools.
|
||||
Simplicity is key.
|
||||
|
||||
Traditional SQL relational databases have a snowballing effect on code complexity, even for applications with basic requirements.
|
||||
Furthermore, data description in tables and relations is not intuitive contrary to storing whole documents which is simply serializing structures used in the code.
|
||||
|
||||
.UL "Contrary to other NoSQL databases" ,
|
||||
DODB doesn't provide an application but a library, nothing else.
|
||||
DODB isn't an application but a library.
|
||||
The idea is to help developers to store their data themselves, not depending on
|
||||
. I yet-another-all-in-one
|
||||
massive tool.
|
||||
|
@ -205,7 +232,7 @@ The library writes (and removes) data on a storage device, has a few retrieval a
|
|||
The lack of features
|
||||
.I is
|
||||
the feature.
|
||||
Even with that motto, the tool still is expected to be convenient for most applications.
|
||||
Yet, the tool is expected to be convenient for most applications.
|
||||
.FOOTNOTE2
|
||||
|
||||
Section 2 provides an extensive documentation on how DODB works and how to use it.
|
||||
|
@ -224,8 +251,8 @@ Finally, section 12 provides a conclusion.
|
|||
.SECTION How DODB works and basic usage
|
||||
DODB is a hash table.
|
||||
The key of the hash is an auto-incremented number and the value is the stored data.
|
||||
The following section will explain how to use DODB for basic cases including the few added mechanisms to speed-up searches.
|
||||
Also, the filesystem representation of the data will be presented since it enables easy off-application searches.
|
||||
This section explains how to use DODB for basic cases including the few added mechanisms to speed-up searches.
|
||||
Also, the filesystem representation of the data is presented since it enables easy off-application searches.
|
||||
|
||||
The presented code is in Crystal such as the DODB library.
|
||||
Keep in mind that this document is all about the method more than the current implementation.
|
||||
|
@ -1235,6 +1262,72 @@ In case this space isn't used for metadata, some filesystems enables to use it f
|
|||
.FOOTNOTE2
|
||||
.
|
||||
.KS
|
||||
.SSS "Exotic filesystems"
|
||||
Filesystems have been developed over the years for various reasons.
|
||||
Let's browse for a moment to provide an overview of what is possible.
|
||||
.KE
|
||||
|
||||
.B Kernel-related .
|
||||
A whole class of filesystems is dedicated to provide an interface to the kernel, such as
|
||||
.I procfs
|
||||
(information about running processes),
|
||||
.I sysfs
|
||||
(to tweak a few device parameters) or even
|
||||
.I debugfs
|
||||
(to provide debug info from the kernel to user-space).
|
||||
Providing information about the running system and enabling its modification through simple files and directories is a direct
|
||||
.dq "everything is a file"
|
||||
UNIX legacy.
|
||||
Data cannot be freely written, files are directly related to specific structures which only accept a finite set of possible values; consistency is preserved with verifications written in the drivers.
|
||||
|
||||
.B "Network-related" .
|
||||
Many filesystems were designed specifically to be remotely mounted, either to be shared amongst many people in a company, or to be part of a giant cluster to provide a high-availability storage solution for tech giants with peculiar requirements or just to stack ever more commodity computers together and provide a gigantic storage space.
|
||||
Filesystems can also be distributed with some replication in order to provide a fault-tolerant storage with ordinary computers sharing unused space.
|
||||
|
||||
.KS
|
||||
.B "UnionFS" .
|
||||
UnionFS (and its variants) is a filesystem enabling several filesystems to be mounted on the same mount-point and to show superposed contents, enabling a read-only base image to be used together with persistent data for a specific instance.
|
||||
This way, a
|
||||
.dq "live-cd image"
|
||||
for an operating system can become persistent by storing modifications on an usb stick.
|
||||
.KE
|
||||
|
||||
UnionFS is a copy-on-write snapshotting filesystem on top of other filesystems.
|
||||
Docker uses it to save space.
|
||||
Docker provides different ready-to-run software as small virtual machines.
|
||||
To preserve storage space, a base OS image is shared amongst all instances and each instance only stores its own specific files (binaries, configuration and dependencies) written in a separate storage volume.
|
||||
|
||||
.KS
|
||||
.B "Archivemount" .
|
||||
Mounting a compressed archive, enabling to use day-to-day tools to search for a file in an archive without the need to uncompress it.
|
||||
.KE
|
||||
|
||||
.KS
|
||||
.B "RAM-based filesystems" .
|
||||
For temporary data, intensive read and write operations on a small storage volume or for filesystem development, a chunk of the computer memory can be used as a filesystem thanks to
|
||||
.B tmpfs
|
||||
and variants\*[*] (ramdisk and ramfs).
|
||||
.KE
|
||||
.FOOTNOTE1
|
||||
.B ramdisk
|
||||
creates a block file based on a chunk of RAM that needs to be formated then mounted as any partition.
|
||||
.B ramfs
|
||||
mounts directly a RAM-based filesystem, without the need to format a fake partition.
|
||||
Finally,
|
||||
.B tmpfs
|
||||
is the more flexible one, it is used as ramfs but can be resized and only uses a necessary amount of RAM at a given point (memory is free'd once a file is removed).
|
||||
.FOOTNOTE2
|
||||
|
||||
.KS
|
||||
.B "Semantic (tag-based) filesystems" .
|
||||
Some filesystems (such as tagsistant) store data based on tags for each file which enables to index a file based on many attributes and not a single path.
|
||||
As a side effect, searching for a file in this context can be done by computing the intersection of different tags\*[*].
|
||||
.KE
|
||||
.FOOTNOTE1
|
||||
Well well well… doesn't that sound like the DODB tag triggers?
|
||||
As if databases and filesystems were intertwined somehow…
|
||||
.FOOTNOTE2
|
||||
.KS
|
||||
.SSS "Quick comparison between DBMSs and filesystems"
|
||||
The following table shows the proximity between famous database systems and ordinary filesystems, both sharing a lot of features despite very different approaches.
|
||||
.ds OK \[OK]
|
||||
|
@ -1256,7 +1349,7 @@ CRUD operations : SQL :files & directories
|
|||
Atomicity : \*[OK] :T{
|
||||
locking mechanism based on files
|
||||
T}
|
||||
Consistency : \*[OK] : \*[NOK]
|
||||
Consistency : \*[OK] : \*[NOK] besides very specific filesystems
|
||||
Isolation : \*[OK] :T{
|
||||
.dq "new file then mv"
|
||||
technique\*[*]
|
||||
|
@ -1297,67 +1390,6 @@ The main difference between DBMSs and filesystems is the
|
|||
property.
|
||||
Filesystems are almost exclusively built to store undefined streams of data with a very wide range of different shapes (plain text, multimedia, documents, etc.) and sizes (from empty to multiple terabytes and more), thus no consistency verification can be reasonably implemented.
|
||||
.
|
||||
.KS
|
||||
.SSS "Exotic filesystems"
|
||||
Filesystems have been developed over the years for various reasons.
|
||||
Let's browse for a moment to provide an overview of what is possible.
|
||||
|
||||
.B Kernel-related .
|
||||
A whole class of filesystems is dedicated to provide an interface to the kernel, such as
|
||||
.I procfs
|
||||
(information about running processes),
|
||||
.I sysfs
|
||||
(to tweak a few device parameters) or even
|
||||
.I debugfs
|
||||
(to provide debug info from the kernel to user-space).
|
||||
Providing information about the running system and enabling its modification through simple files and directories is a direct
|
||||
.dq "everything is a file"
|
||||
UNIX legacy.
|
||||
|
||||
.B "Network-related" .
|
||||
Many filesystems were designed specifically to be remotely mounted, either to be shared amongst many people in a company, or to be part of a giant cluster to provide a high-availability storage solution for tech giants with peculiar requirements or just to stack ever more commodity computers together and provide a gigantic storage space.
|
||||
Filesystems can also be distributed with some replication in order to provide a fault-tolerant storage with ordinary computers sharing unused space.
|
||||
.KE
|
||||
|
||||
.KS
|
||||
.B "UnionFS" .
|
||||
UnionFS (and its variants) is a filesystem enabling several filesystems to be mounted on the same mount-point and to show superposed contents, enabling a read-only base image to be used together with persistent data for a specific instance.
|
||||
This way, a
|
||||
.dq "live-cd image"
|
||||
for an operating system can become persistent by storing modifications on an usb stick.
|
||||
UnionFS is a copy-on-write snapshotting filesystem on top of other filesystems.
|
||||
.KE
|
||||
|
||||
.KS
|
||||
.B "Archivemount" .
|
||||
Mounting a compressed archive, enabling to use day-to-day tools to search for a file in an archive without the need to uncompress it.
|
||||
.KE
|
||||
|
||||
.KS
|
||||
.B "RAM-based filesystems" .
|
||||
For temporary data, intensive read and write operations on a small storage volume or for filesystem development, a chunk of the computer memory can be used as a filesystem thanks to
|
||||
.B tmpfs
|
||||
and variants\*[*] (ramdisk and ramfs).
|
||||
.KE
|
||||
.FOOTNOTE1
|
||||
.B ramdisk
|
||||
creates a block file based on a chunk of RAM that needs to be formated then mounted as any partition.
|
||||
.B ramfs
|
||||
mounts directly a RAM-based filesystem, without the need to format a fake partition.
|
||||
Finally,
|
||||
.B tmpfs
|
||||
is the more flexible one, it is used as ramfs but can be resized and only uses a necessary amount of RAM at a given point (memory is free'd once a file is removed).
|
||||
.FOOTNOTE2
|
||||
|
||||
.KS
|
||||
.B "Semantic (tag-based) filesystems" .
|
||||
Some filesystems (such as tagsistant) store data based on tags for each file which enables to index a file based on many attributes and not a single path.
|
||||
As a side effect, searching for a file in this context can be done by computing the intersection of different tags\*[*].
|
||||
.KE
|
||||
.FOOTNOTE1
|
||||
Well well well… doesn't that sound like the DODB tag triggers?
|
||||
As if databases and filesystems were intertwined somehow…
|
||||
.FOOTNOTE2
|
||||
.
|
||||
.KS
|
||||
.SSS "Conclusion on filesystems"
|
||||
|
@ -1380,8 +1412,6 @@ Having a directory with a few million entries is fine on modern filesystems.
|
|||
The first file access is slow (a few ms) then the kernel
|
||||
.B automatically
|
||||
caches the file, making it reachable in about a few dozen µs which is virtually nothing.
|
||||
|
||||
TODO: des systèmes de fichiers dédiés
|
||||
.
|
||||
.
|
||||
.SECTION Alternatives
|
||||
|
@ -1408,7 +1438,7 @@ These applications are inherently complex for different reasons.
|
|||
MadiaDB has 2.3 million lines of code (MLOC) and 1.7 MLOC for Postgres.
|
||||
Other mentioned DBMSs aren't open-source software, but it seems reasonable to consider their number of LOC to be in the same ballpark.
|
||||
.br
|
||||
Just to put things into perspective, DODB is less than 1300 lines of code.
|
||||
Just to put things into perspective, DODB is just a thousand lines of code.
|
||||
Sure, DODB doesn't have the same features, but are they worth multiplying the codebase by 1700?
|
||||
.FOOTNOTE2
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue