Limits of DODB++.
This commit is contained in:
parent
f4ab8154f9
commit
5dbc282027
1 changed files with 79 additions and 20 deletions
|
@ -998,12 +998,13 @@ With Postgres, the request duration of a single value varies from 0.1 to 2 ms on
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SECTION Limits of DODB
|
.SECTION Limits of DODB
|
||||||
DODB provides basic database operations such as storing, retrieving, modifying and removing data.
|
DODB provides basic database operations such as storing, retrieving, modifying and removing data but doesn't fully handle ACID properties nor a few other aspects generally associated with databases\*[*].
|
||||||
However, DODB doesn't fully handle ACID properties\*[*]: atomicity, consistency, isolation and durability.
|
|
||||||
This section presents the limits of DODB, whether the current implementation or the approach, and presents some suggestions to fill the gaps.
|
|
||||||
.FOOTNOTE1
|
.FOOTNOTE1
|
||||||
Traditional SQL databases handle ACID properties and may have created some "expectations" towards databases from a general public standpoint.
|
Traditional SQL databases may have created some "expectations" towards databases from a general public standpoint, such as the ACID properties (atomicity, consistency, isolation and durability), transactions and replication.
|
||||||
.FOOTNOTE2
|
.FOOTNOTE2
|
||||||
|
This section presents the limits of DODB, whether the current implementation or the approach.
|
||||||
|
The state of filesystems will be discussed since DODB heavily relies on the underlying filesystem.
|
||||||
|
Finally, this section presents some suggestions to fill the gaps with traditional databases on a few points.
|
||||||
|
|
||||||
.SS "Current state of DODB regarding ACID properties"
|
.SS "Current state of DODB regarding ACID properties"
|
||||||
.STARTBULLET
|
.STARTBULLET
|
||||||
|
@ -1046,8 +1047,9 @@ Data is written on disk each time it changes.
|
||||||
Again, this is basic but
|
Again, this is basic but
|
||||||
.SHINE "good enough"
|
.SHINE "good enough"
|
||||||
for most applications.
|
for most applications.
|
||||||
|
|
||||||
|
A future improvement could be to write a checksum for every file to detect corrupt data, but this overlaps with some filesystems which already provide this feature.
|
||||||
.ENDBULLET
|
.ENDBULLET
|
||||||
A future improvement could be to write a checksum for every written data, to easily remove corrupt data from a database.
|
|
||||||
|
|
||||||
.SS "Discussion on ACID properties"
|
.SS "Discussion on ACID properties"
|
||||||
First and foremost, both atomicity and isolation properties are inherently related to parallelism, whether through concurrent threads or applications.
|
First and foremost, both atomicity and isolation properties are inherently related to parallelism, whether through concurrent threads or applications.
|
||||||
|
@ -1058,13 +1060,13 @@ Therefore, DODB could theoretically serve millions of requests per second from a
|
||||||
.FOOTNOTE1
|
.FOOTNOTE1
|
||||||
FYI, the service
|
FYI, the service
|
||||||
.I netlib.re
|
.I netlib.re
|
||||||
uses DODB and since the database is fast enough, parallelism isn't required despite enabling several thousand requests per second.
|
uses DODB and since the database is fast enough, parallelism isn't required despite enabling several thousand requests per second in a virtual machine on a low-end hardware released almost two decades ago.
|
||||||
.FOOTNOTE2
|
.FOOTNOTE2
|
||||||
Considering this swiftness, parallelism may seem as optional.
|
Considering this swiftness, parallelism may seem as optional.
|
||||||
|
|
||||||
The consistency property is a safety net for potentially defective software.
|
The consistency property is a safety net for potentially defective software.
|
||||||
Always nice to have, but not entirely necessary, especially for document-oriented databases.
|
Always nice to have, but not entirely necessary, especially for document-oriented databases.
|
||||||
Contrary to a traditional SQL database which often requires several modifications to different tables in one go to be kept consistent, a document-oriented database stores an entire document which already is internally consistent.
|
Contrary to a traditional SQL database which often requires several modifications of different tables in one go to be kept consistent, a document-oriented database stores an entire document which already is internally consistent.
|
||||||
When several documents are involved (which happens from time to time), consistency needs to be checked, but this may not require much code\*[*].
|
When several documents are involved (which happens from time to time), consistency needs to be checked, but this may not require much code\*[*].
|
||||||
Not checking systematically for consistency upon any database modification is a tradeoff between simplicity of the code plus speed, and security.
|
Not checking systematically for consistency upon any database modification is a tradeoff between simplicity of the code plus speed, and security.
|
||||||
.FOOTNOTE1
|
.FOOTNOTE1
|
||||||
|
@ -1074,8 +1076,9 @@ Database verifications are just the last bastion against inserting junk data.
|
||||||
|
|
||||||
Moreover, the consistency property in traditional SQL databases is often used for simple tasks but quickly becomes difficult to deal with.
|
Moreover, the consistency property in traditional SQL databases is often used for simple tasks but quickly becomes difficult to deal with.
|
||||||
Some companies and organizations (such as Doctors Without Borders for instance) cannot afford to implement all the preventive measures in their DBMSs due to the sheer complexity of it.
|
Some companies and organizations (such as Doctors Without Borders for instance) cannot afford to implement all the preventive measures in their DBMSs due to the sheer complexity of it.
|
||||||
Instead, these organizations adopt curative measures that they may call "data-fix".
|
Instead, these organizations adopt curative measures that they may call
|
||||||
Thus, having some verifications in the database is not a silver bullet, it is complementary to other measures.
|
.dq data-fix .
|
||||||
|
Having verifications in the database is not a silver bullet but a complementary measure at most.
|
||||||
|
|
||||||
DODB may provide some form of atomicity and consistency at some point, but nothing fancy nor too advanced.
|
DODB may provide some form of atomicity and consistency at some point, but nothing fancy nor too advanced.
|
||||||
The whole point of the DODB project is to keep the code simple, hackable, enjoyable even.
|
The whole point of the DODB project is to keep the code simple, hackable, enjoyable even.
|
||||||
|
@ -1086,7 +1089,36 @@ Which also results from a lack of time.
|
||||||
|
|
||||||
.SS "Beyond ACID properties \[en] modern databases' features"
|
.SS "Beyond ACID properties \[en] modern databases' features"
|
||||||
Most current databases (traditional relational databases, some key-value databases and so on) provide additional features.
|
Most current databases (traditional relational databases, some key-value databases and so on) provide additional features.
|
||||||
These features may include for example high availability toolsets (replication, clustering, etc.), some forms of modularity (several storage backends, specific interfaces with other tools, etc.), interactive command lines or shells, user and authorization management, administration of databases, and so on.
|
|
||||||
|
.STARTBULLET
|
||||||
|
.KS
|
||||||
|
.BULLET
|
||||||
|
.B "High availability toolsets"
|
||||||
|
(replication, clustering, etc.).
|
||||||
|
Well, this simply doesn't match with DODB goals to provide a database for small projects.
|
||||||
|
These tools imply an unreasonable amount of code compared to the current DODB library.
|
||||||
|
.KE
|
||||||
|
|
||||||
|
However, some of these features could be provided by the filesystem itself.
|
||||||
|
|
||||||
|
.KS
|
||||||
|
.BULLET
|
||||||
|
.B Modularity
|
||||||
|
(several storage backends, specific interfaces with other tools, etc.).
|
||||||
|
.KE
|
||||||
|
|
||||||
|
.KS
|
||||||
|
.BULLET
|
||||||
|
.B "Interactive management"
|
||||||
|
(through command lines or a dedicated shell).
|
||||||
|
.KE
|
||||||
|
|
||||||
|
.KS
|
||||||
|
.BULLET
|
||||||
|
.B "Database administration"
|
||||||
|
(CRUD on databases themselves, user and authorization management, etc.).
|
||||||
|
.KE
|
||||||
|
.ENDBULLET
|
||||||
|
|
||||||
Because DODB is a library and doesn't support an intermediary language for generic requests,
|
Because DODB is a library and doesn't support an intermediary language for generic requests,
|
||||||
.TBD
|
.TBD
|
||||||
|
@ -1198,7 +1230,7 @@ Some filesystems added more than a decade ago then under-explored features such
|
||||||
.ds NOK \[tmu]
|
.ds NOK \[tmu]
|
||||||
.nr total 16.0c
|
.nr total 16.0c
|
||||||
.nr col1 3.0c
|
.nr col1 3.0c
|
||||||
.nr col2 (\n[total]-\n[col1])/3
|
.nr col2 (\n[total]-\n[col1])/6
|
||||||
.nr col3 (\n[total]-\n[col1]-\n[col2])
|
.nr col3 (\n[total]-\n[col1]-\n[col2])
|
||||||
.\"total: \n[total]
|
.\"total: \n[total]
|
||||||
.\"col1: \n[col1]
|
.\"col1: \n[col1]
|
||||||
|
@ -1208,24 +1240,51 @@ Some filesystems added more than a decade ago then under-explored features such
|
||||||
allbox tab(:);
|
allbox tab(:);
|
||||||
c | c | c
|
c | c | c
|
||||||
cw(\n[col1]u) | lw(\n[col2]u) | lw(\n[col3]u).
|
cw(\n[col1]u) | lw(\n[col2]u) | lw(\n[col3]u).
|
||||||
Feature : Traditional databases : Filesystems
|
Feature : DBMS : Filesystems
|
||||||
CRUD operations : SQL : Files & directories
|
CRUD operations : SQL :files & directories
|
||||||
Atomicity : \*[OK] :T{
|
Atomicity : \*[OK] :T{
|
||||||
transactions are implemented in a few filesystems (ex: BTRFS)
|
locking mechanism based on files
|
||||||
and there is a locking mechanism based on files
|
|
||||||
T}
|
T}
|
||||||
Consistency : \*[OK] : \*[NOK]
|
Consistency : \*[OK] : \*[NOK]
|
||||||
Isolation : \*[OK] :T{
|
Isolation : \*[OK] :T{
|
||||||
.dq "new file then mv"
|
.dq "new file then mv"
|
||||||
technique
|
technique
|
||||||
T}
|
T}
|
||||||
Durability : \*[OK] : yes (checksums)
|
Durability : \*[OK] :limited (checksums)
|
||||||
Access Time : 0.1 to 2ms : a few µs (cache) to a few ms (first access)
|
Access Time : 0.1 to 2ms :a few µs (cache) to a few ms (first access with a hard disk)
|
||||||
Transactions : :
|
Transactions : \*[OK] :T{
|
||||||
|
implemented in a few filesystems (BTRFS, ZFS)
|
||||||
|
T}
|
||||||
|
Performance : \*[OK] :T{
|
||||||
|
B trees and variants (used in all modern FS: BTRFS, ext4, Raiserfs4, NTFS, HAMMER…) are used to search data on the storage device but also to get an entry in a huge directory.
|
||||||
|
T}
|
||||||
|
Space waste :T{
|
||||||
|
almost none
|
||||||
|
.ps
|
||||||
|
T}:T{
|
||||||
|
depends on many factors, but generally important
|
||||||
|
T}
|
||||||
.TE
|
.TE
|
||||||
|
|
||||||
In conclusion, no current filesystem has been designed to be used the way DODB use them.
|
.B "Conclusion" .
|
||||||
However, having a few millions entries is fine on most filesystems.
|
The difference between the feature set of traditional databases and filesystems slightly narrowed over time.
|
||||||
|
The discrepancy will always be there since they do not share the same goal, yet some features overlap.
|
||||||
|
Even though no current filesystem has been designed to be used the way DODB use them, this kind of database system can profit from some
|
||||||
|
.dq recent
|
||||||
|
developments in the filesystem world (such as transactions).
|
||||||
|
The codebase size (and complexity) necessary to create a database system that provides acceptable performances for a small project \*[*] shrunk drastically thanks to hardware and filesystem developments.
|
||||||
|
.FOOTNOTE1
|
||||||
|
Beside CRUD operations, a small project could imply basic relations between data, some simple transactions, a few databases (or
|
||||||
|
.I tables
|
||||||
|
in DBMS jargon) and a few thousand operations per second.
|
||||||
|
Both relations and transactions could be handled by the application, not necessarily by the database system itself.
|
||||||
|
.FOOTNOTE2
|
||||||
|
|
||||||
|
Performance is simply not a problem for most use.
|
||||||
|
Having a directory with a few million entries is fine on modern filesystems.
|
||||||
|
The access time is slow (a few ms) only on the first access, the kernel
|
||||||
|
.B automatically
|
||||||
|
caches accessed files, then we are talking about a few dozen µs which is virtually nothing.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SECTION Alternatives
|
.SECTION Alternatives
|
||||||
|
|
Loading…
Add table
Reference in a new issue