This commit is contained in:
Philippe Pittoli 2025-01-26 05:37:46 +01:00
parent d58856b9d8
commit b08c8d43d8

View file

@ -956,12 +956,12 @@ As a side note, let's keep in mind that requesting several thousand entries in D
with SQL (varies from 0.1 to 2 ms on my machine for a single value without a search, just the first available entry).
This should help put things into perspective.
.
.
.
.SECTION Limits of DODB
DODB provides basic database operations such as storing, retrieving, modifying and removing data.
However, DODB doesn't fully handle ACID properties\*[*]: atomicity, consistency, isolation and durability.
This section presents the limits of
.UL "the current implementation"
of DODB.
This section presents the limits of DODB, whether the current implementation or the approach, and some suggestions to fill the gaps.
.FOOTNOTE1
Traditional SQL databases handle ACID properties and may have created some "expectations" towards databases from a general public standpoint.
.FOOTNOTE2
@ -970,27 +970,32 @@ Traditional SQL databases handle ACID properties and may have created some "expe
.BULLET
.B Atomicity
isn't handled in DODB.
Instructions cannot be chained and rollback if one of them fails.
Multiple instructions cannot be chained and taken into account at the same time.
However, this a limitation of the current implementation, not the approach.
This issue could be resolved for example by introducing a
.I "global lock"
to prevent any modification while processing multiple instructions in one go.
This lock would eventually be shared accross multiple DODB instances, in case the chained instructions would modify several databases\*[*].
.FOOTNOTE1
Modifying several DODB instances at the same time is a rather common occurrence.
.FOOTNOTE2
.BULLET
.B Consistency
isn't handled in DODB.
No mechanism prevents invalid values to be added.
Once again, this only is a shortcoming of the implementation.
For the moment,
.I triggers
are used to create indexes, but the idea could fit another purpose: to create predicates to avoid invalid actions.
These new triggers could record user-defined procedures to perform database verifications and revert (or prevent) changes in case of a unvalid insertion, modification or deletion.
.BULLET
.B Isolation
is partially taken into account with a locking mechanism preventing race conditions when modifying a value.
This property is inherently related to parallelism, which is mostly required to respond to a large number of clients at the same time.
SQL databases require a communication with an inherent latency between the application and the database, slowing down the requests despite the fast algorithms to search for a value within the database.
Parallelism is required for SQL databases because of this latency (at least partially), which doesn't exist with DODB\*[*].
.FOOTNOTE1
FYI, the service
.I netlib.re
uses DODB and since the database is fast enough, parallelism isn't required despite enabling more than a thousand requests per second.
.FOOTNOTE2
With a cache, data is retrieved five hundred times quicker than with a SQL database.
Thus, parallelism is probably not needed but a locking mechanism is provided anyway, just in case; this may be overly simplistic but
This may be seen as simplistic but
.SHINE "good enough"
for most applications.
@ -1004,26 +1009,33 @@ for most applications.
.ENDBULLET
.B "Discussion on ACID properties" .
The author of this document sees these database properties as a sort of "fail-safe".
Always nice to have, but not entirely necessary; at least not for every single application.
DODB will provide some form of atomicity and consistency at some point, but nothing fancy nor too advanced.
The whole point of the DODB project is to keep the code simple (almost
.B "stupidly"
simple).
Not handling these properties isn't a limitation of the DODB approach but a choice for this project\*[*].
First and foremost, both atomicity and isolation properties are inherently related to parallelism, whether through concurrent threads or applications.
Traditional SQL databases require both atomicity and isolation properties because they cannot afford not to have parallelism.
Since DODB is a library (and not a separate application) and is kept simple (no intermediary language to interpret, no complicated algorithm), it doesn't suffer from any communication latency or processing delay slowing down the requests.
As the experimentation shown, retrieving a value in DODB only takes about 20 µs, 200 ns with a data cache.
Therefore, DODB could theoretically serve millions of requests per second from a single thread\*[*].
.FOOTNOTE1
Which results from a lack of time, mostly.
FYI, the service
.I netlib.re
uses DODB and since the database is fast enough, parallelism isn't required despite enabling more than a thousand requests per second.
.FOOTNOTE2
Considering this swiftness, parallelism may seem as optional.
The consistency property is a safety net for potentially defective software.
Always nice to have, but not entirely necessary, especially for document-oriented databases.
Contrary to a traditional SQL database which often requires several modifications to different tables in one go to be kept consistent, a document-oriented database stores an entire document which already is internally consistent.
When several documents are involved (which happens from time to time), consistency needs to be checked, but this may not require much code\*[*].
Not checking systematically for consistency upon any database modification is a tradeoff between simplicity of the code plus speed, and security.
.FOOTNOTE1
As a side note, consistency is already taken care of within the application anyway.
Database verifications are just the last bastion against inserting junk data.
.FOOTNOTE2
Not handling all the ACID properties within the DODB library doesn't mean they cannot be achieved.
Applications can have these properties, often with just a few lines of code.
They just don't come
.I "by default"
with the library\*[*].
DODB may provide some form of atomicity and consistency at some point, but nothing fancy nor too advanced.
The whole point of the DODB project is to keep the code simple, hackable.
Not handling these properties isn't a limitation of the DODB approach but a choice for this project\*[*].
.FOOTNOTE1
As a side note, the
.I consistency
property is often taken care of within the application despite being handled by the database, for various reasons.
Which also results from a lack of time.
.FOOTNOTE2
.
.
@ -1072,8 +1084,7 @@ This is not acceptable for databases with large partitions and tags: memory will
.
.SS DODB and security
Right now, security isn't managed in DODB, at all.
Sure, DODB isn't vulnerable to SQL injections, but an internet-facing application may encounter a few other problems including, but not limited to, code injection, buffer overflows, etc.
Of course, DODB isn't a mechanism to protect applications from any possible attack, so most of the vulnerabilities cannot be countered by the library.
DODB isn't vulnerable to SQL injections, but an internet-facing application may encounter a few other problems including, but not limited to, buffer overflows and code injection.
However, a few security mechanisms exist to prevent data leak or data modification from an outsider and the DODB library may implement some of them in the future.
.B "Preventing data leak with in-app memory management" .