Limitations of the DODB approach.

This commit is contained in:
Philippe PITTOLI 2024-05-16 14:42:11 +02:00
parent e6e503e475
commit 4c136ddf82

View File

@ -181,13 +181,13 @@ end
Let's create a DODB database for our cars.
.SOURCE Ruby ps=10
# Database creation
db = DODB::DataBase(Car).new "path/to/db-cars"
database = DODB::DataBase(Car).new "path/to/db-cars"
# Adding an element to the db
db << Car.new "Corvet", "red", ["elegant", "fast"]
database << Car.new "Corvet", "red", ["elegant", "fast"]
# Reaching all objects in the db
db.each do |car|
# Reaching all objects in the database
database.each do |car|
pp! car
end
.SOURCE
@ -234,13 +234,13 @@ Next step, to retrieve, to modify or to delete a value, its key will be required
.QP
.SOURCE Ruby ps=10
# Get a value based on its key.
db[key]
database[key]
# Update a value based on its key.
db[key] = new_value
database[key] = new_value
# Delete a value based on its key.
db.delete 0
database.delete 0
.SOURCE
.QE
.
@ -250,7 +250,7 @@ lists the entries with their keys.
.
.QP
.SOURCE Ruby ps=10
db.each_with_index do |value, key|
database.each_with_index do |value, key|
puts "#{key}: #{value}"
end
.SOURCE
@ -329,23 +329,37 @@ directory.
The basic indexes as shown in this section already give a taste of what is possible to do with DODB.
The following indexes will cover some other usual cases.
.
.
.SSS Partitions (1 to n relations)
An attribute can have a value that is shared by other entries in the database, such as the
.I color
attribute in our cars.
.
attribute of our cars.
.SOURCE Ruby ps=10
# Create a partition based on the "color" attribute of the cars.
cars_by_color = database.new_partition "color", do |car|
car.color
end
.SOURCE
As with basic indexes, once the partition is asked to the database, every new or modified entry will be indexed.
.KS
Let's imagine having 3 cars, one is blue and the other two are red.
.TREE1
$ tree db-cars/
db-cars
+-- data
|  +-- 0000000000 <- this car is blue
|  `-- 0000000001 <- this car is red
|  +-- 0000000001 <- this car is red
|  `-- 0000000002 <- this car is red, too
| ...
`-- partitions
   `-- by_color
+-- blue
  `-- 0000000000 -> 0000000000
`-- red
  `-- 0000000001 -> 0000000001
  +-- 0000000001 -> 0000000001
  `-- 0000000002 -> 0000000002
.TREE2
.QP
Listing all the blue cars is simple as a
@ -354,14 +368,103 @@ in the
.DIRECTORY db-cars/partitions/by_color/blue
directory!
.QE
.KE
.
.
.
.SSS Tags (n to n relations)
Tags are basically partitions but the attribute can have multiple values.
.SOURCE Ruby ps=10
# Create a tag based on the "keywords" attribute of the cars.
cars_by_keywords = database.new_tags "keywords", do |car|
car.keywords
end
.SOURCE
As with other indexes, once the tag is requested to the database, every new or modified entry will be indexed.
.
.
.KS
Let's imagine having two cars with different associated keywords.
.TREE1
.ps -2
$ tree db-cars/
db-cars
+-- data
|  +-- 0000000000 <- this car is fast and cheap
|  `-- 0000000001 <- this car is fast and elegant
`-- partitions
   `-- by_color
+-- cheap
`-- 0000000000 -> 0000000000
`-- fast
+-- 0000000000 -> 0000000000
`-- 0000000001 -> 0000000001
.ps
.TREE2
.QP
Listing all the fast cars is simple as a
.COMMAND ls
in the
.DIRECTORY db-cars/tags/by_keywords/fast
directory!
.QE
.KE
.
.SECTION A few more options
.TBD
.SECTION Limits of DODB
.TBD
DODB provides basic database operations such as storing, searching, modifying and removing data.
Though, SQL databases have a few
.I properties
enabling a more standardized behavior and may create some expectations towards databases from a general public standpoint.
These properties are called "ACID": atomicity, consistency, isolation and durability.
DODB doesn't fully handle ACID properties.
DODB doesn't provide
.I atomicity .
Instructions cannot be chained and rollback if one of them fails.
DODB doesn't handle
.I consistency .
There is currently no mechanism to prevent adding invalid values.
.I Isolation
is partially taken into account with a locking mechanism preventing race conditions.
Though, parallelism is mostly required to respond to a large number of clients at the same time.
Also, SQL databases require a communication with an inherent latency between the application and the database, slowing down the requests despite the fast algorithms to search for a value within the database.
Parallelism is required for SQL databases because of this latency (at least partially), which doesn't exist with DODB\*[*].
.FOOTNOTE1
FYI, the service
.I netlib.re
uses DODB and since the database is fast enough, parallelism isn't required despite enabling more than a thousand requests per second.
.FOOTNOTE2
With a cache, data is retrieved five hundred times quicker than with a SQL database.
.I Durability
is taken into account.
Data is written on disk each time it changes.
Again, this is basic but
.SHINE "good enough"
for most applications.
.B "Discussion on ACID properties" .
The author of this document sees these database properties as a sort of "fail-safe".
Always nice to have, but not entirely necessary; at least not for every single application.
DODB will provide some form of atomicity and consistency at some point, but nothing fancy nor too advanced.
The whole point of the DODB project is to keep the code simple (almost
.B "stupidly"
simple).
Thus, managing or not these properties isn't a limitation of the DODB approach but a choice for this specific project.
Not handling all the ACID properties within the DODB library doesn't mean they cannot be achieved.
Applications can have these properties with a few lines of code.
They just don't come
.I "by default"
with the library.
.
.
.
.SECTION Experimental scenario
.LP
The following experiment shows the performance of DODB based on quering durations.