From 4c136ddf821af2735a3408bf8741fa702d741158 Mon Sep 17 00:00:00 2001 From: Philippe PITTOLI Date: Thu, 16 May 2024 14:42:11 +0200 Subject: [PATCH] Limitations of the DODB approach. --- graphs/graphs.ms | 129 ++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 116 insertions(+), 13 deletions(-) diff --git a/graphs/graphs.ms b/graphs/graphs.ms index 09e52a1..c71f1c3 100644 --- a/graphs/graphs.ms +++ b/graphs/graphs.ms @@ -181,13 +181,13 @@ end Let's create a DODB database for our cars. .SOURCE Ruby ps=10 # Database creation -db = DODB::DataBase(Car).new "path/to/db-cars" +database = DODB::DataBase(Car).new "path/to/db-cars" # Adding an element to the db -db << Car.new "Corvet", "red", ["elegant", "fast"] +database << Car.new "Corvet", "red", ["elegant", "fast"] -# Reaching all objects in the db -db.each do |car| +# Reaching all objects in the database +database.each do |car| pp! car end .SOURCE @@ -234,13 +234,13 @@ Next step, to retrieve, to modify or to delete a value, its key will be required .QP .SOURCE Ruby ps=10 # Get a value based on its key. -db[key] +database[key] # Update a value based on its key. -db[key] = new_value +database[key] = new_value # Delete a value based on its key. -db.delete 0 +database.delete 0 .SOURCE .QE . @@ -250,7 +250,7 @@ lists the entries with their keys. . .QP .SOURCE Ruby ps=10 -db.each_with_index do |value, key| +database.each_with_index do |value, key| puts "#{key}: #{value}" end .SOURCE @@ -329,23 +329,37 @@ directory. The basic indexes as shown in this section already give a taste of what is possible to do with DODB. The following indexes will cover some other usual cases. . +. .SSS Partitions (1 to n relations) An attribute can have a value that is shared by other entries in the database, such as the .I color -attribute in our cars. -. +attribute of our cars. + +.SOURCE Ruby ps=10 +# Create a partition based on the "color" attribute of the cars. +cars_by_color = database.new_partition "color", do |car| + car.color +end +.SOURCE +As with basic indexes, once the partition is asked to the database, every new or modified entry will be indexed. + +.KS +Let's imagine having 3 cars, one is blue and the other two are red. .TREE1 +$ tree db-cars/ db-cars +-- data |  +-- 0000000000 <- this car is blue -|  `-- 0000000001 <- this car is red +|  +-- 0000000001 <- this car is red +|  `-- 0000000002 <- this car is red, too | ... `-- partitions    `-- by_color +-- blue   `-- 0000000000 -> 0000000000 `-- red -   `-- 0000000001 -> 0000000001 +   +-- 0000000001 -> 0000000001 +   `-- 0000000002 -> 0000000002 .TREE2 .QP Listing all the blue cars is simple as a @@ -354,14 +368,103 @@ in the .DIRECTORY db-cars/partitions/by_color/blue directory! .QE +.KE +. +. . .SSS Tags (n to n relations) Tags are basically partitions but the attribute can have multiple values. + +.SOURCE Ruby ps=10 +# Create a tag based on the "keywords" attribute of the cars. +cars_by_keywords = database.new_tags "keywords", do |car| + car.keywords +end +.SOURCE +As with other indexes, once the tag is requested to the database, every new or modified entry will be indexed. +. +. +.KS +Let's imagine having two cars with different associated keywords. +.TREE1 +.ps -2 +$ tree db-cars/ +db-cars ++-- data +|  +-- 0000000000 <- this car is fast and cheap +|  `-- 0000000001 <- this car is fast and elegant +`-- partitions +    `-- by_color + +-- cheap + `-- 0000000000 -> 0000000000 + `-- fast + +-- 0000000000 -> 0000000000 + `-- 0000000001 -> 0000000001 +.ps +.TREE2 +.QP +Listing all the fast cars is simple as a +.COMMAND ls +in the +.DIRECTORY db-cars/tags/by_keywords/fast +directory! +.QE +.KE . .SECTION A few more options .TBD .SECTION Limits of DODB -.TBD +DODB provides basic database operations such as storing, searching, modifying and removing data. +Though, SQL databases have a few +.I properties +enabling a more standardized behavior and may create some expectations towards databases from a general public standpoint. +These properties are called "ACID": atomicity, consistency, isolation and durability. +DODB doesn't fully handle ACID properties. + +DODB doesn't provide +.I atomicity . +Instructions cannot be chained and rollback if one of them fails. + +DODB doesn't handle +.I consistency . +There is currently no mechanism to prevent adding invalid values. + +.I Isolation +is partially taken into account with a locking mechanism preventing race conditions. +Though, parallelism is mostly required to respond to a large number of clients at the same time. +Also, SQL databases require a communication with an inherent latency between the application and the database, slowing down the requests despite the fast algorithms to search for a value within the database. +Parallelism is required for SQL databases because of this latency (at least partially), which doesn't exist with DODB\*[*]. +.FOOTNOTE1 +FYI, the service +.I netlib.re +uses DODB and since the database is fast enough, parallelism isn't required despite enabling more than a thousand requests per second. +.FOOTNOTE2 +With a cache, data is retrieved five hundred times quicker than with a SQL database. + +.I Durability +is taken into account. +Data is written on disk each time it changes. +Again, this is basic but +.SHINE "good enough" +for most applications. + +.B "Discussion on ACID properties" . +The author of this document sees these database properties as a sort of "fail-safe". +Always nice to have, but not entirely necessary; at least not for every single application. +DODB will provide some form of atomicity and consistency at some point, but nothing fancy nor too advanced. +The whole point of the DODB project is to keep the code simple (almost +.B "stupidly" +simple). +Thus, managing or not these properties isn't a limitation of the DODB approach but a choice for this specific project. + +Not handling all the ACID properties within the DODB library doesn't mean they cannot be achieved. +Applications can have these properties with a few lines of code. +They just don't come +.I "by default" +with the library. +. +. +. .SECTION Experimental scenario .LP The following experiment shows the performance of DODB based on quering durations.