Documentation PDF: smaller trees and source code, some more API doc.

This commit is contained in:
Philippe PITTOLI 2024-05-22 03:58:15 +02:00
parent ce50f6f334
commit 2b24fbc8a0

View File

@ -1,7 +1,8 @@
.so macros.roff
.de TREE1
.QP
.ps -2
.ps -3
.vs -3
.KS
.ft CW
.b1
@ -12,18 +13,24 @@
.fi
.b2
.ps
.vs
.KE
.QE
..
.de FUNCTION_CALL
.de CLASS
.I \\$*
..
.de FUNCTION_CALL
.I "\\$*"
..
.
.de COMMAND
.I "\\$*"
..
.de DIRECTORY
.ps -2
.I "\\$*"
.ps +2
..
.de PRIMARY_KEY
.I \\$1 \\$2 \\$3
@ -40,12 +47,16 @@
DODB is a database-as-library, enabling a very simple way to store applications' data: storing serialized
.I documents
(basically any data type) in plain files.
To speed-up searches, attributes of these documents can be used as indexes which leads to create a few symbolic links
To speed-up searches, attributes of these documents can be used as indexes.
DODB can provide a file-system representation of those indexes through a few symbolic links
.I symlinks ) (
on the disk.
This enables administrators to search for data outside the application with the most basic tools, like
.I ls .
This document briefly presents DODB and its main differences with other database engines.
An experiment is described and analysed to understand the performance that can be expected from this approach.
Limits of such approach are discussed.
An experiment is described and analyzed to understand the performance that can be expected.
.ABSTRACT2
.SINGLE_COLUMN
.SECTION Introduction to DODB
@ -173,7 +184,7 @@ First things first, the following code is the structure used in the rest of the
This is a simple object
.I Car ,
with a name, a color and a list of associated keywords (fast, elegant, etc.).
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
class Car
property name : String
property color : String
@ -184,7 +195,7 @@ end
.
.SS DODB basic usage
Let's create a DODB database for our cars.
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
# Database creation
database = DODB::DataBase(Car).new "path/to/db-cars"
@ -216,7 +227,7 @@ In this example, the directory
contains the serialized value, with a formated number as file name.
The file "0000000000" contains the following:
.QP
.SOURCE JSON ps=10
.SOURCE JSON ps=9 vs=10
{
"name": "Corvet",
"color": "red",
@ -234,7 +245,7 @@ The car is serialized as expected in the file
Next step, to retrieve, to modify or to delete a value, its key will be required.
.
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
# Get a value based on its key.
database[key]
@ -251,7 +262,7 @@ The function
lists the entries with their keys.
.
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
database.each_with_key do |value, key|
puts "#{key}: #{value}"
end
@ -274,7 +285,7 @@ This
.I name
attribute can be used to speed-up searches.
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
# Create an index based on the "name" attribute of the cars.
cars_by_name = cars.new_index "name", do |car|
car.name
@ -290,7 +301,7 @@ The
.I "index object"
has several useful functions.
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
# Retrieve the car named "Corvet".
corvet = cars_by_name.get? "Corvet"
@ -337,12 +348,14 @@ An attribute can have a value that is shared by other entries in the database, s
.I color
attribute of our cars.
.SOURCE Ruby ps=10
.QP
.SOURCE Ruby ps=9 vs=10
# Create a partition based on the "color" attribute of the cars.
cars_by_color = database.new_partition "color", do |car|
car.color
end
.SOURCE
.QE
As with basic indexes, once the partition is asked to the database, every new or modified entry will be indexed.
.KS
@ -364,7 +377,7 @@ db-cars
  `-- 0000000002 -> 0000000002
.TREE2
.QP
Listing all the blue cars is simple as a
Listing all the blue cars is simple as running
.COMMAND ls
in the
.DIRECTORY db-cars/partitions/by_color/blue
@ -375,15 +388,17 @@ directory!
.
.
.SSS Tags (n to n relations)
Tags are basically partitions but the attribute can have multiple values.
Tags are basically partitions but the indexed attribute can have multiple values.
.SOURCE Ruby ps=10
.QP
.SOURCE Ruby ps=9 vs=10
# Create a tag based on the "keywords" attribute of the cars.
cars_by_keywords = database.new_tags "keywords", do |car|
car.keywords
end
.SOURCE
As with other indexes, once the tag is requested to the database, every new or modified entry will be indexed.
.QE
.
.
.KS
@ -394,16 +409,18 @@ db-cars
+-- data
|  +-- 0000000000 <- this car is fast and cheap
|  `-- 0000000001 <- this car is fast and elegant
`-- partitions
   `-- by_color
`-- tags
   `-- by_keywords
+-- cheap
`-- 0000000000 -> 0000000000
+-- elegant
`-- 0000000001 -> 0000000001
`-- fast
+-- 0000000000 -> 0000000000
`-- 0000000001 -> 0000000001
.TREE2
.QP
Listing all the fast cars is simple as a
Listing all the fast cars is simple as running
.COMMAND ls
in the
.DIRECTORY db-cars/tags/by_keywords/fast
@ -442,14 +459,14 @@ Indexes can easily be cached, thanks to simple hash tables.
.SS Cached database
A cached database has the same API as the other DODB databases.
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
# Create a cached database
database = DODB::CachedDataBase(Car).new "path/to/db-cars"
.SOURCE
All operations of the
.I DODB::DataBase
.CLASS DODB::DataBase
class are available for
.I DODB::CachedDataBase .
.CLASS DODB::CachedDataBase .
.QE
.
.SS Cached indexes
@ -483,7 +500,7 @@ small layer over a hash table.
Instanciate a RAM-only database is as simple as the other options.
Moreover, this database has exactly the same API as the others, thus changing from one to another is painless.
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
# RAM-only database creation
database = DODB::RAMOnlyDataBase(Car).new "path/to/db-cars"
.SOURCE
@ -496,7 +513,7 @@ Also, I worked enough already, leave me alone.
.SS RAM-only indexes
Indexes have their RAM-only version.
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
# RAM-only basic indexes.
cars_by_name = cars.new_RAM_index "name", &.name
@ -527,7 +544,7 @@ See the "Future work" section.
.
.SS Uncached database
By default, the database (provided by
.I "DODB::DataBase" )
.CLASS "DODB::DataBase" )
isn't cached.
.
.SS Uncached indexes
@ -537,7 +554,7 @@ of the data).
For that reason, indexes are cached by default.
But for highly memory-constrained environments, the cache can be removed.
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
# Uncached basic indexes.
cars_by_name = cars.new_uncached_index "name", &.name
@ -565,7 +582,7 @@ command enables to browse the full documentation with a web browser.
.
.SS Database creation
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
# Uncached, cached and RAM-only database creation.
database = DODB::DataBase(Car).new "path/to/db-cars"
database = DODB::CachedDataBase(Car).new "path/to/db-cars"
@ -575,7 +592,7 @@ database = DODB::RAMOnlyDataBase(Car).new "path/to/db-cars"
.
.SS Browsing the database
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
# List all the values in the database
database.each do |value|
# ...
@ -584,7 +601,7 @@ end
.QE
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
# List all the values in the database with their key
database.each_with_key do |value, key|
# ...
@ -595,16 +612,16 @@ end
.SS Database search, update and deletion with the key (integer associated to the value)
.KS
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
value = database[key] # May throw a MissingEntry exception
value = database[key]? # Return nil if the value doesn't exist
value = database[key]? # Returns nil if the value doesn't exist
database[key] = value
database.delete key
.SOURCE
Side note for the
.I []
function: in case the value isn't in the database, the function throws an exception named
.I DODB::MissingEntry .
.CLASS DODB::MissingEntry .
To avoid this exception and get a
.I nil
value instead, use the
@ -616,7 +633,7 @@ function.
.
.SS Indexes creation
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
# Uncached, cached and RAM-only basic indexes.
cars_by_name = cars.new_uncached_index "name", &.name
cars_by_name = cars.new_index "name", &.name
@ -636,40 +653,49 @@ cars_by_keywords = cars.new_RAM_tags "keywords", &.keywords
.
.
.SS Database retrieval, update and deletion with an index
.
.QP
.SOURCE Ruby ps=10
# Get a value from an index.
.SOURCE Ruby ps=9 vs=10
# Get a value from a 1-1 index.
car = cars_by_name.get "Corvet" # May throw a MissingEntry exception
car = cars_by_name.get? "Corvet" # Return nil if the value doesn't exist
.SOURCE
Works the same for partitions and tags, the API is consistent.
.QE
.QP
.SOURCE Ruby ps=10
# Update a value.
car = cars_by_name.update updated_car
# In case the indexed attribute changed.
car = cars_by_name.update "Corvet", updated_car
car = cars_by_name.get? "Corvet" # Returns nil if the value doesn't exist
.SOURCE
.QE
.
.QP
.SOURCE Ruby ps=10
# In case the value may not exist.
car = cars_by_name.update_or_create updated_car
# In case the indexed attribute has changed.
car = cars_by_name.update_or_create "Corvet", updated_car
.SOURCE Ruby ps=9 vs=10
# Get a value from a partition (1-n relations) or a tag (n-n relations) index.
red_cars = cars_by_color.get "red" # empty array if no such cars exist
fast_cars = cars_by_keywords.get "fast" # empty array if no such cars exist
# Several tags can be selected at the same time, to narrow the search.
cars_both_fast_and_expensive = cars_by_keywords.get ["fast", "expensive"]
.SOURCE
.QE
.
The basic 1-1
.I "index object"
can update a value by selecting an unique entry in the database.
.QP
.SOURCE Ruby ps=10
cars_by_name.delete "Corvet"
.SOURCE Ruby ps=9 vs=10
car = cars_by_name.update updated_car # If the `name` hasn't changed.
car = cars_by_name.update "Corvet", updated_car # If the `name` has changed.
# Delete all red cars.
cars_by_color.delete "red"
car = cars_by_name.update_or_create updated_car # Updates or creates the value.
car = cars_by_name.update_or_create "Corvet", updated_car # Same.
.SOURCE
.QE
For deletion, database entries can be selected based on any index.
Partitions and tags can take a block of code to narrow the selection.
.QP
.SOURCE Ruby ps=9 vs=10
cars_by_name.delete "Corvet" # Deletes the car named "Corvet".
cars_by_color.delete "red" # Deletes all red cars.
# Delete all cars that are both blue and slow.
# Deletes cars that are both slow and expensive.
cars_by_keywords.delete ["slow", "expensive"]
# Deletes all cars that are both blue and slow.
cars_by_color.delete "blue", do |car|
car.keywords.includes? "slow"
end
@ -686,7 +712,7 @@ end
The Tag index enables to search for a value based on multiple keys.
For example, searching for all cars that are both fast and elegant can be written this way:
.QP
.SOURCE Ruby ps=10
.SOURCE Ruby ps=9 vs=10
fast_elegant_cars = cars_by_keywords.get ["fast", "elegant"]
.SOURCE
Used with a list of keys, the
@ -801,7 +827,7 @@ The library is written in Crystal and so is the benchmark (\f[CW]spec/benchmark-
Nonetheless, despite a few technicalities, the objective of this document is to provide an insight on the approach used in DODB more than this particular implementation.
The manipulated data type can be found in \f[CW]spec/db-cars.cr\f[].
.SOURCE Ruby ps=9 vs=9p
.SOURCE Ruby ps=9 vs=9p vs=10
class Car
property name : String # 1-1 relation
property color : String # 1-n relation