Some more explanations.

toying-with-ramdb
Philippe PITTOLI 2024-05-30 06:09:58 +02:00
parent 25bfab34e0
commit 1d03f906e6
2 changed files with 224 additions and 184 deletions

View File

@ -10,3 +10,9 @@
%T RFC 8259, The JavaScript Object Notation (JSON) Data Interchange Format
%D 2017
%I Internet Engineering Task Force (IETF)
%K darkhttpd
%A Emil Mikulic
%T DarkHTTPd, when you need a webserver in a hurry.
%D 2017
%I https://unix4lyfe.org/darkhttpd/

View File

@ -671,161 +671,6 @@ is exactly the same as the others.
.QE
.
.
.
.SECTION Recap of the DODB API
This section provides a quick shorthand manual for the most important parts of the DODB API.
For an exhaustive API documentation, please generate the development documentation for the library.
The command
.COMMAND "make doc"
generates the documentation, then the
.COMMAND "make serve-doc"
command enables to browse the full documentation with a web browser.
.
.SS Database creation
.QP
.SOURCE Ruby ps=9 vs=10
# Uncached, cached, common and RAM-only database creation.
database = DODB::Storage::Uncached(Car).new "path/to/db"
database = DODB::Storage::Cached(Car).new "path/to/db"
database = DODB::Storage::Common(Car).new "path/to/db", 50000 # nb cache entries
database = DODB::Storage::RAMOnly(Car).new "path/to/db"
.SOURCE
.QE
.
.SS Browsing the database
.QP
.SOURCE Ruby ps=9 vs=10
# List all the values in the database
database.each do |value|
# ...
end
.SOURCE
.QE
.QP
.SOURCE Ruby ps=9 vs=10
# List all the values in the database with their key
database.each_with_key do |value, key|
# ...
end
.SOURCE
.QE
.
.SS Database search, update and deletion with the key (integer associated to the value)
.KS
.QP
.SOURCE Ruby ps=9 vs=10
value = database[key] # May throw a MissingEntry exception
value = database[key]? # Returns nil if the value doesn't exist
database[key] = value
database.delete key
.SOURCE
Side note for the
.I []
function: in case the value isn't in the database, the function throws an exception named
.CLASS DODB::MissingEntry .
To avoid this exception and get a
.I nil
value instead, use the
.I []?
function.
.QE
.KE
.
.
.SS Trigger creation
.QP
.SOURCE Ruby ps=9 vs=10
# Uncached, cached and RAM-only basic indexes.
cars_by_name = cars.new_uncached_index "name", &.name
cars_by_name = cars.new_index "name", &.name
cars_by_name = cars.new_RAM_index "name", &.name
# Uncached, cached and RAM-only partitions.
cars_by_color = cars.new_uncached_partition "color", &.color
cars_by_color = cars.new_partition "color", &.color
cars_by_color = cars.new_RAM_partition "color", &.color
# Uncached, cached and RAM-only tags.
cars_by_keywords = cars.new_uncached_tags "keywords", &.keywords
cars_by_keywords = cars.new_tags "keywords", &.keywords
cars_by_keywords = cars.new_RAM_tags "keywords", &.keywords
.SOURCE
.QE
.
.
.SS Database retrieval, update and deletion with an index
.
.QP
.SOURCE Ruby ps=9 vs=10
# Get a value from a 1-1 index.
car = cars_by_name.get "Corvet" # May throw a MissingEntry exception
car = cars_by_name.get? "Corvet" # Returns nil if the value doesn't exist
.SOURCE
.QE
.
.QP
.SOURCE Ruby ps=9 vs=10
# Get a value from a partition (1-n relations) or a tag (n-n relations) index.
red_cars = cars_by_color.get "red" # empty array if no such cars exist
fast_cars = cars_by_keywords.get "fast" # empty array if no such cars exist
# Several tags can be selected at the same time, to narrow the search.
cars_both_fast_and_expensive = cars_by_keywords.get ["fast", "expensive"]
.SOURCE
.QE
.
The basic 1-1
.I "index object"
can update a value by selecting an unique entry in the database.
.QP
.SOURCE Ruby ps=9 vs=10
car = cars_by_name.update updated_car # If the `name` hasn't changed.
car = cars_by_name.update "Corvet", updated_car # If the `name` has changed.
car = cars_by_name.update_or_create updated_car # Updates or creates the value.
car = cars_by_name.update_or_create "Corvet", updated_car # Same.
.SOURCE
.QE
For deletion, database entries can be selected based on any index.
Partitions and tags can take a block of code to narrow the selection.
.QP
.SOURCE Ruby ps=9 vs=10
cars_by_name.delete "Corvet" # Deletes the car named "Corvet".
cars_by_color.delete "red" # Deletes all red cars.
# Deletes cars that are both slow and expensive.
cars_by_keywords.delete ["slow", "expensive"]
# Deletes all cars that are both blue and slow.
cars_by_color.delete "blue", do |car|
car.keywords.includes? "slow"
end
# Same.
cars_by_keywords.delete "slow", do |car|
car.color == "blue"
end
.SOURCE
.QE
.
.
.SSS Tags: search on multiple keys
The Tag index enables to search for a value based on multiple keys.
For example, searching for all cars that are both fast and elegant can be written this way:
.QP
.SOURCE Ruby ps=9 vs=10
fast_elegant_cars = cars_by_keywords.get ["fast", "elegant"]
.SOURCE
Used with a list of keys, the
.FUNCTION_CALL get
function returns an empty list in case the search failed.
.br
The implementation was designed to be simple (7 lines of code), not efficient.
However, with data and index caches, the search is expected to meet about everyone's requirements, speed-wise, given that the tags are small enough (a few thousand entries).
.QE
.
.
.SECTION Limits of DODB
DODB provides basic database operations such as storing, searching, modifying and removing data.
Though, SQL databases have a few
@ -980,6 +825,8 @@ That is why alternative encodings, such as CBOR,
CBOR
.]
should be considered for large databases.
.
.
.SS Partitions (1 to n relations)
The previous example shown the retrieval of a single value from the database.
The following will show what happens when thousands of entries are retrieved.
@ -1017,6 +864,7 @@ and
databases, and the more data there is to retrieve, the worst it gets.
However, retrieving thousands and thousands of entries in a single request may not be a typical usage of databases, anyway.
.
.
.SS Tags (n to n relations)
A tag index enables to match a list of entries based on an attribute with potentially multiple values (such as an array).
In the experiment, a database of cars is created along with a tag index on a list of
@ -1071,40 +919,58 @@ This database is to be considered to achieve maximum speed for data-sets fitting
.B "Common database"
enables to lower the memory requirements as much as desired.
The eviction policy implies some operations which leads to poorer performances, however still acceptable.
.
.ps -2
.TS
allbox tab(:);
c | lw(3.6i) | cew(1.4i).
DODB instance:Comment and database usage:T{
compared to RAM-only
T}
RAM only:T{
Worst memory footprint, best performance.
T}:-
Cached db and index:T{
Performance for retrieving a value is the same as RAM only while
enabling the admin to manually search for data on-disk.
T}:about the same perfs
Common db, cached index:T{
Performance is still excellent while requiring a
.UL configurable
amount of RAM.
Should be used by default.
T}:T{
67% slower (about 200 ns) which still is great
T}
Uncached db, cached index:Very slow. Common database should be considered instead.:170 to 180x slower
Uncached db and index:T{
Best memory footprint, worst performance.
T}:200 to 210x slower
.TE
.ps \n[PS]
.B "Uncached database"
is mostly in this experiment as a control sample, to see what could be the worst possible performances of DODB.
Cached indexes should be considered for most applications, or even their RAM-only version in case the file-system representation isn't necessary.
.
.\" .ps -2
.\" .TS
.\" allbox tab(:);
.\" c | lw(3.6i) | cew(1.4i).
.\" DODB instance:Comment and database usage:T{
.\" compared to RAM-only
.\" T}
.\" RAM only:T{
.\" Worst memory footprint, best performance.
.\" T}:-
.\" Cached db and index:T{
.\" Performance for retrieving a value is the same as RAM only while
.\" enabling the admin to manually search for data on-disk.
.\" T}:about the same perfs
.\" Common db, cached index:T{
.\" Performance is still excellent while requiring a
.\" .UL configurable
.\" amount of RAM.
.\" Should be used by default.
.\" T}:T{
.\" 67% slower (about 200 ns) which still is great
.\" T}
.\" Uncached db, cached index:Very slow. Common database should be considered instead.:170 to 180x slower
.\" Uncached db and index:T{
.\" Best memory footprint, worst performance.
.\" T}:200 to 210x slower
.\" .TE
.\" .ps \n[PS]
.
.SS Conclusion on performance
As expected, retrieving a single value is fast and the size of the database doesn't matter much.
Each deserialization and, more importantly, each disk access is a pain point.
Caching the value enables a massive performance gain, data can be retrieved several hundred times quicker.
The more entries requested, the slower it gets; but more importantly, the poorer performances it gets
.UL "per entry" .
The eviction policy also implies poorer performances since it requires operations to select the data to cache.
However, the implementation is as simple as it gets, and some approaches could be considered to make it faster.
Notably, specific data-sets or database uses could lead to adapt the eviction policy.
Same thing for the entire caching mechanism.
The current implementation offers a simple and generic way to store data based on typical database uses.
As a side note, let's keep in mind that requesting several thousand entries in DODB, with the common database for instance, is as slow as getting
.B "a single entry"
with SQL (varies from 0.1 to 2 ms on my machine for a single value without a search, just the first available entry).
This should help put things into perspective.
.
.SECTION Future work
This section presents all the features I want to see in a future version of the DODB library.
@ -1157,6 +1023,9 @@ Since this implementation of DODB is related to the Crystal language (which isn'
.
.
.SECTION Conclusion
The
.I common
database should be an acceptable choice for most applications.
.TBD
.APPENDIX FIFO vs Efficient FIFO
@ -1203,3 +1072,168 @@ When the cache size is not sufficient, the requests are hundred times slower, wh
This figure shows the request durations to retrieve data based on a tag containing up to 5k entries.
.QE
As for partitions, the response time depends on the number of entries to retrieve and the duration increases linearly with the number of elements.
.
.
.APPENDIX Recap of the DODB API
This section provides a quick shorthand manual for the most important parts of the DODB API.
For an exhaustive API documentation, please generate the development documentation for the library.
The command
.COMMAND "make doc"
generates the documentation, then the
.COMMAND "make serve-doc"
command enables to browse the full documentation with a web browser\*[*].
.FOOTNOTE1
The
.COMMAND "make serve-doc"
requires darkhttpd
.[
darkhttpd
.]
but this can be adapted to any other web server.
.FOOTNOTE2
.
.SS Database creation
.QP
.SOURCE Ruby ps=9 vs=10
# Uncached, cached, common and RAM-only database creation.
database = DODB::Storage::Uncached(Car).new "path/to/db"
database = DODB::Storage::Cached(Car).new "path/to/db"
database = DODB::Storage::Common(Car).new "path/to/db", 50000 # nb cache entries
database = DODB::Storage::RAMOnly(Car).new "path/to/db"
.SOURCE
.QE
.
.SS Browsing the database
.QP
.SOURCE Ruby ps=9 vs=10
# List all the values in the database
database.each do |value|
# ...
end
.SOURCE
.QE
.QP
.SOURCE Ruby ps=9 vs=10
# List all the values in the database with their key
database.each_with_key do |value, key|
# ...
end
.SOURCE
.QE
.
.SS Database search, update and deletion with the key (integer associated to the value)
.KS
.QP
.SOURCE Ruby ps=9 vs=10
value = database[key] # May throw a MissingEntry exception
value = database[key]? # Returns nil if the value doesn't exist
database[key] = value
database.delete key
.SOURCE
Side note for the
.I []
function: in case the value isn't in the database, the function throws an exception named
.CLASS DODB::MissingEntry .
To avoid this exception and get a
.I nil
value instead, use the
.I []?
function.
.QE
.KE
.
.
.SS Trigger creation
.QP
.SOURCE Ruby ps=9 vs=10
# Uncached, cached and RAM-only basic indexes.
cars_by_name = cars.new_uncached_index "name", &.name
cars_by_name = cars.new_index "name", &.name
cars_by_name = cars.new_RAM_index "name", &.name
# Uncached, cached and RAM-only partitions.
cars_by_color = cars.new_uncached_partition "color", &.color
cars_by_color = cars.new_partition "color", &.color
cars_by_color = cars.new_RAM_partition "color", &.color
# Uncached, cached and RAM-only tags.
cars_by_keywords = cars.new_uncached_tags "keywords", &.keywords
cars_by_keywords = cars.new_tags "keywords", &.keywords
cars_by_keywords = cars.new_RAM_tags "keywords", &.keywords
.SOURCE
.QE
.
.
.SS Database retrieval, update and deletion with an index
.
.QP
.SOURCE Ruby ps=9 vs=10
# Get a value from a 1-1 index.
car = cars_by_name.get "Corvet" # May throw a MissingEntry exception
car = cars_by_name.get? "Corvet" # Returns nil if the value doesn't exist
.SOURCE
.QE
.
.QP
.SOURCE Ruby ps=9 vs=10
# Get a value from a partition (1-n relations) or a tag (n-n relations) index.
red_cars = cars_by_color.get "red" # empty array if no such cars exist
fast_cars = cars_by_keywords.get "fast" # empty array if no such cars exist
# Several tags can be selected at the same time, to narrow the search.
cars_both_fast_and_expensive = cars_by_keywords.get ["fast", "expensive"]
.SOURCE
.QE
.
The basic 1-1
.I "index object"
can update a value by selecting an unique entry in the database.
.QP
.SOURCE Ruby ps=9 vs=10
car = cars_by_name.update updated_car # If the `name` hasn't changed.
car = cars_by_name.update "Corvet", updated_car # If the `name` has changed.
car = cars_by_name.update_or_create updated_car # Updates or creates the value.
car = cars_by_name.update_or_create "Corvet", updated_car # Same.
.SOURCE
.QE
For deletion, database entries can be selected based on any index.
Partitions and tags can take a block of code to narrow the selection.
.QP
.SOURCE Ruby ps=9 vs=10
cars_by_name.delete "Corvet" # Deletes the car named "Corvet".
cars_by_color.delete "red" # Deletes all red cars.
# Deletes cars that are both slow and expensive.
cars_by_keywords.delete ["slow", "expensive"]
# Deletes all cars that are both blue and slow.
cars_by_color.delete "blue", do |car|
car.keywords.includes? "slow"
end
# Same.
cars_by_keywords.delete "slow", do |car|
car.color == "blue"
end
.SOURCE
.QE
.
.
.SSS Tags: search on multiple keys
The Tag index enables to search for a value based on multiple keys.
For example, searching for all cars that are both fast and elegant can be written this way:
.QP
.SOURCE Ruby ps=9 vs=10
fast_elegant_cars = cars_by_keywords.get ["fast", "elegant"]
.SOURCE
Used with a list of keys, the
.FUNCTION_CALL get
function returns an empty list in case the search failed.
.br
The implementation was designed to be simple (7 lines of code), not efficient.
However, with data and index caches, the search is expected to meet about everyone's requirements, speed-wise, given that the tags are small enough (a few thousand entries).
.QE
.
.