s/Storage::Basic/Storage::Uncached/

toying-with-ramdb
Philippe PITTOLI 2024-05-27 21:42:20 +02:00
parent cd48aad945
commit 279f4379e8
6 changed files with 58 additions and 41 deletions

View File

@ -272,14 +272,21 @@ Of course, browsing the entire database to find a value (or its key) is a waste
That is when indexes come into play.
.
.
.SS Indexes
Entries can be
.I indexed
based on their attributes.
There are currently three main ways to search for a value by its attributes: basic indexes, partitions and tags.
.SS Triggers
A simple way to quickly retrieve a piece of data is to create
.I indexes
based on its attributes.
When a value is added to the database, or when it is modified, a
.I trigger
can be called to index it.
There are currently three main triggers in
.CLASS DODB
to index values: basic indexes, partitions and tags.
.
.SSS Basic indexes (1 to 1 relations)
Basic indexes represent one-to-one relations, such as an index in SQL.
Basic indexes
.CLASS DODB::Trigger::Index ) (
represent one-to-one relations, such as an index in SQL.
In the Car database, each car has a dedicated (unique) name.
This
.I name
@ -296,9 +303,9 @@ cars_by_name = cars.new_index "name", { |car| car.name }
cars_by_name = cars.new_index "name", &.name
.SOURCE
Once the index has been created, every added or modified entry in the database will be indexed.
Adding an index (basic index, partition or tag) provides an
Adding a trigger provides an
.I object
used to manipulate the database based on this index.
used to manipulate the database based on the related attribute.
Let's call it an
.I "index object" .
In the code above, the
@ -349,7 +356,7 @@ directory.
.QE
.
The basic indexes as shown in this section already give a taste of what is possible to do with DODB.
The following indexes will cover some other usual cases.
The following triggers will cover some other usual cases.
.
.
.SSS Partitions (1 to n relations)
@ -401,7 +408,7 @@ directory!
.
.SSS Tags (n to n relations)
Tags are basically partitions but the indexed attribute can have multiple values.
.
.QP
.SOURCE Ruby ps=9 vs=10
# Create a tag based on the "keywords" attribute of the cars.
@ -445,15 +452,15 @@ directory!
.
.
.
.SSS Side note about indexes
DODB presents a few possible indexes (basic indexes, partitions and tags) which respond to an obvious need for fast searches.
.SSS Side note about triggers
DODB presents a few possible triggers (basic indexes, partitions and tags) which respond to an obvious need for fast searches.
Though, their implementation via the creation of symlinks is the result of a certain vision about how a database should behave in order to provide a practical way for users to sort the entries.
The implementation can be completely changed.
Also, other kinds of indexes could
Also, other kinds of triggers could
.B easily
be implemented in addition of those presented.
The new indexes may have completely different objectives than providing a file-system representation of the data.
The new triggers may have completely different objectives than providing a file-system representation of the data.
The following sections will precisely cover this aspect.
.
.
@ -469,10 +476,9 @@ Several hundred times faster, see the experiment section.
.FOOTNOTE2
Same thing for cached indexes.
Indexes can easily be cached, thanks to simple hash tables.
.
.
.SS Cached database
A cached database has the same API as the other DODB databases.
.B "Cached database" .
A cached database has the same API as the other DODB databases and keeps a copy of the entire database in memory for fast retrieval.
.QP
.SOURCE Ruby ps=9 vs=10
# Create a cached database
@ -484,12 +490,12 @@ class are available for
.CLASS Storage::Cached .
.QE
.
.SS Cached indexes
.B "Cached indexes" .
Since indexes do not require nearly as much memory as caching the entire database, they are cached by default.
.
.
.
.SECTION Common database: caching only recently used data
.SECTION Common database
Storing the entire data-set in memory is an effective way to make the requests fast, as does
the
.I "cached database"
@ -497,11 +503,14 @@ presented in the previous section.
Not all data-sets are compatible with this approach, for obvious reasons.
Thus, a tradeoff could be found to enable fast retrieval of data without requiring much memory.
Caching only a part of the data-set could already enable a massive speed-up even in memory-constrained environments.
The most effective strategy could differ from an application to another, providing a generic algorithm that should work for all possible constraints is an hazardous endeavor.
The most effective strategy could differ from an application to another\*[*].
.FOOTNOTE1
Providing a generic algorithm that should work for all possible constraints is an hazardous endeavor.
.FOOTNOTE2
However, caching only the most recently requested values is a simple policy which may be efficient in many cases.
This strategy is implemented in
.I "common database"
and this section will explain how it works.
This strategy is implemented in the
.CLASS DODB::Storage::Common
database and this section will explain how it works.
Common database implements a simple strategy to keep only relevant values in memory:
caching
@ -514,7 +523,8 @@ Any value that is requested or added to the database is considered
Each time a value is added in the database, its key is put as the first element of a list.
In this list,
.B "values are unique" .
Adding a value that is already present in the list is considered as "using the value",
Adding a value that is already present in the list is considered as
.I "using the value" ,
thus it is moved at the start of the list.
In case the number of entries exceeds what is allowed,
the least recently used value (the last list entry) is removed,
@ -526,7 +536,9 @@ the duration of adding a value is constant, it doesn't change with the number of
This efficiency is a memory tradeoff.
All the entries are added to a
.B "double-linked list"
(to keep track of the order of the added keys) and to a
(to keep track of the order of the added keys)
.UL and
to a
.B hash
to perform efficient searches of the keys in the list.
Thus, all the nodes are added twice, once in the list, once in the hash.
@ -606,13 +618,17 @@ It is perfectly reasonable to have a cached database with a policy of keeping ju
But for now, the cached version keeps everything.
See the "Future work" section.
.FOOTNOTE2
.
.SS Uncached database
By default, the database (provided by
.CLASS "DODB::Storage::Basic" )
isn't cached.
.
.SS Uncached indexes
.B "Uncached database" .
The
.CLASS "DODB::Storage::Uncached"
database has no data cache at all and can be used in very constrained environments.
However, the
.CLASS DODB::Storage::Common
should (probably) be considered instead, even if the configured number of entries is low.
A small data cache is still better than no cache.
.B "Uncached indexes" .
Cached indexes do not require a large amount of memory since the only stored data is an integer (the
.I key
of the data).
@ -696,7 +712,7 @@ function.
.KE
.
.
.SS Indexes creation
.SS Triggers creation
.QP
.SOURCE Ruby ps=9 vs=10
# Uncached, cached and RAM-only basic indexes.

View File

@ -13,7 +13,7 @@ class DODBCached < DODB::Storage::Cached(Ship)
end
end
class DODBUnCached < DODB::Storage::Basic(Ship)
class DODBUnCached < DODB::Storage::Uncached(Ship)
def initialize(storage_ext = "", remove_previous_data = true)
storage_dir = "test-storage#{storage_ext}"

View File

@ -1,4 +1,4 @@
class SPECDB::Uncached(V) < DODB::Storage::Basic(V)
class SPECDB::Uncached(V) < DODB::Storage::Uncached(V)
property storage_dir : String
def initialize(storage_ext = "", remove_previous_data = true)
@storage_dir = "specdb-storage-uncached#{storage_ext}"

View File

@ -6,7 +6,7 @@ def fork_process(&)
Process.new Crystal::System::Process.fork { yield }
end
describe "DODB::Storage::Basic" do
describe "DODB::Storage::Uncached" do
describe "basics" do
it "store and get data" do
db = SPECDB::Uncached(Ship).new
@ -585,7 +585,7 @@ describe "DODB::Storage::Cached" do
db2 = SPECDB::Cached(Ship).new remove_previous_data: false
db2 << Ship.mutsuki
# Only difference with DODB::Storage::Basic: concurrent DB cannot coexists.
# Only difference with DODB::Storage::Uncached: concurrent DB cannot coexists.
db2.to_a.size.should eq(2)
db1.rm_storage_dir

View File

@ -50,7 +50,7 @@ class DODB::Storage::Cached(V) < DODB::Storage(V)
end
# Load the database in RAM at start-up.
DODB::Storage::Basic(V).new(@directory_name).each_with_key do |v, key|
DODB::Storage::Uncached(V).new(@directory_name).each_with_key do |v, key|
puts "\rloading data from #{@directory_name} at key #{key}"
self[key] = v
end

View File

@ -3,7 +3,7 @@
#
# ```
# # Creates a DODB (uncached) database.
# car_database = DODB::Storage::Basic.new "/path/to/db"
# car_database = DODB::Storage::Uncached.new "/path/to/db"
#
# # Creates a (cached) index.
# cars_by_name = car_database.new_index "name", &.name
@ -22,5 +22,6 @@
# ```
#
# NOTE: slow but doesn't require much memory.
class DODB::Storage::Basic(V) < DODB::Storage(V)
# NOTE: for a database with a configurable data cache size, use `DODB::Storage::Common`.
class DODB::Storage::Uncached(V) < DODB::Storage(V)
end