comments + paper

This commit is contained in:
Philippe PITTOLI 2024-05-27 04:58:32 +02:00
parent c01ec614ae
commit e152bc0ee7
4 changed files with 87 additions and 36 deletions

View File

@ -488,6 +488,56 @@ class are available for
Since indexes do not require nearly as much memory as caching the entire database, they are cached by default.
.
.
.
.SECTION Common database: caching only recently used data
Storing the entire data-set in memory is an effective way to make the requests fast, as does
the
.I "cached database"
presented in the previous section.
Not all data-sets are compatible with this approach, for obvious reasons.
Thus, a tradeoff could be found to enable fast retrieval of data without requiring much memory.
Caching only a part of the data-set could already enable a massive speed-up even in memory-constrained environments.
The most effective strategy could differ from an application to another, providing a generic algorithm that should work for all possible constraints is an hazardous endeavor.
However, caching only the most recently requested values is a simple policy which may be efficient in many cases.
This strategy is implemented in
.I "common database"
and this section will explain how it works.
Common database implements a simple strategy to keep only relevant values in memory:
caching
.I "recently used"
values.
Any value that is requested or added to the database is considered
.I recent .
.B "How this works" .
Each time a value is added in the database, its key is put as the first element of a list.
In this list,
.B "values are unique" .
Adding a value that is already present in the list is considered as "using the value",
thus it is moved at the start of the list.
In case the number of entries exceeds what is allowed,
the least recently used value (the last list entry) is removed,
along with its related data from the cache.
.B "Implementation details" .
The implementation is time-efficient;
the duration of adding a value is constant, it doesn't change with the number of entries.
This efficiency is a memory tradeoff.
All the entries are added to a
.B "double-linked list"
(to keep track of the order of the added keys) and to a
.B hash
to perform efficient searches of the keys in the list.
Thus, all the nodes are added twice, once in the list, once in the hash.
This way, adding, removing and searching for an entry in the list is fast,
no matter the size of the list.
Moreover,
.I "common database"
enables to adjust the number of stored entries.
.
.
.SECTION RAM-only database for short-lived data
Databases are built around the objective to actually
.I store
@ -911,13 +961,6 @@ Caching the value enables a massive performance gain, data can be retrieved seve
.SECTION Future work
This section presents all the features I want to see in a future version of the DODB library.
.
.SS Cached database and indexes with selective memory
Right now, both cached database and cached indexes will store any cached value indefinitively.
Giving the cache the ability to select the values to keep in memory would enable a massive speed-up even in memory-constrained environments.
The policy could be as simple as keeping in memory only the most recently requested values.
These new versions of cached database and indexes will become the standard, default DODB behavior.
.
.SS Pagination via the indexes: offset and limit
Right now, browsing the entire database by requesting a limited list at a time is possible, thanks to some functions accepting an
.I offset

View File

@ -12,6 +12,7 @@ require "./db-cars.cr"
# ENV["REPORT_DIR"] rescue "results"
# ENV["NBRUN"] rescue 100
# ENV["MAXINDEXES"] rescue 5_000
# ENV["FIFO_SIZE"] rescue 10_000
class Context
class_property report_dir = "results"
@ -20,6 +21,7 @@ class Context
class_property from = 1_000
class_property to = 50_000
class_property incr = 1_000
class_property fifo_size = 10_000
end
# To simplify the creation of graphs, it's better to have fake data for
@ -101,7 +103,7 @@ end
def bench_searches()
cars_ram = SPECDB::RAMOnly(Car).new
cars_cached = SPECDB::Cached(Car).new
cars_fifo = SPECDB::FIFO(Car).new "", 5000 # With only 5_000 entries
cars_fifo = SPECDB::Common(Car).new "-#{Context.fifo_size}", Context.fifo_size
cars_semi = SPECDB::Uncached(Car).new "-semi"
cars_uncached = SPECDB::Uncached(Car).new
@ -134,7 +136,7 @@ end
def bench_add()
cars_ram = SPECDB::RAMOnly(Car).new
cars_cached = SPECDB::Cached(Car).new
cars_fifo = SPECDB::FIFO(Car).new "", 5_000
cars_fifo = SPECDB::Common(Car).new "-#{Context.fifo_size}", Context.fifo_size
cars_semi = SPECDB::Uncached(Car).new "-semi"
cars_uncached = SPECDB::Uncached(Car).new
@ -166,9 +168,9 @@ def bench_add()
end
def bench_50_shades_of_fifo()
cars_fifo1 = SPECDB::FIFO(Car).new "", 1_000
cars_fifo5 = SPECDB::FIFO(Car).new "", 5_000
cars_fifo10 = SPECDB::FIFO(Car).new "", 10_000
cars_fifo1 = SPECDB::Common(Car).new "-1k", 1_000
cars_fifo5 = SPECDB::Common(Car).new "-5k", 5_000
cars_fifo10 = SPECDB::Common(Car).new "-10k", 10_000
fifo_Sby_name1, fifo_Sby_color1, fifo_Sby_keywords1 = cached_indexes cars_fifo1
fifo_Sby_name5, fifo_Sby_color5, fifo_Sby_keywords5 = cached_indexes cars_fifo5
@ -189,6 +191,7 @@ ENV["NBRUN"]?.try { |it| Context.nb_run = it.to_i }
ENV["DBSIZE"]?.try { |it| Context.to = it.to_i }
ENV["DBSIZE_START"]?.try { |it| Context.from = it.to_i }
ENV["DBSIZE_INCREMENT"]?.try { |it| Context.incr = it.to_i }
ENV["FIFO_SIZE"]?.try { |it| Context.fifo_size = it.to_i }
pp! Context.nb_run
pp! Context.from

View File

@ -1,8 +1,8 @@
# Common database: only recently requested entries are kept in memory.
# Common database: only **recently added or requested** entries are kept in memory.
#
# Least recently used entries may be removed from the cache in order to keep the amount of memory used reasonable.
#
# The number of entries to keep in memory is configurable.
# The number of entries to keep in memory is **configurable**.
#
# This database is relevant for high demand applications;
# which means both a high number of entries (data cannot fit entirely in RAM),
@ -33,9 +33,9 @@
#
# NOTE: fast for frequently requested data and requires a stable (and configurable) amount of memory.
class DODB::Storage::Common(V) < DODB::Storage::Cached(V)
# The *fifo* a simple FIFO instance where the key of the requested data is pushed.
# The *fifo* an `EfficientFIFO` instance where the key of the requested data is pushed.
# In case the number of stored entries exceeds what is allowed, the least recently used entry is removed.
property fifo : FIFO(Int32)
property fifo : EfficientFIFO(Int32)
# Initializes the `DODB::Storage::Common` database with a maximum number of entries in the cache.
def initialize(@directory_name : String, max_entries : UInt32)

View File

@ -1,12 +1,14 @@
require "./list.cr"
# This class enables to keep track of used data.
# This class is a simpler implementation of `EfficientFIFO`, used to implement an eviction policy for data cache
# for `DODB::Storage::Common`.
# It enables to keep track of recently used data.
#
# Each time a value is added, it is put in a FIFO structure.
# Adding a value several times is considered as "using the value",
# so it is pushed back at the entry of the FIFO (as a new value).
# In case the number of entries exceeds what is allowed,
# the least recently used value is removed.
# **How this works**.
# Each time a value is added in the database, its key is put in this "FIFO" structure.
# In this structure, **values are unique** and adding a value several times is considered as "using the value",
# so it is pushed back at the entry of the FIFO structure, as a new value.
# In case the number of entries exceeds what is allowed, the least recently used value is removed.
# ```
# fifo = FIFO(Int32).new 3 # Only 3 allowed entries.
#
@ -20,7 +22,8 @@ require "./list.cr"
# ```
#
# The number of entries in the FIFO structure is configurable.
# WARNING: this implementation becomes slow very fast, but doesn't cost much memory.
# WARNING: this implementation becomes slow very fast (0(n) complexity), but doesn't cost much memory.
# WARNING: this *FIFO* class doesn't allow the same value multiple times.
class FIFO(V)
# This array is used as the *fifo structure*.
property data : Array(V)
@ -47,19 +50,14 @@ class FIFO(V)
end
end
# This class enables to keep track of used data.
# This class is used to implement a cache policy for `DODB::Storage::Common`.
# It enables to keep track of recently used data.
#
# **Implementation details.**
# Contrary to the `FIFO` class, this implementation is time-efficient.
# However, this efficiency is a memory tradeoff: all the entries are added to a double-linked list to keep
# track of the order **and** to a hash to perform efficient searches of the values in the double-linked list.
# Thus, all the nodes are added twice, once in the list, once in the hash.
#
# Each time a value is added, it is put in a FIFO structure.
# Adding a value several times is considered as "using the value",
# so it is pushed back at the entry of the FIFO (as a new value).
# In case the number of entries exceeds what is allowed,
# the least recently used value is removed.
# **How this works**.
# Each time a value is added in the database, its key is put in this "FIFO" structure.
# In this structure, **values are unique** and adding a value several times is considered as "using the value",
# so it is pushed back at the entry of the FIFO structure, as a new value.
# In case the number of entries exceeds what is allowed, the least recently used value is removed.
# ```
# fifo = EfficientFIFO(Int32).new 3 # Only 3 allowed entries.
#
@ -72,10 +70,17 @@ end
# pp! fifo << 5 # -> 3 (least recently used data)
# ```
#
# **Implementation details.**
# Contrary to the `FIFO` class, this implementation is time-efficient.
# However, this efficiency is a memory tradeoff: all the entries are added to a double-linked list to keep
# track of the order **and** to a hash to perform efficient searches of the values in the double-linked list.
# Thus, all the nodes are added twice, once in the list, once in the hash.
#
# The number of entries in the FIFO structure is configurable.
# NOTE: this implementation is time-efficient, but costs some memory.
class EfficientFIFO(V)
# This array is used as the *fifo structure*.
# Both this list and the hash are used as the *fifo structures*.
# The list preserves the *order* of the entries while the *hash* enables fast retrieval of entries in the list.
property list : DoubleLinkedList(V)
property hash : Hash(V, DoubleLinkedList::Node(V))