comments + paper
parent
c01ec614ae
commit
e152bc0ee7
|
@ -488,6 +488,56 @@ class are available for
|
|||
Since indexes do not require nearly as much memory as caching the entire database, they are cached by default.
|
||||
.
|
||||
.
|
||||
.
|
||||
.SECTION Common database: caching only recently used data
|
||||
Storing the entire data-set in memory is an effective way to make the requests fast, as does
|
||||
the
|
||||
.I "cached database"
|
||||
presented in the previous section.
|
||||
Not all data-sets are compatible with this approach, for obvious reasons.
|
||||
Thus, a tradeoff could be found to enable fast retrieval of data without requiring much memory.
|
||||
Caching only a part of the data-set could already enable a massive speed-up even in memory-constrained environments.
|
||||
The most effective strategy could differ from an application to another, providing a generic algorithm that should work for all possible constraints is an hazardous endeavor.
|
||||
However, caching only the most recently requested values is a simple policy which may be efficient in many cases.
|
||||
This strategy is implemented in
|
||||
.I "common database"
|
||||
and this section will explain how it works.
|
||||
|
||||
Common database implements a simple strategy to keep only relevant values in memory:
|
||||
caching
|
||||
.I "recently used"
|
||||
values.
|
||||
Any value that is requested or added to the database is considered
|
||||
.I recent .
|
||||
|
||||
.B "How this works" .
|
||||
Each time a value is added in the database, its key is put as the first element of a list.
|
||||
In this list,
|
||||
.B "values are unique" .
|
||||
Adding a value that is already present in the list is considered as "using the value",
|
||||
thus it is moved at the start of the list.
|
||||
In case the number of entries exceeds what is allowed,
|
||||
the least recently used value (the last list entry) is removed,
|
||||
along with its related data from the cache.
|
||||
|
||||
.B "Implementation details" .
|
||||
The implementation is time-efficient;
|
||||
the duration of adding a value is constant, it doesn't change with the number of entries.
|
||||
This efficiency is a memory tradeoff.
|
||||
All the entries are added to a
|
||||
.B "double-linked list"
|
||||
(to keep track of the order of the added keys) and to a
|
||||
.B hash
|
||||
to perform efficient searches of the keys in the list.
|
||||
Thus, all the nodes are added twice, once in the list, once in the hash.
|
||||
This way, adding, removing and searching for an entry in the list is fast,
|
||||
no matter the size of the list.
|
||||
|
||||
Moreover,
|
||||
.I "common database"
|
||||
enables to adjust the number of stored entries.
|
||||
.
|
||||
.
|
||||
.SECTION RAM-only database for short-lived data
|
||||
Databases are built around the objective to actually
|
||||
.I store
|
||||
|
@ -911,13 +961,6 @@ Caching the value enables a massive performance gain, data can be retrieved seve
|
|||
.SECTION Future work
|
||||
This section presents all the features I want to see in a future version of the DODB library.
|
||||
.
|
||||
.SS Cached database and indexes with selective memory
|
||||
Right now, both cached database and cached indexes will store any cached value indefinitively.
|
||||
Giving the cache the ability to select the values to keep in memory would enable a massive speed-up even in memory-constrained environments.
|
||||
The policy could be as simple as keeping in memory only the most recently requested values.
|
||||
|
||||
These new versions of cached database and indexes will become the standard, default DODB behavior.
|
||||
.
|
||||
.SS Pagination via the indexes: offset and limit
|
||||
Right now, browsing the entire database by requesting a limited list at a time is possible, thanks to some functions accepting an
|
||||
.I offset
|
||||
|
|
|
@ -12,6 +12,7 @@ require "./db-cars.cr"
|
|||
# ENV["REPORT_DIR"] rescue "results"
|
||||
# ENV["NBRUN"] rescue 100
|
||||
# ENV["MAXINDEXES"] rescue 5_000
|
||||
# ENV["FIFO_SIZE"] rescue 10_000
|
||||
|
||||
class Context
|
||||
class_property report_dir = "results"
|
||||
|
@ -20,6 +21,7 @@ class Context
|
|||
class_property from = 1_000
|
||||
class_property to = 50_000
|
||||
class_property incr = 1_000
|
||||
class_property fifo_size = 10_000
|
||||
end
|
||||
|
||||
# To simplify the creation of graphs, it's better to have fake data for
|
||||
|
@ -101,7 +103,7 @@ end
|
|||
def bench_searches()
|
||||
cars_ram = SPECDB::RAMOnly(Car).new
|
||||
cars_cached = SPECDB::Cached(Car).new
|
||||
cars_fifo = SPECDB::FIFO(Car).new "", 5000 # With only 5_000 entries
|
||||
cars_fifo = SPECDB::Common(Car).new "-#{Context.fifo_size}", Context.fifo_size
|
||||
cars_semi = SPECDB::Uncached(Car).new "-semi"
|
||||
cars_uncached = SPECDB::Uncached(Car).new
|
||||
|
||||
|
@ -134,7 +136,7 @@ end
|
|||
def bench_add()
|
||||
cars_ram = SPECDB::RAMOnly(Car).new
|
||||
cars_cached = SPECDB::Cached(Car).new
|
||||
cars_fifo = SPECDB::FIFO(Car).new "", 5_000
|
||||
cars_fifo = SPECDB::Common(Car).new "-#{Context.fifo_size}", Context.fifo_size
|
||||
cars_semi = SPECDB::Uncached(Car).new "-semi"
|
||||
cars_uncached = SPECDB::Uncached(Car).new
|
||||
|
||||
|
@ -166,9 +168,9 @@ def bench_add()
|
|||
end
|
||||
|
||||
def bench_50_shades_of_fifo()
|
||||
cars_fifo1 = SPECDB::FIFO(Car).new "", 1_000
|
||||
cars_fifo5 = SPECDB::FIFO(Car).new "", 5_000
|
||||
cars_fifo10 = SPECDB::FIFO(Car).new "", 10_000
|
||||
cars_fifo1 = SPECDB::Common(Car).new "-1k", 1_000
|
||||
cars_fifo5 = SPECDB::Common(Car).new "-5k", 5_000
|
||||
cars_fifo10 = SPECDB::Common(Car).new "-10k", 10_000
|
||||
|
||||
fifo_Sby_name1, fifo_Sby_color1, fifo_Sby_keywords1 = cached_indexes cars_fifo1
|
||||
fifo_Sby_name5, fifo_Sby_color5, fifo_Sby_keywords5 = cached_indexes cars_fifo5
|
||||
|
@ -189,6 +191,7 @@ ENV["NBRUN"]?.try { |it| Context.nb_run = it.to_i }
|
|||
ENV["DBSIZE"]?.try { |it| Context.to = it.to_i }
|
||||
ENV["DBSIZE_START"]?.try { |it| Context.from = it.to_i }
|
||||
ENV["DBSIZE_INCREMENT"]?.try { |it| Context.incr = it.to_i }
|
||||
ENV["FIFO_SIZE"]?.try { |it| Context.fifo_size = it.to_i }
|
||||
|
||||
pp! Context.nb_run
|
||||
pp! Context.from
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
# Common database: only recently requested entries are kept in memory.
|
||||
# Common database: only **recently added or requested** entries are kept in memory.
|
||||
#
|
||||
# Least recently used entries may be removed from the cache in order to keep the amount of memory used reasonable.
|
||||
#
|
||||
# The number of entries to keep in memory is configurable.
|
||||
# The number of entries to keep in memory is **configurable**.
|
||||
#
|
||||
# This database is relevant for high demand applications;
|
||||
# which means both a high number of entries (data cannot fit entirely in RAM),
|
||||
|
@ -33,9 +33,9 @@
|
|||
#
|
||||
# NOTE: fast for frequently requested data and requires a stable (and configurable) amount of memory.
|
||||
class DODB::Storage::Common(V) < DODB::Storage::Cached(V)
|
||||
# The *fifo* a simple FIFO instance where the key of the requested data is pushed.
|
||||
# The *fifo* an `EfficientFIFO` instance where the key of the requested data is pushed.
|
||||
# In case the number of stored entries exceeds what is allowed, the least recently used entry is removed.
|
||||
property fifo : FIFO(Int32)
|
||||
property fifo : EfficientFIFO(Int32)
|
||||
|
||||
# Initializes the `DODB::Storage::Common` database with a maximum number of entries in the cache.
|
||||
def initialize(@directory_name : String, max_entries : UInt32)
|
||||
|
|
45
src/fifo.cr
45
src/fifo.cr
|
@ -1,12 +1,14 @@
|
|||
require "./list.cr"
|
||||
|
||||
# This class enables to keep track of used data.
|
||||
# This class is a simpler implementation of `EfficientFIFO`, used to implement an eviction policy for data cache
|
||||
# for `DODB::Storage::Common`.
|
||||
# It enables to keep track of recently used data.
|
||||
#
|
||||
# Each time a value is added, it is put in a FIFO structure.
|
||||
# Adding a value several times is considered as "using the value",
|
||||
# so it is pushed back at the entry of the FIFO (as a new value).
|
||||
# In case the number of entries exceeds what is allowed,
|
||||
# the least recently used value is removed.
|
||||
# **How this works**.
|
||||
# Each time a value is added in the database, its key is put in this "FIFO" structure.
|
||||
# In this structure, **values are unique** and adding a value several times is considered as "using the value",
|
||||
# so it is pushed back at the entry of the FIFO structure, as a new value.
|
||||
# In case the number of entries exceeds what is allowed, the least recently used value is removed.
|
||||
# ```
|
||||
# fifo = FIFO(Int32).new 3 # Only 3 allowed entries.
|
||||
#
|
||||
|
@ -20,7 +22,8 @@ require "./list.cr"
|
|||
# ```
|
||||
#
|
||||
# The number of entries in the FIFO structure is configurable.
|
||||
# WARNING: this implementation becomes slow very fast, but doesn't cost much memory.
|
||||
# WARNING: this implementation becomes slow very fast (0(n) complexity), but doesn't cost much memory.
|
||||
# WARNING: this *FIFO* class doesn't allow the same value multiple times.
|
||||
class FIFO(V)
|
||||
# This array is used as the *fifo structure*.
|
||||
property data : Array(V)
|
||||
|
@ -47,19 +50,14 @@ class FIFO(V)
|
|||
end
|
||||
end
|
||||
|
||||
# This class enables to keep track of used data.
|
||||
# This class is used to implement a cache policy for `DODB::Storage::Common`.
|
||||
# It enables to keep track of recently used data.
|
||||
#
|
||||
# **Implementation details.**
|
||||
# Contrary to the `FIFO` class, this implementation is time-efficient.
|
||||
# However, this efficiency is a memory tradeoff: all the entries are added to a double-linked list to keep
|
||||
# track of the order **and** to a hash to perform efficient searches of the values in the double-linked list.
|
||||
# Thus, all the nodes are added twice, once in the list, once in the hash.
|
||||
#
|
||||
# Each time a value is added, it is put in a FIFO structure.
|
||||
# Adding a value several times is considered as "using the value",
|
||||
# so it is pushed back at the entry of the FIFO (as a new value).
|
||||
# In case the number of entries exceeds what is allowed,
|
||||
# the least recently used value is removed.
|
||||
# **How this works**.
|
||||
# Each time a value is added in the database, its key is put in this "FIFO" structure.
|
||||
# In this structure, **values are unique** and adding a value several times is considered as "using the value",
|
||||
# so it is pushed back at the entry of the FIFO structure, as a new value.
|
||||
# In case the number of entries exceeds what is allowed, the least recently used value is removed.
|
||||
# ```
|
||||
# fifo = EfficientFIFO(Int32).new 3 # Only 3 allowed entries.
|
||||
#
|
||||
|
@ -72,10 +70,17 @@ end
|
|||
# pp! fifo << 5 # -> 3 (least recently used data)
|
||||
# ```
|
||||
#
|
||||
# **Implementation details.**
|
||||
# Contrary to the `FIFO` class, this implementation is time-efficient.
|
||||
# However, this efficiency is a memory tradeoff: all the entries are added to a double-linked list to keep
|
||||
# track of the order **and** to a hash to perform efficient searches of the values in the double-linked list.
|
||||
# Thus, all the nodes are added twice, once in the list, once in the hash.
|
||||
#
|
||||
# The number of entries in the FIFO structure is configurable.
|
||||
# NOTE: this implementation is time-efficient, but costs some memory.
|
||||
class EfficientFIFO(V)
|
||||
# This array is used as the *fifo structure*.
|
||||
# Both this list and the hash are used as the *fifo structures*.
|
||||
# The list preserves the *order* of the entries while the *hash* enables fast retrieval of entries in the list.
|
||||
property list : DoubleLinkedList(V)
|
||||
property hash : Hash(V, DoubleLinkedList::Node(V))
|
||||
|
||||
|
|
Loading…
Reference in New Issue