From e152bc0ee705aaaee8b2eda133c00e7355bde418 Mon Sep 17 00:00:00 2001 From: Philippe PITTOLI Date: Mon, 27 May 2024 04:58:32 +0200 Subject: [PATCH] comments + paper --- paper/paper.ms | 57 +++++++++++++++++++++++++++++++++----- spec/benchmark-cars.cr | 13 +++++---- src/dodb/storage/common.cr | 8 +++--- src/fifo.cr | 45 +++++++++++++++++------------- 4 files changed, 87 insertions(+), 36 deletions(-) diff --git a/paper/paper.ms b/paper/paper.ms index b1fab9d..b1d5b5b 100644 --- a/paper/paper.ms +++ b/paper/paper.ms @@ -488,6 +488,56 @@ class are available for Since indexes do not require nearly as much memory as caching the entire database, they are cached by default. . . +. +.SECTION Common database: caching only recently used data +Storing the entire data-set in memory is an effective way to make the requests fast, as does +the +.I "cached database" +presented in the previous section. +Not all data-sets are compatible with this approach, for obvious reasons. +Thus, a tradeoff could be found to enable fast retrieval of data without requiring much memory. +Caching only a part of the data-set could already enable a massive speed-up even in memory-constrained environments. +The most effective strategy could differ from an application to another, providing a generic algorithm that should work for all possible constraints is an hazardous endeavor. +However, caching only the most recently requested values is a simple policy which may be efficient in many cases. +This strategy is implemented in +.I "common database" +and this section will explain how it works. + +Common database implements a simple strategy to keep only relevant values in memory: +caching +.I "recently used" +values. +Any value that is requested or added to the database is considered +.I recent . + +.B "How this works" . +Each time a value is added in the database, its key is put as the first element of a list. +In this list, +.B "values are unique" . +Adding a value that is already present in the list is considered as "using the value", +thus it is moved at the start of the list. +In case the number of entries exceeds what is allowed, +the least recently used value (the last list entry) is removed, +along with its related data from the cache. + +.B "Implementation details" . +The implementation is time-efficient; +the duration of adding a value is constant, it doesn't change with the number of entries. +This efficiency is a memory tradeoff. +All the entries are added to a +.B "double-linked list" +(to keep track of the order of the added keys) and to a +.B hash +to perform efficient searches of the keys in the list. +Thus, all the nodes are added twice, once in the list, once in the hash. +This way, adding, removing and searching for an entry in the list is fast, +no matter the size of the list. + +Moreover, +.I "common database" +enables to adjust the number of stored entries. +. +. .SECTION RAM-only database for short-lived data Databases are built around the objective to actually .I store @@ -911,13 +961,6 @@ Caching the value enables a massive performance gain, data can be retrieved seve .SECTION Future work This section presents all the features I want to see in a future version of the DODB library. . -.SS Cached database and indexes with selective memory -Right now, both cached database and cached indexes will store any cached value indefinitively. -Giving the cache the ability to select the values to keep in memory would enable a massive speed-up even in memory-constrained environments. -The policy could be as simple as keeping in memory only the most recently requested values. - -These new versions of cached database and indexes will become the standard, default DODB behavior. -. .SS Pagination via the indexes: offset and limit Right now, browsing the entire database by requesting a limited list at a time is possible, thanks to some functions accepting an .I offset diff --git a/spec/benchmark-cars.cr b/spec/benchmark-cars.cr index a23678b..feb72ec 100644 --- a/spec/benchmark-cars.cr +++ b/spec/benchmark-cars.cr @@ -12,6 +12,7 @@ require "./db-cars.cr" # ENV["REPORT_DIR"] rescue "results" # ENV["NBRUN"] rescue 100 # ENV["MAXINDEXES"] rescue 5_000 +# ENV["FIFO_SIZE"] rescue 10_000 class Context class_property report_dir = "results" @@ -20,6 +21,7 @@ class Context class_property from = 1_000 class_property to = 50_000 class_property incr = 1_000 + class_property fifo_size = 10_000 end # To simplify the creation of graphs, it's better to have fake data for @@ -101,7 +103,7 @@ end def bench_searches() cars_ram = SPECDB::RAMOnly(Car).new cars_cached = SPECDB::Cached(Car).new - cars_fifo = SPECDB::FIFO(Car).new "", 5000 # With only 5_000 entries + cars_fifo = SPECDB::Common(Car).new "-#{Context.fifo_size}", Context.fifo_size cars_semi = SPECDB::Uncached(Car).new "-semi" cars_uncached = SPECDB::Uncached(Car).new @@ -134,7 +136,7 @@ end def bench_add() cars_ram = SPECDB::RAMOnly(Car).new cars_cached = SPECDB::Cached(Car).new - cars_fifo = SPECDB::FIFO(Car).new "", 5_000 + cars_fifo = SPECDB::Common(Car).new "-#{Context.fifo_size}", Context.fifo_size cars_semi = SPECDB::Uncached(Car).new "-semi" cars_uncached = SPECDB::Uncached(Car).new @@ -166,9 +168,9 @@ def bench_add() end def bench_50_shades_of_fifo() - cars_fifo1 = SPECDB::FIFO(Car).new "", 1_000 - cars_fifo5 = SPECDB::FIFO(Car).new "", 5_000 - cars_fifo10 = SPECDB::FIFO(Car).new "", 10_000 + cars_fifo1 = SPECDB::Common(Car).new "-1k", 1_000 + cars_fifo5 = SPECDB::Common(Car).new "-5k", 5_000 + cars_fifo10 = SPECDB::Common(Car).new "-10k", 10_000 fifo_Sby_name1, fifo_Sby_color1, fifo_Sby_keywords1 = cached_indexes cars_fifo1 fifo_Sby_name5, fifo_Sby_color5, fifo_Sby_keywords5 = cached_indexes cars_fifo5 @@ -189,6 +191,7 @@ ENV["NBRUN"]?.try { |it| Context.nb_run = it.to_i } ENV["DBSIZE"]?.try { |it| Context.to = it.to_i } ENV["DBSIZE_START"]?.try { |it| Context.from = it.to_i } ENV["DBSIZE_INCREMENT"]?.try { |it| Context.incr = it.to_i } +ENV["FIFO_SIZE"]?.try { |it| Context.fifo_size = it.to_i } pp! Context.nb_run pp! Context.from diff --git a/src/dodb/storage/common.cr b/src/dodb/storage/common.cr index 1efb384..171d1a9 100644 --- a/src/dodb/storage/common.cr +++ b/src/dodb/storage/common.cr @@ -1,8 +1,8 @@ -# Common database: only recently requested entries are kept in memory. +# Common database: only **recently added or requested** entries are kept in memory. # # Least recently used entries may be removed from the cache in order to keep the amount of memory used reasonable. # -# The number of entries to keep in memory is configurable. +# The number of entries to keep in memory is **configurable**. # # This database is relevant for high demand applications; # which means both a high number of entries (data cannot fit entirely in RAM), @@ -33,9 +33,9 @@ # # NOTE: fast for frequently requested data and requires a stable (and configurable) amount of memory. class DODB::Storage::Common(V) < DODB::Storage::Cached(V) - # The *fifo* a simple FIFO instance where the key of the requested data is pushed. + # The *fifo* an `EfficientFIFO` instance where the key of the requested data is pushed. # In case the number of stored entries exceeds what is allowed, the least recently used entry is removed. - property fifo : FIFO(Int32) + property fifo : EfficientFIFO(Int32) # Initializes the `DODB::Storage::Common` database with a maximum number of entries in the cache. def initialize(@directory_name : String, max_entries : UInt32) diff --git a/src/fifo.cr b/src/fifo.cr index a8ba34f..175e9ce 100644 --- a/src/fifo.cr +++ b/src/fifo.cr @@ -1,12 +1,14 @@ require "./list.cr" -# This class enables to keep track of used data. +# This class is a simpler implementation of `EfficientFIFO`, used to implement an eviction policy for data cache +# for `DODB::Storage::Common`. +# It enables to keep track of recently used data. # -# Each time a value is added, it is put in a FIFO structure. -# Adding a value several times is considered as "using the value", -# so it is pushed back at the entry of the FIFO (as a new value). -# In case the number of entries exceeds what is allowed, -# the least recently used value is removed. +# **How this works**. +# Each time a value is added in the database, its key is put in this "FIFO" structure. +# In this structure, **values are unique** and adding a value several times is considered as "using the value", +# so it is pushed back at the entry of the FIFO structure, as a new value. +# In case the number of entries exceeds what is allowed, the least recently used value is removed. # ``` # fifo = FIFO(Int32).new 3 # Only 3 allowed entries. # @@ -20,7 +22,8 @@ require "./list.cr" # ``` # # The number of entries in the FIFO structure is configurable. -# WARNING: this implementation becomes slow very fast, but doesn't cost much memory. +# WARNING: this implementation becomes slow very fast (0(n) complexity), but doesn't cost much memory. +# WARNING: this *FIFO* class doesn't allow the same value multiple times. class FIFO(V) # This array is used as the *fifo structure*. property data : Array(V) @@ -47,19 +50,14 @@ class FIFO(V) end end -# This class enables to keep track of used data. +# This class is used to implement a cache policy for `DODB::Storage::Common`. +# It enables to keep track of recently used data. # -# **Implementation details.** -# Contrary to the `FIFO` class, this implementation is time-efficient. -# However, this efficiency is a memory tradeoff: all the entries are added to a double-linked list to keep -# track of the order **and** to a hash to perform efficient searches of the values in the double-linked list. -# Thus, all the nodes are added twice, once in the list, once in the hash. -# -# Each time a value is added, it is put in a FIFO structure. -# Adding a value several times is considered as "using the value", -# so it is pushed back at the entry of the FIFO (as a new value). -# In case the number of entries exceeds what is allowed, -# the least recently used value is removed. +# **How this works**. +# Each time a value is added in the database, its key is put in this "FIFO" structure. +# In this structure, **values are unique** and adding a value several times is considered as "using the value", +# so it is pushed back at the entry of the FIFO structure, as a new value. +# In case the number of entries exceeds what is allowed, the least recently used value is removed. # ``` # fifo = EfficientFIFO(Int32).new 3 # Only 3 allowed entries. # @@ -72,10 +70,17 @@ end # pp! fifo << 5 # -> 3 (least recently used data) # ``` # +# **Implementation details.** +# Contrary to the `FIFO` class, this implementation is time-efficient. +# However, this efficiency is a memory tradeoff: all the entries are added to a double-linked list to keep +# track of the order **and** to a hash to perform efficient searches of the values in the double-linked list. +# Thus, all the nodes are added twice, once in the list, once in the hash. +# # The number of entries in the FIFO structure is configurable. # NOTE: this implementation is time-efficient, but costs some memory. class EfficientFIFO(V) - # This array is used as the *fifo structure*. + # Both this list and the hash are used as the *fifo structures*. + # The list preserves the *order* of the entries while the *hash* enables fast retrieval of entries in the list. property list : DoubleLinkedList(V) property hash : Hash(V, DoubleLinkedList::Node(V))