diff --git a/paper/paper.ms b/paper/paper.ms index b1d5b5b..c2dd3c0 100644 --- a/paper/paper.ms +++ b/paper/paper.ms @@ -272,14 +272,21 @@ Of course, browsing the entire database to find a value (or its key) is a waste That is when indexes come into play. . . -.SS Indexes -Entries can be -.I indexed -based on their attributes. -There are currently three main ways to search for a value by its attributes: basic indexes, partitions and tags. +.SS Triggers +A simple way to quickly retrieve a piece of data is to create +.I indexes +based on its attributes. +When a value is added to the database, or when it is modified, a +.I trigger +can be called to index it. +There are currently three main triggers in +.CLASS DODB +to index values: basic indexes, partitions and tags. . .SSS Basic indexes (1 to 1 relations) -Basic indexes represent one-to-one relations, such as an index in SQL. +Basic indexes +.CLASS DODB::Trigger::Index ) ( +represent one-to-one relations, such as an index in SQL. In the Car database, each car has a dedicated (unique) name. This .I name @@ -296,9 +303,9 @@ cars_by_name = cars.new_index "name", { |car| car.name } cars_by_name = cars.new_index "name", &.name .SOURCE Once the index has been created, every added or modified entry in the database will be indexed. -Adding an index (basic index, partition or tag) provides an +Adding a trigger provides an .I object -used to manipulate the database based on this index. +used to manipulate the database based on the related attribute. Let's call it an .I "index object" . In the code above, the @@ -349,7 +356,7 @@ directory. .QE . The basic indexes as shown in this section already give a taste of what is possible to do with DODB. -The following indexes will cover some other usual cases. +The following triggers will cover some other usual cases. . . .SSS Partitions (1 to n relations) @@ -401,7 +408,7 @@ directory! . .SSS Tags (n to n relations) Tags are basically partitions but the indexed attribute can have multiple values. - +. .QP .SOURCE Ruby ps=9 vs=10 # Create a tag based on the "keywords" attribute of the cars. @@ -445,15 +452,15 @@ directory! . . . -.SSS Side note about indexes -DODB presents a few possible indexes (basic indexes, partitions and tags) which respond to an obvious need for fast searches. +.SSS Side note about triggers +DODB presents a few possible triggers (basic indexes, partitions and tags) which respond to an obvious need for fast searches. Though, their implementation via the creation of symlinks is the result of a certain vision about how a database should behave in order to provide a practical way for users to sort the entries. The implementation can be completely changed. -Also, other kinds of indexes could +Also, other kinds of triggers could .B easily be implemented in addition of those presented. -The new indexes may have completely different objectives than providing a file-system representation of the data. +The new triggers may have completely different objectives than providing a file-system representation of the data. The following sections will precisely cover this aspect. . . @@ -469,10 +476,9 @@ Several hundred times faster, see the experiment section. .FOOTNOTE2 Same thing for cached indexes. Indexes can easily be cached, thanks to simple hash tables. -. -. -.SS Cached database -A cached database has the same API as the other DODB databases. + +.B "Cached database" . +A cached database has the same API as the other DODB databases and keeps a copy of the entire database in memory for fast retrieval. .QP .SOURCE Ruby ps=9 vs=10 # Create a cached database @@ -484,12 +490,12 @@ class are available for .CLASS Storage::Cached . .QE . -.SS Cached indexes +.B "Cached indexes" . Since indexes do not require nearly as much memory as caching the entire database, they are cached by default. . . . -.SECTION Common database: caching only recently used data +.SECTION Common database Storing the entire data-set in memory is an effective way to make the requests fast, as does the .I "cached database" @@ -497,11 +503,14 @@ presented in the previous section. Not all data-sets are compatible with this approach, for obvious reasons. Thus, a tradeoff could be found to enable fast retrieval of data without requiring much memory. Caching only a part of the data-set could already enable a massive speed-up even in memory-constrained environments. -The most effective strategy could differ from an application to another, providing a generic algorithm that should work for all possible constraints is an hazardous endeavor. +The most effective strategy could differ from an application to another\*[*]. +.FOOTNOTE1 +Providing a generic algorithm that should work for all possible constraints is an hazardous endeavor. +.FOOTNOTE2 However, caching only the most recently requested values is a simple policy which may be efficient in many cases. -This strategy is implemented in -.I "common database" -and this section will explain how it works. +This strategy is implemented in the +.CLASS DODB::Storage::Common +database and this section will explain how it works. Common database implements a simple strategy to keep only relevant values in memory: caching @@ -514,7 +523,8 @@ Any value that is requested or added to the database is considered Each time a value is added in the database, its key is put as the first element of a list. In this list, .B "values are unique" . -Adding a value that is already present in the list is considered as "using the value", +Adding a value that is already present in the list is considered as +.I "using the value" , thus it is moved at the start of the list. In case the number of entries exceeds what is allowed, the least recently used value (the last list entry) is removed, @@ -526,7 +536,9 @@ the duration of adding a value is constant, it doesn't change with the number of This efficiency is a memory tradeoff. All the entries are added to a .B "double-linked list" -(to keep track of the order of the added keys) and to a +(to keep track of the order of the added keys) +.UL and +to a .B hash to perform efficient searches of the keys in the list. Thus, all the nodes are added twice, once in the list, once in the hash. @@ -606,13 +618,17 @@ It is perfectly reasonable to have a cached database with a policy of keeping ju But for now, the cached version keeps everything. See the "Future work" section. .FOOTNOTE2 -. -.SS Uncached database -By default, the database (provided by -.CLASS "DODB::Storage::Basic" ) -isn't cached. -. -.SS Uncached indexes + +.B "Uncached database" . +The +.CLASS "DODB::Storage::Uncached" +database has no data cache at all and can be used in very constrained environments. +However, the +.CLASS DODB::Storage::Common +should (probably) be considered instead, even if the configured number of entries is low. +A small data cache is still better than no cache. + +.B "Uncached indexes" . Cached indexes do not require a large amount of memory since the only stored data is an integer (the .I key of the data). @@ -696,7 +712,7 @@ function. .KE . . -.SS Indexes creation +.SS Triggers creation .QP .SOURCE Ruby ps=9 vs=10 # Uncached, cached and RAM-only basic indexes. diff --git a/spec/benchmark-todo.cr b/spec/benchmark-todo.cr index 668a54f..0f564b3 100644 --- a/spec/benchmark-todo.cr +++ b/spec/benchmark-todo.cr @@ -13,7 +13,7 @@ class DODBCached < DODB::Storage::Cached(Ship) end end -class DODBUnCached < DODB::Storage::Basic(Ship) +class DODBUnCached < DODB::Storage::Uncached(Ship) def initialize(storage_ext = "", remove_previous_data = true) storage_dir = "test-storage#{storage_ext}" diff --git a/spec/spec-database.cr b/spec/spec-database.cr index 63ce7d4..f5aa922 100644 --- a/spec/spec-database.cr +++ b/spec/spec-database.cr @@ -1,4 +1,4 @@ -class SPECDB::Uncached(V) < DODB::Storage::Basic(V) +class SPECDB::Uncached(V) < DODB::Storage::Uncached(V) property storage_dir : String def initialize(storage_ext = "", remove_previous_data = true) @storage_dir = "specdb-storage-uncached#{storage_ext}" diff --git a/spec/test-ships.cr b/spec/test-ships.cr index 361d8ed..0bc20ac 100644 --- a/spec/test-ships.cr +++ b/spec/test-ships.cr @@ -6,7 +6,7 @@ def fork_process(&) Process.new Crystal::System::Process.fork { yield } end -describe "DODB::Storage::Basic" do +describe "DODB::Storage::Uncached" do describe "basics" do it "store and get data" do db = SPECDB::Uncached(Ship).new @@ -585,7 +585,7 @@ describe "DODB::Storage::Cached" do db2 = SPECDB::Cached(Ship).new remove_previous_data: false db2 << Ship.mutsuki - # Only difference with DODB::Storage::Basic: concurrent DB cannot coexists. + # Only difference with DODB::Storage::Uncached: concurrent DB cannot coexists. db2.to_a.size.should eq(2) db1.rm_storage_dir diff --git a/src/dodb/storage/cached.cr b/src/dodb/storage/cached.cr index c5e8d27..b0257d6 100644 --- a/src/dodb/storage/cached.cr +++ b/src/dodb/storage/cached.cr @@ -50,7 +50,7 @@ class DODB::Storage::Cached(V) < DODB::Storage(V) end # Load the database in RAM at start-up. - DODB::Storage::Basic(V).new(@directory_name).each_with_key do |v, key| + DODB::Storage::Uncached(V).new(@directory_name).each_with_key do |v, key| puts "\rloading data from #{@directory_name} at key #{key}" self[key] = v end diff --git a/src/dodb/storage/uncached.cr b/src/dodb/storage/uncached.cr index 43bde68..c8f7708 100644 --- a/src/dodb/storage/uncached.cr +++ b/src/dodb/storage/uncached.cr @@ -3,7 +3,7 @@ # # ``` # # Creates a DODB (uncached) database. -# car_database = DODB::Storage::Basic.new "/path/to/db" +# car_database = DODB::Storage::Uncached.new "/path/to/db" # # # Creates a (cached) index. # cars_by_name = car_database.new_index "name", &.name @@ -22,5 +22,6 @@ # ``` # # NOTE: slow but doesn't require much memory. -class DODB::Storage::Basic(V) < DODB::Storage(V) +# NOTE: for a database with a configurable data cache size, use `DODB::Storage::Common`. +class DODB::Storage::Uncached(V) < DODB::Storage(V) end