The review continues.

This commit is contained in:
Philippe Pittoli 2025-04-03 14:42:48 +02:00
parent df565f0426
commit 493deef217

View file

@ -613,14 +613,16 @@ Storing the entire data-set in memory is an effective way to make the requests f
the
.I "cached database"
presented in the previous section.
Not all data-sets are compatible with this approach, for obvious reasons.
Thus, a tradeoff could be found to enable fast retrieval of data without requiring much memory.
Caching only a part of the data-set could already enable a massive speed-up even in memory-constrained environments.
The most effective strategy could differ from an application to another\*[*].
Not all data-sets are compatible with this approach due to their size.
Thus, a tradeoff can be found to enable fast retrieval of data without requiring much memory.
Caching only parts of the data-set already enables a massive speed-up even in memory-constrained environments.
The most effective strategy could differ depending on the context, providing a generic algorithm that should work for all possible constraints is an hazardous endeavor.
However, keeping in memory the most recently requested values should be efficient in many cases\*[*].
.FOOTNOTE1
Providing a generic algorithm that should work for all possible constraints is an hazardous endeavor.
LRU is efficient when a subset of the database is regularly requested (at least at a given point in time), more than the rest of the dataset.
For example, an online retailer may present the same set of products over and over to new customers when they first arrive on the website then when they select a category of products.
In both these cases, a cache based on an LRU policy is efficient.
.FOOTNOTE2
However, caching only the most recently requested values is a simple policy which may be efficient in many cases.
This strategy is implemented in the
.CLASS DODB::Storage::Common
database and this section will explain how it works.
@ -628,10 +630,10 @@ database and this section will explain how it works.
Common database implements a
.I "Least Recently Used"
(LRU) cache eviction policy.
The strategy is simple, keeping only the most
The strategy keeps only the most
.I "recently used"
values in memory.
Added, requested or modified values are considered
Requested, added or modified values are considered
.I recent .
In case a new value is added to the cache and that the number of entries exceeds the cache size, the least recently used value is evicted, along with its related data from the cache.
@ -660,6 +662,7 @@ to perform efficient searches of the keys in the list.
Thus, all the nodes are added twice, once in the list, once in the lookup table.
This way, adding, removing and searching for an entry in the list is fast,
no matter the size of the list.
See the annex I for a performance comparison between LRU implementations.
Moreover,
.I "common database"
@ -675,24 +678,26 @@ The
class has the same API as the other database classes.
.QE
.
.SECTION RAM-only database for short-lived data
.SECTION RAM-only database and triggers for short-lived data
Databases are built around the objective to actually
.I store
data.
But sometimes the data has only the same lifetime as the application.
But sometimes the data only has the same lifetime as the application.
Stop the application and the data becomes irrelevant.
This happens in several occasions, for example when the application keeps track of the connected users.
This case is not covered by traditional databases; this is out-of-scope, short-lived data only is handled
This happens for example when the application keeps track of the connected users.
This case is not covered by traditional databases; short-lived data is handled
.UL within
the application.
Since DODB is a library and not a separate application, providing a way to handle this usage of the database can be relevant.
Having the same API to handle both long and short-lived data can be useful.
Having the same API to handle both long and short-lived data is useful.
Moreover, the previously mentioned triggers (basic indexes, partitions and tags) would also work the same way for these short-lived data.
Of course, in this case, the filesystem representation may be completely irrelevant.
Therefore, the
.I RAM-only
database and the
was created.
Also, indexes don't always require a filesystem representation and could be used only to speed up data retrieval for example.
Therefore, as for the database,
.I RAM-only
triggers were created.
@ -739,7 +744,7 @@ The API of the
is exactly the same as the others.
.QE
As for the database API itself, changing from a version of an index to another is painless.
This way, one can opt for a cached index and, after some time not using the filesystem representation, decide to change for its RAM-only version; a 4-character modification and nothing else.
This way, one can opt for the default index (with a filesystem representation and a cache), then after some time decide to change it for its RAM-only version, implying a 4-character modification and nothing else.
.
.
.
@ -804,11 +809,11 @@ Loop and repeat.
Five instances of DODB are tested:
.STARTBULLET
.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory);
.BULLET \fIuncached database but cached index\f[] shows the improvement to expect with an index cache alone;
.BULLET \fIcommon database\f[] shows the most basic use of DODB, with a limited cache (100k entries)\*[*];
.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in memory);
.BULLET \fIuncached database but cached index\f[] shows the improvement to expect with only a cached index;
.BULLET \fIcommon database\f[] shows the expected usage DODB in most instances, with a limited data cache (100k entries in our scenario)\*[*] and a cached index;
.BULLET \fIcached database\f[] represents a database will all the entries in cache (no eviction mechanism);
.BULLET \fIRAM only\f[], the database doesn't have a representation on disk (no data is written on it).
.BULLET and finally, \fIRAM only\f[], the database for volatile data (no data is written on the storage device).
.ENDBULLET
.FOOTNOTE1
The data cache can be fine-tuned with the "common database", enabling the use of DODB in environments with low memory.
@ -820,7 +825,7 @@ is actually a
.I "temporary filesystem (tmpfs)"
to enable maximum efficiency.
.FOOTNOTE1
A very simple $50 PC, buyed online.
A very simple $50 PC, second-hand, buyed online.
Nothing fancy.
.FOOTNOTE2