This commit is contained in:
Philippe PITTOLI 2024-05-17 01:43:22 +02:00
parent e903b9df43
commit 9da39972d3

View File

@ -411,6 +411,8 @@ directory!
.QE
.KE
.
.
.
.SSS Side note about indexes
DODB presents a few possible indexes (basic indexes, partitions and tags) which respond to an obvious need for fast searches.
Though, their implementation via the creation of symlinks is the result of a certain vision about how a database should behave in order to provide a practical way for users to sort the entries.
@ -419,13 +421,105 @@ The implementation can be completely changed.
Also, other kinds of indexes could
.B easily
be implemented in addition of those presented.
.TBD
The new indexes may have completely different objectives than providing a file-system representation of the data.
The following sections will precisely cover this aspect.
.
.
.SSS Cached indexes and ram-only indexes
.SECTION DODB, slow? Nope. Let's talk about caches
DODB acts like a hash table.
Internally, it literally has one
.I "by default"
to cache data.
This means data is being stored in memory as well as on the file-system, so the retrieval is incredibly fast;
same thing for the indexes.
Sure, having a file-system representation of the data (including the indexes) is convenient for the administrator, but input-output operations on a file-system are slow.
Indexes can easily be cached instead, as simple hash tables.
.SS Cached and uncached database
.TBD
.SECTION DODB library: all the important parts
.SS Cached and uncached indexes
.TBD
.SS RAM-only indexes
In case the file-system representation isn't required, indexes can be stored in memory,
.I only .
.TBD
.SECTION RAM-only database for short-lived data
Databases are built around the objective to actually
.I store
data.
But sometimes the data has only the same lifetime as the application.
Stop the application and the data itself become irrelevant, which happens in several occasions, for instance when the application keeps track of the connected users.
This case is not covered by traditional databases; this is out-of-scope, short-lived data only is handled within the application.
Yet, since DODB is a library and not a separate application (read: DODB is incredibly faster), this usage of the database can be relevant.
Having the same API to handle both long and short-lived data can be useful.
Moreover, the previously mentioned indexes (basic indexes, partitions and tags) would also work the same way for these short-lived data.
Of course, in this case, the file-system representation may be completely irrelevant.
And for all these reasons, the
.I RAM-only
DODB database and
.I RAM-only
indexes were created.
Let's recap the advantages of the RAM-only DODB database.
The DODB API is the same for short-lived (read: temporary) and long-lived data.
This includes the same indexes too, so a file-system representation of the current state of the application is possible.
RAM-only also means incredible performances since DODB only is a
.I very
small layer over a hash table.
.SS RAM-only database
Instanciate a RAM-only database is as simple as the other options.
Moreover, this database has exactly the same API as the others, thus changing from one to another is painless.
.QP
.SOURCE Ruby ps=10
# RAM-only database creation
database = DODB::RAMOnlyDataBase(Car).new "path/to/db-cars"
.SOURCE
Yes, the path still is required which may be seen as a quirk but the rationale\*[*] is sound.
.QE
.FOOTNOTE1
A path is still required despite the databse being only in memory for two reasons.
First, indexes can still be instanciated for the database, and those indexes can provide a file-system representation of the data.
Second, I worked enough already, leave me alone.
.FOOTNOTE2
.SS RAM-only indexes
All indexes have their RAM-only counterpart.
.QP
.SOURCE Ruby ps=10
# RAM-only basic indexes.
cars_by_name = cars.new_RAM_index "name", &.name
# RAM-only partitions.
cars_by_colors = cars.new_RAM_partition "color", &.color
# RAM-only tags.
cars_by_keywords = cars.new_RAM_tags "keywords", &.keywords
.SOURCE
The API of the
.I "RAM-only index objects"
is exactly the same as the others.
.QE
As for the database API itself, changing from a version of an index to another is painless.
This way, one can opt for a cached index and, after some time not using the file-system representation, decide to change for its RAM-only version; a 4-character modification and nothing else.
.
.
.
.SECTION DODB and memory constraint
In contrast with the previous section, some environments have a memory constraint.
For example, in case the database is larger than the available memory, it won't be possible to use a data cache\*[*].
.FOOTNOTE1
Keep in mind that for the moment "cached database" means "all data in memory".
It is perfectly reasonable to have a cached database with a policy of keeping just a certain amount of values in memory, in order to limit the memory required by selecting the relevant values to keep in cache (the most recently used, for example).
But for now, the cached version keeps everything.
.FOOTNOTE2
.SS Uncached database
.SS Uncached indexes
.
.SECTION Recap of the DODB API
.TBD
.SS Database creation
.SS Database update and deletion with the key
.SS Indexes creation
.SS Database update and deletion with an index
.SSS Tags: specific functions
.SECTION Limits of DODB
DODB provides basic database operations such as storing, searching, modifying and removing data.
Though, SQL databases have a few
@ -453,6 +547,9 @@ FYI, the service
uses DODB and since the database is fast enough, parallelism isn't required despite enabling more than a thousand requests per second.
.FOOTNOTE2
With a cache, data is retrieved five hundred times quicker than with a SQL database.
Thus, parallelism is probably not needed but a locking mechanism is provided anyway, just in case; this may be overly simplistic but
.SHINE "good enough"
for most applications.
.I Durability
is taken into account.
@ -501,9 +598,9 @@ The scenario is simple: adding values to a database with indexes (basic, partiti
Loop and repeat.
Four instances of DODB are tested:
.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory) ;
.BULLET \fIuncached data but cached index\f[] shows the improvement you can expect by having a cache on indexes ;
.BULLET \fIcached database\f[] shows the most basic use of DODB\*[*] ;
.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory);
.BULLET \fIuncached data but cached index\f[] shows the improvement you can expect by having a cache on indexes;
.BULLET \fIcached database\f[] shows the most basic use of DODB\*[*];
.BULLET \fIRAM only\f[], the database doesn't have a representation on disk (no data is written on it).
The \fIRAM only\f[] instance shows a possible way to use DODB: to keep a consistent API to store data, including in-memory data with a lifetime related to the application's.
.ENDBULLET
@ -585,6 +682,17 @@ Caching the value enables a massive performance gain, data can be retrieved seve
.so graph_query_tag.grap
.
.SECTION Future work
.TBD
This section presents all the features I want to see in a future version of the DODB library.
.SS Cached database and indexes with selective memory
Right now, both cached database and cached indexes will store any cached value indefinitively.
Giving the cache the ability to select the values to keep in memory would enable a massive speed-up even in memory-constrained environments.
The policy could be as simple as keeping in memory only the most recently requested values.
.SS Pagination via the indexes: offset and limit
Right now, browsing the entire database by requesting a limited list at a time is possible, thanks to some functions accepting an
.I offset
and a
.I size .
However, this is not possible with the indexes, thus when querying for example a partition the API provides the entire list of matching values.
This is not acceptable for databases with large partitions and tags: memory will be over-used and requests will be slow.
.SECTION Conclusion
.TBD