Blah.
This commit is contained in:
parent
e903b9df43
commit
9da39972d3
122
graphs/graphs.ms
122
graphs/graphs.ms
@ -411,6 +411,8 @@ directory!
|
|||||||
.QE
|
.QE
|
||||||
.KE
|
.KE
|
||||||
.
|
.
|
||||||
|
.
|
||||||
|
.
|
||||||
.SSS Side note about indexes
|
.SSS Side note about indexes
|
||||||
DODB presents a few possible indexes (basic indexes, partitions and tags) which respond to an obvious need for fast searches.
|
DODB presents a few possible indexes (basic indexes, partitions and tags) which respond to an obvious need for fast searches.
|
||||||
Though, their implementation via the creation of symlinks is the result of a certain vision about how a database should behave in order to provide a practical way for users to sort the entries.
|
Though, their implementation via the creation of symlinks is the result of a certain vision about how a database should behave in order to provide a practical way for users to sort the entries.
|
||||||
@ -419,13 +421,105 @@ The implementation can be completely changed.
|
|||||||
Also, other kinds of indexes could
|
Also, other kinds of indexes could
|
||||||
.B easily
|
.B easily
|
||||||
be implemented in addition of those presented.
|
be implemented in addition of those presented.
|
||||||
.TBD
|
The new indexes may have completely different objectives than providing a file-system representation of the data.
|
||||||
|
The following sections will precisely cover this aspect.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SSS Cached indexes and ram-only indexes
|
.SECTION DODB, slow? Nope. Let's talk about caches
|
||||||
|
DODB acts like a hash table.
|
||||||
|
Internally, it literally has one
|
||||||
|
.I "by default"
|
||||||
|
to cache data.
|
||||||
|
This means data is being stored in memory as well as on the file-system, so the retrieval is incredibly fast;
|
||||||
|
same thing for the indexes.
|
||||||
|
Sure, having a file-system representation of the data (including the indexes) is convenient for the administrator, but input-output operations on a file-system are slow.
|
||||||
|
Indexes can easily be cached instead, as simple hash tables.
|
||||||
|
.SS Cached and uncached database
|
||||||
.TBD
|
.TBD
|
||||||
.SECTION DODB library: all the important parts
|
.SS Cached and uncached indexes
|
||||||
.TBD
|
.TBD
|
||||||
|
.SS RAM-only indexes
|
||||||
|
In case the file-system representation isn't required, indexes can be stored in memory,
|
||||||
|
.I only .
|
||||||
|
.TBD
|
||||||
|
.SECTION RAM-only database for short-lived data
|
||||||
|
Databases are built around the objective to actually
|
||||||
|
.I store
|
||||||
|
data.
|
||||||
|
But sometimes the data has only the same lifetime as the application.
|
||||||
|
Stop the application and the data itself become irrelevant, which happens in several occasions, for instance when the application keeps track of the connected users.
|
||||||
|
This case is not covered by traditional databases; this is out-of-scope, short-lived data only is handled within the application.
|
||||||
|
Yet, since DODB is a library and not a separate application (read: DODB is incredibly faster), this usage of the database can be relevant.
|
||||||
|
Having the same API to handle both long and short-lived data can be useful.
|
||||||
|
Moreover, the previously mentioned indexes (basic indexes, partitions and tags) would also work the same way for these short-lived data.
|
||||||
|
Of course, in this case, the file-system representation may be completely irrelevant.
|
||||||
|
And for all these reasons, the
|
||||||
|
.I RAM-only
|
||||||
|
DODB database and
|
||||||
|
.I RAM-only
|
||||||
|
indexes were created.
|
||||||
|
|
||||||
|
Let's recap the advantages of the RAM-only DODB database.
|
||||||
|
The DODB API is the same for short-lived (read: temporary) and long-lived data.
|
||||||
|
This includes the same indexes too, so a file-system representation of the current state of the application is possible.
|
||||||
|
RAM-only also means incredible performances since DODB only is a
|
||||||
|
.I very
|
||||||
|
small layer over a hash table.
|
||||||
|
.SS RAM-only database
|
||||||
|
Instanciate a RAM-only database is as simple as the other options.
|
||||||
|
Moreover, this database has exactly the same API as the others, thus changing from one to another is painless.
|
||||||
|
.QP
|
||||||
|
.SOURCE Ruby ps=10
|
||||||
|
# RAM-only database creation
|
||||||
|
database = DODB::RAMOnlyDataBase(Car).new "path/to/db-cars"
|
||||||
|
.SOURCE
|
||||||
|
Yes, the path still is required which may be seen as a quirk but the rationale\*[*] is sound.
|
||||||
|
.QE
|
||||||
|
.FOOTNOTE1
|
||||||
|
A path is still required despite the databse being only in memory for two reasons.
|
||||||
|
First, indexes can still be instanciated for the database, and those indexes can provide a file-system representation of the data.
|
||||||
|
Second, I worked enough already, leave me alone.
|
||||||
|
.FOOTNOTE2
|
||||||
|
.SS RAM-only indexes
|
||||||
|
All indexes have their RAM-only counterpart.
|
||||||
|
.QP
|
||||||
|
.SOURCE Ruby ps=10
|
||||||
|
# RAM-only basic indexes.
|
||||||
|
cars_by_name = cars.new_RAM_index "name", &.name
|
||||||
|
|
||||||
|
# RAM-only partitions.
|
||||||
|
cars_by_colors = cars.new_RAM_partition "color", &.color
|
||||||
|
|
||||||
|
# RAM-only tags.
|
||||||
|
cars_by_keywords = cars.new_RAM_tags "keywords", &.keywords
|
||||||
|
.SOURCE
|
||||||
|
The API of the
|
||||||
|
.I "RAM-only index objects"
|
||||||
|
is exactly the same as the others.
|
||||||
|
.QE
|
||||||
|
As for the database API itself, changing from a version of an index to another is painless.
|
||||||
|
This way, one can opt for a cached index and, after some time not using the file-system representation, decide to change for its RAM-only version; a 4-character modification and nothing else.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SECTION DODB and memory constraint
|
||||||
|
In contrast with the previous section, some environments have a memory constraint.
|
||||||
|
For example, in case the database is larger than the available memory, it won't be possible to use a data cache\*[*].
|
||||||
|
.FOOTNOTE1
|
||||||
|
Keep in mind that for the moment "cached database" means "all data in memory".
|
||||||
|
It is perfectly reasonable to have a cached database with a policy of keeping just a certain amount of values in memory, in order to limit the memory required by selecting the relevant values to keep in cache (the most recently used, for example).
|
||||||
|
But for now, the cached version keeps everything.
|
||||||
|
.FOOTNOTE2
|
||||||
|
.SS Uncached database
|
||||||
|
.SS Uncached indexes
|
||||||
|
.
|
||||||
|
.SECTION Recap of the DODB API
|
||||||
|
.TBD
|
||||||
|
.SS Database creation
|
||||||
|
.SS Database update and deletion with the key
|
||||||
|
.SS Indexes creation
|
||||||
|
.SS Database update and deletion with an index
|
||||||
|
.SSS Tags: specific functions
|
||||||
.SECTION Limits of DODB
|
.SECTION Limits of DODB
|
||||||
DODB provides basic database operations such as storing, searching, modifying and removing data.
|
DODB provides basic database operations such as storing, searching, modifying and removing data.
|
||||||
Though, SQL databases have a few
|
Though, SQL databases have a few
|
||||||
@ -453,6 +547,9 @@ FYI, the service
|
|||||||
uses DODB and since the database is fast enough, parallelism isn't required despite enabling more than a thousand requests per second.
|
uses DODB and since the database is fast enough, parallelism isn't required despite enabling more than a thousand requests per second.
|
||||||
.FOOTNOTE2
|
.FOOTNOTE2
|
||||||
With a cache, data is retrieved five hundred times quicker than with a SQL database.
|
With a cache, data is retrieved five hundred times quicker than with a SQL database.
|
||||||
|
Thus, parallelism is probably not needed but a locking mechanism is provided anyway, just in case; this may be overly simplistic but
|
||||||
|
.SHINE "good enough"
|
||||||
|
for most applications.
|
||||||
|
|
||||||
.I Durability
|
.I Durability
|
||||||
is taken into account.
|
is taken into account.
|
||||||
@ -501,9 +598,9 @@ The scenario is simple: adding values to a database with indexes (basic, partiti
|
|||||||
Loop and repeat.
|
Loop and repeat.
|
||||||
|
|
||||||
Four instances of DODB are tested:
|
Four instances of DODB are tested:
|
||||||
.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory) ;
|
.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory);
|
||||||
.BULLET \fIuncached data but cached index\f[] shows the improvement you can expect by having a cache on indexes ;
|
.BULLET \fIuncached data but cached index\f[] shows the improvement you can expect by having a cache on indexes;
|
||||||
.BULLET \fIcached database\f[] shows the most basic use of DODB\*[*] ;
|
.BULLET \fIcached database\f[] shows the most basic use of DODB\*[*];
|
||||||
.BULLET \fIRAM only\f[], the database doesn't have a representation on disk (no data is written on it).
|
.BULLET \fIRAM only\f[], the database doesn't have a representation on disk (no data is written on it).
|
||||||
The \fIRAM only\f[] instance shows a possible way to use DODB: to keep a consistent API to store data, including in-memory data with a lifetime related to the application's.
|
The \fIRAM only\f[] instance shows a possible way to use DODB: to keep a consistent API to store data, including in-memory data with a lifetime related to the application's.
|
||||||
.ENDBULLET
|
.ENDBULLET
|
||||||
@ -585,6 +682,17 @@ Caching the value enables a massive performance gain, data can be retrieved seve
|
|||||||
.so graph_query_tag.grap
|
.so graph_query_tag.grap
|
||||||
.
|
.
|
||||||
.SECTION Future work
|
.SECTION Future work
|
||||||
.TBD
|
This section presents all the features I want to see in a future version of the DODB library.
|
||||||
|
.SS Cached database and indexes with selective memory
|
||||||
|
Right now, both cached database and cached indexes will store any cached value indefinitively.
|
||||||
|
Giving the cache the ability to select the values to keep in memory would enable a massive speed-up even in memory-constrained environments.
|
||||||
|
The policy could be as simple as keeping in memory only the most recently requested values.
|
||||||
|
.SS Pagination via the indexes: offset and limit
|
||||||
|
Right now, browsing the entire database by requesting a limited list at a time is possible, thanks to some functions accepting an
|
||||||
|
.I offset
|
||||||
|
and a
|
||||||
|
.I size .
|
||||||
|
However, this is not possible with the indexes, thus when querying for example a partition the API provides the entire list of matching values.
|
||||||
|
This is not acceptable for databases with large partitions and tags: memory will be over-used and requests will be slow.
|
||||||
.SECTION Conclusion
|
.SECTION Conclusion
|
||||||
.TBD
|
.TBD
|
||||||
|
Loading…
Reference in New Issue
Block a user