diff --git a/graphs/graphs.ms b/graphs/graphs.ms index 3777c1d..ddba61a 100644 --- a/graphs/graphs.ms +++ b/graphs/graphs.ms @@ -411,6 +411,8 @@ directory! .QE .KE . +. +. .SSS Side note about indexes DODB presents a few possible indexes (basic indexes, partitions and tags) which respond to an obvious need for fast searches. Though, their implementation via the creation of symlinks is the result of a certain vision about how a database should behave in order to provide a practical way for users to sort the entries. @@ -419,13 +421,105 @@ The implementation can be completely changed. Also, other kinds of indexes could .B easily be implemented in addition of those presented. -.TBD +The new indexes may have completely different objectives than providing a file-system representation of the data. +The following sections will precisely cover this aspect. . . -.SSS Cached indexes and ram-only indexes +.SECTION DODB, slow? Nope. Let's talk about caches +DODB acts like a hash table. +Internally, it literally has one +.I "by default" +to cache data. +This means data is being stored in memory as well as on the file-system, so the retrieval is incredibly fast; +same thing for the indexes. +Sure, having a file-system representation of the data (including the indexes) is convenient for the administrator, but input-output operations on a file-system are slow. +Indexes can easily be cached instead, as simple hash tables. +.SS Cached and uncached database .TBD -.SECTION DODB library: all the important parts +.SS Cached and uncached indexes .TBD +.SS RAM-only indexes +In case the file-system representation isn't required, indexes can be stored in memory, +.I only . +.TBD +.SECTION RAM-only database for short-lived data +Databases are built around the objective to actually +.I store +data. +But sometimes the data has only the same lifetime as the application. +Stop the application and the data itself become irrelevant, which happens in several occasions, for instance when the application keeps track of the connected users. +This case is not covered by traditional databases; this is out-of-scope, short-lived data only is handled within the application. +Yet, since DODB is a library and not a separate application (read: DODB is incredibly faster), this usage of the database can be relevant. +Having the same API to handle both long and short-lived data can be useful. +Moreover, the previously mentioned indexes (basic indexes, partitions and tags) would also work the same way for these short-lived data. +Of course, in this case, the file-system representation may be completely irrelevant. +And for all these reasons, the +.I RAM-only +DODB database and +.I RAM-only +indexes were created. + +Let's recap the advantages of the RAM-only DODB database. +The DODB API is the same for short-lived (read: temporary) and long-lived data. +This includes the same indexes too, so a file-system representation of the current state of the application is possible. +RAM-only also means incredible performances since DODB only is a +.I very +small layer over a hash table. +.SS RAM-only database +Instanciate a RAM-only database is as simple as the other options. +Moreover, this database has exactly the same API as the others, thus changing from one to another is painless. +.QP +.SOURCE Ruby ps=10 +# RAM-only database creation +database = DODB::RAMOnlyDataBase(Car).new "path/to/db-cars" +.SOURCE +Yes, the path still is required which may be seen as a quirk but the rationale\*[*] is sound. +.QE +.FOOTNOTE1 +A path is still required despite the databse being only in memory for two reasons. +First, indexes can still be instanciated for the database, and those indexes can provide a file-system representation of the data. +Second, I worked enough already, leave me alone. +.FOOTNOTE2 +.SS RAM-only indexes +All indexes have their RAM-only counterpart. +.QP +.SOURCE Ruby ps=10 +# RAM-only basic indexes. +cars_by_name = cars.new_RAM_index "name", &.name + +# RAM-only partitions. +cars_by_colors = cars.new_RAM_partition "color", &.color + +# RAM-only tags. +cars_by_keywords = cars.new_RAM_tags "keywords", &.keywords +.SOURCE +The API of the +.I "RAM-only index objects" +is exactly the same as the others. +.QE +As for the database API itself, changing from a version of an index to another is painless. +This way, one can opt for a cached index and, after some time not using the file-system representation, decide to change for its RAM-only version; a 4-character modification and nothing else. +. +. +. +.SECTION DODB and memory constraint +In contrast with the previous section, some environments have a memory constraint. +For example, in case the database is larger than the available memory, it won't be possible to use a data cache\*[*]. +.FOOTNOTE1 +Keep in mind that for the moment "cached database" means "all data in memory". +It is perfectly reasonable to have a cached database with a policy of keeping just a certain amount of values in memory, in order to limit the memory required by selecting the relevant values to keep in cache (the most recently used, for example). +But for now, the cached version keeps everything. +.FOOTNOTE2 +.SS Uncached database +.SS Uncached indexes +. +.SECTION Recap of the DODB API +.TBD +.SS Database creation +.SS Database update and deletion with the key +.SS Indexes creation +.SS Database update and deletion with an index +.SSS Tags: specific functions .SECTION Limits of DODB DODB provides basic database operations such as storing, searching, modifying and removing data. Though, SQL databases have a few @@ -453,6 +547,9 @@ FYI, the service uses DODB and since the database is fast enough, parallelism isn't required despite enabling more than a thousand requests per second. .FOOTNOTE2 With a cache, data is retrieved five hundred times quicker than with a SQL database. +Thus, parallelism is probably not needed but a locking mechanism is provided anyway, just in case; this may be overly simplistic but +.SHINE "good enough" +for most applications. .I Durability is taken into account. @@ -501,9 +598,9 @@ The scenario is simple: adding values to a database with indexes (basic, partiti Loop and repeat. Four instances of DODB are tested: -.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory) ; -.BULLET \fIuncached data but cached index\f[] shows the improvement you can expect by having a cache on indexes ; -.BULLET \fIcached database\f[] shows the most basic use of DODB\*[*] ; +.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory); +.BULLET \fIuncached data but cached index\f[] shows the improvement you can expect by having a cache on indexes; +.BULLET \fIcached database\f[] shows the most basic use of DODB\*[*]; .BULLET \fIRAM only\f[], the database doesn't have a representation on disk (no data is written on it). The \fIRAM only\f[] instance shows a possible way to use DODB: to keep a consistent API to store data, including in-memory data with a lifetime related to the application's. .ENDBULLET @@ -585,6 +682,17 @@ Caching the value enables a massive performance gain, data can be retrieved seve .so graph_query_tag.grap . .SECTION Future work -.TBD +This section presents all the features I want to see in a future version of the DODB library. +.SS Cached database and indexes with selective memory +Right now, both cached database and cached indexes will store any cached value indefinitively. +Giving the cache the ability to select the values to keep in memory would enable a massive speed-up even in memory-constrained environments. +The policy could be as simple as keeping in memory only the most recently requested values. +.SS Pagination via the indexes: offset and limit +Right now, browsing the entire database by requesting a limited list at a time is possible, thanks to some functions accepting an +.I offset +and a +.I size . +However, this is not possible with the indexes, thus when querying for example a partition the API provides the entire list of matching values. +This is not acceptable for databases with large partitions and tags: memory will be over-used and requests will be slow. .SECTION Conclusion .TBD