This commit is contained in:
Philippe PITTOLI 2024-05-29 03:56:11 +02:00
parent 751baef391
commit 02e7e82fa1
2 changed files with 39 additions and 10 deletions

View File

@ -4,3 +4,9 @@
%T RFC 8949, Concise Binary Object Representation (CBOR)
%D 2020
%I Internet Engineering Task Force (IETF)
%K JSON
%A Tim Bray
%T RFC 8259, The JavaScript Object Notation (JSON) Data Interchange Format
%D 2017
%I Internet Engineering Task Force (IETF)

View File

@ -211,7 +211,14 @@ end
When a value is added, it is serialized\*[*] and written in a dedicated file.
.FOOTNOTE1
Serialization is currently in JSON.
CBOR is a work-in-progress.
.[
JSON
.]
CBOR
.[
CBOR
.]
is a work-in-progress.
Nothing binds DODB to a particular format.
.FOOTNOTE2
The key of the hash is a number, auto-incremented, used as the name of the stored file.
@ -896,16 +903,18 @@ Three possible indexes exist in DODB:
The scenario is simple: adding values to a database with indexes (basic, partitions and tags) then query 100 times a value based on the different indexes.
Loop and repeat.
Four instances of DODB are tested:
Five instances of DODB are tested:
.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory);
.BULLET \fIuncached data but cached index\f[] shows the improvement you can expect by having a cache on indexes;
.BULLET \fIcached database\f[] shows the most basic use of DODB\*[*];
.BULLET \fIuncached database but cached index\f[] shows the improvement you can expect by having a cache on indexes;
.BULLET \fIcommon database\f[] shows the most basic use of DODB, with a limited cache (100k entries)\*[*];
.BULLET \fIcached database\f[] represents a database will all the entries in cache (no eviction mechanism);
.BULLET \fIRAM only\f[], the database doesn't have a representation on disk (no data is written on it).
The \fIRAM only\f[] instance shows a possible way to use DODB: to keep a consistent API to store data, including in-memory data with a lifetime related to the application's.
.ENDBULLET
.FOOTNOTE1
Having a cached database will probably be the most widespread use of DODB.
When memory isn't scarce, there is no point not using it to achieve better performance.
Moreover, the "common database" enables to configure the cache size, so this database is relevant even when the data-set is bigger than the available memory.
.FOOTNOTE2
The computer on which this test is performed\*[*] is a AMD PRO A10-8770E R7 (4 cores), 2.8 GHz.When mentioned, the
@ -955,26 +964,40 @@ This is slightly more (about 200 ns) for Common database since there is a few mo
In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db\f[]).
The request is a little longer when the index isn't cached (see \f[CW]Uncached db and index\f[]); in this case DODB walks the file-system to find the right symlink to follow, thus slowing the process even more, by up to 20%.
The logarithmic scale version of this figure shows that RAM-only and Cached databases have exactly the same performance.
The Common database is somewhat slower than these two due to the caching policy: when a value is asked, the Common database puts its key at the start of a list to represent a
The logarithmic scale version of this figure shows that \fIRAM-only\f[] and \fIcached\f[] databases have exactly the same performance.
The \fIcommon\f[] database is somewhat slower than these two due to the caching policy: when a value is asked, the \fIcommon\f[] database puts its key at the start of a list to represent a
.I recent
use of this data (respectively, the last values in this list are the least recently used entries).
Thus, Common database takes 80 ns for its caching policy, which makes this database about 67% slower than the previous ones to retrieve a value.
Thus, the \fIcommon\f[] database takes 80 ns for its caching policy, which makes this database about 67% slower than the previous ones to retrieve a value.
Uncached databases are far away from these results, as shown by the logarithmically scaled figure.
The data cache improves the duration of the requests, this makes them at least a hundred times faster.
The data cache improves the duration of the requests, this makes them at least 170 times faster.
The results depend on the data size; the bigger the data, the slower the serialization (and deserialization).
In this example, the database entries are almost empty; they have very few attributes and not much content (a few dozen characters max).
Thus, performance of non-cached databases will be even more severely impacted with real-world data.
That is why alternative encodings, such as CBOR,
.[
CBOR
.]
should be considered for large databases.
.SS Partitions (1 to n relations)
.LP
The previous example shown the retrieval of a single value from the database.
The following will show what happens when thousands of entries are retrieved.
A partition index enables to match a list of entries based on an attribute.
In the experiment, a database of cars is created along with a partition on their color.
Performance is analyzed based the partition size (the number of red cars) and the duration to retrieve all the entries.
.ps -2
.so graphs/query_partition.grap
.ps \n[PS]
.QP
This figure shows the retrieval of cars based on a partition (their color), with both a linear and a logarithmic scale.
.QE
In this example, both the linear and the logarithmic scales are represented to better grasp the difference between all databases.
The linear scale shows the linearity of the request time for uncached databases.
Respectively, the logarithmically scaled figure does the same for cached databases,
which are flattened in the linear scale since they all are hundreds of time quicker than the uncached ones.
.SS Tags (n to n relations)
.LP