Paper
This commit is contained in:
parent
751baef391
commit
02e7e82fa1
@ -4,3 +4,9 @@
|
||||
%T RFC 8949, Concise Binary Object Representation (CBOR)
|
||||
%D 2020
|
||||
%I Internet Engineering Task Force (IETF)
|
||||
|
||||
%K JSON
|
||||
%A Tim Bray
|
||||
%T RFC 8259, The JavaScript Object Notation (JSON) Data Interchange Format
|
||||
%D 2017
|
||||
%I Internet Engineering Task Force (IETF)
|
||||
|
@ -211,7 +211,14 @@ end
|
||||
When a value is added, it is serialized\*[*] and written in a dedicated file.
|
||||
.FOOTNOTE1
|
||||
Serialization is currently in JSON.
|
||||
CBOR is a work-in-progress.
|
||||
.[
|
||||
JSON
|
||||
.]
|
||||
CBOR
|
||||
.[
|
||||
CBOR
|
||||
.]
|
||||
is a work-in-progress.
|
||||
Nothing binds DODB to a particular format.
|
||||
.FOOTNOTE2
|
||||
The key of the hash is a number, auto-incremented, used as the name of the stored file.
|
||||
@ -896,16 +903,18 @@ Three possible indexes exist in DODB:
|
||||
The scenario is simple: adding values to a database with indexes (basic, partitions and tags) then query 100 times a value based on the different indexes.
|
||||
Loop and repeat.
|
||||
|
||||
Four instances of DODB are tested:
|
||||
Five instances of DODB are tested:
|
||||
.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory);
|
||||
.BULLET \fIuncached data but cached index\f[] shows the improvement you can expect by having a cache on indexes;
|
||||
.BULLET \fIcached database\f[] shows the most basic use of DODB\*[*];
|
||||
.BULLET \fIuncached database but cached index\f[] shows the improvement you can expect by having a cache on indexes;
|
||||
.BULLET \fIcommon database\f[] shows the most basic use of DODB, with a limited cache (100k entries)\*[*];
|
||||
.BULLET \fIcached database\f[] represents a database will all the entries in cache (no eviction mechanism);
|
||||
.BULLET \fIRAM only\f[], the database doesn't have a representation on disk (no data is written on it).
|
||||
The \fIRAM only\f[] instance shows a possible way to use DODB: to keep a consistent API to store data, including in-memory data with a lifetime related to the application's.
|
||||
.ENDBULLET
|
||||
.FOOTNOTE1
|
||||
Having a cached database will probably be the most widespread use of DODB.
|
||||
When memory isn't scarce, there is no point not using it to achieve better performance.
|
||||
Moreover, the "common database" enables to configure the cache size, so this database is relevant even when the data-set is bigger than the available memory.
|
||||
.FOOTNOTE2
|
||||
|
||||
The computer on which this test is performed\*[*] is a AMD PRO A10-8770E R7 (4 cores), 2.8 GHz.When mentioned, the
|
||||
@ -955,26 +964,40 @@ This is slightly more (about 200 ns) for Common database since there is a few mo
|
||||
In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db\f[]).
|
||||
The request is a little longer when the index isn't cached (see \f[CW]Uncached db and index\f[]); in this case DODB walks the file-system to find the right symlink to follow, thus slowing the process even more, by up to 20%.
|
||||
|
||||
The logarithmic scale version of this figure shows that RAM-only and Cached databases have exactly the same performance.
|
||||
The Common database is somewhat slower than these two due to the caching policy: when a value is asked, the Common database puts its key at the start of a list to represent a
|
||||
The logarithmic scale version of this figure shows that \fIRAM-only\f[] and \fIcached\f[] databases have exactly the same performance.
|
||||
The \fIcommon\f[] database is somewhat slower than these two due to the caching policy: when a value is asked, the \fIcommon\f[] database puts its key at the start of a list to represent a
|
||||
.I recent
|
||||
use of this data (respectively, the last values in this list are the least recently used entries).
|
||||
Thus, Common database takes 80 ns for its caching policy, which makes this database about 67% slower than the previous ones to retrieve a value.
|
||||
Thus, the \fIcommon\f[] database takes 80 ns for its caching policy, which makes this database about 67% slower than the previous ones to retrieve a value.
|
||||
Uncached databases are far away from these results, as shown by the logarithmically scaled figure.
|
||||
The data cache improves the duration of the requests, this makes them at least a hundred times faster.
|
||||
The data cache improves the duration of the requests, this makes them at least 170 times faster.
|
||||
|
||||
The results depend on the data size; the bigger the data, the slower the serialization (and deserialization).
|
||||
In this example, the database entries are almost empty; they have very few attributes and not much content (a few dozen characters max).
|
||||
Thus, performance of non-cached databases will be even more severely impacted with real-world data.
|
||||
That is why alternative encodings, such as CBOR,
|
||||
.[
|
||||
CBOR
|
||||
.]
|
||||
should be considered for large databases.
|
||||
|
||||
.SS Partitions (1 to n relations)
|
||||
.LP
|
||||
The previous example shown the retrieval of a single value from the database.
|
||||
The following will show what happens when thousands of entries are retrieved.
|
||||
|
||||
A partition index enables to match a list of entries based on an attribute.
|
||||
In the experiment, a database of cars is created along with a partition on their color.
|
||||
Performance is analyzed based the partition size (the number of red cars) and the duration to retrieve all the entries.
|
||||
|
||||
.ps -2
|
||||
.so graphs/query_partition.grap
|
||||
.ps \n[PS]
|
||||
.QP
|
||||
This figure shows the retrieval of cars based on a partition (their color), with both a linear and a logarithmic scale.
|
||||
.QE
|
||||
In this example, both the linear and the logarithmic scales are represented to better grasp the difference between all databases.
|
||||
The linear scale shows the linearity of the request time for uncached databases.
|
||||
Respectively, the logarithmically scaled figure does the same for cached databases,
|
||||
which are flattened in the linear scale since they all are hundreds of time quicker than the uncached ones.
|
||||
|
||||
.SS Tags (n to n relations)
|
||||
.LP
|
||||
|
Loading…
Reference in New Issue
Block a user