Talk a bit more about the results (index).

This commit is contained in:
Philippe PITTOLI 2024-05-28 22:28:08 +02:00
parent 948f995ef4
commit 751baef391
3 changed files with 55 additions and 22 deletions

View File

@ -0,0 +1,6 @@
%K CBOR
%A C. Bormann
%A P. Hoffman
%T RFC 8949, Concise Binary Object Representation (CBOR)
%D 2020
%I Internet Engineering Task Force (IETF)

View File

@ -44,7 +44,7 @@ define legend {
cy = cy - hdiff
legend_line(cy,lstartx,lendx,tstartx,black,"Cached db and index")
cy = cy - hdiff
legend_line(cy,lstartx,lendx,tstartx,pink,"FIFO db and cached index")
legend_line(cy,lstartx,lendx,tstartx,pink,"Common db, cached index")
cy = cy - hdiff
legend_line(cy,lstartx,lendx,tstartx,blue,"Uncached db, cached index")
cy = cy - hdiff

View File

@ -943,14 +943,47 @@ The experiment starts with a database containing 1,000 cars and goes up to 250,0
.so graphs/query_index.grap
.ps \n[PS]
.QP
This figure shows the request durations to retrieve data based on a basic index with a database containing up to 250k entries.
This figure shows the request durations to retrieve data based on a basic index with a database containing up to 250k entries, both with linear and logarithmic scales.
.QE
Since there is only one value to retrieve, the request is quick and time is almost constant.
When the value and the index are kept in memory (see \f[CW]RAM only\f[] and \f[CW]Cached db\f[]), the retrieval is almost instantaneous (about 50 to 120 ns).
In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db, cached index\f[]).
When the value and the index are kept in memory (see \f[CW]RAM only\f[], \f[CW]Cached db\f[] and \f[CW]Common db\f[]), the retrieval is almost instantaneous\*[*].
.FOOTNOTE1
About 110 to 120 ns for RAM-only and cached database.
This is slightly more (about 200 ns) for Common database since there is a few more steps due to the inner structure to maintain.
.FOOTNOTE2
In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db\f[]).
The request is a little longer when the index isn't cached (see \f[CW]Uncached db and index\f[]); in this case DODB walks the file-system to find the right symlink to follow, thus slowing the process even more, by up to 20%.
The logarithmic scale version of this figure shows that RAM-only and Cached databases have exactly the same performance.
The Common database is somewhat slower than these two due to the caching policy: when a value is asked, the Common database puts its key at the start of a list to represent a
.I recent
use of this data (respectively, the last values in this list are the least recently used entries).
Thus, Common database takes 80 ns for its caching policy, which makes this database about 67% slower than the previous ones to retrieve a value.
Uncached databases are far away from these results, as shown by the logarithmically scaled figure.
The data cache improves the duration of the requests, this makes them at least a hundred times faster.
The results depend on the data size; the bigger the data, the slower the serialization (and deserialization).
That is why alternative encodings, such as CBOR,
.[
CBOR
.]
should be considered for large databases.
.SS Partitions (1 to n relations)
.LP
.ps -2
.so graphs/query_partition.grap
.ps \n[PS]
.SS Tags (n to n relations)
.LP
.ps -2
.so graphs/query_tag.grap
.ps \n[PS]
.
.SS Summary
.ps -2
.TS
allbox tab(:);
@ -965,31 +998,25 @@ Cached db and index:T{
Performance for retrieving a value is the same as RAM only while
enabling the admin to manually search for data on-disk.
T}:about the same perfs
Uncached db, cached index::300 to 400x slower
Common db, cached index:T{
Performance is still excellent while requiring a
.UL configurable
amount of RAM.
Should be used by default.
T}:T{
67% slower (about 200 ns) which still is great
T}
Uncached db, cached index:Very slow. Common database should be considered instead.:170 to 180x slower
Uncached db and index:T{
Best memory footprint, worst performance.
T}:400 to 500x slower
T}:200 to 210x slower
.TE
.ps \n[PS]
.B Conclusion :
as expected, retrieving a single value is fast and the size of the database doesn't matter much.
.SS Conclusion on performance
As expected, retrieving a single value is fast and the size of the database doesn't matter much.
Each deserialization and, more importantly, each disk access is a pain point.
Caching the value enables a massive performance gain, data can be retrieved several hundred times quicker.
.SS Partitions (1 to n relations)
.LP
.ps -2
.so graphs/query_partition.grap
.ps \n[PS]
.SS Tags (n to n relations)
.LP
.ps -2
.so graphs/query_tag.grap
.ps \n[PS]
.
.
.
.SECTION Future work
This section presents all the features I want to see in a future version of the DODB library.