From 751baef391b94988e7f8e7286a477e331e1fd44e Mon Sep 17 00:00:00 2001 From: Philippe PITTOLI Date: Tue, 28 May 2024 22:28:08 +0200 Subject: [PATCH] Talk a bit more about the results (index). --- paper/bibliography | 6 ++++ paper/legend.grap | 2 +- paper/paper.ms | 69 ++++++++++++++++++++++++++++++++-------------- 3 files changed, 55 insertions(+), 22 deletions(-) diff --git a/paper/bibliography b/paper/bibliography index e69de29..075b055 100644 --- a/paper/bibliography +++ b/paper/bibliography @@ -0,0 +1,6 @@ +%K CBOR +%A C. Bormann +%A P. Hoffman +%T RFC 8949, Concise Binary Object Representation (CBOR) +%D 2020 +%I Internet Engineering Task Force (IETF) diff --git a/paper/legend.grap b/paper/legend.grap index 8ec53ed..9cbe433 100644 --- a/paper/legend.grap +++ b/paper/legend.grap @@ -44,7 +44,7 @@ define legend { cy = cy - hdiff legend_line(cy,lstartx,lendx,tstartx,black,"Cached db and index") cy = cy - hdiff - legend_line(cy,lstartx,lendx,tstartx,pink,"FIFO db and cached index") + legend_line(cy,lstartx,lendx,tstartx,pink,"Common db, cached index") cy = cy - hdiff legend_line(cy,lstartx,lendx,tstartx,blue,"Uncached db, cached index") cy = cy - hdiff diff --git a/paper/paper.ms b/paper/paper.ms index 896ce51..c542759 100644 --- a/paper/paper.ms +++ b/paper/paper.ms @@ -943,14 +943,47 @@ The experiment starts with a database containing 1,000 cars and goes up to 250,0 .so graphs/query_index.grap .ps \n[PS] .QP -This figure shows the request durations to retrieve data based on a basic index with a database containing up to 250k entries. +This figure shows the request durations to retrieve data based on a basic index with a database containing up to 250k entries, both with linear and logarithmic scales. .QE Since there is only one value to retrieve, the request is quick and time is almost constant. -When the value and the index are kept in memory (see \f[CW]RAM only\f[] and \f[CW]Cached db\f[]), the retrieval is almost instantaneous (about 50 to 120 ns). -In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db, cached index\f[]). +When the value and the index are kept in memory (see \f[CW]RAM only\f[], \f[CW]Cached db\f[] and \f[CW]Common db\f[]), the retrieval is almost instantaneous\*[*]. +.FOOTNOTE1 +About 110 to 120 ns for RAM-only and cached database. +This is slightly more (about 200 ns) for Common database since there is a few more steps due to the inner structure to maintain. +.FOOTNOTE2 +In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db\f[]). The request is a little longer when the index isn't cached (see \f[CW]Uncached db and index\f[]); in this case DODB walks the file-system to find the right symlink to follow, thus slowing the process even more, by up to 20%. +The logarithmic scale version of this figure shows that RAM-only and Cached databases have exactly the same performance. +The Common database is somewhat slower than these two due to the caching policy: when a value is asked, the Common database puts its key at the start of a list to represent a +.I recent +use of this data (respectively, the last values in this list are the least recently used entries). +Thus, Common database takes 80 ns for its caching policy, which makes this database about 67% slower than the previous ones to retrieve a value. +Uncached databases are far away from these results, as shown by the logarithmically scaled figure. +The data cache improves the duration of the requests, this makes them at least a hundred times faster. + +The results depend on the data size; the bigger the data, the slower the serialization (and deserialization). +That is why alternative encodings, such as CBOR, +.[ +CBOR +.] +should be considered for large databases. + +.SS Partitions (1 to n relations) +.LP +.ps -2 +.so graphs/query_partition.grap +.ps \n[PS] + +.SS Tags (n to n relations) +.LP +.ps -2 +.so graphs/query_tag.grap +.ps \n[PS] +. + +.SS Summary .ps -2 .TS allbox tab(:); @@ -965,31 +998,25 @@ Cached db and index:T{ Performance for retrieving a value is the same as RAM only while enabling the admin to manually search for data on-disk. T}:about the same perfs -Uncached db, cached index::300 to 400x slower +Common db, cached index:T{ +Performance is still excellent while requiring a +.UL configurable +amount of RAM. +Should be used by default. +T}:T{ +67% slower (about 200 ns) which still is great +T} +Uncached db, cached index:Very slow. Common database should be considered instead.:170 to 180x slower Uncached db and index:T{ Best memory footprint, worst performance. -T}:400 to 500x slower +T}:200 to 210x slower .TE .ps \n[PS] -.B Conclusion : -as expected, retrieving a single value is fast and the size of the database doesn't matter much. +.SS Conclusion on performance +As expected, retrieving a single value is fast and the size of the database doesn't matter much. Each deserialization and, more importantly, each disk access is a pain point. Caching the value enables a massive performance gain, data can be retrieved several hundred times quicker. -.SS Partitions (1 to n relations) -.LP - -.ps -2 -.so graphs/query_partition.grap -.ps \n[PS] - -.SS Tags (n to n relations) -.LP -.ps -2 -.so graphs/query_tag.grap -.ps \n[PS] -. -. . .SECTION Future work This section presents all the features I want to see in a future version of the DODB library.