Talk a bit more about the results (index).

2024-05-28 22:28:08 +02:00 · 2024-05-28 22:28:08 +02:00 · 751baef391
commit 751baef391
parent 948f995ef4
3 changed files with 55 additions and 22 deletions
--- a/paper/bibliography
+++ b/paper/bibliography
@ -0,0 +1,6 @@
+%K CBOR
+%A C. Bormann
+%A P. Hoffman
+%T RFC 8949, Concise Binary Object Representation (CBOR)
+%D 2020
+%I Internet Engineering Task Force (IETF)
--- a/paper/legend.grap
+++ b/paper/legend.grap
@ -44,7 +44,7 @@ define legend {
 	cy = cy - hdiff
 	legend_line(cy,lstartx,lendx,tstartx,black,"Cached db and index")
 	cy = cy - hdiff
-	legend_line(cy,lstartx,lendx,tstartx,pink,"FIFO db and cached index")
+	legend_line(cy,lstartx,lendx,tstartx,pink,"Common db, cached index")
 	cy = cy - hdiff
 	legend_line(cy,lstartx,lendx,tstartx,blue,"Uncached db, cached index")
 	cy = cy - hdiff
--- a/paper/paper.ms
+++ b/paper/paper.ms
@ -943,14 +943,47 @@ The experiment starts with a database containing 1,000 cars and goes up to 250,0
 .so graphs/query_index.grap
 .ps \n[PS]
 .QP
-This figure shows the request durations to retrieve data based on a basic index with a database containing up to 250k entries.
+This figure shows the request durations to retrieve data based on a basic index with a database containing up to 250k entries, both with linear and logarithmic scales.
 .QE

 Since there is only one value to retrieve, the request is quick and time is almost constant.
-When the value and the index are kept in memory (see \f[CW]RAM only\f[] and \f[CW]Cached db\f[]), the retrieval is almost instantaneous (about 50 to 120 ns).
-In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db, cached index\f[]).
+When the value and the index are kept in memory (see \f[CW]RAM only\f[], \f[CW]Cached db\f[] and \f[CW]Common db\f[]), the retrieval is almost instantaneous\*[*].
+.FOOTNOTE1
+About 110 to 120 ns for RAM-only and cached database.
+This is slightly more (about 200 ns) for Common database since there is a few more steps due to the inner structure to maintain.
+.FOOTNOTE2
+In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db\f[]).
 The request is a little longer when the index isn't cached (see \f[CW]Uncached db and index\f[]); in this case DODB walks the file-system to find the right symlink to follow, thus slowing the process even more, by up to 20%.

+The logarithmic scale version of this figure shows that RAM-only and Cached databases have exactly the same performance.
+The Common database is somewhat slower than these two due to the caching policy: when a value is asked, the Common database puts its key at the start of a list to represent a
+.I recent
+use of this data (respectively, the last values in this list are the least recently used entries).
+Thus, Common database takes 80 ns for its caching policy, which makes this database about 67% slower than the previous ones to retrieve a value.
+Uncached databases are far away from these results, as shown by the logarithmically scaled figure.
+The data cache improves the duration of the requests, this makes them at least a hundred times faster.
+
+The results depend on the data size; the bigger the data, the slower the serialization (and deserialization).
+That is why alternative encodings, such as CBOR,
+.[
+CBOR
+.]
+should be considered for large databases.
+
+.SS Partitions (1 to n relations)
+.LP
+.ps -2
+.so graphs/query_partition.grap
+.ps \n[PS]
+
+.SS Tags (n to n relations)
+.LP
+.ps -2
+.so graphs/query_tag.grap
+.ps \n[PS]
+.
+
+.SS Summary
 .ps -2
 .TS
 allbox tab(:);
@ -965,31 +998,25 @@ Cached db and index:T{
 Performance for retrieving a value is the same as RAM only while
 enabling the admin to manually search for data on-disk.
 T}:about the same perfs
-Uncached db, cached index::300 to 400x slower
+Common db, cached index:T{
+Performance is still excellent while requiring a
+.UL configurable
+amount of RAM.
+Should be used by default.
+T}:T{
+67% slower (about 200 ns) which still is great
+T}
+Uncached db, cached index:Very slow. Common database should be considered instead.:170 to 180x slower
 Uncached db and index:T{
 Best memory footprint, worst performance.
-T}:400 to 500x slower
+T}:200 to 210x slower
 .TE
 .ps \n[PS]

-.B Conclusion :
-as expected, retrieving a single value is fast and the size of the database doesn't matter much.
+.SS Conclusion on performance
+As expected, retrieving a single value is fast and the size of the database doesn't matter much.
 Each deserialization and, more importantly, each disk access is a pain point.
 Caching the value enables a massive performance gain, data can be retrieved several hundred times quicker.
-.SS Partitions (1 to n relations)
-.LP
-
-.ps -2
-.so graphs/query_partition.grap
-.ps \n[PS]
-
-.SS Tags (n to n relations)
-.LP
-.ps -2
-.so graphs/query_tag.grap
-.ps \n[PS]
-.
-.
 .
 .SECTION Future work
 This section presents all the features I want to see in a future version of the DODB library.