Explanations.

2024-05-30 03:20:05 +02:00 · 2024-05-30 03:20:05 +02:00 · 25bfab34e0
commit 25bfab34e0
parent 814baced05
2 changed files with 65 additions and 7 deletions
--- a/paper/macros.roff
+++ b/paper/macros.roff
@ -80,7 +80,8 @@ accumulate
 .
 .de ENUM           \" Numbered list
 .nr LIST_NUMBER +1
-.IP \\n[LIST_NUMBER] 2
+.IP \\n[LIST_NUMBER]. 3
+.sp -1
 .ie '\\$1'' \
 .
 .el         \\$*
--- a/paper/paper.ms
+++ b/paper/paper.ms
@ -998,23 +998,80 @@ The number of cars retrieved scales from 2000 to 10000.
 In this example, both the linear and the logarithmic scales are represented to better grasp the difference between all databases.
 The linear scale shows the linearity of the request time for uncached databases.
 Respectively, the logarithmically scaled figure does the same for cached databases,
-which are flattened in the linear scale since they all are hundreds of time quicker than the uncached ones.
+which are flattened in the linear scale since they are between one to five hundred times quicker than the uncached ones.

 The duration of a retrieval grows linearly with the number of matched entries.
 On both figures, a dashed line is drawn representing a linear growth based on the quickest retrieval observed from basic indexes for each database.
 This dashed line and the observed results differ slightly; observed results grow more than what has been calculated.
-This difference comes, at least partially, from the additional process of putting all the results in an array (which may also include some memory management) and the accumulated random delays for the retrieval of each value (due to processus scheduling on the machine).
-
-Further analysis of the results may be interesting but are far beyond the scope of this document.
+This difference comes, at least partially, from the additional process of putting all the results in an array (which may also include some memory management) and the accumulated random delays for the retrieval of each value (due to processus scheduling on the machine, for example).

+Further analysis of the results may be interesting but this is far beyond the scope of this document.
+The objective of this experiment is to give an idea of the performance that can be expected from DODB.
+Basically, uncached databases are between 70 to 600 times slower than cached ones.
+The eviction policy in
+.I common
+database slows down the retrievals, which makes it 70% to 6 times slower than
+.I cached
+and
+.I RAM-only
+databases, and the more data there is to retrieve, the worst it gets.
+However, retrieving thousands and thousands of entries in a single request may not be a typical usage of databases, anyway.
+.
 .SS Tags (n to n relations)
-.LP
+A tag index enables to match a list of entries based on an attribute with potentially multiple values (such as an array).
+In the experiment, a database of cars is created along with a tag index on a list of
+.I keywords
+associated with the cars, such as "elegant", "fast" and so on.
+Performance is analyzed based the number of entries retrieved (the number of elegant cars) and the request duration.
+.
 .ps -2
 .so graphs/query_tag.grap
 .ps \n[PS]
+.QP
+This figure shows the retrieval of cars based on a tag (all cars tagged as
+.I elegant ),
+with both a linear and a logarithmic scale.
+The number of cars retrieved scales from 1000 to 5000.
+.QE
 .
+.
+The results are similar to the retrivial of partition indexes, because this is fundamentally the same thing:
+.ENUM both tag and partition indexes enable to retrieve a list of entries;
+.ENUM the keys of the database entries come from listing the content of a directory (uncached indexes) or are directly available from a hash (cached indexes);
+.ENUM data is retrieved irrespective of the index, it is either read from the storage device or retrieved from a data cache, which depends on the type of database.
+.ENDENUM
+Retrieving data from a partition or a tag involves exactly the same actions, which leads to the same results.

-.SS Summary
+A particularity of the tag index compared to partitions is that it enables multiple values for the same attribute, thus a database entry can be referenced in multiple directories.
+For example, a car can be both
+.I elegant
+and
+.I fast .
+The retrieval of entries corresponding to a single
+.I tag
+is then exactly similar to retrieving a partition\*[*].
+.FOOTNOTE1
+It would be different in case of a retrieval of entries corresponding to
+.I several
+tags, such as selecting cars that are
+.UL "both elegant and fast" .
+This test may be done in a future version of this document.
+.FOOTNOTE2
+.
+.
+.SS Summary of the different databases and their use
+.LP
+.B "RAM-only database"
+is the fastest database but has a limited use since data isn't saved.
+
+.B "Cached database"
+enables the same performance on data retrieval than RAM-only while actually storing data on a storage device.
+This database is to be considered to achieve maximum speed for data-sets fitting in memory.
+
+.B "Common database"
+enables to lower the memory requirements as much as desired.
+The eviction policy implies some operations which leads to poorer performances, however still acceptable.
+.
 .ps -2
 .TS
 allbox tab(:);