Minor changes.

This commit is contained in:
Philippe Pittoli 2025-04-15 00:08:53 +02:00
parent bc2fb503f2
commit d113026d40

View file

@ -296,7 +296,7 @@ CBOR
.[
CBOR
.]
is a work-in-progress.
will be provided in a future version.
Nothing binds DODB to a particular format.
.FOOTNOTE2
The key of the lookup table is an auto-incremented number used as the name of the stored file.
@ -339,7 +339,7 @@ database[key]
database[key] = new_value
# Delete a value based on its key.
database.delete 0
database.delete key
.SOURCE
.QE
.
@ -395,7 +395,7 @@ An index also requires a callback, a procedure to extract the value used for ind
In this case, the procedure takes a car as a parameter and returns its "name" attribute\*[*].
.FOOTNOTE1
This procedure can be arbitrarily complex and include any necessary data transformation.
For example, the netlibre project (discussed later in the papper) indexes their users' email, but emails are first encoded in base64 to avoid messing around with the filesystem.
For example, the netlibre project (discussed later in the paper) indexes their users' email, but emails are first encoded in base64 to avoid messing around with the filesystem.
.FOOTNOTE2
Once the index has been created, every inserted or modified entry in the database will be indexed.
@ -608,7 +608,7 @@ Since indexes do not require nearly as much memory as caching the entire databas
.
.
.
.SECTION Common database
.SECTION Common database: the common use of DODB
Storing the entire data-set in memory is an effective way to make the requests fast, as does
the
.I "cached database"
@ -694,9 +694,9 @@ Having the same API to handle both long and short-lived data is useful.
Moreover, the previously mentioned triggers (basic indexes, partitions and tags) would also work the same way for these short-lived data.
Therefore, the
.I RAM-only
was created.
database was created.
Also, indexes don't always require a filesystem representation and could be used only to speed up data retrieval for example.
Also, indexes don't always require a filesystem representation and could be used only to speed up data retrieval.
Therefore, as for the database,
.I RAM-only
triggers were created.
@ -812,7 +812,7 @@ Five instances of DODB are tested:
.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in memory);
.BULLET \fIuncached database but cached index\f[] shows the improvement to expect with only a cached index;
.BULLET \fIcommon database\f[] shows the expected usage DODB in most instances, with a limited data cache (100k entries in our scenario)\*[*] and a cached index;
.BULLET \fIcached database\f[] represents a database will all the entries in cache (no eviction mechanism);
.BULLET \fIcached database\f[] represents a database with all the entries in cache (no eviction mechanism);
.BULLET and finally, \fIRAM only\f[], the database for volatile data (no data is written on the storage device).
.ENDBULLET
.FOOTNOTE1
@ -897,14 +897,7 @@ A partition index enables to match a list of entries based on an attribute.
In the experiment, a database of cars is created along with a partition on their color.
Performance is analyzed based the partition size (the number of red cars) and the duration to retrieve all the entries.
.ps -2
.so graphs/query_partition.grap
.ps \n[PS]
.QP
This figure shows the retrieval of cars based on a partition (their color), with both a linear and a logarithmic scale.
The number of cars retrieved scales from 2000 to 10000.
.QE
In this example, both the linear and the logarithmic scales are represented to better grasp the difference between all databases.
In the following figure, both the linear and the logarithmic scales are represented to better grasp the difference between all databases.
The linear scale shows the linearity of the request time for uncached databases.
Respectively, the logarithmically scaled figure does the same for cached databases,
which are flattened in the linear scale since they are between one to five hundred times quicker than the uncached ones.
@ -912,7 +905,7 @@ which are flattened in the linear scale since they are between one to five hundr
The duration of a retrieval grows linearly with the number of matched entries.
On both figures, a dashed line is drawn representing a linear growth based on the quickest retrieval observed from basic indexes for each database.
This dashed line and the observed results differ slightly; observed results grow more than what has been calculated.
This difference comes, at least partially, from the additional process of putting all the results in an array (which may also include some memory management) and the accumulated random delays for the retrieval of each value (due to the cache policy processing, to the processus scheduling on the machine, etc.).
This difference comes, at least partially, from the additional process of putting all the results in an array (which may also include some memory management) and the accumulated random delays for the retrieval of each value (due to the cache policy processing, the processus scheduling on the machine, etc.).
Further analysis of the results may be interesting but this is far beyond the scope of this document.
The objective of this experiment is to give an idea of the performance that can be expected from DODB.
@ -925,7 +918,15 @@ and
.I RAM-only
databases, and the more data there is to retrieve, the worst it gets.
However, retrieving thousands and thousands of entries in a single request may not be a typical usage of databases, anyway.
.
.ps -2
.so graphs/query_partition.grap
.ps \n[PS]
.QP
This figure shows the retrieval of cars based on a partition (their color), with both a linear and a logarithmic scale.
The number of cars retrieved scales from 2000 to 10000.
Dashed lines represent a linear growth based on the quickest retrieval observed from basic indexes.
.QE
.
.SS Tags (n to n relations)
A tag index enables to match a list of entries based on an attribute with potentially multiple values (such as an array).
@ -980,7 +981,7 @@ The eviction policy implies some operations leading to poorer performances, howe
is essentially a debug mode and is not expected to run in most real-life scenarii.
The purpose is to produce a control sample (involving only raw IO operations) to compare it to other (more realistic) implementations.
Cached indexes should be considered for most applications, and even more their RAM-only version in case the filesystem representation isn't necessary.
Cached indexes should be considered for most applications, or their RAM-only version in case the filesystem representation isn't necessary.
.
.\" .ps -2
.\" .TS