dodb.cr/graphs/graphs.ms

282 lines
8.3 KiB
Plaintext
Raw Normal View History

.so macros.roff
.TITLE Brief performance analysis of Document Oriented DataBase (DODB)
.AUTHOR Philippe P.
.ABSTRACT1
DODB is a database-as-library, enabling a very simple way to store applications' data: storing serialized
.I documents
(basically any data type) in plain files.
To speed-up searches, attributes of these documents can be used as indexes which leads to create a few symbolic links
.I symlinks ) (
on the disk.
.br
See the \f[CW]README\f[] for a longer explanation.
This document briefly presents an experiment to understand the performances we can get with this approach.
.ABSTRACT2
.SECTION Experimental scenario
2024-05-12 16:47:53 +02:00
.LP
The following experiment shows the performance of DODB based on quering durations.
Data can be searched via
.I indexes ,
as for SQL databases.
Three possible indexes exist in DODB:
(a) basic indexes, representing 1 to 1 relations, the document's attribute is related to a value and each value of this attribute is unique,
(b) partitions, representing 1 to n relations, the attribute has a value and this value can be shared by other documents,
(c) tags, representing n to n relations, enabling the attribute to have multiple values whose are shared by other documents.
The scenario is simple: adding values to a database with indexes (basic, partitions and tags) then query 100 times a value based on the different indexes.
Loop and repeat.
Four instances of DODB are tested:
.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory) ;
.BULLET \fIuncached data but cached index\f[] shows the improvement you can expect by having a cache on indexes ;
.BULLET \fIcached database\f[] shows the most basic use of DODB\*[*] ;
.BULLET \fIRAM only\f[], the database doesn't have a representation on disk (no data is written on it).
The \fIRAM only\f[] instance shows a possible way to use DODB: to keep a consistent API to store data, including in-memory data with a lifetime related to the application's.
.ENDBULLET
.FOOTNOTE1
Having a cached database will probably be the most widespread use of DODB.
When memory isn't scarce, there is no point not using it to achieve better performance.
.FOOTNOTE2
The computer on which this test is performed\*[*] is a AMD PRO A10-8770E R7 (4 cores), 2.8 GHz.When mentioned, the
.I disk
is actually a
.I "temporary file-system (tmpfs)"
to enable maximum efficiency.
.FOOTNOTE1
A very simple $50 PC, buyed online.
Nothing fancy.
.FOOTNOTE2
The library is written in Crystal and so is the benchmark (\f[CW]spec/benchmark-cars.cr\f[]).
Nonetheless, despite a few technicalities, the objective of this document is to provide an insight on the approach used in DODB more than this particular implementation.
The manipulated data type can be found in \f[CW]spec/db-cars.cr\f[].
.SOURCE Ruby ps=9 vs=9p
class Car
property name : String # 1-1 relation
property color : String # 1-n relation
property keywords : Array(String) # n-n relation
end
.SOURCE
.
.SECTION Basic indexes (1 to 1 relations)
.LP
An index enables to match a single value based on a small string.
2024-05-13 03:38:41 +02:00
In our example, each \f[CW]car\f[] has an unique \fIname\f[] which is used as an index.
The following graph represents the result of 100 queries of a car based on its name.
The experiment starts with a database containing 1,000 cars and goes up to 250,000 cars.
Since there is only one value to retrieve, the request is quick and time is almost constant.
When the value and the index are kept in memory (see \f[CW]RAM only\f[] and \f[CW]Cached db\f[]), the retrieval is almost instantaneous (about 50 to 120 ns).
In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db, cached index\f[]).
The request is a little longer when the index isn't cached, in this case DODB walks the file-system to find the right symlink to follow, thus slowing the process even more, by up to 20%.
2024-05-12 16:47:53 +02:00
.G1
2024-05-12 20:47:09 +02:00
copy "legend.grap"
2024-05-12 19:24:50 +02:00
frame invis ht 3 wid 4 left solid bot solid
coord y 0,50
ticks left out from 0 to 50 by 10
2024-05-12 20:47:09 +02:00
ticks bot out at 50000 "50,000", 100000 "100,000", 150000 "150,000", 200000 "200,000", 250000 "250,000"
2024-05-12 19:24:50 +02:00
2024-05-12 20:47:09 +02:00
label left "Request duration with" unaligned "an index (us)" "(Median)" left 0.8
label bot "Number of cars in the database" down 0.1
2024-05-12 16:47:53 +02:00
2024-05-12 19:24:50 +02:00
obram = obuncache = obcache = obsemi = 0 # old bullets
cbram = cbuncache = cbcache = cbsemi = 0 # current bullets
2024-05-12 16:47:53 +02:00
2024-05-12 20:47:09 +02:00
legendxleft = 100000
legendxright = 250000
legendyup = 15
legendydown = 2
boite(legendxleft,legendxright,legendyup,legendydown)
legend(legendxleft,legendxright,legendyup,legendydown)
2024-05-12 16:47:53 +02:00
copy "../data/index.d" thru X
cx = $1*5
2024-05-12 19:24:50 +02:00
y_scale = 1000
2024-05-12 16:47:53 +02:00
# ram cached semi uncached
2024-05-12 19:24:50 +02:00
line from cx,$2/y_scale to cx,$4/y_scale
line from cx,$5/y_scale to cx,$7/y_scale
line from cx,$8/y_scale to cx,$10/y_scale
line from cx,$11/y_scale to cx,$13/y_scale
2024-05-12 16:47:53 +02:00
#ty = $3
cx = $1*5
2024-05-12 19:24:50 +02:00
cbram = $3/y_scale
cbcache = $6/y_scale
cbsemi = $9/y_scale
cbuncache = $12/y_scale
2024-05-12 16:47:53 +02:00
if (obram > 0) then {line from cx,cbram to ox,obram}
if (obcache > 0) then {line from cx,cbcache to ox,obcache}
.gcolor blue
if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi}
.gcolor
.gcolor green
if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache}
.gcolor
obram = cbram
obcache = cbcache
obsemi = cbsemi
obuncache = cbuncache
ox = cx
# ram cached semi uncached
.gcolor red
bullet at cx,cbram
.gcolor
bullet at cx,cbcache
.gcolor blue
bullet at cx,cbsemi
.gcolor
.gcolor green
bullet at cx,cbuncache
.gcolor
X
.G2
.bp
.SECTION Partitions (1 to n relations)
2024-05-12 19:24:50 +02:00
.LP
2024-05-12 16:47:53 +02:00
.G1
2024-05-12 20:47:09 +02:00
copy "legend.grap"
2024-05-12 19:24:50 +02:00
frame invis ht 3 wid 4 left solid bot solid
coord x 0,5000*2 y 0,350
ticks left out from 0 to 350 by 50
2024-05-12 20:47:09 +02:00
label left "Request duration" unaligned "for a partition (ms)" "(Median)" left 0.8
label bot "Number of cars matching the partition" down 0.1
2024-05-12 16:47:53 +02:00
obram = obuncache = obcache = obsemi = 0
cbram = cbuncache = cbcache = cbsemi = 0
2024-05-12 20:47:09 +02:00
legendxleft = 1000
legendxright = 6500
legendyup = 330
legendydown = 230
boite(legendxleft,legendxright,legendyup,legendydown)
legend(legendxleft,legendxright,legendyup,legendydown)
2024-05-12 16:47:53 +02:00
copy "../data/partitions.d" thru X
2024-05-12 19:24:50 +02:00
cx = $1*2
y_scale = 1000000
2024-05-12 16:47:53 +02:00
# ram cached semi uncached
2024-05-12 19:24:50 +02:00
line from cx,$2/y_scale to cx,$4/y_scale
line from cx,$5/y_scale to cx,$7/y_scale
line from cx,$8/y_scale to cx,$10/y_scale
line from cx,$11/y_scale to cx,$13/y_scale
2024-05-12 16:47:53 +02:00
#ty = $3
2024-05-12 19:24:50 +02:00
cbram = $3/y_scale
cbcache = $6/y_scale
cbsemi = $9/y_scale
cbuncache = $12/y_scale
2024-05-12 16:47:53 +02:00
if (obram > 0) then {line from cx,cbram to ox,obram}
if (obcache > 0) then {line from cx,cbcache to ox,obcache}
.gcolor blue
if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi}
.gcolor
.gcolor green
if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache}
.gcolor
obram = cbram
obcache = cbcache
obsemi = cbsemi
obuncache = cbuncache
ox = cx
# ram cached semi uncached
.gcolor red
bullet at cx,cbram
.gcolor
bullet at cx,cbcache
.gcolor blue
bullet at cx,cbsemi
.gcolor
.gcolor green
bullet at cx,cbuncache
.gcolor
X
.G2
2024-05-12 19:24:50 +02:00
.bp
.SECTION Tags (n to n relations)
2024-05-12 19:24:50 +02:00
.LP
2024-05-12 16:47:53 +02:00
.G1
2024-05-12 20:47:09 +02:00
copy "legend.grap"
frame invis ht 3 wid 4 left solid bot solid
coord x 0,5000 y 0,170
ticks left out from 0 to 170 by 20
label left "Request duration" unaligned "for a tag (ms)" "(Median)" left 0.8
label bot "Number of cars matching the tag" down 0.1
2024-05-12 16:47:53 +02:00
obram = obuncache = obcache = obsemi = 0
cbram = cbuncache = cbcache = cbsemi = 0
2024-05-12 20:47:09 +02:00
legendxleft = 200
legendxright = 3000
legendyup = 170
legendydown = 120
boite(legendxleft,legendxright,legendyup,legendydown)
legend(legendxleft,legendxright,legendyup,legendydown)
2024-05-12 16:47:53 +02:00
copy "../data/tags.d" thru X
2024-05-12 19:24:50 +02:00
cx = $1
y_scale = 1000000
2024-05-12 16:47:53 +02:00
# ram cached semi uncached
2024-05-12 19:24:50 +02:00
line from cx,$2/y_scale to cx,$4/y_scale
line from cx,$5/y_scale to cx,$7/y_scale
line from cx,$8/y_scale to cx,$10/y_scale
line from cx,$11/y_scale to cx,$13/y_scale
2024-05-12 16:47:53 +02:00
#ty = $3
2024-05-12 19:24:50 +02:00
cbram = $3/y_scale
cbcache = $6/y_scale
cbsemi = $9/y_scale
cbuncache = $12/y_scale
2024-05-12 16:47:53 +02:00
if (obram > 0) then {line from cx,cbram to ox,obram}
if (obcache > 0) then {line from cx,cbcache to ox,obcache}
.gcolor blue
if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi}
.gcolor
.gcolor green
if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache}
.gcolor
obram = cbram
obcache = cbcache
obsemi = cbsemi
obuncache = cbuncache
ox = cx
# ram cached semi uncached
.gcolor red
bullet at cx,cbram
.gcolor
bullet at cx,cbcache
.gcolor blue
bullet at cx,cbsemi
.gcolor
.gcolor green
bullet at cx,cbuncache
.gcolor
X
.G2