dodb.cr/graphs/graphs.ms

.so macros.roff
.TITLE Brief performance analysis of Document Oriented DataBase (DODB)
.AUTHOR Philippe P.
.ABSTRACT1
DODB is a database-as-library, enabling a very simple way to store applications' data: storing serialized
.I documents
(basically any data type) in plain files.
To speed-up searches, attributes of these documents can be used as indexes which leads to create a few symbolic links
.I symlinks ) (
on the disk.
.br
See the \f[CW]README\f[] for a longer explanation.

This document briefly presents an experiment to understand the performances we can get with this approach.
.ABSTRACT2
.SECTION Experimental scenario
.LP
The following experiment shows the performance of DODB based on quering durations.
Data can be searched via
.I indexes ,
as for SQL databases.
Three possible indexes exist in DODB:
(a) basic indexes, representing 1 to 1 relations, the document's attribute is related to a value and each value of this attribute is unique,
(b) partitions, representing 1 to n relations, the attribute has a value and this value can be shared by other documents,
(c) tags, representing n to n relations, enabling the attribute to have multiple values whose are shared by other documents.

The scenario is simple: adding values to a database with indexes (basic, partitions and tags) then query 100 times a value based on the different indexes.
Loop and repeat.

Four instances of DODB are tested:
.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory) ;
.BULLET \fIuncached data but cached index\f[] shows the improvement you can expect by having a cache on indexes ;
.BULLET \fIcached database\f[] shows the most basic use of DODB\*[*] ;
.BULLET \fIRAM only\f[], the database doesn't have a representation on disk (no data is written on it).
The \fIRAM only\f[] instance shows a possible way to use DODB: to keep a consistent API to store data, including in-memory data with a lifetime related to the application's.
.ENDBULLET
.FOOTNOTE1
Having a cached database will probably be the most widespread use of DODB.
When memory isn't scarce, there is no point not using it to achieve better performance.
.FOOTNOTE2

The computer on which this test is performed\*[*] is a AMD PRO A10-8770E R7 (4 cores), 2.8 GHz.When mentioned, the
.I disk
is actually a
.I "temporary file-system (tmpfs)"
to enable maximum efficiency.
.FOOTNOTE1
A very simple $50 PC, buyed online.
Nothing fancy.
.FOOTNOTE2

The library is written in Crystal and so is the benchmark (\f[CW]spec/benchmark-cars.cr\f[]).
Nonetheless, despite a few technicalities, the objective of this document is to provide an insight on the approach used in DODB more than this particular implementation.

The manipulated data type can be found in \f[CW]spec/db-cars.cr\f[].
.SOURCE Ruby ps=9 vs=9p
class Car
	property name     : String        # 1-1 relation
	property color    : String        # 1-n relation
	property keywords : Array(String) # n-n relation
end
.SOURCE
.
.SECTION Basic indexes (1 to 1 relations)
.LP
An index enables to match a single value based on a small string.
In our example, each \f[CW]car\f[] has an unique \fIname\f[] which is used as an index.

The following graph represents the result of 100 queries of a car based on its name.
The experiment starts with a database containing 1,000 cars and goes up to 250,000 cars.

Since there is only one value to retrieve, the request is quick and time is almost constant.
When the value and the index are kept in memory (see \f[CW]RAM only\f[] and \f[CW]Cached db\f[]), the retrieval is almost instantaneous (about 50 to 120 ns).
In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db, cached index\f[]).
The request is a little longer when the index isn't cached, in this case DODB walks the file-system to find the right symlink to follow, thus slowing the process even more, by up to 20%.
.G1
copy "legend.grap"
frame invis ht 3 wid 4 left solid bot solid
coord y 0,50
ticks left out from 0 to 50 by 10
ticks bot out at 50000 "50,000", 100000 "100,000", 150000 "150,000", 200000 "200,000", 250000 "250,000"

label left "Request duration with" unaligned "an index (us)" "(Median)" left 0.8
label bot "Number of cars in the database" down 0.1

obram = obuncache = obcache = obsemi = 0 # old bullets
cbram = cbuncache = cbcache = cbsemi = 0 # current bullets

legendxleft  = 100000
legendxright = 250000
legendyup    = 15
legendydown  = 2

boite(legendxleft,legendxright,legendyup,legendydown)
legend(legendxleft,legendxright,legendyup,legendydown)

copy "../data/index.d" thru X
	cx = $1*5

	y_scale = 1000

	# ram cached semi uncached
	line from cx,$2/y_scale  to cx,$4/y_scale
	line from cx,$5/y_scale  to cx,$7/y_scale
	line from cx,$8/y_scale  to cx,$10/y_scale
	line from cx,$11/y_scale to cx,$13/y_scale

	#ty = $3

	cx = $1*5

	cbram     = $3/y_scale
	cbcache   = $6/y_scale
	cbsemi    = $9/y_scale
	cbuncache = $12/y_scale

	if (obram > 0) then {line from cx,cbram to ox,obram}
	if (obcache > 0) then {line from cx,cbcache to ox,obcache}
.gcolor blue
	if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi}
.gcolor
.gcolor green
	if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache}
.gcolor

	obram = cbram
	obcache = cbcache
	obsemi = cbsemi
	obuncache = cbuncache
	ox = cx

	# ram cached semi uncached
.gcolor red
	bullet at cx,cbram
.gcolor
	bullet at cx,cbcache
.gcolor blue
	bullet at cx,cbsemi
.gcolor
.gcolor green
	bullet at cx,cbuncache
.gcolor
X
.G2
.bp
.SECTION Partitions (1 to n relations)
.LP
.G1
copy "legend.grap"
frame invis ht 3 wid 4 left solid bot solid
coord x 0,5000*2 y 0,350
ticks left out from 0 to 350 by 50

label left "Request duration" unaligned "for a partition (ms)" "(Median)" left 0.8
label bot "Number of cars matching the partition" down 0.1

obram = obuncache = obcache = obsemi = 0
cbram = cbuncache = cbcache = cbsemi = 0

legendxleft  = 1000
legendxright = 6500
legendyup    = 330
legendydown  = 230

boite(legendxleft,legendxright,legendyup,legendydown)
legend(legendxleft,legendxright,legendyup,legendydown)

copy "../data/partitions.d" thru X
	cx = $1*2

	y_scale = 1000000

	# ram cached semi uncached
	line from cx,$2/y_scale  to cx,$4/y_scale
	line from cx,$5/y_scale  to cx,$7/y_scale
	line from cx,$8/y_scale  to cx,$10/y_scale
	line from cx,$11/y_scale to cx,$13/y_scale

	#ty = $3

	cbram     = $3/y_scale
	cbcache   = $6/y_scale
	cbsemi    = $9/y_scale
	cbuncache = $12/y_scale

	if (obram > 0) then {line from cx,cbram to ox,obram}
	if (obcache > 0) then {line from cx,cbcache to ox,obcache}
.gcolor blue
	if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi}
.gcolor
.gcolor green
	if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache}
.gcolor

	obram = cbram
	obcache = cbcache
	obsemi = cbsemi
	obuncache = cbuncache
	ox = cx

	# ram cached semi uncached
.gcolor red
	bullet at cx,cbram
.gcolor
	bullet at cx,cbcache
.gcolor blue
	bullet at cx,cbsemi
.gcolor
.gcolor green
	bullet at cx,cbuncache
.gcolor
X
.G2
.bp
.SECTION Tags (n to n relations)
.LP
.G1
copy "legend.grap"
frame invis ht 3 wid 4 left solid bot solid
coord x 0,5000 y 0,170
ticks left out from 0 to 170 by 20
label left "Request duration" unaligned "for a tag (ms)" "(Median)" left 0.8
label bot "Number of cars matching the tag" down 0.1

obram = obuncache = obcache = obsemi = 0
cbram = cbuncache = cbcache = cbsemi = 0

legendxleft  = 200
legendxright = 3000
legendyup    = 170
legendydown  = 120

boite(legendxleft,legendxright,legendyup,legendydown)
legend(legendxleft,legendxright,legendyup,legendydown)

copy "../data/tags.d" thru X
	cx = $1

	y_scale = 1000000

	# ram cached semi uncached
	line from cx,$2/y_scale  to cx,$4/y_scale
	line from cx,$5/y_scale  to cx,$7/y_scale
	line from cx,$8/y_scale  to cx,$10/y_scale
	line from cx,$11/y_scale to cx,$13/y_scale

	#ty = $3

	cbram     = $3/y_scale
	cbcache   = $6/y_scale
	cbsemi    = $9/y_scale
	cbuncache = $12/y_scale

	if (obram > 0) then {line from cx,cbram to ox,obram}
	if (obcache > 0) then {line from cx,cbcache to ox,obcache}
.gcolor blue
	if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi}
.gcolor
.gcolor green
	if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache}
.gcolor

	obram = cbram
	obcache = cbcache
	obsemi = cbsemi
	obuncache = cbuncache
	ox = cx

	# ram cached semi uncached
.gcolor red
	bullet at cx,cbram
.gcolor
	bullet at cx,cbcache
.gcolor blue
	bullet at cx,cbsemi
.gcolor
.gcolor green
	bullet at cx,cbuncache
.gcolor
X
.G2
Longer explanation of the experimental scenario. 2024-05-13 02:24:59 +02:00			`.so macros.roff`
			`.TITLE Brief performance analysis of Document Oriented DataBase (DODB)`
			`.AUTHOR Philippe P.`
			`.ABSTRACT1`
			`DODB is a database-as-library, enabling a very simple way to store applications' data: storing serialized`
			`.I documents`
			`(basically any data type) in plain files.`
			`To speed-up searches, attributes of these documents can be used as indexes which leads to create a few symbolic links`
			`.I symlinks ) (`
			`on the disk.`
			`.br`
			`See the \f[CW]README\f[] for a longer explanation.`

			`This document briefly presents an experiment to understand the performances we can get with this approach.`
			`.ABSTRACT2`
			`.SECTION Experimental scenario`
Graphs! 2024-05-12 16:47:53 +02:00			`.LP`
Longer explanation of the experimental scenario. 2024-05-13 02:24:59 +02:00			`The following experiment shows the performance of DODB based on quering durations.`
			`Data can be searched via`
			`.I indexes ,`
			`as for SQL databases.`
			`Three possible indexes exist in DODB:`
			`(a) basic indexes, representing 1 to 1 relations, the document's attribute is related to a value and each value of this attribute is unique,`
			`(b) partitions, representing 1 to n relations, the attribute has a value and this value can be shared by other documents,`
			`(c) tags, representing n to n relations, enabling the attribute to have multiple values whose are shared by other documents.`

			`The scenario is simple: adding values to a database with indexes (basic, partitions and tags) then query 100 times a value based on the different indexes.`
			`Loop and repeat.`

			`Four instances of DODB are tested:`
			`.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory) ;`
			`.BULLET \fIuncached data but cached index\f[] shows the improvement you can expect by having a cache on indexes ;`
			`.BULLET \fIcached database\f[] shows the most basic use of DODB\[] ;`
			`.BULLET \fIRAM only\f[], the database doesn't have a representation on disk (no data is written on it).`
			`The \fIRAM only\f[] instance shows a possible way to use DODB: to keep a consistent API to store data, including in-memory data with a lifetime related to the application's.`
			`.ENDBULLET`
			`.FOOTNOTE1`
			`Having a cached database will probably be the most widespread use of DODB.`
			`When memory isn't scarce, there is no point not using it to achieve better performance.`
			`.FOOTNOTE2`

			`The computer on which this test is performed\[] is a AMD PRO A10-8770E R7 (4 cores), 2.8 GHz.When mentioned, the`
			`.I disk`
			`is actually a`
			`.I "temporary file-system (tmpfs)"`
			`to enable maximum efficiency.`
			`.FOOTNOTE1`
			`A very simple $50 PC, buyed online.`
			`Nothing fancy.`
			`.FOOTNOTE2`

			`The library is written in Crystal and so is the benchmark (\f[CW]spec/benchmark-cars.cr\f[]).`
			`Nonetheless, despite a few technicalities, the objective of this document is to provide an insight on the approach used in DODB more than this particular implementation.`

			`The manipulated data type can be found in \f[CW]spec/db-cars.cr\f[].`
			`.SOURCE Ruby ps=9 vs=9p`
			`class Car`
			`property name : String # 1-1 relation`
			`property color : String # 1-n relation`
			`property keywords : Array(String) # n-n relation`
			`end`
			`.SOURCE`
			`.`
			`.SECTION Basic indexes (1 to 1 relations)`
			`.LP`
			`An index enables to match a single value based on a small string.`
Graph: a few more sentenses. 2024-05-13 03:38:41 +02:00			`In our example, each \f[CW]car\f[] has an unique \fIname\f[] which is used as an index.`

			`The following graph represents the result of 100 queries of a car based on its name.`
			`The experiment starts with a database containing 1,000 cars and goes up to 250,000 cars.`

Longer explanation of the experimental scenario. 2024-05-13 02:24:59 +02:00			`Since there is only one value to retrieve, the request is quick and time is almost constant.`
			`When the value and the index are kept in memory (see \f[CW]RAM only\f[] and \f[CW]Cached db\f[]), the retrieval is almost instantaneous (about 50 to 120 ns).`
			`In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db, cached index\f[]).`
			`The request is a little longer when the index isn't cached, in this case DODB walks the file-system to find the right symlink to follow, thus slowing the process even more, by up to 20%.`
Graphs! 2024-05-12 16:47:53 +02:00			`.G1`
Graphs: starting to look good. 2024-05-12 20:47:09 +02:00			`copy "legend.grap"`
Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`frame invis ht 3 wid 4 left solid bot solid`
			`coord y 0,50`
			`ticks left out from 0 to 50 by 10`
Graphs: starting to look good. 2024-05-12 20:47:09 +02:00			`ticks bot out at 50000 "50,000", 100000 "100,000", 150000 "150,000", 200000 "200,000", 250000 "250,000"`
Graph: change the Y scale. 2024-05-12 19:24:50 +02:00
Graphs: starting to look good. 2024-05-12 20:47:09 +02:00			`label left "Request duration with" unaligned "an index (us)" "(Median)" left 0.8`
			`label bot "Number of cars in the database" down 0.1`
Graphs! 2024-05-12 16:47:53 +02:00
Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`obram = obuncache = obcache = obsemi = 0 # old bullets`
			`cbram = cbuncache = cbcache = cbsemi = 0 # current bullets`
Graphs! 2024-05-12 16:47:53 +02:00
Graphs: starting to look good. 2024-05-12 20:47:09 +02:00			`legendxleft = 100000`
			`legendxright = 250000`
			`legendyup = 15`
			`legendydown = 2`

			`boite(legendxleft,legendxright,legendyup,legendydown)`
			`legend(legendxleft,legendxright,legendyup,legendydown)`
Graphs! 2024-05-12 16:47:53 +02:00
			`copy "../data/index.d" thru X`
			`cx = $1*5`

Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`y_scale = 1000`

Graphs! 2024-05-12 16:47:53 +02:00			`# ram cached semi uncached`
Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`line from cx,$2/y_scale to cx,$4/y_scale`
			`line from cx,$5/y_scale to cx,$7/y_scale`
			`line from cx,$8/y_scale to cx,$10/y_scale`
			`line from cx,$11/y_scale to cx,$13/y_scale`
Graphs! 2024-05-12 16:47:53 +02:00
			`#ty = $3`

			`cx = $1*5`

Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`cbram = $3/y_scale`
			`cbcache = $6/y_scale`
			`cbsemi = $9/y_scale`
			`cbuncache = $12/y_scale`
Graphs! 2024-05-12 16:47:53 +02:00
			`if (obram > 0) then {line from cx,cbram to ox,obram}`
			`if (obcache > 0) then {line from cx,cbcache to ox,obcache}`
			`.gcolor blue`
			`if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi}`
			`.gcolor`
			`.gcolor green`
			`if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache}`
			`.gcolor`

			`obram = cbram`
			`obcache = cbcache`
			`obsemi = cbsemi`
			`obuncache = cbuncache`
			`ox = cx`

			`# ram cached semi uncached`
			`.gcolor red`
			`bullet at cx,cbram`
			`.gcolor`
			`bullet at cx,cbcache`
			`.gcolor blue`
			`bullet at cx,cbsemi`
			`.gcolor`
			`.gcolor green`
			`bullet at cx,cbuncache`
			`.gcolor`
			`X`
			`.G2`
Longer explanation of the experimental scenario. 2024-05-13 02:24:59 +02:00			`.bp`
			`.SECTION Partitions (1 to n relations)`
Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`.LP`
Graphs! 2024-05-12 16:47:53 +02:00			`.G1`
Graphs: starting to look good. 2024-05-12 20:47:09 +02:00			`copy "legend.grap"`
Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`frame invis ht 3 wid 4 left solid bot solid`
			`coord x 0,5000*2 y 0,350`
			`ticks left out from 0 to 350 by 50`

Graphs: starting to look good. 2024-05-12 20:47:09 +02:00			`label left "Request duration" unaligned "for a partition (ms)" "(Median)" left 0.8`
			`label bot "Number of cars matching the partition" down 0.1`
Graphs! 2024-05-12 16:47:53 +02:00
			`obram = obuncache = obcache = obsemi = 0`
			`cbram = cbuncache = cbcache = cbsemi = 0`

Graphs: starting to look good. 2024-05-12 20:47:09 +02:00			`legendxleft = 1000`
			`legendxright = 6500`
			`legendyup = 330`
			`legendydown = 230`

			`boite(legendxleft,legendxright,legendyup,legendydown)`
			`legend(legendxleft,legendxright,legendyup,legendydown)`

Graphs! 2024-05-12 16:47:53 +02:00			`copy "../data/partitions.d" thru X`
Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`cx = $1*2`

			`y_scale = 1000000`
Graphs! 2024-05-12 16:47:53 +02:00
			`# ram cached semi uncached`
Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`line from cx,$2/y_scale to cx,$4/y_scale`
			`line from cx,$5/y_scale to cx,$7/y_scale`
			`line from cx,$8/y_scale to cx,$10/y_scale`
			`line from cx,$11/y_scale to cx,$13/y_scale`
Graphs! 2024-05-12 16:47:53 +02:00
			`#ty = $3`

Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`cbram = $3/y_scale`
			`cbcache = $6/y_scale`
			`cbsemi = $9/y_scale`
			`cbuncache = $12/y_scale`
Graphs! 2024-05-12 16:47:53 +02:00
			`if (obram > 0) then {line from cx,cbram to ox,obram}`
			`if (obcache > 0) then {line from cx,cbcache to ox,obcache}`
			`.gcolor blue`
			`if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi}`
			`.gcolor`
			`.gcolor green`
			`if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache}`
			`.gcolor`

			`obram = cbram`
			`obcache = cbcache`
			`obsemi = cbsemi`
			`obuncache = cbuncache`
			`ox = cx`

			`# ram cached semi uncached`
			`.gcolor red`
			`bullet at cx,cbram`
			`.gcolor`
			`bullet at cx,cbcache`
			`.gcolor blue`
			`bullet at cx,cbsemi`
			`.gcolor`
			`.gcolor green`
			`bullet at cx,cbuncache`
			`.gcolor`
			`X`
			`.G2`
Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`.bp`
Longer explanation of the experimental scenario. 2024-05-13 02:24:59 +02:00			`.SECTION Tags (n to n relations)`
Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`.LP`
Graphs! 2024-05-12 16:47:53 +02:00			`.G1`
Graphs: starting to look good. 2024-05-12 20:47:09 +02:00			`copy "legend.grap"`
			`frame invis ht 3 wid 4 left solid bot solid`
			`coord x 0,5000 y 0,170`
			`ticks left out from 0 to 170 by 20`
			`label left "Request duration" unaligned "for a tag (ms)" "(Median)" left 0.8`
			`label bot "Number of cars matching the tag" down 0.1`
Graphs! 2024-05-12 16:47:53 +02:00
			`obram = obuncache = obcache = obsemi = 0`
			`cbram = cbuncache = cbcache = cbsemi = 0`

Graphs: starting to look good. 2024-05-12 20:47:09 +02:00			`legendxleft = 200`
			`legendxright = 3000`
			`legendyup = 170`
			`legendydown = 120`

			`boite(legendxleft,legendxright,legendyup,legendydown)`
			`legend(legendxleft,legendxright,legendyup,legendydown)`
Graphs! 2024-05-12 16:47:53 +02:00
			`copy "../data/tags.d" thru X`
Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`cx = $1`

			`y_scale = 1000000`
Graphs! 2024-05-12 16:47:53 +02:00
			`# ram cached semi uncached`
Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`line from cx,$2/y_scale to cx,$4/y_scale`
			`line from cx,$5/y_scale to cx,$7/y_scale`
			`line from cx,$8/y_scale to cx,$10/y_scale`
			`line from cx,$11/y_scale to cx,$13/y_scale`
Graphs! 2024-05-12 16:47:53 +02:00
			`#ty = $3`

Graph: change the Y scale. 2024-05-12 19:24:50 +02:00			`cbram = $3/y_scale`
			`cbcache = $6/y_scale`
			`cbsemi = $9/y_scale`
			`cbuncache = $12/y_scale`
Graphs! 2024-05-12 16:47:53 +02:00
			`if (obram > 0) then {line from cx,cbram to ox,obram}`
			`if (obcache > 0) then {line from cx,cbcache to ox,obcache}`
			`.gcolor blue`
			`if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi}`
			`.gcolor`
			`.gcolor green`
			`if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache}`
			`.gcolor`

			`obram = cbram`
			`obcache = cbcache`
			`obsemi = cbsemi`
			`obuncache = cbuncache`
			`ox = cx`

			`# ram cached semi uncached`
			`.gcolor red`
			`bullet at cx,cbram`
			`.gcolor`
			`bullet at cx,cbcache`
			`.gcolor blue`
			`bullet at cx,cbsemi`
			`.gcolor`
			`.gcolor green`
			`bullet at cx,cbuncache`
			`.gcolor`
			`X`
			`.G2`