.so macros.roff .TITLE Brief performance analysis of Document Oriented DataBase (DODB) .AUTHOR Philippe P. .ABSTRACT1 DODB is a database-as-library, enabling a very simple way to store applications' data: storing serialized .I documents (basically any data type) in plain files. To speed-up searches, attributes of these documents can be used as indexes which leads to create a few symbolic links .I symlinks ) ( on the disk. .br See the \f[CW]README\f[] for a longer explanation. This document briefly presents an experiment to understand the performances we can get with this approach. .br .UL Status : WIP .ABSTRACT2 .SECTION Experimental scenario .LP The following experiment shows the performance of DODB based on quering durations. Data can be searched via .I indexes , as for SQL databases. Three possible indexes exist in DODB: (a) basic indexes, representing 1 to 1 relations, the document's attribute is related to a value and each value of this attribute is unique, (b) partitions, representing 1 to n relations, the attribute has a value and this value can be shared by other documents, (c) tags, representing n to n relations, enabling the attribute to have multiple values whose are shared by other documents. The scenario is simple: adding values to a database with indexes (basic, partitions and tags) then query 100 times a value based on the different indexes. Loop and repeat. Four instances of DODB are tested: .BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory) ; .BULLET \fIuncached data but cached index\f[] shows the improvement you can expect by having a cache on indexes ; .BULLET \fIcached database\f[] shows the most basic use of DODB\*[*] ; .BULLET \fIRAM only\f[], the database doesn't have a representation on disk (no data is written on it). The \fIRAM only\f[] instance shows a possible way to use DODB: to keep a consistent API to store data, including in-memory data with a lifetime related to the application's. .ENDBULLET .FOOTNOTE1 Having a cached database will probably be the most widespread use of DODB. When memory isn't scarce, there is no point not using it to achieve better performance. .FOOTNOTE2 The computer on which this test is performed\*[*] is a AMD PRO A10-8770E R7 (4 cores), 2.8 GHz.When mentioned, the .I disk is actually a .I "temporary file-system (tmpfs)" to enable maximum efficiency. .FOOTNOTE1 A very simple $50 PC, buyed online. Nothing fancy. .FOOTNOTE2 The library is written in Crystal and so is the benchmark (\f[CW]spec/benchmark-cars.cr\f[]). Nonetheless, despite a few technicalities, the objective of this document is to provide an insight on the approach used in DODB more than this particular implementation. The manipulated data type can be found in \f[CW]spec/db-cars.cr\f[]. .SOURCE Ruby ps=9 vs=9p class Car property name : String # 1-1 relation property color : String # 1-n relation property keywords : Array(String) # n-n relation end .SOURCE . .SECTION Basic indexes (1 to 1 relations) .LP An index enables to match a single value based on a small string. Since there is only one value to retrieve, the request is quick and time is almost constant. When the value and the index are kept in memory (see \f[CW]RAM only\f[] and \f[CW]Cached db\f[]), the retrieval is almost instantaneous (about 50 to 120 ns). In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db, cached index\f[]). The request is a little longer when the index isn't cached, in this case DODB walks the file-system to find the right symlink to follow, thus slowing the process even more, by up to 20%. .G1 copy "legend.grap" frame invis ht 3 wid 4 left solid bot solid coord y 0,50 ticks left out from 0 to 50 by 10 ticks bot out at 50000 "50,000", 100000 "100,000", 150000 "150,000", 200000 "200,000", 250000 "250,000" label left "Request duration with" unaligned "an index (us)" "(Median)" left 0.8 label bot "Number of cars in the database" down 0.1 obram = obuncache = obcache = obsemi = 0 # old bullets cbram = cbuncache = cbcache = cbsemi = 0 # current bullets legendxleft = 100000 legendxright = 250000 legendyup = 15 legendydown = 2 boite(legendxleft,legendxright,legendyup,legendydown) legend(legendxleft,legendxright,legendyup,legendydown) copy "../data/index.d" thru X cx = $1*5 y_scale = 1000 # ram cached semi uncached line from cx,$2/y_scale to cx,$4/y_scale line from cx,$5/y_scale to cx,$7/y_scale line from cx,$8/y_scale to cx,$10/y_scale line from cx,$11/y_scale to cx,$13/y_scale #ty = $3 cx = $1*5 cbram = $3/y_scale cbcache = $6/y_scale cbsemi = $9/y_scale cbuncache = $12/y_scale if (obram > 0) then {line from cx,cbram to ox,obram} if (obcache > 0) then {line from cx,cbcache to ox,obcache} .gcolor blue if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi} .gcolor .gcolor green if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache} .gcolor obram = cbram obcache = cbcache obsemi = cbsemi obuncache = cbuncache ox = cx # ram cached semi uncached .gcolor red bullet at cx,cbram .gcolor bullet at cx,cbcache .gcolor blue bullet at cx,cbsemi .gcolor .gcolor green bullet at cx,cbuncache .gcolor X .G2 .bp .SECTION Partitions (1 to n relations) .LP .G1 copy "legend.grap" frame invis ht 3 wid 4 left solid bot solid coord x 0,5000*2 y 0,350 ticks left out from 0 to 350 by 50 label left "Request duration" unaligned "for a partition (ms)" "(Median)" left 0.8 label bot "Number of cars matching the partition" down 0.1 obram = obuncache = obcache = obsemi = 0 cbram = cbuncache = cbcache = cbsemi = 0 legendxleft = 1000 legendxright = 6500 legendyup = 330 legendydown = 230 boite(legendxleft,legendxright,legendyup,legendydown) legend(legendxleft,legendxright,legendyup,legendydown) copy "../data/partitions.d" thru X cx = $1*2 y_scale = 1000000 # ram cached semi uncached line from cx,$2/y_scale to cx,$4/y_scale line from cx,$5/y_scale to cx,$7/y_scale line from cx,$8/y_scale to cx,$10/y_scale line from cx,$11/y_scale to cx,$13/y_scale #ty = $3 cbram = $3/y_scale cbcache = $6/y_scale cbsemi = $9/y_scale cbuncache = $12/y_scale if (obram > 0) then {line from cx,cbram to ox,obram} if (obcache > 0) then {line from cx,cbcache to ox,obcache} .gcolor blue if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi} .gcolor .gcolor green if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache} .gcolor obram = cbram obcache = cbcache obsemi = cbsemi obuncache = cbuncache ox = cx # ram cached semi uncached .gcolor red bullet at cx,cbram .gcolor bullet at cx,cbcache .gcolor blue bullet at cx,cbsemi .gcolor .gcolor green bullet at cx,cbuncache .gcolor X .G2 .bp .SECTION Tags (n to n relations) .LP .G1 copy "legend.grap" frame invis ht 3 wid 4 left solid bot solid coord x 0,5000 y 0,170 ticks left out from 0 to 170 by 20 label left "Request duration" unaligned "for a tag (ms)" "(Median)" left 0.8 label bot "Number of cars matching the tag" down 0.1 obram = obuncache = obcache = obsemi = 0 cbram = cbuncache = cbcache = cbsemi = 0 legendxleft = 200 legendxright = 3000 legendyup = 170 legendydown = 120 boite(legendxleft,legendxright,legendyup,legendydown) legend(legendxleft,legendxright,legendyup,legendydown) copy "../data/tags.d" thru X cx = $1 y_scale = 1000000 # ram cached semi uncached line from cx,$2/y_scale to cx,$4/y_scale line from cx,$5/y_scale to cx,$7/y_scale line from cx,$8/y_scale to cx,$10/y_scale line from cx,$11/y_scale to cx,$13/y_scale #ty = $3 cbram = $3/y_scale cbcache = $6/y_scale cbsemi = $9/y_scale cbuncache = $12/y_scale if (obram > 0) then {line from cx,cbram to ox,obram} if (obcache > 0) then {line from cx,cbcache to ox,obcache} .gcolor blue if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi} .gcolor .gcolor green if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache} .gcolor obram = cbram obcache = cbcache obsemi = cbsemi obuncache = cbuncache ox = cx # ram cached semi uncached .gcolor red bullet at cx,cbram .gcolor bullet at cx,cbcache .gcolor blue bullet at cx,cbsemi .gcolor .gcolor green bullet at cx,cbuncache .gcolor X .G2