Compare commits

..

2 Commits
master ... cbor

Author SHA1 Message Date
Karchnu b288972dcc Shards: cbor branch master. 2020-11-23 17:02:30 +01:00
Karchnu ffce08b36c WIP: CBOR implementation. 2020-11-15 03:07:02 +01:00
32 changed files with 210 additions and 2695 deletions

View File

@ -1,17 +0,0 @@
all: build
OPTS ?= --progress
Q ?= @
SHOULD_UPDATE = ./bin/should-update
DBDIR=/tmp/tests-on-dodb
benchmark-cars:
$(Q)crystal build spec/benchmark-cars.cr $(OPTS)
build: benchmark-cars
wipe-db:
rm -r $(DBDIR)
release:
make build OPTS="--release --progress"

149
README.md
View File

@ -1,3 +1,4 @@
# dodb.cr
DODB stands for Document Oriented DataBase.
@ -12,47 +13,16 @@ A brief summary:
- no SQL
- objects are serialized (currently in JSON)
- indexes (simple symlinks on the FS) can be created to improve significantly searches in the db
- db is fully integrated in the language (basically a simple array with a few more functions)
Also, data can be `cached`.
The entire base will be kept in memory (if you can), enabling incredible speeds.
## Limitations
DODB doesn't fully handle ACID properties:
**TODO**: speed tests, elaborate on the matter.
- no *atomicity*, you can't chain instructions and rollback if one of them fails ;
- no *consistency*, there is currently no mechanism to prevent adding invalid values ;
DODB is not compatible with projects:
- having an absolute priority on speed,
however, DODB is efficient in most cases with the right indexes.
- having relational data
*Isolation* is partially taken into account, with a locking mechanism preventing a few race conditions.
FYI, in my projects the database is fast enough so I don't even need parallelism (like, by far).
*Durability* is taken into account.
Data is written on-disk each time it changes.
**NOTE:** what I need is mostly there.
What DODB doesn't provide, I hack it in a few lines in my app.
DODB will provide some form of atomicity and consistency at some point, but nothing fancy nor too advanced.
The whole point is to keep it simple.
## Speed
Since DODB doesn't use SQL and doesn't even try to handle stuff like atomicity or consistency, speed is great.
Reading data from disk takes about a few dozen microseconds, and not much more when searching an indexed data.
**On my more-than-decade-old, slow-as-fuck machine**, the simplest possible SQL request to Postgres takes about 100 to 900 microseconds.
With DODB, to reach on-disk data: 13 microseconds.
To search then retrieve indexed data: almost the same thing, 16 microseconds on average, since it's just a path to a symlink we have to build.
With the `cached` version of DODB, there is not even deserialization happening, so 7 nanoseconds.
Indexes (indexes, partitions and tags) are also cached **by default**.
The speed up is great compared to the uncached version since you won't walk the file-system.
Searching an index takes about 35 nanoseconds when cached.
To avoid the memory cost of cached indexes, you can explicitely ask for uncached ones.
**NOTE:** of course SQL and DODB cannot be fairly compared based on performance since they don't have the same properties.
But still, this is the kind of speed you can get with the tool.
# Installation
@ -135,24 +105,24 @@ After adding a few objects in the database, here the index in action on the file
$ tree storage/
storage
├── data
│   ├── 0000000000
│   ├── 0000000001
│   ├── 0000000002
│   ├── 0000000003
│   ├── 0000000004
│   └── 0000000005
│   ├── 0000000000.json
│   ├── 0000000001.json
│   ├── 0000000002.json
│   ├── 0000000003.json
│   ├── 0000000004.json
│   └── 0000000005.json
├── indices
│   └── by_id
│   ├── 6e109b82-25de-4250-9c67-e7e8415ad5a7 -> ../../data/0000000003
│   ├── 2080131b-97d7-4300-afa9-55b93cdfd124 -> ../../data/0000000000
│   ├── 2118bf1c-e413-4658-b8c1-a08925e20945 -> ../../data/0000000005
│   ├── b53fab8e-f394-49ef-b939-8a670abe278b -> ../../data/0000000004
│   ├── 7e918680-6bc2-4f29-be7e-3d2e9c8e228c -> ../../data/0000000002
│   └── 8b4e83e3-ef95-40dc-a6e5-e6e697ce6323 -> ../../data/0000000001
│   ├── 6e109b82-25de-4250-9c67-e7e8415ad5a7.json -> ../../data/0000000003.json
│   ├── 2080131b-97d7-4300-afa9-55b93cdfd124.json -> ../../data/0000000000.json
│   ├── 2118bf1c-e413-4658-b8c1-a08925e20945.json -> ../../data/0000000005.json
│   ├── b53fab8e-f394-49ef-b939-8a670abe278b.json -> ../../data/0000000004.json
│   ├── 7e918680-6bc2-4f29-be7e-3d2e9c8e228c.json -> ../../data/0000000002.json
│   └── 8b4e83e3-ef95-40dc-a6e5-e6e697ce6323.json -> ../../data/0000000001.json
```
We have 5 objects in the DB, each of them has a unique ID attribute, each attribute is related to a single object.
Getting an object by its ID is as simple as `cat storage/indices/by_id/<id>`.
Getting an object by its ID is as simple as `cat storage/indices/by_id/<id>.json`.
Now we want to sort cars based on their `color` attribute.
@ -168,14 +138,14 @@ $ tree storage/
├── partitions
│   └── by_color
│   ├── blue
│   │   ├── 0000000000 -> ../../../data/0000000000
│   │   └── 0000000004 -> ../../../data/0000000004
│   │   ├── 0000000000.json -> ../../../data/0000000000.json
│   │   └── 0000000004.json -> ../../../data/0000000004.json
│   ├── red
│   │   ├── 0000000001 -> ../../../data/0000000001
│   │   ├── 0000000002 -> ../../../data/0000000002
│   │   └── 0000000003 -> ../../../data/0000000003
│   │   ├── 0000000001.json -> ../../../data/0000000001.json
│   │   ├── 0000000002.json -> ../../../data/0000000002.json
│   │   └── 0000000003.json -> ../../../data/0000000003.json
│   └── violet
│   └── 0000000005 -> ../../../data/0000000005
│   └── 0000000005.json -> ../../../data/0000000005.json
```
Now the attribute corresponds to a directory (blue, red, violet, etc.) containing a symlink for each related object.
@ -192,16 +162,25 @@ $ tree storage/
...
└── tags
└── by_keyword
├── elegant
│   ├── 0000000000 -> ../../../data/0000000000
│   └── 0000000003 -> ../../../data/0000000003
├── impressive
│   ├── 0000000000 -> ../../../data/0000000000
│   ├── 0000000001 -> ../../../data/0000000001
│   └── 0000000003 -> ../../../data/0000000003
└── other-tags
├── average
│   ├── data
│   │   └── 0000000004.json -> ../../../../..//data/0000000004.json
...
├── dirty
│   ├── data
│   │   └── 0000000005.json -> ../../../../..//data/0000000005.json
...
├── elegant
│   ├── data
│   │   ├── 0000000000.json -> ../../../../..//data/0000000000.json
│   │   └── 0000000003.json -> ../../../../..//data/0000000003.json
...
```
Tags are very similar to partitions and are used the exact same way for search, update and deletion.
This is very similar to partitions, but there is a bit more complexity here since we eventually search for a car matching a combination of keywords.
**TODO**: explanations about our tag-based search and an example.
## Updating an object
@ -209,7 +188,7 @@ In our last example we had a `Car` class, we stored its instances in `cars` and
Now, we want to update a car:
```Crystal
# we find a car we want to modify
car = cars_by_id.get "86a07924-ab3a-4f46-a975-e9803acba22d"
car = cars_by_id "86a07924-ab3a-4f46-a975-e9803acba22d"
# we modify it
car.color = "Blue"
@ -225,41 +204,23 @@ cars_by_id.update "86a07924-ab3a-4f46-a975-e9803acba22d", car
Or, in the case the object may not yet exist:
```Crystal
cars_by_id.update_or_create car.id, car
# Search by partitions: all blue cars.
pp! cars_by_color.get "blue"
# Search by tags: all elegant cars.
pp! cars_by_keyword.get "elegant"
```
Changing a value that is related to a partition or a tag will automatically do what you would expect: de-index then re-index.
You won't find yourself with a bunch of invalid symbolic links all over the place.
## Removing an object
```Crystal
# Remove a value based on an index.
cars_by_id.delete "86a07924-ab3a-4f46-a975-e9803acba22d"
# Remove a value based on a partition.
cars_by_color.delete "red"
cars_by_color.delete "blue", do |car|
cars_by_color.delete "red", do |car|
car.keywords.empty
end
# Remove a value based on a tag.
cars_by_keyword.delete "shiny"
cars_by_keyword.delete "elegant", do |car|
car.name == "GTI"
end
```
In this code snippet, we apply a function on blue cars only;
and blue cars are only removed if they don't have any associated keywords.
Same thing for elegant cars.
In this last example, we apply the function on red cars only.
This represents a performance boost compared to applying the function on all the cars.
# Complete example
```Crystal
@ -331,10 +292,11 @@ pp! cars_by_name.get "Corvet"
# based on a partition (print all red cars)
pp! cars_by_color.get "red"
# based on a tag (print all fast cars)
# based on a tag
pp! cars_by_keyword.get "fast"
############
# Updating #
############
@ -352,11 +314,7 @@ cars_by_name.update "Bullet-GT", car # the name changed
car = Car.new "Mustang", "red", [] of String
cars_by_name.update_or_create car.name, car
# We all know it, elegant cars are also expensive.
cars_by_keyword.get("elegant").each do |car|
car.keywords << "expensive"
cars_by_name.update car.name, car
end
###############
# Deleting... #
@ -370,8 +328,9 @@ cars_by_color.delete "red"
# based on a color (but not only)
cars_by_color.delete "blue", &.name.==("GTI")
# based on a keyword
cars_by_keyword.delete "solid"
# based on a keyword (but not only)
cars_by_keyword.delete "fast", &.name.==("Corvet")
## TAG-based deletion, soon.
# # based on a keyword
# cars_by_keyword.delete "solid"
# # based on a keyword (but not only)
# cars_by_keyword.delete "fast", &.name.==("Corvet")
```

View File

@ -1,8 +0,0 @@
# API
Cached indexes (index, partition, tags) should be used by default.
Uncached indexes should be an option, through a new function `add_uncached_index` or something.
# Performance
Search with some kind of "pagination" system: ask entries with a limit on the number of elements and an offset.

View File

@ -1,24 +0,0 @@
#!/bin/sh
if [ $# -ne 1 ]
then
echo "usage: $0 result-directory"
exit 0
fi
d=$1
echo "from data (.d) to truncated data (.t)"
for i in $d/*.d
do
fname=$(echo $i | sed "s/[.]d$/.t/")
awk '{ print $2, $3, $5 }' < $i > $fname
done
awk '{ print $1 }' < $d/ram_index.d > it
mkdir data
echo "from truncated data (.t) to graphed data data/XXX.d"
paste it $d/ram_index.t $d/cached_index.t $d/semi_index.t $d/uncached_index.t > ./data/index.d
paste it $d/ram_partitions.t $d/cached_partitions.t $d/semi_partitions.t $d/uncached_partitions.t > ./data/partitions.d
paste it $d/ram_tags.t $d/cached_tags.t $d/semi_tags.t $d/uncached_tags.t > ./data/tags.d

View File

@ -1,39 +0,0 @@
#!/usr/bin/awk -f
BEGIN {
FOUND_95pct = 0
FOUND_mean = 0
}
FOUND_95pct == 1 {
pct_min = $1
pct_max = $2
FOUND_95pct = 0
}
FOUND_mean == 1 {
mean = $1
print pct_min, median, mean, pct_max, t, df, pvalue
FOUND_mean = 0
}
/^t = / {
gsub(",", "", $3)
t = $3
gsub(",", "", $6)
df = $6
pvalue = $9
}
/mean of x/ {
FOUND_mean = 1
}
/Median/ {
gsub(":", "")
median = $2
}
/95 percent confidence/ {
FOUND_95pct = 1
}

View File

@ -1,66 +0,0 @@
#!/bin/sh
extract="./bin/extract-final-data.sh"
summary="./bin/summary.r"
summary_to_line="./bin/rsum2line.awk"
if [ $# -ne 1 ]
then
echo "usage: $0 result-directory"
exit 0
fi
dir="$1"
raw_to_summary() {
for i in $dir/*.raw
do
summary_with_bad_format=$(echo $i | sed "s/.raw$/.unconveniently_formated_summary/")
target=$(echo $i | sed "s/.raw$/.summary/")
if [ -f $summary_with_bad_format ]; then
echo -n "\r$summary_with_bad_format already exists: skipping "
else
Rscript $summary $i > $summary_with_bad_format
fi
if [ -f $target ]; then
echo -n "\r$target already exists: skipping "
else
$summary_to_line $summary_with_bad_format > $target
fi
done
echo ""
# Beyond a certain number of entries, retrieving data from partitions and tags isn't tested anymore.
# This leads to create "fake entries" with a duration of 0, resulting to causing some problems with
# statistical analysis. So, we need to replace "NaN" by "0" in summaries.
sed -i "s/NaN/0/g" $dir/*.summary
}
# List raw files with the number of iterations as a prefix so they can then be sorted.
sort_summary_files() {
for i in $dir/*.summary ; do f $i ; done | sort -n
}
f() {
echo $* | sed "s/[_./]/ /g" | xargs echo "$* " | awk '{ printf "%s %s/%s_%s %s\n", $4, $2, $3, $5, $1 }'
}
fill() {
while read LINE; do
nb_it=$(echo $LINE | awk '{ print $1 }')
target=$(echo $LINE | awk '{ print $2 }')
fname=$(echo $LINE | awk '{ print $3 }')
cat $fname | xargs echo "$nb_it " >> $target.d
done
}
raw_to_summary
sort_summary_files | fill
extract_final_data() {
$extract $dir
}
extract_final_data

View File

@ -1,14 +0,0 @@
# Rscript summary handshake-duration.txt
require(grDevices) # for colours
tbl <- read.table(file=commandArgs(TRUE)[1])
val <- tbl[1]
summary(val)
# standarddeviation=sd(unlist(val))
sd(unlist(val))
# print (standarddeviation, zero.print="standard deviation: ")
# confint.default (val)
t.test (val)

View File

@ -1,71 +0,0 @@
extension "groff"
doctemplate
"
.MT 0
$header
.TL
$title
.AU \"\"
.ND
.SA 0
.DS I
"
".DE
$footer
"
end
nodoctemplate
"
"
"
"
end
bold "\f[CB]$text\fP"
italics "\f[CI]$text\fP"
underline "\f[CI]$text\fP"
fixed "\fC$text\fP"
color "\m[$style]$text\m[]"
anchor "$infilename : $linenum - $text"
reference "$text \(-> $infile:$linenum, page : $infilename:$linenum"
#lineprefix "\fC\(em\fP "
#lineprefix "\fC\n(ln\fP "
lineprefix ""
colormap
"green" "green"
"red" "red"
"darkred" "darkred"
"blue" "blue"
"brown" "brown"
"pink" "pink"
"yellow" "yellow"
"cyan" "cyan"
"purple" "purple"
"orange" "orange"
"brightorange" "brightorange"
"brightgreen" "brightgreen"
"darkgreen" "darkgreen"
"black" "black"
"teal" "teal"
"gray" "gray"
"darkblue" "darkblue"
default "black"
end
translations
"\\" "\\\\"
##"\n" " \\\\\n"
##" " "\\ "
##"\t" "\\ \\ \\ \\ \\ \\ \\ \\ "
"\t" " "
"|" "|"
"---" "\(em"
"--" "\(mi"
end

View File

@ -1,5 +0,0 @@
SRC ?= graphs
ODIR ?= /tmp/
export ODIR SRC
include Makefile.in

View File

@ -1,79 +0,0 @@
SRC ?= graphs
ODIR ?= .
BIBLIOGRAPHY ?= bibliography
ALLSRC = $(shell find .)
SOELIM_OPTS ?=
SOELIM = soelim $(SOELIM_OPTS)
PRECONV_OPTS ?= -e utf-8
PRECONV = preconv $(PRECONV_OPTS)
EQN_OPTS ?= -Tpdf
EQN = eqn $(EQN_OPTS)
# source-highlight stuff
# GH_INTRO: instructions before each source code provided by source-highlight
# GH_OUTRO: ------------ after ---- ------ ---- -------- -- ----------------
# GH_INTRO/GH_OUTRO: values are separated by ';'
#
GH_INTRO := .nr DI 0;.DS I;.fam C;.b1;.sp -0.1i
GH_OUTRO := .sp -0.2i;.b2;.fam;.DE
#
export GH_INTRO
export GH_OUTRO
#
# SHOPTS: cmd line parameter given to source-highlight
SHOPTS = --outlang-def=.source-highlight_groff-output-definition
export SHOPTS
# ghighlight brings `source-highlight` to troff
GHIGHLIGHT_OPTS ?=
GHIGHLIGHT = ./bin/ghighlight $(GHIGHLIGHT_OPTS)
GRAP_OPTS ?=
GRAP = grap $(GRAP_OPTS)
PIC_OPTS ?= -Tpdf
PIC = pic $(PIC_OPTS)
# -P => move ponctuation after reference
# -S => label and bracket-label options
# -e => accumulate (use a reference section)
# -p bib => bibliography file
REFER_OPTS ?= -PS -e -p $(BIBLIOGRAPHY)
REFER = refer $(REFER_OPTS)
# -k => iconv conversion (did it ever worked?)
# -ms => ms macro
# -U => unsafe (because of PDF inclusion)
# -Tpdf => output device is PDF
# -mspdf => include PDF (so, images converted in PDF) in the document
# NOTE: a custom troffrc (configuration file) is necessary on OpenBSD
# to have correctly justified paragraphs. Otherwise, the default
# configuration removes this possibility, for bullshit reasons. Sad.
# -M dir => path to custom troffrc
# TODO: no change with or without the following options -P -e
# This has to be inverstigated: how to make PDFs look nice in browsers?
# -P -e => provide "-e" to gropdf to embed fonts
GROFF_OPTS ?= -ms -t -Tpdf -U -mspdf -mpdfmark -M ./bin -P -e
GROFF = groff $(GROFF_OPTS)
$(SRC).pdf:
$(SOELIM) < $(SRC).ms |\
./bin/utf8-to-ms.sh |\
$(PRECONV) |\
$(EQN) |\
$(GHIGHLIGHT) |\
$(GRAP) |\
$(PIC) |\
$(REFER) |\
$(GROFF) > $(ODIR)/$@
# Keep options in memory for the recursive 'make' call
export SOELIM_OPTS PRECONV_OPTS EQN_OPTS GHIGHLIGHT_OPTS GRAP_OPTS PIC_OPTS REFER_OPTS
serve:
@#find . -name "*.ms" -or -name "*.d" | entr gmake -B $(SRC).pdf
find . | entr gmake -B $(SRC).pdf

View File

View File

@ -1,286 +0,0 @@
#! /usr/bin/env perl
# ghighlight - A simple preprocessor for adding code highlighting in a groff file
# Copyright (C) 2014-2018 Free Software Foundation, Inc.
# Written by Bernd Warken <groff-bernd.warken-72@web.de>.
my $version = '0.9.0';
# This file is part of 'ghighlight', which is part of 'groff'.
# 'groff' is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 2 of the License, or
# (at your option) any later version.
# 'groff' is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
# You can find a copy of the GNU General Public License in the internet
# at <http://www.gnu.org/licenses/gpl-2.0.html>.
########################################################################
use strict;
use warnings;
#use diagnostics;
# current working directory
use Cwd;
# $Bin is the directory where this script is located
use FindBin;
# open3 for a bidirectional communication with a child process
use IPC::Open3;
########################################################################
# system variables and exported variables
########################################################################
$\ = "\n"; # final part for print command
########################################################################
# read-only variables with double-@ construct
########################################################################
our $File_split_env_sh;
our $File_version_sh;
our $Groff_Version;
my $before_make; # script before run of 'make'
{
my $at = '@';
$before_make = 1 if '@VERSION@' eq "${at}VERSION${at}";
}
my %at_at;
my $file_perl_test_pl;
my $groffer_libdir;
if ($before_make) {
my $highlight_source_dir = $FindBin::Bin;
$at_at{'BINDIR'} = $highlight_source_dir;
$at_at{'G'} = '';
} else {
$at_at{'BINDIR'} = '@BINDIR@';
$at_at{'G'} = '@g@';
}
########################################################################
# options
########################################################################
foreach (@ARGV) {
if ( /^(-h|--h|--he|--hel|--help)$/ ) {
print q(Usage for the 'ghighlight' program:);
print 'ghighlight [-] [--] [filespec...] normal file name arguments';
print 'ghighlight [-h|--help] gives usage information';
print 'ghighlight [-v|--version] displays the version number';
print q(This program is a 'groff' preprocessor that handles highlighting source code ) .
q(parts in 'roff' files.);
exit;
} elsif ( /^(-v|--v|--ve|--ver|--vers|--versi|--versio|--version)$/ ) {
print q('ghighlight' version ) . $version;
exit;
}
}
my $macros = "groff_mm";
if ( $ENV{'GHLENABLECOLOR'} ) {
$macros = "groff_mm_color";
}
########################################################################
# input
########################################################################
my $source_mode = 0;
my @lines = ();
sub getTroffLine {
my ($opt) = @_;
if ($opt =~ /^ps=([0-9]+)/) {".ps $1"}
elsif ($opt =~ /^vs=(\S+)/) {".vs $1"}
else { print STDERR "didn't recognised '$opt'"; ""}
}
sub getTroffLineOpposite {
my ($opt) = @_;
if ($opt =~ /^ps=/) {".ps"}
elsif ($opt =~ /^vs=/) {".vs"}
else { print STDERR "didn't recognised '$opt'"; ""}
}
# language for codeblocks
my $lang = '';
my @options = ();
foreach (<>) {
chomp;
s/\s+$//;
my $line = $_;
my $is_dot_Source = $line =~ /^[.']\s*(``|SOURCE)(|\s+.*)$/;
unless ( $is_dot_Source ) { # not a '.SOURCE' line
if ( $source_mode ) { # is running in SOURCE mode
push @lines, $line;
} else { # normal line, not SOURCE-related
print $line;
}
next;
}
##########
# now the line is a '.SOURCE' line
my $args = $line;
$args =~ s/\s+$//; # remove final spaces
$args =~ s/^[.']\s*(``|SOURCE)\s*//; # omit .source part, leave the arguments
my @args = split /\s+/, $args;
##########
# start SOURCE mode
$lang = $args[0] if ( @args > 0 && $args[0] ne 'stop' );
if ( @args > 0 && $args[0] ne 'stop' ) {
# For '.``' no args or first arg 'start' means opening 'SOURCE' mode.
# Everything else means an ending command.
shift @args;
@options = @args;
if ( $source_mode ) {
# '.SOURCE' was started twice, ignore
print STDERR q('.``' starter was run several times);
next;
} else { # new SOURCE start
$source_mode = 1;
@lines = ();
next;
}
}
##########
# now the line must be a SOURCE ending line (stop)
unless ( $source_mode ) {
print STDERR 'ghighlight.pl: there was a SOURCE ending without being in ' .
'SOURCE mode:';
print STDERR ' ' . $line;
next;
}
$source_mode = 0; # 'SOURCE' stop calling is correct
my $shopts = $ENV{"SHOPTS"} || "";
##########
# Run source-highlight on lines
# Check if language was specified
my $cmdline = "source-highlight -f $macros $shopts --output STDOUT";
if ($lang ne '') {
$cmdline .= " -s $lang";
}
# Start `source-highlight`
my $pid = open3(my $child_in, my $child_out, my $child_err, $cmdline)
or die "open3() failed $!";
# Provide source code to `source-highlight` in its standard input
print $child_in $_ for @lines;
close $child_in;
if (my $v = $ENV{"GH_INTRO"}) {
print for split /;/, $v;
}
for (@options) {
my $l = getTroffLine $_;
print $l if ($l ne "");
}
# Print `source-highlight` output
while (<$child_out>) {
chomp;
print;
}
close $child_out;
for (reverse @options) {
my $l = getTroffLineOpposite $_;
print $l if ($l ne "");
}
if (my $v = $ENV{"GH_OUTRO"}) {
print for split /;/, $v;
}
my @print_res = (1);
# Start argument processing
# remove 'stop' arg if exists
# shift @args if ( $args[0] eq 'stop' );
# if ( @args == 0 ) {
# # no args for saving, so @print_res doesn't matter
# next;
# }
# my @var_names = ();
# my @mode_names = ();
# my $mode = '.ds';
# for ( @args ) {
# if ( /^\.?ds$/ ) {
# $mode = '.ds';
# next;
# }
# if ( /^\.?nr$/ ) {
# $mode = '.nr';
# next;
# }
# push @mode_names, $mode;
# push @var_names, $_;
# }
# my $n_vars = @var_names;
# if ( $n_vars < $n_res ) {
# print STDERR 'ghighlight: not enough variables for Python part: ' .
# $n_vars . ' variables for ' . $n_res . ' output lines.';
# } elsif ( $n_vars > $n_res ) {
# print STDERR 'ghighlight: too many variablenames for Python part: ' .
# $n_vars . ' variables for ' . $n_res . ' output lines.';
# }
# if ( $n_vars < $n_res ) {
# print STDERR 'ghighlight: not enough variables for Python part: ' .
# $n_vars . ' variables for ' . $n_res . ' output lines.';
# }
# my $n_min = $n_res;
# $n_min = $n_vars if ( $n_vars < $n_res );
# exit unless ( $n_min );
# $n_min -= 1; # for starting with 0
# for my $i ( 0..$n_min ) {
# my $value = $print_res[$i];
# chomp $value;
# print $mode_names[$i] . ' ' . $var_names[$i] . ' ' . $value;
# }
}
1;
# Local Variables:
# mode: CPerl
# End:

View File

@ -1,69 +0,0 @@
.\" Startup file for troff.
.
.\" This is tested by pic.
.nr 0p 0
.
.\" Load composite mappings.
.do mso composite.tmac
.
.\" Load generic fallback mappings.
.do mso fallbacks.tmac
.
.\" Use .do here, so that it works with -C.
.\" The groff command defines the .X string if the -X option was given.
.ie r.X .do ds troffrc!ps Xps.tmac
.el .do ds troffrc!ps ps.tmac
.do ds troffrc!pdf pdf.tmac
.do ds troffrc!dvi dvi.tmac
.do ds troffrc!X75 X.tmac
.do ds troffrc!X75-12 X.tmac
.do ds troffrc!X100 X.tmac
.do ds troffrc!X100-12 X.tmac
.do ds troffrc!ascii tty.tmac
.do ds troffrc!latin1 tty.tmac
.do ds troffrc!utf8 tty.tmac
.do ds troffrc!cp1047 tty.tmac
.do ds troffrc!lj4 lj4.tmac
.do ds troffrc!lbp lbp.tmac
.do ds troffrc!html html.tmac
.do if d troffrc!\*[.T] \
. do mso \*[troffrc!\*[.T]]
.do rm troffrc!ps troffrc!Xps troffrc!dvi troffrc!X75 troffrc!X75-12 \
troffrc!X100 troffrc!X100-12 troffrc!lj4 troff!lbp troffrc!html troffrc!pdf
.
.\" Test whether we work under EBCDIC and map the no-breakable space
.\" character accordingly.
.do ie '\[char97]'a' \
. do tr \[char160]\~
.el \
. do tr \[char65]\~
.
.\" Set the hyphenation language to 'us'.
.do hla us
.
.\" Disable hyphenation:
.\" Do not load hyphenation patterns and exceptions.
.\"do hpf hyphen.us
.\"do hpfa hyphenex.us
.
.\" Disable adjustment by default,
.\" such that manuals look similar with groff and mandoc(1).
.\".ad l
.\".de ad
.\"..
.\" Handle paper formats.
.do mso papersize.tmac
.
.\" Handle PS images.
.do mso pspic.tmac
.do mso pdfpic.tmac
.
.\" ====================================================================
.\" Editor settings
.\" ====================================================================
.
.\" Local Variables:
.\" mode: nroff
.\" fill-column: 72
.\" End:
.\" vim: set filetype=groff textwidth=72:

View File

@ -1,154 +0,0 @@
#!/bin/sh
# This program isn't by any mean complete.
# Most of text markers, accents and ligatures are handled.
# However, nothing else currently is.
# Please, do provide more translations.
# Convert input into hexadecimal and a single byte per line.
to_hex_one_column() xxd -p -c 1
# Reverse hexadecimal to original value.
from_hex() xxd -p -r
regroup_lines() awk '
BEGIN {
line_start=1
}
{
if (line_start == 1)
line = $1;
else
line = line " " $1;
line_start = 0;
if ($1 == "0a") {
print line;
line_start = 1
}
}
END {
if (line_start == 0)
print line
}
'
accents() sed \
-e "s/c3 81/5c 5b 27 41 5d/g"\
-e "s/c3 89/5c 5b 27 45 5d/g"\
-e "s/c3 8d/5c 5b 27 49 5d/g"\
-e "s/c3 93/5c 5b 27 4f 5d/g"\
-e "s/c3 9a/5c 5b 27 55 5d/g"\
-e "s/c3 9d/5c 5b 27 59 5d/g"\
-e "s/c3 a1/5c 5b 27 61 5d/g"\
-e "s/c3 a9/5c 5b 27 65 5d/g"\
-e "s/c3 ad/5c 5b 27 69 5d/g"\
-e "s/c3 b3/5c 5b 27 6f 5d/g"\
-e "s/c3 ba/5c 5b 27 75 5d/g"\
-e "s/c3 bd/5c 5b 27 79 5d/g"\
-e "s/c3 84/5c 5b 3a 41 5d/g"\
-e "s/c3 8b/5c 5b 3a 45 5d/g"\
-e "s/c3 8f/5c 5b 3a 49 5d/g"\
-e "s/c3 96/5c 5b 3a 4f 5d/g"\
-e "s/c3 9c/5c 5b 3a 55 5d/g"\
-e "s/c3 a4/5c 5b 3a 61 5d/g"\
-e "s/c3 ab/5c 5b 3a 65 5d/g"\
-e "s/c3 af/5c 5b 3a 69 5d/g"\
-e "s/c3 b6/5c 5b 3a 6f 5d/g"\
-e "s/c3 bc/5c 5b 3a 75 5d/g"\
-e "s/c3 bf/5c 5b 3a 79 5d/g"\
-e "s/c3 82/5c 5b 5e 41 5d/g"\
-e "s/c3 8a/5c 5b 5e 45 5d/g"\
-e "s/c3 8e/5c 5b 5e 49 5d/g"\
-e "s/c3 94/5c 5b 5e 4f 5d/g"\
-e "s/c3 9b/5c 5b 5e 55 5d/g"\
-e "s/c3 a2/5c 5b 5e 61 5d/g"\
-e "s/c3 aa/5c 5b 5e 65 5d/g"\
-e "s/c3 ae/5c 5b 5e 69 5d/g"\
-e "s/c3 b4/5c 5b 5e 6f 5d/g"\
-e "s/c3 bb/5c 5b 5e 75 5d/g"\
-e "s/c3 80/5c 5b 60 41 5d/g"\
-e "s/c3 88/5c 5b 60 45 5d/g"\
-e "s/c3 8c/5c 5b 60 49 5d/g"\
-e "s/c3 92/5c 5b 60 4f 5d/g"\
-e "s/c3 99/5c 5b 60 55 5d/g"\
-e "s/c3 a0/5c 5b 60 61 5d/g"\
-e "s/c3 a8/5c 5b 60 65 5d/g"\
-e "s/c3 ac/5c 5b 60 69 5d/g"\
-e "s/c3 b2/5c 5b 60 6f 5d/g"\
-e "s/c3 b9/5c 5b 60 75 5d/g"\
-e "s/c3 83/5c 5b 7e 41 5d/g"\
-e "s/c3 91/5c 5b 7e 4e 5d/g"\
-e "s/c3 95/5c 5b 7e 4f 5d/g"\
-e "s/c3 a3/5c 5b 7e 61 5d/g"\
-e "s/c3 b1/5c 5b 7e 6e 5d/g"\
-e "s/c3 b5/5c 5b 7e 6f 5d/g"\
-e "s/c3 87/5c 5b 2c 43 5d/g"\
-e "s/c3 a7/5c 5b 2c 63 5d/g"\
-e "s/c3 85/5c 5b 6f 41 5d/g"\
-e "s/c3 a5/5c 5b 6f 61 5d/g"\
-e "s/c5 b8/5c 5b 3a 59 5d/g"\
-e "s/c5 a0/5c 5b 76 53 5d/g"\
-e "s/c5 a1/5c 5b 76 73 5d/g"\
-e "s/c5 bd/5c 5b 76 5a 5d/g"\
-e "s/c5 be/5c 5b 76 7a 5d/g"
# Ligatures.
ligatures() sed \
-e "s/ef ac 80/5c 5b 66 66 5d/g"\
-e "s/ef ac 81/5c 5b 66 69 5d/g"\
-e "s/ef ac 82/5c 5b 66 6c 5d/g"\
-e "s/ef ac 83/5c 5b 46 69 5d/g"\
-e "s/ef ac 84/5c 5b 46 6c 5d/g"\
-e "s/c5 81/5c 5b 2f 4c 5d/g"\
-e "s/c5 82/5c 5b 2f 6c 5d/g"\
-e "s/c3 98/5c 5b 2f 4f 5d/g"\
-e "s/c3 b8/5c 5b 2f 6f 5d/g"\
-e "s/c3 86/5c 5b 41 45 5d/g"\
-e "s/c3 a6/5c 5b 61 65 5d/g"\
-e "s/c5 92/5c 5b 4f 45 5d/g"\
-e "s/c5 93/5c 5b 6f 65 5d/g"\
-e "s/c4 b2/5c 5b 49 4a 5d/g"\
-e "s/c4 b3/5c 5b 69 6a 5d/g"\
-e "s/c4 b1/5c 5b 2e 69 5d/g"\
-e "s/c8 b7/5c 5b 2e 6a 5d/g"
# Text markers.
text_markers() sed \
-e "s/e2 97 8b/5c 5b 63 69 5d/g"\
-e "s/e2 80 a2/5c 5b 62 75 5d/g"\
-e "s/e2 80 a1/5c 5b 64 64 5d/g"\
-e "s/e2 80 a0/5c 5b 64 67 5d/g"\
-e "s/e2 97 8a/5c 5b 6c 7a 5d/g"\
-e "s/e2 96 a1/5c 5b 73 71 5d/g"\
-e "s/c2 b6/5c 5b 70 73 5d/g"\
-e "s/c2 a7/5c 5b 73 63 5d/g"\
-e "s/e2 98 9c/5c 5b 6c 68 5d/g"\
-e "s/e2 98 9e/5c 5b 72 68 5d/g"\
-e "s/e2 86 b5/5c 5b 43 52 5d/g"\
-e "s/e2 9c 93/5c 5b 4f 4b 5d/g"
# These markers shouldn't be automatically translated in ms macros.
# @ "s/40/5c 5b 61 74 5d/g"
# # "s/23/5c 5b 73 68 5d/g"
# Legal symbols.
legal_symbols() sed \
-e "s/c2 a9/5c 5b 63 6f 5d/g"\
-e "s/c2 ae/5c 5b 72 67 5d/g"\
-e "s/e2 84 a2/5c 5b 74 6d 5d/g"
# TODO: ├─│└
misc() sed \
-e "s/e2 94 9c/+/g"\
-e "s/e2 94 80/-/g"\
-e "s/e2 94 82/|/g"\
-e 's/e2 94 94/+/g'
hexutf8_to_hexms() {
text_markers | accents | ligatures | legal_symbols | misc
}
to_hex_one_column | regroup_lines | hexutf8_to_hexms | from_hex

View File

@ -1,69 +0,0 @@
.G1
copy "legend.grap"
frame invis ht 3 wid 4 left solid bot solid
coord y 0,50
ticks left out from 0 to 50 by 10
ticks bot out at 50000 "50,000", 100000 "100,000", 150000 "150,000", 200000 "200,000", 250000 "250,000"
label left "Request duration with" unaligned "an index (µs)" "(Median)" left 0.8
label bot "Number of cars in the database" down 0.1
obram = obuncache = obcache = obsemi = 0 # old bullets
cbram = cbuncache = cbcache = cbsemi = 0 # current bullets
legendxleft = 100000
legendxright = 250000
legendyup = 15
legendydown = 2
boite(legendxleft,legendxright,legendyup,legendydown)
legend(legendxleft,legendxright,legendyup,legendydown)
copy "../data/index.d" thru X
cx = $1*5
y_scale = 1000
# ram cached semi uncached
line from cx,$2/y_scale to cx,$4/y_scale
line from cx,$5/y_scale to cx,$7/y_scale
line from cx,$8/y_scale to cx,$10/y_scale
line from cx,$11/y_scale to cx,$13/y_scale
#ty = $3
cx = $1*5
cbram = $3/y_scale
cbcache = $6/y_scale
cbsemi = $9/y_scale
cbuncache = $12/y_scale
if (obram > 0) then {line from cx,cbram to ox,obram}
if (obcache > 0) then {line from cx,cbcache to ox,obcache}
.gcolor blue
if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi}
.gcolor
.gcolor green
if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache}
.gcolor
obram = cbram
obcache = cbcache
obsemi = cbsemi
obuncache = cbuncache
ox = cx
# ram cached semi uncached
.gcolor red
bullet at cx,cbram
.gcolor
bullet at cx,cbcache
.gcolor blue
bullet at cx,cbsemi
.gcolor
.gcolor green
bullet at cx,cbuncache
.gcolor
X
.G2

View File

@ -1,66 +0,0 @@
.G1
copy "legend.grap"
frame invis ht 3 wid 4 left solid bot solid
coord x 0,5000*2 y 0,350
ticks left out from 0 to 350 by 50
label left "Request duration" unaligned "for a partition (ms)" "(Median)" left 0.8
label bot "Number of cars matching the partition" down 0.1
obram = obuncache = obcache = obsemi = 0
cbram = cbuncache = cbcache = cbsemi = 0
legendxleft = 1000
legendxright = 6500
legendyup = 330
legendydown = 230
boite(legendxleft,legendxright,legendyup,legendydown)
legend(legendxleft,legendxright,legendyup,legendydown)
copy "../data/partitions.d" thru X
cx = $1*2
y_scale = 1000000
# ram cached semi uncached
line from cx,$2/y_scale to cx,$4/y_scale
line from cx,$5/y_scale to cx,$7/y_scale
line from cx,$8/y_scale to cx,$10/y_scale
line from cx,$11/y_scale to cx,$13/y_scale
#ty = $3
cbram = $3/y_scale
cbcache = $6/y_scale
cbsemi = $9/y_scale
cbuncache = $12/y_scale
if (obram > 0) then {line from cx,cbram to ox,obram}
if (obcache > 0) then {line from cx,cbcache to ox,obcache}
.gcolor blue
if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi}
.gcolor
.gcolor green
if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache}
.gcolor
obram = cbram
obcache = cbcache
obsemi = cbsemi
obuncache = cbuncache
ox = cx
# ram cached semi uncached
.gcolor red
bullet at cx,cbram
.gcolor
bullet at cx,cbcache
.gcolor blue
bullet at cx,cbsemi
.gcolor
.gcolor green
bullet at cx,cbuncache
.gcolor
X
.G2

View File

@ -1,65 +0,0 @@
.G1
copy "legend.grap"
frame invis ht 3 wid 4 left solid bot solid
coord x 0,5000 y 0,170
ticks left out from 0 to 170 by 20
label left "Request duration" unaligned "for a tag (ms)" "(Median)" left 0.8
label bot "Number of cars matching the tag" down 0.1
obram = obuncache = obcache = obsemi = 0
cbram = cbuncache = cbcache = cbsemi = 0
legendxleft = 200
legendxright = 3000
legendyup = 170
legendydown = 120
boite(legendxleft,legendxright,legendyup,legendydown)
legend(legendxleft,legendxright,legendyup,legendydown)
copy "../data/tags.d" thru X
cx = $1
y_scale = 1000000
# ram cached semi uncached
line from cx,$2/y_scale to cx,$4/y_scale
line from cx,$5/y_scale to cx,$7/y_scale
line from cx,$8/y_scale to cx,$10/y_scale
line from cx,$11/y_scale to cx,$13/y_scale
#ty = $3
cbram = $3/y_scale
cbcache = $6/y_scale
cbsemi = $9/y_scale
cbuncache = $12/y_scale
if (obram > 0) then {line from cx,cbram to ox,obram}
if (obcache > 0) then {line from cx,cbcache to ox,obcache}
.gcolor blue
if (obsemi > 0) then {line from cx,cbsemi to ox,obsemi}
.gcolor
.gcolor green
if (obuncache > 0) then {line from cx,cbuncache to ox,obuncache}
.gcolor
obram = cbram
obcache = cbcache
obsemi = cbsemi
obuncache = cbuncache
ox = cx
# ram cached semi uncached
.gcolor red
bullet at cx,cbram
.gcolor
bullet at cx,cbcache
.gcolor blue
bullet at cx,cbsemi
.gcolor
.gcolor green
bullet at cx,cbuncache
.gcolor
X
.G2

View File

@ -1,334 +0,0 @@
.so macros.roff
.TITLE Document Oriented DataBase (DODB)
.AUTHOR Philippe P.
.ABSTRACT1
DODB is a database-as-library, enabling a very simple way to store applications' data: storing serialized
.I documents
(basically any data type) in plain files.
To speed-up searches, attributes of these documents can be used as indexes which leads to create a few symbolic links
.I symlinks ) (
on the disk.
This document briefly presents DODB and its main differences with other database engines.
An experiment is described and analysed to understand the performance that can be expected from this approach.
.ABSTRACT2
.SINGLE_COLUMN
.SECTION Introduction to DODB
A database consists in managing data, enabling queries (preferably fast) to retrieve, to modify, to add and to delete a piece of information.
Anything else is
.UL accessory .
Universities all around the world teach about Structured Query Language (SQL) and relational databases.
.
.de PRIMARY_KEY
.I \\$1 \\$2 \\$3
..
.de FOREIGN_KEY
.I \\$1 \\$2 \\$3
..
.UL "Relational databases"
are built around the idea to put data into
.I tables ,
with typed columns so the database can optimize operations and storage.
A database is a list of tables with relations between them.
For example, let's imagine a database of a movie theater.
The database will have a
.I table
for the list of movies they have
.PRIMARY_KEY idmovie , (
title, duration, synopsis),
a table for the scheduling
.PRIMARY_KEY idschedule , (
.FOREIGN_KEY idmovie ,
.FOREIGN_KEY idroom ,
time slot),
a table for the rooms
.PRIMARY_KEY idroom , (
name), etc.
Tables have relations, for example the table "scheduling" has a column
.I idmovie
which points to entries in the "movie" table.
.UL "The SQL language"
enables arbitrary operations on databases: add, search, modify and delete entries.
Furthermore, SQL also enables to manage administrative operations of the databases themselves: creating databases and tables, managing users with fine-grained authorizations, etc.
SQL is used between the application and the database, to perform operations and to provide results when due.
SQL is also used
.UL outside
the application, by admins for managing databases and potentially by some
.I non-developer
users to retrieve some data without a dedicated interface\*[*].
.FOOTNOTE1
One of the first objectives of SQL was to enable a class of
.I non-developer
users to talk directly to the database so they can access the data without bothering the developers.
This has value for many companies and organizations.
.FOOTNOTE2
Many tools were used or even developed over the years specifically to aleviate the inherent complexity and limitations of SQL.
For example, designing databases becomes difficult when the list of tables grows;
Unified Modeling Language (UML) is then used to provide a graphical overview of the relations between tables.
SQL databases may be fast to retrieve data despite complicated operations, but when multiple sequential operations are required they become slow because of all the back-and-forths with the application;
thus, SQL databases can be scripted to automate operations and provide a massive speed up
.I "stored procedures" , (
see
.I "PL/SQL" ).
Writing SQL requests requires a lot of boilerplate since there is no integration in the programming languages, leading to multiple function calls for any operation on the database;
thus, object-relational mapping (ORM) libraries were created to reduce the massive code duplication.
And so on.
For many reasons, SQL is not a silver bullet to
.I solve
the database problem.
The encountered difficulties mentioned above and the original objectives of SQL not being universal\*[*], other database designs were created\*[*].
.FOOTNOTE1
To say the least!
Not everyone needs to let users access the database without going through the application.
For instance, writing a \f[I]blog\f[] for a small event or to share small stories about your life doesn't require manual operations on the database, fortunately.
.FOOTNOTE2
.FOOTNOTE1
A lot of designs won't be mentioned here.
The actual history of databases is often quite unclear since the categories of databases are sometimes vague, underspecified.
As mentioned, SQL is not a silver bullet and a lot of developers shifted towards other solutions, that's the important part.
.FOOTNOTE2
The NoSQL movement started because the stated goals of many actors from the early Web boom were different from SQL.
The need for very fast operations far exceeded what was practical at the moment with SQL.
This led to the use of more basic methods to manage data such as
.I "key-value stores" ,
which simply associate a value with an
.I index
for fast retrieval.
In this case, there is no need for the database to have
.I tables ,
data may be untyped, the entries may even have different attributes.
Since homogeneity is not necessary anymore, databases have fewer (or different) constraints.
Document-oriented databases are a sub-class of key-value stores, where metadata can be extracted from the entries for further optimizations.
And that's exactly what is being done in Document Oriented DataBase (DODB).
.UL "Contrary to SQL" ,
DODB has a very narrow scope: to provide a library enabling to store, retrieve, modify and delete data.
In this way, DODB transforms any application in a database manager.
DODB doesn't provide an interactive shell, there is no request language to perform arbitrary operations on the database, no statistical optimizations of the requests based on query frequencies, etc.
Instead, DODB reduces the complexity of the infrastructure, stores data in plain files and enables simple manual scripting with widespread unix tools.
Simplicity is key.
.UL "Contrary to other NoSQL databases" ,
DODB doesn't provide an application but a library, nothing else.
The idea is to help developers to store their data themselves, not depending on
. I yet-another-all-in-one
massive tool.
The library writes (and removes) data on a storage device, has a few retrieval and update mechanisms and that's it\*[*].
.FOOTNOTE1
The lack of features
.I is
the feature.
Even with that motto, the tool still is expected to be convenient for most applications.
.FOOTNOTE2
This document will provide an extensive documentation on how DODB works and how to use it.
The presented code is in Crystal such as the DODB library for now, but keep in mind that this document is all about the method more that the actual implementation, anyone could implement the exact same library in almost every other language.
Limitations are also clearly stated in a dedicated section.
A few experiments are described to provide an overview of the performance you can expect from this approach.
Finally, a conclusion is drawn based on a real-world usage of this library.
.
.SECTION How DODB works and basic usage
DODB is a hash table.
The key of the hash is an auto-incremented number and the value is the stored data.
The following section will explain how to use DODB for basic cases including the few added mechanisms to speed-up searches.
Also, the file-system representation of the data will be presented since it enables easy off-application searches.
.SS Before starting: the example database
First things first, the following code is the structure used in the rest of the document to present the different aspects of DODB.
This is a simple object
.I Car ,
with a name, a color and a list of associated keywords (fast, elegant, etc.).
.SOURCE Ruby ps=10
class Car
property name : String
property color : String
property keywords : Array(String)
end
.SOURCE
.SS DODB basic usage
Let's create a DODB database for our cars.
.SOURCE Ruby ps=10
# Database creation
db = DODB::DataBase(Car).new "path/to/db-cars"
# Adding an element to the db
db << Car.new "Corvet GT", "red", ["elegant", "fast"]
# Reaching all objects in the db
db.each do |car|
pp! car
end
.SOURCE
.SS Storing data
When a value is added, it is serialized\*[*] and written in a dedicated file.
.FOOTNOTE1
Serialization is currently in JSON.
CBOR is a work-in-progress.
Nothing binds DODB to a particular format.
.FOOTNOTE2
The key of the hash is a number, auto-incremented, used as the name of the stored file.
The following example shows the content of the file system after adding three values.
.de TREE1
.QP
.KS
.ft CW
.b1
.nf
..
.de TREE2
.ft
.fi
.b2
.KE
.QE
..
.TREE1
$ tree storage/
storage
`-- data
  +-- 0000000000
  +-- 0000000001
  `-- 0000000002
.TREE2
In this example, the directory
.I storage/data
contains all three serialized values, with a formated number as their file name.
.SS Indexes
Database entries can be
.I indexed
based on their attributes.
There are currently three main ways to search a value by its attributes: basic indexes, partitions and tags.
.SSS Basic indexes (1 to 1 relation)
Basic indexes represent one-to-one relations, such as an index in SQL.
For example, in a database of
.I cars ,
each car can have a dedicted (unique) name.
This
.I name
attribute can be used to speed-up searches.
On the file-system, this will be translated as this:
.TREE1
storage
+-- data
|  `-- 0000000000
`-- indexes
   `-- by_name
   `-- Ford C-MAX -> ../../data/0000000000
.TREE2
As shown, the file "Ford C-MAX" is a symbolic link to a data file.
The name of the symlink file has been extracted from the value itself, enabling to list all the cars and their names with a simple
.UL ls
in the
.I storage/indexes/by_name/
directory.
.TBD
.SECTION A few more options
.TBD
.SECTION Limits of DODB
.TBD
.SECTION Experimental scenario
.LP
The following experiment shows the performance of DODB based on quering durations.
Data can be searched via
.I indexes ,
as for SQL databases.
Three possible indexes exist in DODB:
(a) basic indexes, representing 1 to 1 relations, the document's attribute is related to a value and each value of this attribute is unique,
(b) partitions, representing 1 to n relations, the attribute has a value and this value can be shared by other documents,
(c) tags, representing n to n relations, enabling the attribute to have multiple values whose are shared by other documents.
The scenario is simple: adding values to a database with indexes (basic, partitions and tags) then query 100 times a value based on the different indexes.
Loop and repeat.
Four instances of DODB are tested:
.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory) ;
.BULLET \fIuncached data but cached index\f[] shows the improvement you can expect by having a cache on indexes ;
.BULLET \fIcached database\f[] shows the most basic use of DODB\*[*] ;
.BULLET \fIRAM only\f[], the database doesn't have a representation on disk (no data is written on it).
The \fIRAM only\f[] instance shows a possible way to use DODB: to keep a consistent API to store data, including in-memory data with a lifetime related to the application's.
.ENDBULLET
.FOOTNOTE1
Having a cached database will probably be the most widespread use of DODB.
When memory isn't scarce, there is no point not using it to achieve better performance.
.FOOTNOTE2
The computer on which this test is performed\*[*] is a AMD PRO A10-8770E R7 (4 cores), 2.8 GHz.When mentioned, the
.I disk
is actually a
.I "temporary file-system (tmpfs)"
to enable maximum efficiency.
.FOOTNOTE1
A very simple $50 PC, buyed online.
Nothing fancy.
.FOOTNOTE2
The library is written in Crystal and so is the benchmark (\f[CW]spec/benchmark-cars.cr\f[]).
Nonetheless, despite a few technicalities, the objective of this document is to provide an insight on the approach used in DODB more than this particular implementation.
The manipulated data type can be found in \f[CW]spec/db-cars.cr\f[].
.SOURCE Ruby ps=9 vs=9p
class Car
property name : String # 1-1 relation
property color : String # 1-n relation
property keywords : Array(String) # n-n relation
end
.SOURCE
.
.SS Basic indexes (1 to 1 relations)
.LP
An index enables to match a single value based on a small string.
In our example, each \f[CW]car\f[] has an unique \fIname\f[] which is used as an index.
The following graph represents the result of 100 queries of a car based on its name.
The experiment starts with a database containing 1,000 cars and goes up to 250,000 cars.
.so graph_query_index.grap
Since there is only one value to retrieve, the request is quick and time is almost constant.
When the value and the index are kept in memory (see \f[CW]RAM only\f[] and \f[CW]Cached db\f[]), the retrieval is almost instantaneous (about 50 to 120 ns).
In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db, cached index\f[]).
The request is a little longer when the index isn't cached (see \f[CW]Uncached db and index\f[]); in this case DODB walks the file-system to find the right symlink to follow, thus slowing the process even more, by up to 20%.
.TS
allbox tab(:);
c | lw(4.0i) | cew(1.4i).
DODB instance:Comment and database usage:T{
compared to RAM only
T}
RAM only:T{
Worst memory footprint (all data must be in memory), best performance.
T}:-
Cached db and index:T{
Performance for retrieving a value is the same as RAM only while
enabling the admin to manually search for data on-disk.
T}:about the same perfs
Uncached db, cached index::300 to 400x slower
Uncached db and index:T{
Best memory footprint, worst performance.
T}:400 to 500x slower
.TE
.B Conclusion :
as expected, retrieving a single value is fast and the size of the database doesn't matter much.
Each deserialization and, more importantly, each disk access is a pain point.
Caching the value enables a massive performance gain, data can be retrieved several hundred times quicker.
.bp
.SS Partitions (1 to n relations)
.LP
.so graph_query_partition.grap
.bp
.SS Tags (n to n relations)
.LP
.so graph_query_tag.grap
.
.SECTION Future work
.TBD
.SECTION Conclusion
.TBD

View File

@ -1,47 +0,0 @@
define boite {
xleft = $1
xright = $2
yup = $3
ydown = $4
line from xleft,ydown to xright,ydown
line from xleft,yup to xright,yup
line from xleft,yup to xleft,ydown
line from xright,yup to xright,ydown
}
define legend {
xleft = $1
xright = $2
yup = $3
ydown = $4
diffx = xright - xleft
diffy = yup - ydown
hdiff = diffy/4.3
cy = yup - (diffy/6)
cx = (diffx/20) + xleft
lstartx = cx
lendx = cx + diffx/8
tstartx = lendx + diffx/20
.gcolor red
line from lstartx,cy to lendx,cy
.gcolor
"RAM only" ljust at tstartx,cy
cy = cy - hdiff
line from lstartx,cy to lendx,cy
"Cached db and index" ljust at tstartx,cy
cy = cy - hdiff
.gcolor blue
line from lstartx,cy to lendx,cy
.gcolor
"Uncached db, cached index" ljust at tstartx,cy
cy = cy - hdiff
.gcolor green
line from lstartx,cy to lendx,cy
.gcolor
"Uncached db and index" ljust at tstartx,cy
}

View File

@ -1,624 +0,0 @@
.\" .RP = report document
.nr PO 0.5i \" page offset default 1i
.nr LL 7.0i \" line length default 6i
.nr FM 0.3i \" page foot margin default 1i
.nr DI 0
.nr FF 3 \" footnotes' type: numbered, with point, indented
.nr PS 12
.
.nr LIST_NUMBER 0 +1
.
.R1
no-label-in-reference
accumulate
.R2
.
. \" COLORS
.defcolor darkgreen rgb 0.1 0.5 0.2
.defcolor darkblue rgb 0.3 0.3 0.7
.defcolor darkred rgb 0.7 0.3 0.3
.defcolor black rgb 0 0 0
.defcolor color_box rgb 1 1 .6
.
. \" with semantic
.defcolor citation rgb 0.4 0.4 0.4
.defcolor citationbar rgb 0.3 0.3 0.7
.defcolor explanation rgb 0.7 0.4 0.4
.defcolor explanationbar rgb 0.8 0.3 0.3
.
.defcolor specialcolor_command rgb 0.7 0.3 0.3
.defcolor specialcolor_type rgb 0.6 0.3 0.5
.defcolor specialcolor_constructor rgb 0.1 0.5 0.2
.defcolor specialcolor_module rgb 0.1 0.5 0.2
.defcolor specialcolor_function rgb 0.4 0.4 0.7
.defcolor specialcolor_question rgb 0.0 0.0 0.7
.defcolor specialcolor_operator rgb 0.3 0.8 0.3
.defcolor specialcolor_shine rgb 0.3 0.3 0.7
.
. \" SIZES
.nr specialsize_command 10
.nr specialsize_type 8
.nr specialsize_constructor 8
.nr specialsize_module 8
.nr specialsize_function 8
.nr specialsize_operator 9
.nr specialsize_question 10 \" Current point size, no change.
.nr specialsize_shine 11
.
. \" FONTS
.ds specialfont_command CW
.ds specialfont_type CW
.ds specialfont_constructor CW
.ds specialfont_module CW
.ds specialfont_function CW
.ds specialfont_operator CW
.ds specialfont_question I
.ds specialfont_shine B
.
.
.de BELLOWEXPLANATION1
.sp 0.5
.ps 7 \" point size (~= font size)
.vs 8p \" vertical spacing between lines
..
.de BELLOWEXPLANATION2
.br
.ps 9
.vs 11p
..
.
.\" BULLET and ENUM => do not add space when no parameter are provided
.de BULLET \" Bullet points
.IP \(bu 2
.ie '\\$1'' \
.
.el \\$*
..
.de ENDBULLET
.in -2 \" indent
..
.
.de ENUM \" Numbered list
.nr LIST_NUMBER +1
.IP \\n[LIST_NUMBER] 2
.ie '\\$1'' \
.
.el \\$*
..
.de ENDENUM
.nr LIST_NUMBER 0
.in -2 \" indent
..
.
.de b1 \" Begin code box
.B1
.sp 0.2
.ft CW
..
.de b2 \" End code box
.sp 0.5
.B2
.ft
..
.
.de CITATION1
.KS \" start a keep
.ft I \" citation in italics
.mk C \" set a marker for line drawing
.in +1 \" indent a bit
.gcolor citation
..
.ig
The CITATION2 macro closes the quote then draws a line
from current line to the start of the quote.
..
.de CITATION2
.mk D \" set second marker to come back here
.ft \" back to previous font
.in -1 \" remove indent
.gcolor \" remove previous color
.gcolor citationbar
.\" r = move upward
.\" Z D t = drawing thickness
.\" L = draw the line
\r\
\Z'\D't 1p''\
\L'|\\nCu' \" draw line
.gcolor black \" remove previous color
.sp -2 \" get two lines back
\Z'\D't 1'' \" get the previous drawing thickness back
.KE \" end of the keep
..
.
.de NAMECITATION
.QP
.vs -\\n[legendps]p
.ps -\\n[legendps]
.in -1.2
.ll +1.2
\h'|-2'\(em\h'|-0.4'
\\$*
.br
.LP
..
.
.de EXPLANATION1
.KS \" start a keep
.ft B \" citation in italics
.mk C \" set a marker for line drawing
.in +1 \" indent a bit
.gcolor explanation
..
.de EXPLANATION2
.ft \" back to previous font
.in -1 \" remove indent
.gcolor \" remove previous color
.gcolor explanationbar
\r\L'|\\nCu' \" draw line (\r moves upward, \L draw the line, ...)
.gcolor \" remove previous color
.sp -1 \" get two lines back
.KE \" end of the keep
..
.
.de METAINFO1
.ft CW \" constant width font
.ps 8 \" small font
.vs 9p \" smaller vertical spacing between lines
..
.de METAINFO2
.sp 1
.vs \" come back to the previous vertical spacing
.ps \" come back to the previous point size
.ft \" come back to the previous font
.sp -1 \" return one line above
..
.
.
.de FRAC
.ie '\\$3'' \{\
\v'-.7m\s[\\n(.s*6u/10u]+.7m'\\$1\v'-.7m\s0+.7m'\
\(f/\s[\\n(.s*6u/10u]\\$2\s0
\}
.el \{\
\v'-.7m\s[\\n(.s*6u/10u]+.7m'\\$1\v'-.7m\s0+.7m'\
\(f/\s[\\n(.s*6u/10u]\\$2\s0\\$3
\}
..
.de FOOTNOTE_TO_COLUMN_WIDTH
.nr pg@fn-colw \\n[pg@colw] \" footnotes' column width
..
.de SINGLE_COLUMN
.1C
.\" .FOOTNOTE_TO_COLUMN_WIDTH
.nr FL (\n[LL]*97/100)
..
.de TWO_COLUMNS
.2C
.FOOTNOTE_TO_COLUMN_WIDTH
..
.de HORIZONTALLINE
\l'15'
.FOOTNOTE_TO_COLUMN_WIDTH
..
.
. \" Fonts and colors.
.
.de SPECIAL_WORDS
.ie !'\\$3'' \\$3\c
.nr current_size \\n[.s] \" Current point size.
.gcolor specialcolor_\\*[semantictoken]
.
.if !((\\n[current_size] == \\n[specialsize_\\*[semantictoken]]) \
.ps \\n[specialsize_\\*[semantictoken]]
.
.ie '\\$2'' \{\
\f[\\*[specialfont_\\*[semantictoken]]]\\$1\f[]
. ps \\n[current_size]
. gcolor black \" FIXME: should be the previous color
\}
.el \{\
\f[\\*[specialfont_\\*[semantictoken]]]\\$1\f[]\c
. ps \\n[current_size]
. gcolor black \" FIXME: should be the previous color
\\$2
\}
..
.de SMALLFONT
.ps 8
.vs 9p
..
.de NORMALFONT
.vs
.ps
..
.de COMMAND1
.b1
..
.de COMMAND2
.b2
..
.de COMMANDNAME
.ds semantictoken command
.SPECIAL_WORDS \\$@
..
.de FUNCTION
.ds semantictoken function
.SPECIAL_WORDS \\$@
..
.de TYPE
.ds semantictoken type
.SPECIAL_WORDS \\$@
..
.de TYPECLASS
.I "\\$1" "\\$2"
..
.de OPERATOR
.ds semantictoken operator
.SPECIAL_WORDS \\$@
..
.de QUESTION
.ds semantictoken question
.SPECIAL_WORDS \\$@
\h'5p'
..
.de CONSTRUCTOR
.ds semantictoken constructor
.SPECIAL_WORDS \\$@
..
.de MODULE
.ds semantictoken module
.SPECIAL_WORDS \\$@
..
.de SHINE
.ds semantictoken shine
.SPECIAL_WORDS \\$@
..
.de MODULEX
.MODULE \\$1 ,
..
.de TBD
.ft B
To be defined or to finish.
.ft R
..
.de ARROW
.br
\(->\h'5p' \\$*
..
.af dy 00
.af mo 00
.ds CURRENT_DATE \\n(dy/\\n(mo/\\n[year]
.ds WEBSITE https://t.karchnu.fr/doc
.ds EMAIL karchnu@karchnu.fr
.de INFORMATIONS
Check out for newer versions:
.ft CW
.ps 8
\h'2p' \\$1
.ps
.ft
.br
And if you have questions:
.ft CW
\h'13p' \\$2
.ft
.\" .DE
.LP
Lastly compiled the
.SHINE \*[CURRENT_DATE]
(day/month/year, you know, like in any sane civilization).
..
.de INFORMATIONS_FR
.LP
Nouvelles versions :
.ft CW
.ps 8
\h'2p' \\$1
.ps
.ft
.br
Questions :
.ft CW
\h'36p' \\$2
.ft
.\" .DE
.LP
Compilé pour la dernière fois le
.SHINE \*[CURRENT_DATE]
..
.
.\" RENAMING REQUESTS
.
.de SECTION
.NH
.ps +3
.fam H \" helvetica family
\\$*
.fam \" back to previous font family
.ps
.PARAGRAPH_INDENTED
..
.de SUBSECTION
.NH 2
.ps +1
.fam H \" helvetica family
\\$*
.fam \" back to previous font family
.ps
.PARAGRAPH_INDENTED
..
.de SUBSUBSECTION
.NH 3
.fam H \" helvetica family
\\$*
.fam \" back to previous font family
.ps
.PARAGRAPH_INDENTED
..
.de SUBSUBSUBSECTION
.NH 4
.fam H \" helvetica family
\\$*
.fam \" back to previous font family
.PARAGRAPH_INDENTED
..
.de SECTION_NO_NUMBER
.SH
.fam H \" helvetica family
\\$*
.fam \" back to previous font family
.PARAGRAPH_INDENTED
..
.de SUBSECTION_NO_NUMBER
.SH 2
.fam H \" helvetica family
\\$*
.fam \" back to previous font family
.PARAGRAPH_INDENTED
..
.de PARAGRAPH_INDENTED
.PP
..
.de PARAGRAPH_UNINDENTED
.LP
..
.de NO_ABSTRACT
.AB no
..
.de ABSTRACT1
.AB
..
.de ABSTRACT2
.AE
..
.ds CH Page %
.de TITLE
.TL
\\$*
.ds LH \\$*
.de HD .XX
.sp -2.3
.nr LINEWIDTH (\n[LL]/1.0i)
\l'\\\\n[LINEWIDTH]i'
.sp +1.5
.br
..XX
..
.de AUTHOR
. AU
. ie !'\\$1'' \\$*
..
.de FOOTNOTE1
. FS
..
.de FOOTNOTE2
. FE
..
.de VOCABULARY1
. KS
. BULLET
. UL "\\$*" :
..
.de VOCABULARY2
. KE
..
.
.
.de HIGHLIGHT
.
. nr @wd \w'\\$1'
. nr x1 0
. nr y1 (\\n[rst]u - \\n[rsb]u + .4m)
. nr x2 (\\n[@wd]u + .4m)
. nr y2 0
. nr x3 0
. nr y3 (\\n[rst]u - \\n[rsb]u + .4m)
. nr x4 (\\n[@wd]u + .4m)
. nr y4 0
.
\h'.2m'\
\h'-.2m'\v'(.2m - \\n[rsb]u)'\
\M[color_box]\
\D'P \\n[x1] -\\n[y1]u \\n[x2]u \\n[y2]u \\n[x3]u \\n[y3]u -\\n[x4]u \\n[y4]u '\
\h'.2m'\v'-(.2m - \\n[rsb]u)'\
\M[]\
\\$1\
\h'.2m'
..
.
.
.
.ds SPACE_SS_NUMBER_TITLE 0.5\" not a number register because of leading 0
.nr CURRENT_SECTION 0 +1
.nr CURRENT_APPENDIX 0
.af CURRENT_APPENDIX I
.nr CURRENT_SUBSECTION 0 +1
.nr CURRENT_SSSECTION 0 +1
.rm SECTION
.de SECTION
. nr CURRENT_SUBSECTION 0 \" reset current subsection numbering
. nr CURRENT_SSSECTION 0 \" reset current subsubsection numbering
. ie !(\\n[CURRENT_SECTION]=0) .sp +1
. br
. ie (\\n[APPENDIX_TIME]=0) \
. ds RH \\n+[CURRENT_SECTION].\h'\\*[SPACE_SS_NUMBER_TITLE]' \\$*
. el \{
. ds RH \\n[CURRENT_APPENDIX].\h'\\*[SPACE_SS_NUMBER_TITLE]' \\$*
. bp \}
. ps +2
. fam H \" helvetica family
. ft B
. ne 4 \" should be at least a few lines left at the bottom of the page
\\*[RH]
. ft
. fam \" back to previous font family
. ps -2
. PARAGRAPH_INDENTED
..
.nr APPENDIX_TIME 0
.de APPENDIX
. nr CURRENT_APPENDIX +1
. nr APPENDIX_TIME 1
. SECTION \\$*
..
.de SS
. nr CURRENT_SSSECTION 0
. ie (\\n[APPENDIX_TIME]=0) \
. SUBSECTION_NO_NUMBER \\n[CURRENT_SECTION].\
\\n+[CURRENT_SUBSECTION]\h'\\*[SPACE_SS_NUMBER_TITLE]' \\$*
.el \
. SUBSECTION_NO_NUMBER \\n[CURRENT_APPENDIX].\
\\n+[CURRENT_SUBSECTION]\h'\\*[SPACE_SS_NUMBER_TITLE]' \\$*
..
.de SSS
. br
. ps -2
. fam H \" helvetica family
. ft B
. ie (\\n[APPENDIX_TIME]=0) \
. SUBSECTION_NO_NUMBER \\n[CURRENT_SECTION].\
\\n[CURRENT_SUBSECTION].\\n+[CURRENT_SSSECTION]\h'\
\\*[SPACE_SS_NUMBER_TITLE]' \\$*
. el \
\\n[CURRENT_APPENDIX].\
\\n[CURRENT_SUBSECTION].\\n+[CURRENT_SSSECTION]\h'\
\\*[SPACE_SS_NUMBER_TITLE]' \\$*
. ft
. fam \" back to previous font family
. ps +2
. PARAGRAPH_INDENTED
..
.de INNERBULLET
. in +1
. br
\(bu
. in +1
. sp -1
\\$*
. in -2
..
.de EENUM \" Numbered list
. nr ENUM_INDENTATION 2
. ie !(\\n[LIST_NUMBER]=0) .in -\\n[ENUM_INDENTATION]
. br
\\n+[LIST_NUMBER].
. in +\\n[ENUM_INDENTATION]
. sp -1
\\$*
..
.de EENDENUM
. nr LIST_NUMBER 0
. in -\\n[ENUM_INDENTATION]
..
.nr legendps 2
.de LEGEND1
. QP
. vs -\\n[legendps]p
. ps -\\n[legendps]
. in -1.2
. ll +1.2
. br
..
.de LEGEND2
. br
. vs +\\n[legendps]p
. ps +\\n[legendps]
. br
. LP
..
.de IEME
\\$1\u\s-4\\$2\s+4\d
..
.de CENTERED
. ce
\\$*
. br
..
.de GIVEEXAMPLE1
. in +1
. ll -1
. KS \" start a keep
. \" .ft I \" citation in italics
. mk C \" set a marker for line drawing
. in +1 \" indent a bit
. gcolor citation
..
.de GIVEEXAMPLE2
. mk D \" set second marker to come back here
. \" .ft \" back to previous font
. in -1 \" remove indent
. gcolor black\" remove previous color
. gcolor citationbar
. \" r = move upward
. \" Z D t = drawing thickness
. \" L = draw the line
\r\
\Z'\D't 1p''\
\L'|\\nCu' \" draw line
. gcolor black \" remove previous color
. sp -2 \" get two lines back
\Z'\D't 0.5p'' \" get the previous drawing thickness back
. KE \" end of the keep
. ll +1
. in -1
..
.de ST
.nr ww \w'\\$1'
\Z@\v'-.25m'\l'\\n[ww]u'@\\$1
..
.de INCREMENT
.br
.in \\*[PINCREMENT]
.br
\h'-\\*[DECALAGE]'\\*[CHARACTER]\h'|0'\\$*
..
.de D
.ds DECALAGE 1.0
.ds PINCREMENT 2
.ds CHARACTER \(bu
.INCREMENT \\$*
..
.de DD
.ds DECALAGE 1.0
.ds PINCREMENT 3
.ds CHARACTER \(bu
.INCREMENT \\$*
..
.de AA
.ds DECALAGE 1.5
.ds PINCREMENT 3
.ds CHARACTER \(->
.INCREMENT \\$*
..
.de AAA
.ds DECALAGE 1.5
.ds PINCREMENT 4
.ds CHARACTER \(->
.INCREMENT \\$*
..
.de ED
.br
.in 0
..

View File

@ -1,5 +1,5 @@
name: dodb
version: 0.3.0
version: 0.2.2
authors:
- Luka Vandervelden <lukc@upyum.com>
@ -8,4 +8,9 @@ authors:
description: |
Simple, embeddable Document-Oriented DataBase in Crystal.
dependencies:
cbor:
branch: master
git: https://git.baguette.netlib.re/Baguette/crystal-cbor
license: MIT

View File

@ -1,181 +0,0 @@
require "benchmark"
require "./benchmark-utilities.cr"
require "../src/dodb.cr"
require "./test-data.cr"
class DODBCachedCars < DODB::CachedDataBase(Car)
property storage_dir : String
def initialize(storage_ext = "", remove_previous_data = true)
@storage_dir = "test-storage-cars-cached#{storage_ext}"
if remove_previous_data
::FileUtils.rm_rf storage_dir
end
super storage_dir
end
def rm_storage_dir
::FileUtils.rm_rf @storage_dir
end
end
class DODBUnCachedCars < DODB::DataBase(Car)
property storage_dir : String
def initialize(storage_ext = "", remove_previous_data = true)
@storage_dir = "test-storage-cars-uncached#{storage_ext}"
if remove_previous_data
::FileUtils.rm_rf storage_dir
end
super storage_dir
end
def rm_storage_dir
::FileUtils.rm_rf @storage_dir
end
end
class DODBSemiCachedCars < DODB::DataBase(Car)
property storage_dir : String
def initialize(storage_ext = "", remove_previous_data = true)
@storage_dir = "test-storage-cars-semi#{storage_ext}"
if remove_previous_data
::FileUtils.rm_rf storage_dir
end
super storage_dir
end
def rm_storage_dir
::FileUtils.rm_rf @storage_dir
end
end
def init_indexes(storage : DODB::Storage)
n = storage.new_index "name", &.name
c = storage.new_partition "color", &.color
k = storage.new_tags "keyword", &.keywords
return n, c, k
end
def init_uncached_indexes(storage : DODB::Storage)
n = storage.new_uncached_index "name", &.name
c = storage.new_uncached_partition "color", &.color
k = storage.new_uncached_tags "keyword", &.keywords
return n, c, k
end
def add_cars(storage : DODB::Storage, nb_iterations : Int32)
i = 0
car1 = Car.new "Corvet", "red", [ "shiny", "impressive", "fast", "elegant" ]
car2 = Car.new "Bullet-GT", "blue", [ "shiny", "fast", "expensive" ]
car3 = Car.new "Deudeuche", "beige", [ "curvy", "sublime" ]
car4 = Car.new "Ford-5", "red", [ "unknown" ]
car5 = Car.new "C-MAX", "gray", [ "spacious", "affordable" ]
while i < nb_iterations
car1.name = "Corvet-#{i}"
car2.name = "Bullet-GT-#{i}"
car3.name = "Deudeuche-#{i}"
car4.name = "Ford-5-#{i}"
car5.name = "C-MAX-#{i}"
storage << car1
storage << car2
storage << car3
storage << car4
storage << car5
i += 1
STDOUT.write "\radding value #{i}".to_slice
end
puts ""
end
cars_cached = DODBCachedCars.new
cars_uncached = DODBUnCachedCars.new
cars_semi = DODBSemiCachedCars.new
cached_searchby_name, cached_searchby_color, cached_searchby_keywords = init_indexes cars_cached
uncached_searchby_name, uncached_searchby_color, uncached_searchby_keywords = init_uncached_indexes cars_uncached
semi_searchby_name, semi_searchby_color, semi_searchby_keywords = init_indexes cars_semi
add_cars cars_cached, 1_000
add_cars cars_uncached, 1_000
add_cars cars_semi, 1_000
# Searching for data with an index.
Benchmark.ips do |x|
x.report("(cars db) searching a data with an index (with a cache)") do
corvet = cached_searchby_name.get "Corvet-500"
end
x.report("(cars db) searching a data with an index (semi: cache is only on index)") do
corvet = semi_searchby_name.get "Corvet-500"
end
x.report("(cars db) searching a data with an index (without a cache)") do
corvet = uncached_searchby_name.get "Corvet-500"
end
end
# Searching for data with a partition.
Benchmark.ips do |x|
x.report("(cars db) searching a data with a partition (with a cache)") do
red_cars = cached_searchby_color.get "red"
end
x.report("(cars db) searching a data with a partition (semi: cache is only on partition)") do
red_cars = semi_searchby_color.get "red"
end
x.report("(cars db) searching a data with a partition (without a cache)") do
red_cars = uncached_searchby_color.get "red"
end
end
# Searching for data with a tag.
Benchmark.ips do |x|
x.report("(cars db) searching a data with a tag (with a cache)") do
red_cars = cached_searchby_keywords.get "spacious"
end
x.report("(cars db) searching a data with a tag (semi: cache is only on tags)") do
red_cars = semi_searchby_keywords.get "spacious"
end
x.report("(cars db) searching a data with a tag (without a cache)") do
red_cars = uncached_searchby_keywords.get "spacious"
end
end
cars_cached.rm_storage_dir
cars_uncached.rm_storage_dir
cars_cached = DODBCachedCars.new
cars_uncached = DODBUnCachedCars.new
#init_indexes cars_cached
#init_indexes cars_uncached
cached_searchby_name, cached_searchby_color, cached_searchby_keywords = init_indexes cars_cached
uncached_searchby_name, uncached_searchby_color, uncached_searchby_keywords = init_uncached_indexes cars_uncached
add_cars cars_cached, 1_000
add_cars cars_uncached, 1_000
nb_run = 1000
perform_benchmark_average_verbose "(cached) search db with an index", nb_run, do
cached_searchby_name.get "Corvet-500"
end
perform_benchmark_average_verbose "(uncached) search db with an index", nb_run, do
uncached_searchby_name.get "Corvet-500"
end
cars_cached.rm_storage_dir
cars_uncached.rm_storage_dir
cars_semi.rm_storage_dir

View File

@ -1,32 +0,0 @@
def perform_something(&block)
start = Time.monotonic
yield
Time.monotonic - start
end
def perform_benchmark_average(ntimes : Int32, &block)
i = 1
sum = Time::Span.zero
while i <= ntimes
elapsed_time = perform_something &block
sum += elapsed_time
i += 1
end
sum / ntimes
end
def perform_benchmark_average_verbose(title : String, ntimes : Int32, &block)
i = 1
sum = Time::Span.zero
puts "Execute '#{title}' × #{ntimes}"
while i <= ntimes
elapsed_time = perform_something &block
sum += elapsed_time
STDOUT.write "\relapsed_time: #{elapsed_time}, average: #{sum/i}".to_slice
i += 1
end
puts ""
puts "Average: #{sum/ntimes}"
end

View File

@ -1,10 +1,10 @@
require "uuid"
require "json"
require "cbor"
# FIXME: Split the test data in separate files. We dont care about those here.
class Ship
include JSON::Serializable
include CBOR::Serializable
def_clone
@ -65,7 +65,7 @@ end
# This will be used for migration testing, but basically its a variant of
# the class above, a few extra fields, a few missing ones.
class PrimitiveShip
include JSON::Serializable
include CBOR::Serializable
property id : String
property name : String
@ -85,24 +85,3 @@ class PrimitiveShip
@@asakaze
]
end
class Car
include JSON::Serializable
property name : String # unique to each instance (1-1 relations)
property color : String # a simple attribute (1-n relations)
property keywords : Array(String) # tags about a car, example: "shiny" (n-n relations)
def_clone
def initialize(@name, @color, @keywords)
end
class_getter cars = [
Car.new("Corvet", "red", [ "shiny", "impressive", "fast", "elegant" ]),
Car.new("SUV", "red", [ "solid", "impressive" ]),
Car.new("Mustang", "red", [ "shiny", "impressive", "elegant" ]),
Car.new("Bullet-GT", "red", [ "shiny", "impressive", "fast", "elegant" ]),
Car.new("GTI", "blue", [ "average" ]),
Car.new("Deudeuch", "violet", [ "dirty", "slow", "only French will understand" ])
]
end

View File

@ -277,7 +277,7 @@ describe "DODB::DataBase" do
end
# Removing the “flagship” tag, brace for impact.
flagship, index = db_ships_by_tags.get_with_indice("flagship")[0]
flagship, index = db_ships_by_tags.get_with_indices("flagship")[0]
flagship.tags = [] of String
db[index] = flagship

View File

@ -1,5 +1,5 @@
require "file_utils"
require "json"
require "cbor"
class Hash(K,V)
def reverse
@ -44,11 +44,9 @@ class DODB::CachedDataBase(V) < DODB::Storage(V)
# FIXME: rescues any error the same way.
return nil
end
# WARNING: data isn't cloned.
# You have to do it yourself in case you modify any value,
# otherwise you may encounter problems (at least with indexes).
def [](key : Int32) : V
# raise MissingEntry.new(key) unless ::File.exists? file_path key
# read file_path key
@data[key] rescue raise MissingEntry.new(key)
end
@ -60,12 +58,12 @@ class DODB::CachedDataBase(V) < DODB::Storage(V)
# Removes any old indices or partitions pointing to a value about
# to be replaced.
if old_value
remove_indexes index, old_value
remove_partitions index, old_value
end
# Avoids corruption in case the application crashes while writing.
file_path(index).tap do |path|
::File.write "#{path}.new", value.to_json
::File.write "#{path}.new", value.to_cbor
::FileUtils.mv "#{path}.new", path
end
@ -102,7 +100,7 @@ class DODB::CachedDataBase(V) < DODB::Storage(V)
# FIXME: Only intercept “no such file" errors
end
remove_indexes key, value
remove_partitions key, value
@data.delete key
value

View File

@ -1,8 +1,12 @@
require "file_utils"
require "json"
require "cbor"
require "./dodb/*"
module DODB
class_property file_extension = ".cbor"
end
abstract class DODB::Storage(V)
property directory_name : String
@ -112,24 +116,12 @@ abstract class DODB::Storage(V)
##
# name is the name that will be used on the file system.
def new_index(name : String, &block : Proc(V, String))
CachedIndex(V).new(self, @directory_name, name, block).tap do |indexer|
@indexers << indexer
end
end
def new_nilable_index(name : String, &block : Proc(V, String | DODB::NoIndex))
CachedIndex(V).new(self, @directory_name, name, block).tap do |indexer|
@indexers << indexer
end
end
def new_uncached_index(name : String, &block : Proc(V, String))
Index(V).new(self, @directory_name, name, block).tap do |indexer|
@indexers << indexer
end
end
def new_nilable_uncached_index(name : String, &block : Proc(V, String | DODB::NoIndex))
def new_nilable_index(name : String, &block : Proc(V, String | DODB::NoIndex))
Index(V).new(self, @directory_name, name, block).tap do |indexer|
@indexers << indexer
end
@ -144,12 +136,6 @@ abstract class DODB::Storage(V)
##
# name is the name that will be used on the file system.
def new_partition(name : String, &block : Proc(V, String))
CachedPartition(V).new(self, @directory_name, name, block).tap do |table|
@indexers << table
end
end
def new_uncached_partition(name : String, &block : Proc(V, String))
Partition(V).new(self, @directory_name, name, block).tap do |table|
@indexers << table
end
@ -166,21 +152,15 @@ abstract class DODB::Storage(V)
end
def new_tags(name : String, &block : Proc(V, Array(String)))
CachedTags(V).new(self, @directory_name, name, block).tap do |tags|
@indexers << tags
end
end
def new_uncached_tags(name : String, &block : Proc(V, Array(String)))
Tags(V).new(self, @directory_name, name, block).tap do |tags|
Tags(V).new(@directory_name, name, block).tap do |tags|
@indexers << tags
end
end
def get_tags(name, key : String)
tag = @indexers.find &.name.==(name)
partition = @indexers.find &.name.==(name)
tag.not_nil!.as(DODB::Tags).get name, key
partition.not_nil!.as(DODB::Tags).get name, key
end
def new_directed_graph(name : String, index : DODB::Index(V), &block : Proc(V, Array(String))) : DirectedGraph(V)
@ -224,7 +204,7 @@ abstract class DODB::Storage(V)
end
private def file_path(key : Int32)
"#{data_path}/%010i" % key
"#{data_path}/%010i#{DODB.file_extension}" % key
end
private def locks_directory : String
@ -240,7 +220,7 @@ abstract class DODB::Storage(V)
end
private def read(file_path : String)
V.from_json ::File.read file_path
V.from_cbor ::File.read(file_path).to_slice
end
private def remove_data!
@ -270,7 +250,7 @@ abstract class DODB::Storage(V)
end
end
def remove_indexes(key : Int32, value : V)
def remove_partitions(key : Int32, value : V)
@indexers.each &.deindex(stringify_key(key), value)
end
@ -313,12 +293,12 @@ class DODB::DataBase(V) < DODB::Storage(V)
# Removes any old indices or partitions pointing to a value about
# to be replaced.
if old_value
remove_indexes index, old_value
remove_partitions index, old_value
end
# Avoids corruption in case the application crashes while writing.
file_path(index).tap do |path|
::File.write "#{path}.new", value.to_json
::File.write "#{path}.new", value.to_cbor
::FileUtils.mv "#{path}.new", path
end
@ -336,10 +316,11 @@ class DODB::DataBase(V) < DODB::Storage(V)
begin
::File.delete file_path key
rescue File::NotFoundError
rescue
# FIXME: Only intercept “no such file" errors
end
remove_indexes key, value
remove_partitions key, value
value
end

View File

@ -1,10 +1,8 @@
require "file_utils"
require "json"
require "cbor"
require "./indexer.cr"
# WARNING: this code hasn't been reviewed nor used in years.
class DODB::DirectedGraph(V) < DODB::Indexer(V)
property name : String
property key_proc : Proc(V, Array(String))
@ -80,7 +78,7 @@ class DODB::DirectedGraph(V) < DODB::Indexer(V)
return r_value unless Dir.exists? incoming_links_directory
Dir.each_child incoming_links_directory do |child|
r_value << V.from_json ::File.read "#{incoming_links_directory}/#{child}"
r_value << V.from_cbor ::File.read("#{incoming_links_directory}/#{child}").to_slice
end
r_value
@ -93,7 +91,7 @@ class DODB::DirectedGraph(V) < DODB::Indexer(V)
return r_value unless Dir.exists? incoming_links_directory
Dir.each_child incoming_links_directory do |child|
r_value << child.sub /.json$/, ""
r_value << child.sub /#{DODB.file_extension}$/, ""
end
r_value
@ -109,7 +107,7 @@ class DODB::DirectedGraph(V) < DODB::Indexer(V)
return r_value unless Dir.exists? outgoing_links_directory
Dir.each_child outgoing_links_directory do |child|
r_value << V.from_json ::File.read "#{outgoing_links_directory}/#{child}"
r_value << V.from_cbor ::File.read("#{outgoing_links_directory}/#{child}").to_slice
end
r_value
@ -122,7 +120,7 @@ class DODB::DirectedGraph(V) < DODB::Indexer(V)
return r_value unless Dir.exists? outgoing_links_directory
Dir.each_child outgoing_links_directory do |child|
r_value << child.sub /.json$/, ""
r_value << child.sub /#{DODB.file_extension}$/, ""
end
r_value
@ -134,7 +132,7 @@ class DODB::DirectedGraph(V) < DODB::Indexer(V)
private def get_key(path : String) : Int32
::File.readlink(path)
.sub(/\.json$/, "")
.sub(/#{DODB.file_extension}$/, "")
.sub(/^.*\//, "")
.to_i
end
@ -144,7 +142,7 @@ class DODB::DirectedGraph(V) < DODB::Indexer(V)
end
private def get_node_symlink(node : String, key : String)
"#{indexing_directory node}/#{key}.json"
"#{indexing_directory node}/#{key}#{DODB.file_extension}"
end
private def get_outgoing_links_directory(node)
@ -164,13 +162,13 @@ class DODB::DirectedGraph(V) < DODB::Indexer(V)
end
private def get_data_symlink(key : String)
"../../../../data/#{key}.json"
"../../../../data/#{key}#{DODB.file_extension}"
end
# Roughly matches Index#file_path_index, but works if @storage_root
# is an absolute path as well.
private def get_cross_index_data_symlink(node : String)
"../../../../indices/by_#{@index.name}/#{node}.json"
"../../../../indices/by_#{@index.name}/#{node}#{DODB.file_extension}"
end
end

View File

@ -1,4 +1,5 @@
require "file_utils"
require "cbor"
require "./exceptions.cr"
require "./indexer.cr"
@ -26,7 +27,7 @@ class DODB::Index(V) < DODB::Indexer(V)
return if symlink == file_path_index old_key.to_s
end
raise IndexOverload.new "index '#{@name}' is overloaded for key '#{key}', file #{symlink} exists"
raise IndexOverload.new "index '#{@name}' is overloaded for key '#{key}'"
end
end
@ -54,20 +55,15 @@ class DODB::Index(V) < DODB::Indexer(V)
symlink = file_path_index index_key
begin
::File.delete symlink
rescue File::NotFoundError
end
end
# Get the key (ex: 343) for an entry in the DB.
# Without caching, it translates to walk the file-system in `db/indices/by_#{name}/<index>`.
def get_key(index : String) : Int32
get_key_on_fs index
::File.delete symlink
end
def get(index : String) : V
@storage[get_key index]
file_path = file_path_index index
raise MissingEntry.new(@name, index) unless ::File.exists? file_path
V.from_cbor ::File.read(file_path).to_slice
end
def get?(index : String) : V?
@ -94,12 +90,15 @@ class DODB::Index(V) < DODB::Indexer(V)
yield nil
end
def get_key_on_fs(index : String) : Int32
def get_key(index : String) : Int32
file_path = file_path_index index
raise MissingEntry.new(@name, index) unless ::File.exists? file_path
::File.readlink(file_path).sub(/^.*\//, "").to_i
::File.readlink(file_path)
.sub(/#{DODB.file_extension}$/, "")
.sub(/^.*\//, "")
.to_i
end
def get_with_key(index : String) : Tuple(V, Int32)
@ -120,7 +119,7 @@ class DODB::Index(V) < DODB::Indexer(V)
end
def update(index : String, new_value : V)
key = get_key index
_, key = get_with_key index
@storage[key] = new_value
end
@ -143,60 +142,11 @@ class DODB::Index(V) < DODB::Indexer(V)
# FIXME: Now that its being used outside of this class, name it properly.
def file_path_index(index_key : String)
"#{indexing_directory}/#{index_key}"
"#{indexing_directory}/#{index_key}#{DODB.file_extension}"
end
private def get_data_symlink_index(key : String)
"../../data/#{key}"
"../../data/#{key}#{DODB.file_extension}"
end
end
class DODB::CachedIndex(V) < DODB::Index(V)
# This hash contains the relation between the index key and the data key.
property data = Hash(String, Int32).new
def check!(key, value, old_value)
index_key = key_proc.call value
# FIXME: Check its not pointing to “old_value”, if any, before raising.
if data[index_key]?
if old_value
old_key = key_proc.call old_value
return if index_key == old_key
end
raise IndexOverload.new "index '#{@name}' is overloaded for key '#{key}'"
end
end
def index(key, value)
super(key, value)
index_key = key_proc.call value
return if index_key.is_a? NoIndex
@data[index_key] = key.to_i
end
def deindex(key, value)
super(key, value)
index_key = key_proc.call value
return if index_key.is_a? NoIndex
@data.delete index_key
end
# Get the key (ex: 343) for an entry in the DB.
# With caching, the key is probably stored in a hash, or we'll search in the FS.
def get_key(index : String) : Int32
if k = @data[index]?
k
elsif k = get_key_on_fs(index)
@data[index] = k
k
else
raise MissingEntry.new(@name, index)
end
end
end

9
src/dodb/lib_c.cr Normal file
View File

@ -0,0 +1,9 @@
lib LibC
{% if flag?(:linux) %}
O_EXCL = 0o200
{% elsif flag?(:openbsd) %}
O_EXCL = 0x0800
{% end %}
end

View File

@ -1,4 +1,5 @@
require "file_utils"
require "cbor"
require "./indexer.cr"
@ -7,7 +8,6 @@ class DODB::Partition(V) < DODB::Indexer(V)
property key_proc : Proc(V, String)
getter storage_root : String
# Required to remove an entry in the DB.
@storage : DODB::Storage(V)
def initialize(@storage, @storage_root, @name, @key_proc)
@ -36,13 +36,10 @@ class DODB::Partition(V) < DODB::Indexer(V)
symlink = get_partition_symlink(partition, key)
begin
::File.delete symlink
rescue File::NotFoundError
end
::File.delete symlink
end
def get(partition) : Array(V)
def get(partition)
r_value = Array(V).new
partition_directory = indexing_directory partition
@ -50,18 +47,12 @@ class DODB::Partition(V) < DODB::Indexer(V)
return r_value unless Dir.exists? partition_directory
Dir.each_child partition_directory do |child|
r_value << @storage[get_key child]
r_value << V.from_cbor ::File.read("#{partition_directory}/#{child}").to_slice
end
r_value
end
def get?(partition) : Array(V)?
get partition
rescue MissingEntry
nil
end
def delete(partition)
delete partition, do true end
end
@ -72,10 +63,12 @@ class DODB::Partition(V) < DODB::Indexer(V)
return unless Dir.exists? partition_directory
Dir.each_child partition_directory do |child|
key = get_key child
item = @storage[key]
path = "#{partition_directory}/#{child}"
item = V.from_cbor ::File.read(path).to_slice
if yield item
key = get_key path
@storage.delete key
end
end
@ -86,7 +79,10 @@ class DODB::Partition(V) < DODB::Indexer(V)
end
private def get_key(path : String) : Int32
path.sub(/^.*\//, "").to_i
::File.readlink(path)
.sub(/#{DODB.file_extension}$/, "")
.sub(/^.*\//, "")
.to_i
end
private def indexing_directory(partition)
@ -94,60 +90,11 @@ class DODB::Partition(V) < DODB::Indexer(V)
end
private def get_partition_symlink(partition : String, key : String)
"#{indexing_directory partition}/#{key}"
"#{indexing_directory partition}/#{key}#{DODB.file_extension}"
end
private def get_data_symlink(key : String)
"../../../data/#{key}"
"../../../data/#{key}#{DODB.file_extension}"
end
end
class DODB::CachedPartition(V) < DODB::Partition(V)
# This hash contains the relation between the index key and the data keys.
property data = Hash(String, Array(Int32)).new
def index(key, value)
super(key, value)
partition = key_proc.call value
array = if v = @data[partition]?
v
else
Array(Int32).new
end
array << key.to_i
@data[partition] = array
end
def deindex(key, value)
super(key, value)
partition = key_proc.call value
if v = @data[partition]?
v.delete key.to_i
@data[partition] = v
end
end
def get(partition)
r_value = Array(Tuple(V, Int32)).new
if keys = @data[partition]?
keys.each do |data_key|
r_value << { @storage[data_key], data_key }
end
else
# Get the key from the database representation on the file-system.
partition_directory = indexing_directory partition
raise MissingEntry.new(@name, partition) unless Dir.exists? partition_directory
Dir.each_child partition_directory do |child|
r_value << { @storage[get_key child], get_key child }
end
@data[partition] = r_value.map &.[1]
end
r_value.map &.[0]
end
end

View File

@ -1,174 +1,113 @@
require "file_utils"
require "cbor"
class DODB::Tags(V) < DODB::Indexer(V)
property name : String
property key_proc : Proc(V, Array(String))
getter storage_root : String
# Required to remove an entry in the DB.
@storage : DODB::Storage(V)
def initialize(@storage, @storage_root, @name, @key_proc)
def initialize(@storage_root, @name, @key_proc)
::Dir.mkdir_p indexing_directory
end
# FIXME: The slow is damn too high.
def tag_combinations(tags)
combinations = [] of Array(String)
tags.size.times do |i|
combinations.concat tags.permutations (i+1)
end
return combinations
end
def index(key, value)
indices = key_proc.call(value).sort
tag_combinations(indices).each do |previous_indices|
# FIXME: Not on `index`, but on the list of all previous indices.
symdir = symlinks_directory previous_indices
otdir = other_tags_directory previous_indices
::Dir.mkdir_p symdir
::Dir.mkdir_p otdir
symlink = get_tagged_entry_path(key, previous_indices)
::File.delete symlink if ::File.exists? symlink
::File.symlink get_data_symlink(key, previous_indices), symlink
end
end
def deindex(key, value)
indices = key_proc.call(value).sort
tag_combinations(indices).each do |previous_indices|
# FIXME: Not on `index`, but on the list of all previous indices.
symdir = symlinks_directory previous_indices
otdir = other_tags_directory previous_indices
::Dir.mkdir_p symdir
::Dir.mkdir_p otdir
symlink = get_tagged_entry_path(key, previous_indices)
::File.delete symlink if ::File.exists? symlink
# FIXME: Remove directories if empty?
end
end
def check!(key, value, old_value)
return true # Tags dont have collisions or overloads.
end
def index(key, value)
indices = key_proc.call(value)
indices.each do |i|
symlink = get_tagged_entry_path(i, key)
Dir.mkdir_p ::File.dirname symlink
# FIXME: Should not happen anymore. Should we remove this?
::File.delete symlink if ::File.exists? symlink
::File.symlink get_data_symlink(key), symlink
end
end
def deindex(key, value)
indices = key_proc.call(value)
indices.each do |i|
symlink = get_tagged_entry_path(i, key)
begin
::File.delete symlink
rescue File::NotFoundError
end
end
end
def get_with_indice(tag : String) : Array(Tuple(V, Int32))
r_value = Array(Tuple(V, Int32)).new
tag_directory = indexing_directory tag
return r_value unless Dir.exists? tag_directory
Dir.each_child tag_directory do |child|
key = get_key child
r_value << { @storage[key], key }
end
r_value
def get_with_indices(key : String) : Array(Tuple(V, Int32))
get_with_indices [key]
end
def get_with_indices(keys : Array(String)) : Array(Tuple(V, Int32))
r_value = Array(Tuple(V, Int32)).new
keys.each do |tag|
r_value.concat get_with_indice tag
partition_directory = symlinks_directory keys
return r_value unless Dir.exists? partition_directory
Dir.each_child partition_directory do |child|
r_value << {
V.from_cbor(::File.read("#{partition_directory}/#{child}").to_slice),
File.basename(child).gsub(/#{DODB.file_extension}$/, "").to_i
}
end
r_value
end
def get(tag : String) : Array(V)
get_with_indice(tag).map &.[0]
end
def get?(tag : String) : Array(V)?
get tag
rescue MissingEntry
nil
def get(key : String) : Array(V)
get_with_indices(key).map &.[0]
end
def get(keys : Array(String)) : Array(V)
get_with_indices(keys.sort).map &.[0]
end
def delete(tag)
delete tag, do true end
end
def delete(tag, &matcher)
tag_directory = indexing_directory tag
return unless Dir.exists? tag_directory
Dir.each_child tag_directory do |child|
key = get_key child
item = @storage[key]
if yield item
@storage.delete key
end
end
end
private def get_key(path : String) : Int32
path.sub(/^.*\//, "").to_i
end
def indexing_directory : String
"#{@storage_root}/tags/by_#{@name}"
end
private def indexing_directory(tag)
"#{indexing_directory}/#{tag}"
private def symlinks_directory(previous_indices : Array(String))
"#{indexing_directory}#{previous_indices.map { |i| "/other-tags/#{i}" }.join}/data"
end
private def other_tags_directory(previous_indices : Array(String))
"#{indexing_directory}#{previous_indices.map { |i| "/other-tags/#{i}" }.join}/other-tags"
end
private def get_tagged_entry_path(tag : String, key : String)
"#{indexing_directory}/#{tag}/#{key}"
private def get_tagged_entry_path(key : String, indices : Array(String))
"#{indexing_directory}#{indices.map { |i| "/other-tags/#{i}" }.join}/data/#{key}#{DODB.file_extension}"
end
private def get_data_symlink(key : String)
"../../../data/#{key}"
private def get_data_symlink(key : String, indices : Array(String))
"../../../#{indices.map { "../../" }.join}/data/#{key}#{DODB.file_extension}"
end
end
class DODB::CachedTags(V) < DODB::Tags(V)
# This hash contains the relation between the index key and the data keys.
property data = Hash(String, Array(Int32)).new
def index(key, value)
super(key, value)
indices = key_proc.call value
indices.each do |tag|
array = if v = @data[tag]?
v
else
Array(Int32).new
end
array << key.to_i
@data[tag] = array
end
end
def deindex(key, value)
super(key, value)
indices = key_proc.call value
indices.each do |tag|
if v = @data[tag]?
v.delete key.to_i
@data[tag] = v
end
end
end
def get_with_indice(tag : String) : Array(Tuple(V, Int32))
r_value = Array(Tuple(V, Int32)).new
if keys = @data[tag]?
keys.each do |data_key|
r_value << { @storage[data_key], data_key }
end
else
# Get the key from the database representation on the file-system.
tag_directory = indexing_directory tag
raise MissingEntry.new(@name, tag) unless Dir.exists? tag_directory
Dir.each_child tag_directory do |child|
r_value << { @storage[get_key child], get_key child }
end
@data[tag] = r_value.map &.[1]
end
r_value
end
end