378 lines
12 KiB
Markdown
378 lines
12 KiB
Markdown
# dodb.cr
|
||
|
||
DODB stands for Document Oriented DataBase.
|
||
|
||
## Objective
|
||
|
||
The objective is to get rid of DBMS when storing simple files directly on the file-system is enough.
|
||
|
||
## Overview
|
||
|
||
A brief summary:
|
||
- no SQL
|
||
- objects are serialized (currently in JSON)
|
||
- indexes (simple symlinks on the FS) can be created to improve significantly searches in the db
|
||
- db is fully integrated in the language (basically a simple array with a few more functions)
|
||
|
||
Also, data can be `cached`.
|
||
The entire base will be kept in memory (if you can), enabling incredible speeds.
|
||
|
||
## Limitations
|
||
|
||
DODB doesn't fully handle ACID properties:
|
||
|
||
- no *atomicity*, you can't chain instructions and rollback if one of them fails ;
|
||
- no *consistency*, there is currently no mechanism to prevent adding invalid values ;
|
||
|
||
*Isolation* is partially taken into account, with a locking mechanism preventing a few race conditions.
|
||
FYI, in my projects the database is fast enough so I don't even need parallelism (like, by far).
|
||
|
||
*Durability* is taken into account.
|
||
Data is written on-disk each time it changes.
|
||
|
||
**NOTE:** what I need is mostly there.
|
||
What DODB doesn't provide, I hack it in a few lines in my app.
|
||
DODB will provide some form of atomicity and consistency at some point, but nothing fancy nor too advanced.
|
||
The whole point is to keep it simple.
|
||
|
||
## Speed
|
||
|
||
Since DODB doesn't use SQL and doesn't even try to handle stuff like atomicity or consistency, speed is great.
|
||
Reading data from disk takes about a few dozen microseconds, and not much more when searching an indexed data.
|
||
|
||
**On my more-than-decade-old, slow-as-fuck machine**, the simplest possible SQL request to Postgres takes about 100 to 900 microseconds.
|
||
With DODB, to reach on-disk data: 13 microseconds.
|
||
To search then retrieve indexed data: almost the same thing, 16 microseconds on average, since it's just a path to a symlink we have to build.
|
||
|
||
With the `cached` version of DODB, there is not even deserialization happening, so 7 nanoseconds.
|
||
|
||
Indexes (indexes, partitions and tags) are also cached **by default**.
|
||
The speed up is great compared to the uncached version since you won't walk the file-system.
|
||
Searching an index takes about 35 nanoseconds when cached.
|
||
To avoid the memory cost of cached indexes, you can explicitely ask for uncached ones.
|
||
|
||
**NOTE:** of course SQL and DODB cannot be fairly compared based on performance since they don't have the same properties.
|
||
But still, this is the kind of speed you can get with the tool.
|
||
|
||
# Installation
|
||
|
||
Add the following to your `shard.yml`.
|
||
You may want to add version informations to avoid unexpected breakages.
|
||
|
||
```yaml
|
||
dependencies:
|
||
dodb:
|
||
git: https://git.baguette.netlib.re/Baguette/dodb.cr
|
||
```
|
||
|
||
|
||
# Basic usage
|
||
|
||
```crystal
|
||
# Database creation
|
||
db = DODB::DataBase(Thing).new "path/to/storage/directory"
|
||
|
||
# Adding an element to the db
|
||
db << Thing.new
|
||
|
||
# Reaching all objects in the db
|
||
db.each do |thing|
|
||
pp! thing
|
||
end
|
||
```
|
||
|
||
# Basic API
|
||
|
||
## Create the database
|
||
|
||
The DB creation is simply creating a few directories on the file-system.
|
||
|
||
```crystal
|
||
db = DODB::DataBase(Thing).new "path/to/storage/directory"
|
||
```
|
||
|
||
## Adding a new object
|
||
|
||
```crystal
|
||
db << Thing.new
|
||
```
|
||
|
||
## Sorting the objects
|
||
|
||
To speed-up searches in the DB, we can sort them, based on their attributes for example.
|
||
There are 3 sorting methods:
|
||
- index, 1-1 relations, an attribute value is bound to a single object (an identifier)
|
||
- partition, 1-n relations, an attribute value may be related to several objects (the color of a car, for instance)
|
||
- tags, n-n relations, each object may have several tags, each tag may be related to several objects
|
||
|
||
Let's take an example.
|
||
```Crystal
|
||
require "uuid"
|
||
|
||
class Car
|
||
include JSON::Serializable
|
||
property id : String
|
||
property color : String
|
||
property keywords : Array(String)
|
||
|
||
def initialize(@color, @keywords)
|
||
@id = UUID.random.to_s
|
||
end
|
||
end
|
||
```
|
||
|
||
We want to store `cars` in a database and index them on their `id` attribute:
|
||
```Crystal
|
||
cars = DODB::DataBase(Car).new "path/to/storage/directory"
|
||
|
||
# We give a name to the index, then the code to extract the id from a Car instance
|
||
cars_by_id = cars.new_index "id", &.id
|
||
```
|
||
|
||
After adding a few objects in the database, here the index in action on the file-system:
|
||
|
||
```sh
|
||
$ tree storage/
|
||
storage
|
||
├── data
|
||
│ ├── 0000000000
|
||
│ ├── 0000000001
|
||
│ ├── 0000000002
|
||
│ ├── 0000000003
|
||
│ ├── 0000000004
|
||
│ └── 0000000005
|
||
├── indices
|
||
│ └── by_id
|
||
│ ├── 6e109b82-25de-4250-9c67-e7e8415ad5a7 -> ../../data/0000000003
|
||
│ ├── 2080131b-97d7-4300-afa9-55b93cdfd124 -> ../../data/0000000000
|
||
│ ├── 2118bf1c-e413-4658-b8c1-a08925e20945 -> ../../data/0000000005
|
||
│ ├── b53fab8e-f394-49ef-b939-8a670abe278b -> ../../data/0000000004
|
||
│ ├── 7e918680-6bc2-4f29-be7e-3d2e9c8e228c -> ../../data/0000000002
|
||
│ └── 8b4e83e3-ef95-40dc-a6e5-e6e697ce6323 -> ../../data/0000000001
|
||
```
|
||
|
||
We have 5 objects in the DB, each of them has a unique ID attribute, each attribute is related to a single object.
|
||
Getting an object by its ID is as simple as `cat storage/indices/by_id/<id>`.
|
||
|
||
|
||
Now we want to sort cars based on their `color` attribute.
|
||
This time, we use a `partition`, because the relation between the attribute (color) and the object (car) is `1-n`:
|
||
```Crystal
|
||
cars_by_colors = cars.new_partition "color", &.color
|
||
```
|
||
|
||
On the file-system, this translates to:
|
||
```sh
|
||
$ tree storage/
|
||
...
|
||
├── partitions
|
||
│ └── by_color
|
||
│ ├── blue
|
||
│ │ ├── 0000000000 -> ../../../data/0000000000
|
||
│ │ └── 0000000004 -> ../../../data/0000000004
|
||
│ ├── red
|
||
│ │ ├── 0000000001 -> ../../../data/0000000001
|
||
│ │ ├── 0000000002 -> ../../../data/0000000002
|
||
│ │ └── 0000000003 -> ../../../data/0000000003
|
||
│ └── violet
|
||
│ └── 0000000005 -> ../../../data/0000000005
|
||
```
|
||
|
||
Now the attribute corresponds to a directory (blue, red, violet, etc.) containing a symlink for each related object.
|
||
|
||
Finally, we want to sort cars based on the `keywords` attribute.
|
||
This is a n-n relation, each car may have several keywords, each keyword may be related to several cars.
|
||
```Crystal
|
||
cars_by_keyword = cars.new_tags "keyword", &.keywords
|
||
```
|
||
|
||
On the file-system, this translates to:
|
||
```sh
|
||
$ tree storage/
|
||
...
|
||
└── tags
|
||
└── by_keyword
|
||
├── elegant
|
||
│ ├── 0000000000 -> ../../../data/0000000000
|
||
│ └── 0000000003 -> ../../../data/0000000003
|
||
├── impressive
|
||
│ ├── 0000000000 -> ../../../data/0000000000
|
||
│ ├── 0000000001 -> ../../../data/0000000001
|
||
│ └── 0000000003 -> ../../../data/0000000003
|
||
...
|
||
```
|
||
Tags are very similar to partitions and are used the exact same way for search, update and deletion.
|
||
|
||
## Updating an object
|
||
|
||
In our last example we had a `Car` class, we stored its instances in `cars` and we could identify each instance by its `id` with the index `cars_by_id`.
|
||
Now, we want to update a car:
|
||
```Crystal
|
||
# we find a car we want to modify
|
||
car = cars_by_id.get "86a07924-ab3a-4f46-a975-e9803acba22d"
|
||
|
||
# we modify it
|
||
car.color = "Blue"
|
||
|
||
# update
|
||
# simple case: no change in the index
|
||
cars_by_id.update car
|
||
# otherwise
|
||
car.id = "something-else-than-before"
|
||
cars_by_id.update "86a07924-ab3a-4f46-a975-e9803acba22d", car
|
||
```
|
||
|
||
Or, in the case the object may not yet exist:
|
||
```Crystal
|
||
cars_by_id.update_or_create car.id, car
|
||
|
||
# Search by partitions: all blue cars.
|
||
pp! cars_by_color.get "blue"
|
||
|
||
# Search by tags: all elegant cars.
|
||
pp! cars_by_keyword.get "elegant"
|
||
```
|
||
|
||
Changing a value that is related to a partition or a tag will automatically do what you would expect: de-index then re-index.
|
||
You won't find yourself with a bunch of invalid symbolic links all over the place.
|
||
|
||
## Removing an object
|
||
|
||
```Crystal
|
||
# Remove a value based on an index.
|
||
cars_by_id.delete "86a07924-ab3a-4f46-a975-e9803acba22d"
|
||
|
||
# Remove a value based on a partition.
|
||
cars_by_color.delete "red"
|
||
cars_by_color.delete "blue", do |car|
|
||
car.keywords.empty
|
||
end
|
||
|
||
# Remove a value based on a tag.
|
||
cars_by_keyword.delete "shiny"
|
||
cars_by_keyword.delete "elegant", do |car|
|
||
car.name == "GTI"
|
||
end
|
||
```
|
||
|
||
In this code snippet, we apply a function on blue cars only;
|
||
and blue cars are only removed if they don't have any associated keywords.
|
||
Same thing for elegant cars.
|
||
This represents a performance boost compared to applying the function on all the cars.
|
||
|
||
# Complete example
|
||
|
||
```Crystal
|
||
require "dodb"
|
||
|
||
# First, we define what we’ll want to store.
|
||
# It *has* to be serializable through JSON, everything in DODB is stored in JSON directly on the file-system.
|
||
class Car
|
||
include JSON::Serializable
|
||
|
||
property name : String # unique to each instance (1-1 relations)
|
||
property color : String # a simple attribute (1-n relations)
|
||
property keywords : Array(String) # tags about a car, example: "shiny" (n-n relations)
|
||
|
||
def initialize(@name, @color, @keywords)
|
||
end
|
||
end
|
||
|
||
#####################
|
||
# Database creation #
|
||
#####################
|
||
|
||
cars = DODB::DataBase(Car).new "./bin/storage"
|
||
|
||
|
||
##########################
|
||
# Database configuration #
|
||
##########################
|
||
|
||
# There are several ways to index things in DODB.
|
||
|
||
# We give a name to the index, then the code to extract the name from a Car instance
|
||
# (1-1 relations: in this example, names are indexes = they are UNIQUE identifiers)
|
||
cars_by_name = cars.new_index "name", &.name
|
||
|
||
# We want quick searches for cars based on their color
|
||
# (1-n relations: a car only has one color, but a color may refer to many cars)
|
||
cars_by_color = cars.new_partition "color", &.color
|
||
|
||
# We also want to search cars on their keywords
|
||
# (n-n relations: a car may be described with many keywords and a keyword may be applied to many cars)
|
||
cars_by_keyword = cars.new_tags "keyword", &.keywords
|
||
|
||
|
||
##########
|
||
# Adding #
|
||
##########
|
||
|
||
cars << Car.new "Corvet", "red", [ "shiny", "impressive", "fast", "elegant" ]
|
||
cars << Car.new "SUV", "red", [ "solid", "impressive" ]
|
||
cars << Car.new "Mustang", "red", [ "shiny", "impressive", "elegant" ]
|
||
cars << Car.new "Bullet-GT", "red", [ "shiny", "impressive", "fast", "elegant" ]
|
||
cars << Car.new "GTI", "blue", [ "average" ]
|
||
cars << Car.new "Deudeuch", "violet", [ "dirty", "slow", "only French will understand" ]
|
||
|
||
# The DB can be accessed as a simple array
|
||
cars.each do |car|
|
||
pp! car
|
||
end
|
||
|
||
|
||
################
|
||
# Searching... #
|
||
################
|
||
|
||
# based on an index (print the only car named "Corvet")
|
||
pp! cars_by_name.get "Corvet"
|
||
|
||
# based on a partition (print all red cars)
|
||
pp! cars_by_color.get "red"
|
||
|
||
# based on a tag (print all fast cars)
|
||
pp! cars_by_keyword.get "fast"
|
||
|
||
|
||
############
|
||
# Updating #
|
||
############
|
||
|
||
car = cars_by_name.get "Corvet"
|
||
car.color = "blue"
|
||
cars_by_name.update car
|
||
|
||
car = cars_by_name.get "Bullet-GT"
|
||
car.name = "Not-So-Fast-Bullet-GT"
|
||
cars_by_name.update "Bullet-GT", car # the name changed
|
||
|
||
# we have a car
|
||
# and add it to the DB, not knowing in advance if it was already there
|
||
car = Car.new "Mustang", "red", [] of String
|
||
cars_by_name.update_or_create car.name, car
|
||
|
||
# We all know it, elegant cars are also expensive.
|
||
cars_by_keyword.get("elegant").each do |car|
|
||
car.keywords << "expensive"
|
||
cars_by_name.update car.name, car
|
||
end
|
||
|
||
###############
|
||
# Deleting... #
|
||
###############
|
||
|
||
# based on a name
|
||
cars_by_name.delete "Deudeuch"
|
||
|
||
# based on a color
|
||
cars_by_color.delete "red"
|
||
# based on a color (but not only)
|
||
cars_by_color.delete "blue", &.name.==("GTI")
|
||
|
||
# based on a keyword
|
||
cars_by_keyword.delete "solid"
|
||
# based on a keyword (but not only)
|
||
cars_by_keyword.delete "fast", &.name.==("Corvet")
|
||
```
|