2020-01-02 12:14:59 +01:00
# dodb.cr
DODB stands for Document Oriented DataBase.
2020-04-08 12:36:41 +02:00
## Objective
The objective is to get rid of DBMS when storing simple files directly on the file-system is enough.
## Overview
A brief summary:
- no SQL
- objects are serialized (currently in JSON)
2020-04-11 19:12:57 +02:00
- indexes (simple symlinks on the FS) can be created to improve significantly searches in the db
2020-04-08 12:36:41 +02:00
## Limitations
**TODO**: speed tests, elaborate on the matter.
DODB is not compatible with projects:
2020-04-08 14:08:37 +02:00
- having an absolute priority on speed,
however, DODB is efficient in most cases with the right indexes.
2020-04-08 12:36:41 +02:00
- having relational data
2020-01-02 12:14:59 +01:00
# Installation
Add the following to your `shard.yml` .
You may want to add version informations to avoid unexpected breakages.
```yaml
dependencies:
dodb:
2020-05-01 13:09:54 +02:00
git: https://git.baguette.netlib.re/Baguette/dodb.cr
2020-01-02 12:14:59 +01:00
```
2020-04-08 12:36:41 +02:00
# Basic usage
2020-01-02 12:14:59 +01:00
```crystal
2020-04-08 12:36:41 +02:00
# Database creation
2020-01-04 16:05:01 +01:00
db = DODB::DataBase(Thing).new "path/to/storage/directory"
2020-01-02 12:14:59 +01:00
2020-04-08 12:36:41 +02:00
# Adding an element to the db
2020-01-02 12:14:59 +01:00
db < < Thing.new
2020-04-08 12:36:41 +02:00
# Reaching all objects in the db
2020-01-02 12:14:59 +01:00
db.each do |thing|
pp! thing
end
```
2020-04-08 12:36:41 +02:00
# Basic API
## Create the database
The DB creation is simply creating a few directories on the file-system.
```crystal
db = DODB::DataBase(Thing).new "path/to/storage/directory"
```
## Adding a new object
2020-01-02 12:14:59 +01:00
```crystal
2020-04-08 12:36:41 +02:00
db < < Thing.new
```
## Sorting the objects
To speed-up searches in the DB, we can sort them, based on their attributes for example.
There are 3 sorting methods:
- index, 1-1 relations, an attribute value is bound to a single object (an identifier)
- partition, 1-n relations, an attribute value may be related to several objects (the color of a car, for instance)
- tags, n-n relations, each object may have several tags, each tag may be related to several objects
2020-01-02 12:14:59 +01:00
2020-04-08 12:36:41 +02:00
Let's take an example.
```Crystal
require "uuid"
2020-01-02 12:14:59 +01:00
2020-04-08 12:36:41 +02:00
class Car
include JSON::Serializable
2020-04-11 19:12:57 +02:00
property id : String
property color : String
property keywords : Array(String)
2020-04-08 12:36:41 +02:00
2020-04-11 19:12:57 +02:00
def initialize(@color, @keywords )
2020-04-08 12:36:41 +02:00
@id = UUID.random.to_s
2020-01-02 12:14:59 +01:00
end
end
2020-04-08 12:36:41 +02:00
```
2020-01-02 12:14:59 +01:00
2020-04-08 12:36:41 +02:00
We want to store `cars` in a database and index them on their `id` attribute:
```Crystal
cars = DODB::DataBase(Car).new "path/to/storage/directory"
2020-01-02 12:14:59 +01:00
2020-04-08 12:36:41 +02:00
# We give a name to the index, then the code to extract the id from a Car instance
cars_by_id = cars.new_index "id", & .id
```
After adding a few objects in the database, here the index in action on the file-system:
2020-01-02 12:14:59 +01:00
2020-04-08 12:36:41 +02:00
```sh
$ tree storage/
storage
├── data
│ ├── 0000000000.json
│ ├── 0000000001.json
│ ├── 0000000002.json
│ ├── 0000000003.json
│ ├── 0000000004.json
│ └── 0000000005.json
├── indices
│ └── by_id
│ ├── 6e109b82-25de-4250-9c67-e7e8415ad5a7.json -> ../../data/0000000003.json
│ ├── 2080131b-97d7-4300-afa9-55b93cdfd124.json -> ../../data/0000000000.json
│ ├── 2118bf1c-e413-4658-b8c1-a08925e20945.json -> ../../data/0000000005.json
│ ├── b53fab8e-f394-49ef-b939-8a670abe278b.json -> ../../data/0000000004.json
│ ├── 7e918680-6bc2-4f29-be7e-3d2e9c8e228c.json -> ../../data/0000000002.json
│ └── 8b4e83e3-ef95-40dc-a6e5-e6e697ce6323.json -> ../../data/0000000001.json
```
2020-01-02 12:14:59 +01:00
2020-04-11 19:12:57 +02:00
We have 5 objects in the DB, each of them has a unique ID attribute, each attribute is related to a single object.
Getting an object by its ID is as simple as `cat storage/indices/by_id/<id>.json` .
2020-04-08 12:36:41 +02:00
Now we want to sort cars based on their `color` attribute.
This time, we use a `partition` , because the relation between the attribute (color) and the object (car) is `1-n` :
```Crystal
2020-04-11 19:12:57 +02:00
cars_by_colors = cars.new_partition "color", & .color
2020-04-08 12:36:41 +02:00
```
On the file-system, this translates to:
```sh
$ tree storage/
...
├── partitions
│ └── by_color
│ ├── blue
│ │ ├── 0000000000.json -> ../../../data/0000000000.json
│ │ └── 0000000004.json -> ../../../data/0000000004.json
│ ├── red
│ │ ├── 0000000001.json -> ../../../data/0000000001.json
│ │ ├── 0000000002.json -> ../../../data/0000000002.json
│ │ └── 0000000003.json -> ../../../data/0000000003.json
│ └── violet
│ └── 0000000005.json -> ../../../data/0000000005.json
```
2020-01-02 12:14:59 +01:00
2020-04-11 19:12:57 +02:00
Now the attribute corresponds to a directory (blue, red, violet, etc.) containing a symlink for each related object.
2020-01-02 12:14:59 +01:00
2020-04-08 12:36:41 +02:00
Finally, we want to sort cars based on the `keywords` attribute.
This is a n-n relation, each car may have several keywords, each keyword may be related to several cars.
```Crystal
cars_by_keyword = cars.new_tags "keyword", & .keywords
```
On the file-system, this translates to:
```sh
$ tree storage/
...
└── tags
└── by_keyword
2024-04-26 23:28:36 +02:00
├── elegant
│ ├── 0000000000.json -> ../../../data/0000000000.json
│ └── 0000000003.json -> ../../../data/0000000003.json
├── impressive
│ ├── 0000000000.json -> ../../../data/0000000000.json
│ ├── 0000000001.json -> ../../../data/0000000001.json
│ └── 0000000003.json -> ../../../data/0000000003.json
2020-04-08 12:36:41 +02:00
...
```
2024-04-26 23:28:36 +02:00
Tags are very similar to partitions and are used the exact same way for search, update and deletion.
2020-04-08 12:36:41 +02:00
## Updating an object
2020-04-11 19:12:57 +02:00
In our last example we had a `Car` class, we stored its instances in `cars` and we could identify each instance by its `id` with the index `cars_by_id` .
2020-04-08 12:36:41 +02:00
Now, we want to update a car:
```Crystal
# we find a car we want to modify
car = cars_by_id "86a07924-ab3a-4f46-a975-e9803acba22d"
# we modify it
car.color = "Blue"
# update
2020-04-20 15:32:55 +02:00
# simple case: no change in the index
cars_by_id.update car
# otherwise
car.id = "something-else-than-before"
cars_by_id.update "86a07924-ab3a-4f46-a975-e9803acba22d", car
2020-04-08 12:36:41 +02:00
```
2020-01-02 12:14:59 +01:00
2020-04-08 12:36:41 +02:00
Or, in the case the object may not yet exist:
```Crystal
2020-04-11 19:12:57 +02:00
cars_by_id.update_or_create car.id, car
2024-04-26 23:28:36 +02:00
# Search by partitions: all blue cars.
pp! cars_by_color.get "blue"
# Search by tags: all elegant cars.
pp! cars_by_keyword.get "elegant"
2020-01-02 12:14:59 +01:00
```
2024-04-26 23:28:36 +02:00
Changing a value that is related to a partition or a tag will automatically do what you would expect: de-index then re-index.
You won't find yourself with a bunch of invalid symbolic links all over the place.
2020-04-08 12:36:41 +02:00
## Removing an object
```Crystal
2024-04-26 23:28:36 +02:00
# Remove a value based on an index.
2020-04-11 19:12:57 +02:00
cars_by_id.delete "86a07924-ab3a-4f46-a975-e9803acba22d"
2020-04-08 12:36:41 +02:00
2024-04-26 23:28:36 +02:00
# Remove a value based on a partition.
2020-04-11 19:12:57 +02:00
cars_by_color.delete "red"
2024-04-26 23:28:36 +02:00
cars_by_color.delete "blue", do |car|
2020-04-11 19:12:57 +02:00
car.keywords.empty
2020-04-08 12:36:41 +02:00
end
2024-04-26 23:28:36 +02:00
# Remove a value based on a tag.
cars_by_keyword.delete "shiny"
cars_by_keyword.delete "elegant", do |car|
car.name == "GTI"
end
2020-04-08 12:36:41 +02:00
```
2024-04-26 23:28:36 +02:00
In this code snippet, we apply a function on blue cars only;
and blue cars are only removed if they don't have any associated keywords.
Same thing for elegant cars.
2020-04-11 19:12:57 +02:00
This represents a performance boost compared to applying the function on all the cars.
2020-04-08 12:36:41 +02:00
# Complete example
```Crystal
require "dodb"
# First, we define what we’ ll want to store.
2020-04-08 12:40:58 +02:00
# It *has* to be serializable through JSON, everything in DODB is stored in JSON directly on the file-system.
2020-04-08 12:36:41 +02:00
class Car
include JSON::Serializable
property name : String # unique to each instance (1-1 relations)
property color : String # a simple attribute (1-n relations)
property keywords : Array(String) # tags about a car, example: "shiny" (n-n relations)
def initialize(@name, @color , @keywords )
end
end
#####################
# Database creation #
#####################
cars = DODB::DataBase(Car).new "./bin/storage"
##########################
# Database configuration #
##########################
# There are several ways to index things in DODB.
# We give a name to the index, then the code to extract the name from a Car instance
# (1-1 relations: in this example, names are indexes = they are UNIQUE identifiers)
cars_by_name = cars.new_index "name", & .name
# We want quick searches for cars based on their color
# (1-n relations: a car only has one color, but a color may refer to many cars)
cars_by_color = cars.new_partition "color", & .color
# We also want to search cars on their keywords
# (n-n relations: a car may be described with many keywords and a keyword may be applied to many cars)
cars_by_keyword = cars.new_tags "keyword", & .keywords
##########
# Adding #
##########
cars < < Car.new " Corvet " , " red " , [ " shiny " , " impressive " , " fast " , " elegant " ]
cars < < Car.new " SUV " , " red " , [ " solid " , " impressive " ]
cars < < Car.new " Mustang " , " red " , [ " shiny " , " impressive " , " elegant " ]
cars < < Car.new " Bullet-GT " , " red " , [ " shiny " , " impressive " , " fast " , " elegant " ]
cars < < Car.new " GTI " , " blue " , [ " average " ]
2020-04-11 19:12:57 +02:00
cars < < Car.new " Deudeuch " , " violet " , [ " dirty " , " slow " , " only French will understand " ]
2020-04-08 12:36:41 +02:00
# The DB can be accessed as a simple array
cars.each do |car|
pp! car
end
################
# Searching... #
################
# based on an index (print the only car named "Corvet")
pp! cars_by_name.get "Corvet"
# based on a partition (print all red cars)
pp! cars_by_color.get "red"
2024-04-26 23:28:36 +02:00
# based on a tag (print all fast cars)
2020-04-08 12:36:41 +02:00
pp! cars_by_keyword.get "fast"
############
# Updating #
############
car = cars_by_name.get "Corvet"
car.color = "blue"
2020-04-20 15:32:55 +02:00
cars_by_name.update car
car = cars_by_name.get "Bullet-GT"
car.name = "Not-So-Fast-Bullet-GT"
cars_by_name.update "Bullet-GT", car # the name changed
2020-04-08 12:36:41 +02:00
# we have a car
# and add it to the DB, not knowing in advance if it was already there
car = Car.new "Mustang", "red", [] of String
cars_by_name.update_or_create car.name, car
2024-04-26 23:28:36 +02:00
# We all know it, elegant cars are also expensive.
cars_by_keyword.get("elegant").each do |car|
car.keywords < < "expensive"
cars_by_name.update car.name, car
end
2020-04-08 12:36:41 +02:00
###############
# Deleting... #
###############
# based on a name
cars_by_name.delete "Deudeuch"
2020-04-11 19:12:57 +02:00
# based on a color
2020-04-08 12:36:41 +02:00
cars_by_color.delete "red"
2020-04-11 19:12:57 +02:00
# based on a color (but not only)
2020-04-08 12:36:41 +02:00
cars_by_color.delete "blue", & .name.==("GTI")
2024-04-26 23:28:36 +02:00
# based on a keyword
cars_by_keyword.delete "solid"
# based on a keyword (but not only)
cars_by_keyword.delete "fast", & .name.==("Corvet")
2020-04-08 12:36:41 +02:00
```