Document-oriented DataBase. A full Crystal database management without SQL, storing data in plain files.

Go to file

Luka Vandervelden 909eb776e3 Fixes an indexing issue in DirectedGraph. The issue was linked to absolute paths being prepended.		2020-10-29 14:27:37 +01:00
spec	Merge branch 'master' + DODB::CachedDataBase.	2020-07-21 13:25:48 +02:00
src	Fixes an indexing issue in DirectedGraph.	2020-10-29 14:27:37 +01:00
README.md	Readme: s/old-git/baguette/	2020-05-01 13:09:54 +02:00
shard.yml	index#update(new_value) : index is automatically retrieved.	2020-04-19 18:18:53 +02:00

README.md

dodb.cr

DODB stands for Document Oriented DataBase.

Objective

The objective is to get rid of DBMS when storing simple files directly on the file-system is enough.

Overview

A brief summary:

no SQL
objects are serialized (currently in JSON)
indexes (simple symlinks on the FS) can be created to improve significantly searches in the db

Limitations

TODO: speed tests, elaborate on the matter.

DODB is not compatible with projects:

having an absolute priority on speed, however, DODB is efficient in most cases with the right indexes.
having relational data

Installation

Add the following to your shard.yml. You may want to add version informations to avoid unexpected breakages.

dependencies:
    dodb:
        git: https://git.baguette.netlib.re/Baguette/dodb.cr

Basic usage

# Database creation
db = DODB::DataBase(Thing).new "path/to/storage/directory"

# Adding an element to the db
db << Thing.new

# Reaching all objects in the db
db.each do |thing|
	pp! thing
end

Basic API

Create the database

The DB creation is simply creating a few directories on the file-system.

db = DODB::DataBase(Thing).new "path/to/storage/directory"

Adding a new object

db << Thing.new

Sorting the objects

To speed-up searches in the DB, we can sort them, based on their attributes for example. There are 3 sorting methods:

index, 1-1 relations, an attribute value is bound to a single object (an identifier)
partition, 1-n relations, an attribute value may be related to several objects (the color of a car, for instance)
tags, n-n relations, each object may have several tags, each tag may be related to several objects

Let's take an example.

require "uuid"

class Car
	include JSON::Serializable
	property id       : String
	property color    : String
	property keywords : Array(String)

	def initialize(@color, @keywords)
		@id = UUID.random.to_s
	end
end

We want to store cars in a database and index them on their id attribute:

cars = DODB::DataBase(Car).new "path/to/storage/directory"

# We give a name to the index, then the code to extract the id from a Car instance
cars_by_id = cars.new_index "id", &.id

After adding a few objects in the database, here the index in action on the file-system:

$ tree storage/
storage
├── data
│   ├── 0000000000.json
│   ├── 0000000001.json
│   ├── 0000000002.json
│   ├── 0000000003.json
│   ├── 0000000004.json
│   └── 0000000005.json
├── indices
│   └── by_id
│       ├── 6e109b82-25de-4250-9c67-e7e8415ad5a7.json -> ../../data/0000000003.json
│       ├── 2080131b-97d7-4300-afa9-55b93cdfd124.json -> ../../data/0000000000.json
│       ├── 2118bf1c-e413-4658-b8c1-a08925e20945.json -> ../../data/0000000005.json
│       ├── b53fab8e-f394-49ef-b939-8a670abe278b.json -> ../../data/0000000004.json
│       ├── 7e918680-6bc2-4f29-be7e-3d2e9c8e228c.json -> ../../data/0000000002.json
│       └── 8b4e83e3-ef95-40dc-a6e5-e6e697ce6323.json -> ../../data/0000000001.json

We have 5 objects in the DB, each of them has a unique ID attribute, each attribute is related to a single object. Getting an object by its ID is as simple as cat storage/indices/by_id/<id>.json.

Now we want to sort cars based on their color attribute. This time, we use a partition, because the relation between the attribute (color) and the object (car) is 1-n:

cars_by_colors = cars.new_partition "color", &.color

On the file-system, this translates to:

$ tree storage/
...
├── partitions
│   └── by_color
│       ├── blue
│       │   ├── 0000000000.json -> ../../../data/0000000000.json
│       │   └── 0000000004.json -> ../../../data/0000000004.json
│       ├── red
│       │   ├── 0000000001.json -> ../../../data/0000000001.json
│       │   ├── 0000000002.json -> ../../../data/0000000002.json
│       │   └── 0000000003.json -> ../../../data/0000000003.json
│       └── violet
│           └── 0000000005.json -> ../../../data/0000000005.json

Now the attribute corresponds to a directory (blue, red, violet, etc.) containing a symlink for each related object.

Finally, we want to sort cars based on the keywords attribute. This is a n-n relation, each car may have several keywords, each keyword may be related to several cars.

cars_by_keyword = cars.new_tags "keyword", &.keywords

On the file-system, this translates to:

$ tree storage/
...
└── tags
    └── by_keyword
        └── other-tags
            ├── average
            │   ├── data
            │   │   └── 0000000004.json -> ../../../../..//data/0000000004.json
...
            ├── dirty
            │   ├── data
            │   │   └── 0000000005.json -> ../../../../..//data/0000000005.json
...
            ├── elegant
            │   ├── data
            │   │   ├── 0000000000.json -> ../../../../..//data/0000000000.json
            │   │   └── 0000000003.json -> ../../../../..//data/0000000003.json
...

This is very similar to partitions, but there is a bit more complexity here since we eventually search for a car matching a combination of keywords.

TODO: explanations about our tag-based search and an example.

Updating an object

In our last example we had a Car class, we stored its instances in cars and we could identify each instance by its id with the index cars_by_id. Now, we want to update a car:

# we find a car we want to modify
car = cars_by_id "86a07924-ab3a-4f46-a975-e9803acba22d"

# we modify it
car.color = "Blue"

# update
# simple case: no change in the index
cars_by_id.update car
# otherwise
car.id = "something-else-than-before"
cars_by_id.update "86a07924-ab3a-4f46-a975-e9803acba22d", car

Or, in the case the object may not yet exist:

cars_by_id.update_or_create car.id, car

Removing an object

cars_by_id.delete "86a07924-ab3a-4f46-a975-e9803acba22d"

cars_by_color.delete "red"
cars_by_color.delete "red", do |car|
	car.keywords.empty
end

In this last example, we apply the function on red cars only. This represents a performance boost compared to applying the function on all the cars.

Complete example

require "dodb"

# First, we define what we’ll want to store.
# It *has* to be serializable through JSON, everything in DODB is stored in JSON directly on the file-system.
class Car
	include JSON::Serializable

	property name     : String        # unique to each instance (1-1 relations)
	property color    : String        # a simple attribute (1-n relations)
	property keywords : Array(String) # tags about a car, example: "shiny" (n-n relations)

	def initialize(@name, @color, @keywords)
	end
end

#####################
# Database creation #
#####################

cars = DODB::DataBase(Car).new "./bin/storage"


##########################
# Database configuration #
##########################

# There are several ways to index things in DODB.

# We give a name to the index, then the code to extract the name from a Car instance
# (1-1 relations: in this example, names are indexes = they are UNIQUE identifiers)
cars_by_name = cars.new_index "name", &.name

# We want quick searches for cars based on their color
# (1-n relations: a car only has one color, but a color may refer to many cars)
cars_by_color = cars.new_partition "color", &.color

# We also want to search cars on their keywords
# (n-n relations: a car may be described with many keywords and a keyword may be applied to many cars)
cars_by_keyword = cars.new_tags "keyword", &.keywords


##########
# Adding #
##########

cars << Car.new "Corvet",    "red",    [ "shiny", "impressive", "fast", "elegant" ]
cars << Car.new "SUV",       "red",    [ "solid", "impressive" ]
cars << Car.new "Mustang",   "red",    [ "shiny", "impressive", "elegant" ]
cars << Car.new "Bullet-GT", "red",    [ "shiny", "impressive", "fast", "elegant" ]
cars << Car.new "GTI",       "blue",   [ "average" ]
cars << Car.new "Deudeuch",  "violet", [ "dirty", "slow", "only French will understand" ]

# The DB can be accessed as a simple array
cars.each do |car|
	pp! car
end


################
# Searching... #
################

# based on an index (print the only car named "Corvet")
pp! cars_by_name.get "Corvet"

# based on a partition (print all red cars)
pp! cars_by_color.get "red"

# based on a tag
pp! cars_by_keyword.get "fast"



############
# Updating #
############

car = cars_by_name.get "Corvet"
car.color = "blue"
cars_by_name.update car

car = cars_by_name.get "Bullet-GT"
car.name = "Not-So-Fast-Bullet-GT"
cars_by_name.update "Bullet-GT", car # the name changed

# we have a car
# and add it to the DB, not knowing in advance if it was already there
car = Car.new "Mustang", "red", [] of String
cars_by_name.update_or_create car.name, car



###############
# Deleting... #
###############

# based on a name
cars_by_name.delete "Deudeuch"

# based on a color
cars_by_color.delete "red"
# based on a color (but not only)
cars_by_color.delete "blue", &.name.==("GTI")

## TAG-based deletion, soon.
# # based on a keyword
# cars_by_keyword.delete "solid"
# # based on a keyword (but not only)
# cars_by_keyword.delete "fast", &.name.==("Corvet")

README.md Unescape Escape