# dodb.cr DODB stands for Document Oriented DataBase. ## Objective The objective is to get rid of DBMS when storing simple files directly on the file-system is enough. ## Overview A brief summary: - no SQL - objects are serialized (currently in JSON) - indexes (simple symlinks on the FS) can be created to improve significantly searches in the db - db is fully integrated in the language (basically a simple array with a few more functions) Also, data can be `cached`. The entire base will be kept in memory (if you can), enabling incredible speeds. ## Limitations DODB doesn't fully handle ACID properties: - no *atomicity*, you can't chain instructions and rollback if one of them fails ; - no *consistency*, there is currently no mechanism to prevent adding invalid values ; *Isolation* is partially taken into account, with a locking mechanism preventing a few race conditions. FYI, in my projects the database is fast enough so I don't even need parallelism (like, by far). *Durability* is taken into account. Data is written on-disk each time it changes. **NOTE:** what I need is mostly there. What DODB doesn't provide, I hack it in a few lines in my app. DODB will provide some form of atomicity and consistency at some point, but nothing fancy nor too advanced. The whole point is to keep it simple. ## Speed Since DODB doesn't use SQL and doesn't even try to handle stuff like atomicity or consistency, speed is great. Reading data from disk takes about a few dozen microseconds, and not much more when searching an indexed data. **On my more-than-decade-old, slow-as-fuck machine**, the simplest possible SQL request to Postgres takes about 100 to 900 microseconds. With DODB, to reach on-disk data: 13 microseconds. To search then retrieve indexed data: almost the same thing, 16 microseconds on average, since it's just a path to a symlink we have to build. With the `cached` version of DODB, there is not even deserialization happening, so 7 nanoseconds. For indexes (indexes, partitions and tags), the speed up *"only"* is about 14 compared to the uncached version, because indexes still walk the file-system. I may develop fully cached indexes at some point, but keep in mind that this costs memory (but yeah, again, insane speeds). **NOTE:** of course SQL and DODB cannot be fairly compared based on performance since they don't have the same properties. But still, this is the kind of speed you can get with the tool. # Installation Add the following to your `shard.yml`. You may want to add version informations to avoid unexpected breakages. ```yaml dependencies: dodb: git: https://git.baguette.netlib.re/Baguette/dodb.cr ``` # Basic usage ```crystal # Database creation db = DODB::DataBase(Thing).new "path/to/storage/directory" # Adding an element to the db db << Thing.new # Reaching all objects in the db db.each do |thing| pp! thing end ``` # Basic API ## Create the database The DB creation is simply creating a few directories on the file-system. ```crystal db = DODB::DataBase(Thing).new "path/to/storage/directory" ``` ## Adding a new object ```crystal db << Thing.new ``` ## Sorting the objects To speed-up searches in the DB, we can sort them, based on their attributes for example. There are 3 sorting methods: - index, 1-1 relations, an attribute value is bound to a single object (an identifier) - partition, 1-n relations, an attribute value may be related to several objects (the color of a car, for instance) - tags, n-n relations, each object may have several tags, each tag may be related to several objects Let's take an example. ```Crystal require "uuid" class Car include JSON::Serializable property id : String property color : String property keywords : Array(String) def initialize(@color, @keywords) @id = UUID.random.to_s end end ``` We want to store `cars` in a database and index them on their `id` attribute: ```Crystal cars = DODB::DataBase(Car).new "path/to/storage/directory" # We give a name to the index, then the code to extract the id from a Car instance cars_by_id = cars.new_index "id", &.id ``` After adding a few objects in the database, here the index in action on the file-system: ```sh $ tree storage/ storage ├── data │   ├── 0000000000 │   ├── 0000000001 │   ├── 0000000002 │   ├── 0000000003 │   ├── 0000000004 │   └── 0000000005 ├── indices │   └── by_id │   ├── 6e109b82-25de-4250-9c67-e7e8415ad5a7 -> ../../data/0000000003 │   ├── 2080131b-97d7-4300-afa9-55b93cdfd124 -> ../../data/0000000000 │   ├── 2118bf1c-e413-4658-b8c1-a08925e20945 -> ../../data/0000000005 │   ├── b53fab8e-f394-49ef-b939-8a670abe278b -> ../../data/0000000004 │   ├── 7e918680-6bc2-4f29-be7e-3d2e9c8e228c -> ../../data/0000000002 │   └── 8b4e83e3-ef95-40dc-a6e5-e6e697ce6323 -> ../../data/0000000001 ``` We have 5 objects in the DB, each of them has a unique ID attribute, each attribute is related to a single object. Getting an object by its ID is as simple as `cat storage/indices/by_id/`. Now we want to sort cars based on their `color` attribute. This time, we use a `partition`, because the relation between the attribute (color) and the object (car) is `1-n`: ```Crystal cars_by_colors = cars.new_partition "color", &.color ``` On the file-system, this translates to: ```sh $ tree storage/ ... ├── partitions │   └── by_color │   ├── blue │   │   ├── 0000000000 -> ../../../data/0000000000 │   │   └── 0000000004 -> ../../../data/0000000004 │   ├── red │   │   ├── 0000000001 -> ../../../data/0000000001 │   │   ├── 0000000002 -> ../../../data/0000000002 │   │   └── 0000000003 -> ../../../data/0000000003 │   └── violet │   └── 0000000005 -> ../../../data/0000000005 ``` Now the attribute corresponds to a directory (blue, red, violet, etc.) containing a symlink for each related object. Finally, we want to sort cars based on the `keywords` attribute. This is a n-n relation, each car may have several keywords, each keyword may be related to several cars. ```Crystal cars_by_keyword = cars.new_tags "keyword", &.keywords ``` On the file-system, this translates to: ```sh $ tree storage/ ... └── tags └── by_keyword ├── elegant │   ├── 0000000000 -> ../../../data/0000000000 │   └── 0000000003 -> ../../../data/0000000003 ├── impressive │   ├── 0000000000 -> ../../../data/0000000000 │   ├── 0000000001 -> ../../../data/0000000001 │   └── 0000000003 -> ../../../data/0000000003 ... ``` Tags are very similar to partitions and are used the exact same way for search, update and deletion. ## Updating an object In our last example we had a `Car` class, we stored its instances in `cars` and we could identify each instance by its `id` with the index `cars_by_id`. Now, we want to update a car: ```Crystal # we find a car we want to modify car = cars_by_id.get "86a07924-ab3a-4f46-a975-e9803acba22d" # we modify it car.color = "Blue" # update # simple case: no change in the index cars_by_id.update car # otherwise car.id = "something-else-than-before" cars_by_id.update "86a07924-ab3a-4f46-a975-e9803acba22d", car ``` Or, in the case the object may not yet exist: ```Crystal cars_by_id.update_or_create car.id, car # Search by partitions: all blue cars. pp! cars_by_color.get "blue" # Search by tags: all elegant cars. pp! cars_by_keyword.get "elegant" ``` Changing a value that is related to a partition or a tag will automatically do what you would expect: de-index then re-index. You won't find yourself with a bunch of invalid symbolic links all over the place. ## Removing an object ```Crystal # Remove a value based on an index. cars_by_id.delete "86a07924-ab3a-4f46-a975-e9803acba22d" # Remove a value based on a partition. cars_by_color.delete "red" cars_by_color.delete "blue", do |car| car.keywords.empty end # Remove a value based on a tag. cars_by_keyword.delete "shiny" cars_by_keyword.delete "elegant", do |car| car.name == "GTI" end ``` In this code snippet, we apply a function on blue cars only; and blue cars are only removed if they don't have any associated keywords. Same thing for elegant cars. This represents a performance boost compared to applying the function on all the cars. # Complete example ```Crystal require "dodb" # First, we define what we’ll want to store. # It *has* to be serializable through JSON, everything in DODB is stored in JSON directly on the file-system. class Car include JSON::Serializable property name : String # unique to each instance (1-1 relations) property color : String # a simple attribute (1-n relations) property keywords : Array(String) # tags about a car, example: "shiny" (n-n relations) def initialize(@name, @color, @keywords) end end ##################### # Database creation # ##################### cars = DODB::DataBase(Car).new "./bin/storage" ########################## # Database configuration # ########################## # There are several ways to index things in DODB. # We give a name to the index, then the code to extract the name from a Car instance # (1-1 relations: in this example, names are indexes = they are UNIQUE identifiers) cars_by_name = cars.new_index "name", &.name # We want quick searches for cars based on their color # (1-n relations: a car only has one color, but a color may refer to many cars) cars_by_color = cars.new_partition "color", &.color # We also want to search cars on their keywords # (n-n relations: a car may be described with many keywords and a keyword may be applied to many cars) cars_by_keyword = cars.new_tags "keyword", &.keywords ########## # Adding # ########## cars << Car.new "Corvet", "red", [ "shiny", "impressive", "fast", "elegant" ] cars << Car.new "SUV", "red", [ "solid", "impressive" ] cars << Car.new "Mustang", "red", [ "shiny", "impressive", "elegant" ] cars << Car.new "Bullet-GT", "red", [ "shiny", "impressive", "fast", "elegant" ] cars << Car.new "GTI", "blue", [ "average" ] cars << Car.new "Deudeuch", "violet", [ "dirty", "slow", "only French will understand" ] # The DB can be accessed as a simple array cars.each do |car| pp! car end ################ # Searching... # ################ # based on an index (print the only car named "Corvet") pp! cars_by_name.get "Corvet" # based on a partition (print all red cars) pp! cars_by_color.get "red" # based on a tag (print all fast cars) pp! cars_by_keyword.get "fast" ############ # Updating # ############ car = cars_by_name.get "Corvet" car.color = "blue" cars_by_name.update car car = cars_by_name.get "Bullet-GT" car.name = "Not-So-Fast-Bullet-GT" cars_by_name.update "Bullet-GT", car # the name changed # we have a car # and add it to the DB, not knowing in advance if it was already there car = Car.new "Mustang", "red", [] of String cars_by_name.update_or_create car.name, car # We all know it, elegant cars are also expensive. cars_by_keyword.get("elegant").each do |car| car.keywords << "expensive" cars_by_name.update car.name, car end ############### # Deleting... # ############### # based on a name cars_by_name.delete "Deudeuch" # based on a color cars_by_color.delete "red" # based on a color (but not only) cars_by_color.delete "blue", &.name.==("GTI") # based on a keyword cars_by_keyword.delete "solid" # based on a keyword (but not only) cars_by_keyword.delete "fast", &.name.==("Corvet") ```