372 lines
		
	
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			372 lines
		
	
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # dodb.cr
 | ||
| 
 | ||
| DODB stands for Document Oriented DataBase.
 | ||
| 
 | ||
| ## Objective
 | ||
| 
 | ||
| The objective is to get rid of DBMS when storing simple files directly on the file-system is enough.
 | ||
| 
 | ||
| ## Overview
 | ||
| 
 | ||
| A brief summary:
 | ||
| - no SQL
 | ||
| - objects are serialized (currently in JSON)
 | ||
| - data is indexed to improve significantly searches in the db
 | ||
| - db is fully integrated in the language (basically a simple array with a few more functions)
 | ||
| - symlinks on the FS can be generated to enable data searches **outside the application, with UNIX tools**
 | ||
| - configurable data cache size
 | ||
| - RAM-only databases for short-lived data
 | ||
| - triggers can be easily implemented to extend indexes beyond you wildest expectations
 | ||
| 
 | ||
| ## Limitations
 | ||
| 
 | ||
| DODB doesn't fully handle ACID properties:
 | ||
| 
 | ||
| - no *atomicity*, you can't chain instructions and rollback if one of them fails ;
 | ||
| - no *consistency*, there is currently no mechanism to prevent adding invalid values ;
 | ||
| 
 | ||
| *Isolation* is partially taken into account, with a locking mechanism preventing a few race conditions.
 | ||
| FYI, in my projects the database is fast enough so I don't even need parallelism (like, by far).
 | ||
| 
 | ||
| *Durability* is taken into account.
 | ||
| Data is written on-disk each time it changes.
 | ||
| 
 | ||
| **NOTE:** what I need is mostly there.
 | ||
| What DODB doesn't provide, I hack it in a few lines in my app.
 | ||
| DODB will provide some form of atomicity and consistency at some point, but nothing fancy nor too advanced.
 | ||
| The whole point is to keep it simple.
 | ||
| 
 | ||
| ## Speed
 | ||
| 
 | ||
| Since DODB doesn't use SQL and doesn't even try to handle stuff like atomicity or consistency, speed is great.
 | ||
| Reading data from disk takes about a few dozen microseconds, and not much more when searching an indexed data.
 | ||
| 
 | ||
| **On my more-than-decade-old, slow-as-fuck machine**, the simplest possible SQL request to Postgres takes about 100 to 900 microseconds.
 | ||
| With DODB, to reach on-disk data: 15 microseconds; and just a few dozen **nanoseconds** for cached data.
 | ||
| Even when searching a specific value with an index.
 | ||
| 
 | ||
| **NOTE:** of course SQL and DODB cannot be fairly compared based on performance since they don't have the same properties.
 | ||
| But still, this is the kind of speed you can get with the tool.
 | ||
| 
 | ||
| # Installation
 | ||
| 
 | ||
| Add the following to your `shard.yml`.
 | ||
| You may want to add version informations to avoid unexpected breakages.
 | ||
| 
 | ||
| ```yaml
 | ||
| dependencies:
 | ||
|     dodb:
 | ||
|         git: https://git.baguette.netlib.re/Baguette/dodb.cr
 | ||
| ```
 | ||
| 
 | ||
| # Basic usage
 | ||
| 
 | ||
| ```crystal
 | ||
| # Database creation, with a data cache of 100k entries.
 | ||
| db = DODB::Storage::Common(Thing).new "path/to/storage/directory", 100_000
 | ||
| 
 | ||
| # Adding an element to the db
 | ||
| db << Thing.new
 | ||
| 
 | ||
| # Reaching all objects in the db
 | ||
| db.each do |thing|
 | ||
| 	pp! thing
 | ||
| end
 | ||
| ```
 | ||
| 
 | ||
| # Basic API
 | ||
| 
 | ||
| ## Create the database
 | ||
| 
 | ||
| The DB creation is simply creating a few directories on the file-system.
 | ||
| 
 | ||
| ```crystal
 | ||
| db = DODB::Storage::Common(Thing).new "path/to/storage/directory", 100_000
 | ||
| ```
 | ||
| 
 | ||
| ## Adding a new object
 | ||
| 
 | ||
| ```crystal
 | ||
| db << Thing.new
 | ||
| ```
 | ||
| 
 | ||
| ## Sorting the objects
 | ||
| 
 | ||
| To speed-up searches in the DB, we can sort them, based on their attributes for example.
 | ||
| There are 3 sorting methods:
 | ||
| - basic indexes, 1-1 relations, an attribute value is bound to a single object (an identifier)
 | ||
| - partitions, 1-n relations, an attribute value may be related to several objects (the color of a car, for instance)
 | ||
| - tags, n-n relations, each object may have several tags, each tag may be related to several objects
 | ||
| 
 | ||
| Let's take an example.
 | ||
| ```Crystal
 | ||
| require "uuid"
 | ||
| 
 | ||
| class Car
 | ||
| 	include JSON::Serializable
 | ||
| 	property id       : String
 | ||
| 	property color    : String
 | ||
| 	property keywords : Array(String)
 | ||
| 
 | ||
| 	def initialize(@color, @keywords)
 | ||
| 		@id = UUID.random.to_s
 | ||
| 	end
 | ||
| end
 | ||
| ```
 | ||
| 
 | ||
| We want to store `cars` in a database and index them on their `id` attribute:
 | ||
| ```Crystal
 | ||
| cars = DODB::Storage::Common(Car).new "path/to/storage/directory", 100_000
 | ||
| 
 | ||
| # We give a name to the index, then the code to extract the id from a Car instance
 | ||
| cars_by_id = cars.new_index "id", &.id
 | ||
| ```
 | ||
| 
 | ||
| After adding a few objects in the database, here the index in action on the file-system:
 | ||
| 
 | ||
| ```sh
 | ||
| $ tree storage/
 | ||
| storage
 | ||
| ├── data
 | ||
| │   ├── 0000000000
 | ||
| │   ├── 0000000001
 | ||
| │   ├── 0000000002
 | ||
| │   ├── 0000000003
 | ||
| │   ├── 0000000004
 | ||
| │   └── 0000000005
 | ||
| ├── indices
 | ||
| │   └── by_id
 | ||
| │       ├── 6e109b82-25de-4250-9c67-e7e8415ad5a7 -> ../../data/0000000003
 | ||
| │       ├── 2080131b-97d7-4300-afa9-55b93cdfd124 -> ../../data/0000000000
 | ||
| │       ├── 2118bf1c-e413-4658-b8c1-a08925e20945 -> ../../data/0000000005
 | ||
| │       ├── b53fab8e-f394-49ef-b939-8a670abe278b -> ../../data/0000000004
 | ||
| │       ├── 7e918680-6bc2-4f29-be7e-3d2e9c8e228c -> ../../data/0000000002
 | ||
| │       └── 8b4e83e3-ef95-40dc-a6e5-e6e697ce6323 -> ../../data/0000000001
 | ||
| ```
 | ||
| 
 | ||
| We have 5 objects in the DB, each of them has a unique ID attribute, each attribute is related to a single object.
 | ||
| Getting an object by its ID is as simple as `cat storage/indices/by_id/<id>`.
 | ||
| 
 | ||
| 
 | ||
| Now we want to sort cars based on their `color` attribute.
 | ||
| This time, we use a `partition`, because the relation between the attribute (color) and the object (car) is `1-n`:
 | ||
| ```Crystal
 | ||
| cars_by_colors = cars.new_partition "color", &.color
 | ||
| ```
 | ||
| 
 | ||
| On the file-system, this translates to:
 | ||
| ```sh
 | ||
| $ tree storage/
 | ||
| ...
 | ||
| ├── partitions
 | ||
| │   └── by_color
 | ||
| │       ├── blue
 | ||
| │       │   ├── 0000000000 -> ../../../data/0000000000
 | ||
| │       │   └── 0000000004 -> ../../../data/0000000004
 | ||
| │       ├── red
 | ||
| │       │   ├── 0000000001 -> ../../../data/0000000001
 | ||
| │       │   ├── 0000000002 -> ../../../data/0000000002
 | ||
| │       │   └── 0000000003 -> ../../../data/0000000003
 | ||
| │       └── violet
 | ||
| │           └── 0000000005 -> ../../../data/0000000005
 | ||
| ```
 | ||
| 
 | ||
| Now the attribute corresponds to a directory (blue, red, violet, etc.) containing a symlink for each related object.
 | ||
| 
 | ||
| Finally, we want to sort cars based on the `keywords` attribute.
 | ||
| This is a n-n relation, each car may have several keywords, each keyword may be related to several cars.
 | ||
| ```Crystal
 | ||
| cars_by_keyword = cars.new_tags "keyword", &.keywords
 | ||
| ```
 | ||
| 
 | ||
| On the file-system, this translates to:
 | ||
| ```sh
 | ||
| $ tree storage/
 | ||
| ...
 | ||
| └── tags
 | ||
|     └── by_keyword
 | ||
|         ├── elegant
 | ||
|         │   ├── 0000000000 -> ../../../data/0000000000
 | ||
|         │   └── 0000000003 -> ../../../data/0000000003
 | ||
|         ├── impressive
 | ||
|         │   ├── 0000000000 -> ../../../data/0000000000
 | ||
|         │   ├── 0000000001 -> ../../../data/0000000001
 | ||
|         │   └── 0000000003 -> ../../../data/0000000003
 | ||
| ...
 | ||
| ```
 | ||
| Tags are very similar to partitions and are used the exact same way for search, update and deletion.
 | ||
| 
 | ||
| ## Updating an object
 | ||
| 
 | ||
| In our last example we had a `Car` class, we stored its instances in `cars` and we could identify each instance by its `id` with the index `cars_by_id`.
 | ||
| Now, we want to update a car:
 | ||
| ```Crystal
 | ||
| # we find a car we want to modify
 | ||
| car = cars_by_id.get "86a07924-ab3a-4f46-a975-e9803acba22d"
 | ||
| 
 | ||
| # we modify it
 | ||
| car.color = "Blue"
 | ||
| 
 | ||
| # update, simple case: no change in the index
 | ||
| cars_by_id.update car
 | ||
| # otherwise
 | ||
| car.id = "something-else-than-before"
 | ||
| cars_by_id.update "86a07924-ab3a-4f46-a975-e9803acba22d", car
 | ||
| ```
 | ||
| 
 | ||
| Or, in the case the object may not yet exist:
 | ||
| ```Crystal
 | ||
| cars_by_id.update_or_create car.id, car
 | ||
| 
 | ||
| # Search by partitions: all blue cars.
 | ||
| pp! cars_by_color.get "blue"
 | ||
| 
 | ||
| # Search by tags: all elegant cars.
 | ||
| pp! cars_by_keyword.get "elegant"
 | ||
| ```
 | ||
| 
 | ||
| Changing a value that is related to a partition or a tag will automatically do what you would expect: de-index then re-index.
 | ||
| You won't find yourself with a bunch of invalid symbolic links all over the place.
 | ||
| 
 | ||
| ## Removing an object
 | ||
| 
 | ||
| ```Crystal
 | ||
| # Remove a value based on an index.
 | ||
| cars_by_id.delete "86a07924-ab3a-4f46-a975-e9803acba22d"
 | ||
| 
 | ||
| # Remove a value based on a partition.
 | ||
| cars_by_color.delete "red"
 | ||
| cars_by_color.delete "blue", do |car|
 | ||
| 	car.keywords.empty
 | ||
| end
 | ||
| 
 | ||
| # Remove a value based on a tag.
 | ||
| cars_by_keyword.delete "shiny"
 | ||
| cars_by_keyword.delete ["slow", "expensive"] # Remove cars that are both slow and expensive.
 | ||
| cars_by_keyword.delete "elegant", do |car|
 | ||
| 	car.name == "GTI"
 | ||
| end
 | ||
| ```
 | ||
| 
 | ||
| In this code snippet, we apply a function on blue cars only;
 | ||
| and blue cars are only removed if they don't have any associated keywords.
 | ||
| Same thing for elegant cars.
 | ||
| This represents a performance boost compared to applying the function on all the cars.
 | ||
| 
 | ||
| # Complete example
 | ||
| 
 | ||
| ```Crystal
 | ||
| require "dodb"
 | ||
| 
 | ||
| # First, we define what we’ll want to store.
 | ||
| # It *has* to be serializable through JSON, everything in DODB is stored in JSON directly on the file-system.
 | ||
| class Car
 | ||
| 	include JSON::Serializable
 | ||
| 
 | ||
| 	property name     : String        # unique to each instance (1-1 relations)
 | ||
| 	property color    : String        # a simple attribute (1-n relations)
 | ||
| 	property keywords : Array(String) # tags about a car, example: "shiny" (n-n relations)
 | ||
| 
 | ||
| 	def initialize(@name, @color, @keywords)
 | ||
| 	end
 | ||
| end
 | ||
| 
 | ||
| #####################
 | ||
| # Database creation #
 | ||
| #####################
 | ||
| 
 | ||
| cars = DODB::Storage::Common(Car).new "./db-storage", 100_000
 | ||
| 
 | ||
| 
 | ||
| ##########################
 | ||
| # Database configuration #
 | ||
| ##########################
 | ||
| 
 | ||
| # There are several ways to index things in DODB.
 | ||
| 
 | ||
| # We give a name to the index, then the code to extract the name from a Car instance
 | ||
| # (1-1 relations: in this example, names are indexes = they are UNIQUE identifiers)
 | ||
| cars_by_name = cars.new_index "name", &.name
 | ||
| 
 | ||
| # We want quick searches for cars based on their color
 | ||
| # (1-n relations: a car only has one color, but a color may refer to many cars)
 | ||
| cars_by_color = cars.new_partition "color", &.color
 | ||
| 
 | ||
| # We also want to search cars on their keywords
 | ||
| # (n-n relations: a car may be described with many keywords and a keyword may be applied to many cars)
 | ||
| cars_by_keyword = cars.new_tags "keyword", &.keywords
 | ||
| 
 | ||
| 
 | ||
| ##########
 | ||
| # Adding #
 | ||
| ##########
 | ||
| 
 | ||
| cars << Car.new "Corvet",    "red",    [ "shiny", "impressive", "fast", "elegant" ]
 | ||
| cars << Car.new "SUV",       "red",    [ "solid", "impressive" ]
 | ||
| cars << Car.new "Mustang",   "red",    [ "shiny", "impressive", "elegant" ]
 | ||
| cars << Car.new "Bullet-GT", "red",    [ "shiny", "impressive", "fast", "elegant" ]
 | ||
| cars << Car.new "GTI",       "blue",   [ "average" ]
 | ||
| cars << Car.new "Deudeuch",  "violet", [ "dirty", "slow", "only French will understand" ]
 | ||
| 
 | ||
| # The DB can be accessed as a simple array
 | ||
| cars.each do |car|
 | ||
| 	pp! car
 | ||
| end
 | ||
| 
 | ||
| 
 | ||
| ################
 | ||
| # Searching... #
 | ||
| ################
 | ||
| 
 | ||
| # based on an index (print the only car named "Corvet")
 | ||
| pp! cars_by_name.get "Corvet"
 | ||
| 
 | ||
| # based on a partition (print all red cars)
 | ||
| pp! cars_by_color.get "red"
 | ||
| 
 | ||
| # based on a tag (print all fast cars)
 | ||
| pp! cars_by_keyword.get "fast"
 | ||
| 
 | ||
| # based on several tags (print all cars that are both slow and expensive)
 | ||
| pp! cars_by_keyword.get ["slow", "expensive"]
 | ||
| 
 | ||
| ############
 | ||
| # Updating #
 | ||
| ############
 | ||
| 
 | ||
| car = cars_by_name.get "Corvet"
 | ||
| car.color = "blue"
 | ||
| cars_by_name.update car
 | ||
| 
 | ||
| car = cars_by_name.get "Bullet-GT"
 | ||
| car.name = "Not-So-Fast-Bullet-GT"
 | ||
| cars_by_name.update "Bullet-GT", car # the name changed
 | ||
| 
 | ||
| # we have a car
 | ||
| # and add it to the DB, not knowing in advance if it was already there
 | ||
| car = Car.new "Mustang", "red", [] of String
 | ||
| cars_by_name.update_or_create car.name, car
 | ||
| 
 | ||
| # We all know it, elegant cars are also expensive.
 | ||
| cars_by_keyword.get("elegant").each do |car|
 | ||
| 	car.keywords << "expensive"
 | ||
| 	cars_by_name.update car
 | ||
| end
 | ||
| 
 | ||
| ###############
 | ||
| # Deleting... #
 | ||
| ###############
 | ||
| 
 | ||
| # based on a name
 | ||
| cars_by_name.delete "Deudeuch"
 | ||
| 
 | ||
| # based on a color
 | ||
| cars_by_color.delete "red"
 | ||
| # based on a color (but not only)
 | ||
| cars_by_color.delete "blue", &.name.==("GTI")
 | ||
| 
 | ||
| # based on a keyword
 | ||
| cars_by_keyword.delete "solid"
 | ||
| # based on a few keywords (but not only)
 | ||
| cars_by_keyword.delete ["slow", "expensive"], &.name.==("Corvet")
 | ||
| ```
 |