Almost finished!!

2025-04-13 18:31:13 +02:00 · 2025-04-13 18:31:13 +02:00 · bc2fb503f2
commit bc2fb503f2
parent 3d79a94f8f
1 changed files with 58 additions and 6 deletions
--- a/paper/paper.ms
+++ b/paper/paper.ms
@ -1,5 +1,5 @@
 .ds VERSION 0.5.1
-.ds POINT
+.ds NB_USERS_NETLIBRE 10700
 .so macros.roff
 .de dq
 \[lq]\\$1\[rq]\c
@ -251,7 +251,7 @@ Section 12 presents a real-world usage of DODB.
 Finally, section 13 provides a conclusion.
 .
 .SECTION How DODB works and basic usage
-DODB is a lookup table using an auto-incremented number as a key and the value is the stored data.
+DODB can be briefly described as a lookup table using an auto-incremented number as key, and data is stored in plain files (the key being used as file name).
 This section explains how to use DODB for basic cases including the few added mechanisms to speed-up searches.
 Also, the filesystem representation of the data is presented since it enables easy off-application searches.

@ -1672,13 +1672,26 @@ A malicious user who successfully took control of the application can now open f
 This section presents all the features I want to see in a future version of the DODB library.
 .
 .SS New types of storage facility
+DODB is focused on providing a document-oriented database, as its name suggests.
+However, the main idea of this library is to implement the core functionality of this kind of database and leave the rest to the users, which isn't bound to a particular database design.
+Other types of databases could be implemented in the same way.
+
 The Log-Structured Merge-Tree algorithm is interesting for databases with intensive updates.
 Database modifications are deferred to be written sequentially, greatly improving the throughput.
 Adding an alternative storage facility in DODB implementing this algorithm could open new possibilities, bringing DODB to a new class of usage.

+Also, graph databases have their use in social networks, recommendation systems and any cases where a lot of JOIN operations would be necessary in a relational database.
+Graph databases are designed to store and query highly interconnected data efficiently.
+They represent data as nodes (entities) and edges (relationships), with both being first-class citizens, enabling fast traversals and complex relationship queries.
+Unlike relational databases, they avoid expensive joins by physically pre-storing connections, trading storage space for faster read performance in networked data scenarios (e.g., social networks, fraud detection).
+Implementing their core functionality could, as for LSMT, be an interesting challenge and open new possibilities for DODB.
+
 Storing data in separate files as it's currently done is great in many aspects but becomes cumbersome with large databases.
 One way to enable large databases in DODB could be to add a new storage class which works differently, but would inevitably introduce complexity.
-Another way could be to implement a new file-system dedicated to store a massive number of small files, which ultimately is more interesting than adding complexity to the library and may be useful beyond DODB.
+
+Another way to circumvent the limitation on the number of files could be to implement a new file-system dedicated to store a massive number of small files, which ultimately is more interesting than adding complexity to the library and may be useful beyond DODB.
+Moreover, new kinds of databases could emerge from this: in case the number of files isn't a problem anymore, why relational databases themselves couldn't be implemented using a filesystem representation?
+What about graph databases?
 .
 .SS New types of triggers
 Some operations are (rightfully!) not handled in DODB, such as text searches.
@ -1687,6 +1700,9 @@ Triggers could be implemented to provide data to external tools in order to enab
 Also,
 .I "analytical triggers"
 could be implemented to provide statistics about the database usage by adding triggers that are activated on database access, not modification.
+
+Finally, triggers could serve another purpose entirely: preventing invalid data to be inserted, sending live notifications based on arbitrary computations on the database or automatically performing a user-defined operation on access, update or removal of a value would be an interesting addition to DODB.
+.
 .SS Pagination via the indexes: offset and limit
 Right now, browsing the entire database by requesting a limited list at a time is possible, thanks to some functions accepting an
 .I offset
@ -1695,6 +1711,14 @@ and a
 However, this is not possible with the indexes, thus when querying for example a partition the API provides the entire list of matching values.
 This is not acceptable for databases with large partitions and tags: memory will be over-used and requests will be slow.
 .
+.SS Configurable data format (JSON, CBOR, etc.)
+JSON is currently in use, but CBOR
+.[
+CBOR
+.]
+would be a better fit for performance optimization.
+Since both have their pros and cons, and given that other data formats could be prefered for many possible reasons, the serialization format should be a configurable parameter.
+.
 .SECTION Real-world usage: netlibre
 DODB instances have been deployed in a real-world setting by the netlibre service.
 This section presents this service and its use of DODB, showing how this method of handling data can be used in conventional online services.
@ -1713,7 +1737,7 @@ enabling users to host services on the internet despite having a dynamic IP addr
 .SOURCE Ruby ps=9 vs=10
 wget "https://www.netlib.re/token-update/<token>"
 .SOURCE
-Thus, netlibre is a real-life service providing domains to more than 7500 users to this day.
+Thus, netlibre is a real-life service providing domains to more than \*[NB_USERS_NETLIBRE] users to this day (april, 2025).

 .B "The technical parts" .
 The service is split into three components: the user interface (the website), an authentication daemon\*[*] (\fIauthd\f[]) and a daemon handling all the server operations related to the actual service (\fIdnsmanagerd\f[]).
@ -1760,6 +1784,36 @@ DODB never required to even think much about storage.
 The different structures used in the code to handle requests were used as-is, making DODB the simplest possible tool to store data.
 Since DODB requires only a very few lines of code, tests were also very quick to make and run.
 For example, database management only took a few dozen lines of code on a 3 kLOC project (dnsmanagerd), most of them being to setup the different databases (storage and triggers) and the rest to perform CRUD operations (each of them only requiring a single line of code).
+Most (if not all) code looks like this:
+.
+.LP
+.SOURCE Ruby ps=9 vs=10
+getter domains : DODB::Storage::Common(Domain)
+.SOURCE
+.QP
+All DODB instances are declared this way.
+.QE
+.
+.SOURCE Ruby ps=9 vs=10
+@domains         = DODB::Storage::Common(Domain).new "#{@root}/domains", 5_000
+@domains_by_name = @domains.new_index "name", &.name
+.SOURCE
+.QP
+All DODB instances are initialized this way.
+.QE
+.
+.SOURCE Ruby ps=9 vs=10
+if entry = db_by_id.get? id
+	entry.some_attribute = new_attribute
+	db_by_id.update entry
+else
+	error "blah"
+end
+.SOURCE
+.QP
+This is the way DODB is used in netlibre.
+.QE
+And that's it. You know how everything works in netlibre.
 .
 .SECTION Conclusion
 Thanks its unusual design choices, trading most features for simplicity and letting users implement their own solutions around the few (mostly focused on CRUD) operations provided by DODB, the complexity of the library is kept at a minimum.
@ -2017,5 +2071,3 @@ function returns an empty list in case the search failed.
 The implementation was designed to be simple (7 lines of code), not efficient.
 However, with data and index caches, the search is expected to meet about everyone's requirements, speed-wise, given that the tags are small enough (a few thousand entries).
 .QE
-.
-.