diff --git a/paper/paper.ms b/paper/paper.ms index 40fbb3f..82edf17 100644 --- a/paper/paper.ms +++ b/paper/paper.ms @@ -1,4 +1,11 @@ +.ds VERSION 0.5.1 +.ds POINT .so macros.roff +.de dq +\[lq]\\$1\[rq]\c +.shift +\&\\$1 +.. .de TREE1 .QP .ps -3 @@ -51,7 +58,7 @@ DODB is a document-oriented database library, enabling a very simple way to stor The objective is to avoid complex traditional relational databases and to explore a more straightforward way to handle data, to have a tool anyone can .B "read and understand entirely" . To speed-up searches, attributes of these documents can be used as indexes. -DODB can provide a file-system representation of those indexes to enable off-application data manipulation with the most basic tools, such as +DODB can provide a filesystem representation of those indexes to enable off-application data manipulation with the most basic tools, such as .I ls or even a file explorer. @@ -60,16 +67,22 @@ Limits of such approach are discussed. An experiment is described and analyzed to understand the performance that can be expected. .ABSTRACT2 .SINGLE_COLUMN +.br +.po +11.5c +.nf +Document sync'ed with DODB \*[VERSION] +.fi +.br +.po .SECTION Introduction to DODB A database consists in managing data, enabling queries to add, to retrieve, to modify and to delete a piece of information. These actions are grouped under the acronym CRUD: creation, retrieval, update and deletion. CRUD operations are the foundation for the most basic databases. Yet, almost every single database engine goes far beyond this minimalistic set of features. -Although everyone using the file-system of their computer as some sort of database (based on previous definition) by storing raw data (files) in a hierarchical manner (directories), computer science classes introduce a particularly convoluted way of managing data. +Although everyone using the filesystem of their computer as some sort of database (based on previous definition) by storing raw data (files) in a hierarchical manner (directories), computer science classes introduce a particularly convoluted way of managing data. Universities all around the world teach about Structured Query Language (SQL) and relational databases. These two concepts are closely interlinked and require a brief explanation. -. .UL "Relational databases" are built around the idea to describe data to a database engine so it can optimize operations and storage. @@ -112,11 +125,11 @@ users to talk directly to the database so they can access the data without bothe This has value for many companies and organizations. .FOOTNOTE2 -Many tools were used or even developed over the years specifically to aleviate the inherent complexity and limitations of SQL. +Many tools were used or even developed over the years specifically to aleviate the inherent complexity and limitations of traditional SQL databases. For example, designing databases becomes difficult when the list of tables grows; Unified Modeling Language (UML) is then used to provide a graphical overview of the relations between tables. SQL databases may be fast to retrieve data despite complicated operations, but when multiple sequential operations are required they become slow because of all the back-and-forths with the application; -thus, SQL databases can be scripted to automate operations and provide a massive speed up +thus, SQL databases can be scripted to automate operations and to provide a massive speed up .I "stored procedures" , ( see .I "PL/SQL" ). @@ -154,10 +167,23 @@ Document-oriented databases are a sub-class of key-value stores, where metadata And that's exactly what is being done in Document Oriented DataBase (DODB). .UL "The stated goal of DODB" -is to provide a simple library for developers to handle data for basic projects. +is to provide a simple and easy-to-use +.UL library +for developers to perform CRUD operations on documents (undescribed data structures). +DODB aims basic to medium-sized projects, up to a few million entries\*[*]. +.FOOTNOTE1 +See the section +.dq "Limits of DODB" . +.FOOTNOTE2 +Code simplicity implies hackability. Traditional SQL relational databases have a snowballing effect on code complexity, including for applications with basic requirements. However, DODB may be a great starting point to implement more sophisticated features for creative minds. -Code simplicity implies hackability. + +.UL "The non-goals of DODB" +are: +.STARTBULLET +.BULLET to provide a generic library w +.ENDBULLET .UL "Contrary to SQL" , DODB has a very narrow scope: to provide a library enabling to store, to retrieve, to modify and to delete data. @@ -196,7 +222,7 @@ Finally, section 12 provides a conclusion. DODB is a hash table. The key of the hash is an auto-incremented number and the value is the stored data. The following section will explain how to use DODB for basic cases including the few added mechanisms to speed-up searches. -Also, the file-system representation of the data will be presented since it enables easy off-application searches. +Also, the filesystem representation of the data will be presented since it enables easy off-application searches. The presented code is in Crystal such as the DODB library. Keep in mind that this document is all about the method more than the current implementation. @@ -339,7 +365,7 @@ An index also requires a callback, a procedure to extract the value used for ind In this case, the procedure takes a car as a parameter and returns its "name" attribute\*[*]. .FOOTNOTE1 This procedure can be arbitrarily complex and include any necessary data transformation. -For example, the netlibre project (discussed later in the papper) indexes their users' email, but emails are first encoded in base64 to avoid messing around with the file-system. +For example, the netlibre project (discussed later in the papper) indexes their users' email, but emails are first encoded in base64 to avoid messing around with the filesystem. .FOOTNOTE2 Once the index has been created, every inserted or modified entry in the database will be indexed. @@ -377,7 +403,7 @@ A car can now be searched, modified or deleted based on its name. .QE . . -On the file-system, indexes are represented as symbolic links. +On the filesystem, indexes are represented as symbolic links. .TREE1 storage +-- data @@ -503,7 +529,7 @@ Also, this can be as easily hidden in a very nice user-friendly command. . .SSS Side note about triggers DODB presents a few possible triggers (basic indexes, partitions and tags) which respond to an obvious need for fast searches and retrevial. -Though, the implementation involving an heavy use of the file-system via the creation of symlinks comes from a certain vision about how a database could behave to provide a practical way for users to query the database +Though, the implementation involving an heavy use of the filesystem via the creation of symlinks comes from a certain vision about how a database could behave to provide a practical way for users to query the database .UL "outside the application" . Other kinds of triggers could @@ -511,24 +537,23 @@ Other kinds of triggers could be implemented in addition of those presented. These new triggers may have completely different objectives\*[*], methods and performance\*[*]. .FOOTNOTE1 -Providing a file-system representation of the data is a fun experiment; +Providing a filesystem representation of the data is a fun experiment; sysadmins can have a playful relation with the database thanks to an unconventional representation of the data. .FOOTNOTE2 .FOOTNOTE1 New triggers could seek to improve performance by any means necessary including the gazillion ways which already exist. .FOOTNOTE2 -For example, a new kind of triggers could implement a way to accelerate searches for an attribute. -.TBD -The following sections will precisely cover this aspect. +For example, a new kind of triggers could provide a way to accelerate searches based on an attribute, replicate data, send notifications to an external tool, etc. +The following sections will present indexing triggers with improved performance. . . .SECTION DODB, slow? Nope. Let's talk about caches -The file-system representation (of data and indexes) is convenient for the administrator, but input-output operations on a file-system are slow. +The filesystem representation (of data and indexes) is convenient for the administrator, but input-output operations on a filesystem are slow. Storing the data on a storage device is required to protect it from crashes and application restarts. But data can be kept in memory for faster processing of requests. The DODB library has an API close to a hash table. -Having a data cache is as simple as keeping a hash table in memory besides providing a file-system storage, the retrieval becomes incredibly fast\*[*]. +Having a data cache is as simple as keeping a hash table in memory besides providing a filesystem storage, the retrieval becomes incredibly fast\*[*]. .FOOTNOTE1 Several hundred times faster, see the experiment section. .FOOTNOTE2 @@ -634,7 +659,7 @@ the application. Since DODB is a library and not a separate application, providing a way to handle this usage of the database can be relevant. Having the same API to handle both long and short-lived data can be useful. Moreover, the previously mentioned triggers (basic indexes, partitions and tags) would also work the same way for these short-lived data. -Of course, in this case, the file-system representation may be completely irrelevant. +Of course, in this case, the filesystem representation may be completely irrelevant. Therefore, the .I RAM-only database and the @@ -643,7 +668,7 @@ triggers were created. Let's recap the advantages of the RAM-only database. The DODB API is the same for short-lived (read: temporary) and long-lived data. -This includes the same triggers too, so a file-system representation of the current state of the application is possible. +This includes the same triggers too, so a filesystem representation of the current state of the application is possible. .I RAM-only also means incredible performances since DODB only is a .I very @@ -684,7 +709,7 @@ The API of the is exactly the same as the others. .QE As for the database API itself, changing from a version of an index to another is painless. -This way, one can opt for a cached index and, after some time not using the file-system representation, decide to change for its RAM-only version; a 4-character modification and nothing else. +This way, one can opt for a cached index and, after some time not using the filesystem representation, decide to change for its RAM-only version; a 4-character modification and nothing else. . . . @@ -750,7 +775,7 @@ Loop and repeat. Five instances of DODB are tested: .STARTBULLET .BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory); -.BULLET \fIuncached database but cached index\f[] shows the improvement you can expect by having a cache on indexes; +.BULLET \fIuncached database but cached index\f[] shows the improvement to expect with an index cache alone; .BULLET \fIcommon database\f[] shows the most basic use of DODB, with a limited cache (100k entries)\*[*]; .BULLET \fIcached database\f[] represents a database will all the entries in cache (no eviction mechanism); .BULLET \fIRAM only\f[], the database doesn't have a representation on disk (no data is written on it). @@ -762,7 +787,7 @@ The data cache can be fine-tuned with the "common database", enabling the use of The computer on which this test is performed\*[*] is a AMD PRO A10-8770E R7 (4 cores), 2.8 GHz.When mentioned, the .I disk is actually a -.I "temporary file-system (tmpfs)" +.I "temporary filesystem (tmpfs)" to enable maximum efficiency. .FOOTNOTE1 A very simple $50 PC, buyed online. @@ -804,7 +829,7 @@ About 110 to 120 ns for RAM-only and cached database. This is slightly more (about 200 ns) for Common database since there is a few more steps due to the inner structure to maintain. .FOOTNOTE2 In case the value is on the disk, deserialization takes about 15 µs (see \f[CW]Uncached db\f[]). -The request is a little longer when the index isn't cached (see \f[CW]Uncached db and index\f[]); in this case DODB walks the file-system to find the right symlink to follow, thus slowing the process even more, up to 20%. +The request is a little longer when the index isn't cached (see \f[CW]Uncached db and index\f[]); in this case DODB walks the filesystem to find the right symlink to follow, thus slowing the process even more, up to 20%. The logarithmic scale version of this figure shows that \fIRAM-only\f[] and \fIcached\f[] databases have exactly the same performance. The \fIcommon\f[] database spends 80 ns for its LRU caching eviction policy\*[*], making this database about 67% slower than the previous ones to retrieve a value. @@ -920,7 +945,7 @@ The eviction policy implies some operations leading to poorer performances, howe is essentially a debug mode and is not expected to run in most real-life scenarii. The purpose is to produce a control sample (involving only raw IO operations) to compare it to other (more realistic) implementations. -Cached indexes should be considered for most applications, and even more their RAM-only version in case the file-system representation isn't necessary. +Cached indexes should be considered for most applications, and even more their RAM-only version in case the filesystem representation isn't necessary. . .\" .ps -2 .\" .TS @@ -972,11 +997,12 @@ With Postgres, the request duration of a single value varies from 0.1 to 2 ms on .SECTION Limits of DODB DODB provides basic database operations such as storing, retrieving, modifying and removing data. However, DODB doesn't fully handle ACID properties\*[*]: atomicity, consistency, isolation and durability. -This section presents the limits of DODB, whether the current implementation or the approach, and some suggestions to fill the gaps. +This section presents the limits of DODB, whether the current implementation or the approach, and presents some suggestions to fill the gaps. .FOOTNOTE1 Traditional SQL databases handle ACID properties and may have created some "expectations" towards databases from a general public standpoint. .FOOTNOTE2 +.SS "Current state of DODB regarding ACID properties" .STARTBULLET .BULLET .B Atomicity @@ -1020,7 +1046,7 @@ for most applications. .ENDBULLET A future improvement could be to write a checksum for every written data, to easily remove corrupt data from a database. -.B "Discussion on ACID properties" . +.SS "Discussion on ACID properties" First and foremost, both atomicity and isolation properties are inherently related to parallelism, whether through concurrent threads or applications. Traditional SQL databases require both atomicity and isolation properties because they cannot afford not to have parallelism. Since DODB is a library (and not a separate application) and is kept simple (no intermediary language to interpret, no complicated algorithm), it doesn't suffer from any communication latency or long processing delaying requests. @@ -1055,13 +1081,87 @@ Not handling these properties isn't a limitation of the DODB approach but a choi Which also results from a lack of time. .FOOTNOTE2 -.B "Beyond ACID properties" . +.SS "Beyond ACID properties \[en] modern databases' features" Most current databases (traditional relational databases, some key-value databases and so on) provide additional features. These features may include for example high availability toolsets (replication, clustering, etc.), some forms of modularity (several storage backends, specific interfaces with other tools, etc.), interactive command lines or shells, user and authorization management, administration of databases, and so on. -Because DODB is a library and doesn't support the SQL language, because DODB +Because DODB is a library and doesn't support an intermediary language for generic requests, .TBD . +.SS "The state of file systems, their limitations and useful features for DODB instances" +A +.dq filesystem +is the code responsible for the way data and meta-data will be written on a storage device, which basically is some sort of low-level CRUD operations. +This code links the user interface (files and directories) with the device drivers, which finally write bytes on a hard drive for example. +The next paragraphs will give an idea of how filesystems work, the implied limitations regarding DODB\*[*] as it uses filesystems in an overtly naive way and the filesystems' features DODB instances could use for better data management. +.FOOTNOTE1 +Explaining the way filesystem work and their design is out of the scope of this document, so this part will be kept short for readability reasons. +.FOOTNOTE2 + +Beside filesystems designed for specific constraints, such as writing data on a compact disk\*[*] or providing a network filesystem, most +.dq generic +filesystems share a (loosely) common set of objectives. +.FOOTNOTE1 +A compact disk has specific constraints since the device will then only provide read-only access to the data, obviating the need for most of the complexity revolving around fragmentation, inode management and so on. +All storage devices have their own particularities, but regular hard drives and solid-state drives are the important ones for this discussion since filesystems have mostly been designed for them. +.FOOTNOTE2 +These features could be summarized in a few points. + +.STARTBULLET +.KS +.BULLET +.B "CRUD operations" . +Above all, as already established, filesystems enable CRUD operations on a storage device through the concepts of directories and files; this is how users have been directly interacting with their computer to store data for decades. +.KE + +.BULLET +.KS +.B "Reliability and safety" . +.TBD +Since computers do not run in a vacuum, many problems can occur during operation including the loss of the energy supply. +Filesystems try to mitigate damage by keeping a journal of operations (journalized filesystems). +Advanced filesystems may also detect file corruption with automated checksums. +.KE + +.BULLET +.KS +.B "Security" . +.KE +File access should be limited in a number of cases. +For example, several applications with networking features might run on a computer. +If one of these applications is successfully attacked, the attacker shouldn't be able to access other services data or user data. +Same thing for shared computers, one user shouldn't be able to see other users' data. +Therefore, the most widespread form of security comes from filesystem permissions, enabling a user (or a group of users) to access (or to be denied from accessing) specific data (files and directories). +Those permissions include the right to read, to modify or to execute a file, to list or to remove files from a directory, to create or remove directories and a few other permissions. +Extended permissions and attributes exist but are out-of-scope. + +Beside permissions, encryption also brings some kind of security. +In this case, the point is to prevent attackers from accessing protected data despite retrieving files. +Some advanced filesystems can encrypt file individually, others provide the encryption of a whole partition, both methods having their pros and cons. + +.BULLET +.KS +.B "Performance and capacity" . +Many file systems were developed over the years to circumvent contemporary limitations on file or partition sizes, the number of possible files, the limitation on path name lengths, etc. +While storage devices mostly impose physical limitations, a filesystem may be wasting resources because of a simplistic or inadequate design. +.KE + +Depending on the scenario, the filesystem might become wasteful or slow. +Some filesystems cannot handle a huge number of small files (from hundreds of millions to billions) without wasting a lot of space, such as ext4 which doesn't have block suballocation: once a file and has at least one byte in it, it takes a 4kB block size and 4k-1 bytes are wasted. +So, worst case scenario, data rate is +.FRAC 1 4000 +(huge waste) meaning that a 1GB of data would require an entire 4TB hard drive (without even taking the inodes' size into account). + +.BULLET +.KS +.B "Miscealeneous and advanced features" . +A few other features need to be mentionned, such as block suballocation, file content included in the inode, etc. +Some filesystems added more than a decade ago then under-explored features such as snapshots, compression and transactions. +.KE +.ENDBULLET + +In conclusion, no current filesystem has been designed to be used the way DODB use them. +However, having a few millions entries is fine on most filesystems. . . .SECTION Alternatives @@ -1182,7 +1282,7 @@ Therefore, an application can access whatever it needs for its initialization ph For example, a web server can read its configuration file to learn the path to the files to serve, then prevents itself from accessing any other file (including its own configuration file) before serving the files. In-app mechanisms such as these greatly simplifies the configuration. -Security parameters related to the file-system don't require to be sync with the configuration of the application. +Security parameters related to the filesystem don't require to be sync with the configuration of the application. Also, any syscall that is irrelevent for the .I running phase can be disallowed without fuss, which makes pledge+unveil inherently safer than AppArmor and the like. @@ -1251,7 +1351,7 @@ It's almost as the application intentionally avoids any possible optimization. .FOOTNOTE1 Especially given that the number of actual requests is expected to be around 10 requests per second on busy days. .FOOTNOTE2 -Indexes with file-system representation enables quick debugging sessions and to perform a few basic tasks (such as listing all the domains of a user) which, in practice, is great to have at our fingertips with simple unix tools. +Indexes with filesystem representation enables quick debugging sessions and to perform a few basic tasks (such as listing all the domains of a user) which, in practice, is great to have at our fingertips with simple unix tools. .SECTION Conclusion The