netlibre explanation and data leak prevention methods.

This commit is contained in:
Philippe Pittoli 2025-01-23 23:40:26 +01:00
parent 3109ee3bba
commit 9cc958c026

View file

@ -49,10 +49,9 @@ DODB is a database-as-library, enabling a very simple way to store applications'
.I documents .I documents
(basically any data type) in plain files. (basically any data type) in plain files.
To speed-up searches, attributes of these documents can be used as indexes. To speed-up searches, attributes of these documents can be used as indexes.
DODB can provide a file-system representation of those indexes through a few symbolic links DODB can provide a file-system representation of those indexes through symbolic links
.I symlinks ) ( .I symlinks ). (
on the disk. This enables administrators to search for data outside the application with the most basic tools, such as
This enables administrators to search for data outside the application with the most basic tools, like
.I ls . .I ls .
This document briefly presents DODB and its main differences with other database engines. This document briefly presents DODB and its main differences with other database engines.
@ -72,6 +71,7 @@ Universities all around the world teach about Structured Query Language (SQL) an
are built around the idea to put data into are built around the idea to put data into
.I tables , .I tables ,
with typed columns so the database can optimize operations and storage. with typed columns so the database can optimize operations and storage.
Data are thus described.
A database is a list of tables with relations between them. A database is a list of tables with relations between them.
For example, let's imagine a database of a movie theater. For example, let's imagine a database of a movie theater.
The database will have a The database will have a
@ -93,7 +93,7 @@ which points to entries in the "movie" table.
.UL "The SQL language" .UL "The SQL language"
enables arbitrary operations on databases: add, search, modify and delete entries. enables arbitrary operations on databases: add, search, modify and delete entries.
Furthermore, SQL also enables to manage administrative operations of the databases themselves: creating databases and tables, managing users with fine-grained authorizations, etc. SQL also enables administrative operations of the databases themselves: creating databases and tables, managing users with fine-grained authorizations, etc.
SQL is used between the application and the database, to perform operations and to provide results when due. SQL is used between the application and the database, to perform operations and to provide results when due.
SQL is also used SQL is also used
.UL outside .UL outside
@ -115,7 +115,7 @@ thus, SQL databases can be scripted to automate operations and provide a massive
.I "stored procedures" , ( .I "stored procedures" , (
see see
.I "PL/SQL" ). .I "PL/SQL" ).
Writing SQL requests requires a lot of boilerplate since there is no integration in the programming languages, leading to multiple function calls for any operation on the database; Furthermore, writing SQL requests requires a lot of boilerplate since there is no integration in the programming languages, leading to multiple function calls for any operation on the database;
thus, object-relational mapping (ORM) libraries were created to reduce the massive code duplication. thus, object-relational mapping (ORM) libraries were created to reduce the massive code duplication.
And so on. And so on.
@ -749,6 +749,7 @@ The scenario is simple: adding values to a database with indexes (basic, partiti
Loop and repeat. Loop and repeat.
Five instances of DODB are tested: Five instances of DODB are tested:
.STARTBULLET
.BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory); .BULLET \fIuncached database\f[] shows the achievable performance with a strong memory constraint (nothing can be kept in-memory);
.BULLET \fIuncached database but cached index\f[] shows the improvement you can expect by having a cache on indexes; .BULLET \fIuncached database but cached index\f[] shows the improvement you can expect by having a cache on indexes;
.BULLET \fIcommon database\f[] shows the most basic use of DODB, with a limited cache (100k entries)\*[*]; .BULLET \fIcommon database\f[] shows the most basic use of DODB, with a limited cache (100k entries)\*[*];
@ -973,11 +974,33 @@ with SQL (varies from 0.1 to 2 ms on my machine for a single value without a sea
This should help put things into perspective. This should help put things into perspective.
. .
.SECTION Alternatives .SECTION Alternatives
Other approaches have been used beside SQL. Other approaches have been used to store data over the years, including but not limited to SQL and key-value stores.
This section briefly presents some of them and their difference from DODB.
.STARTBULLET
.BULLET
.B "Traditional DBMS" .
This category includes all SQL database management systems with a dedicated application handling databases and the operations upon them.
Most known DBMSs are MSSQL, PostgreSQL, Oracle and MariaDB.
These applications are inherently complex for different reasons.
.STARTBULLET
.BULLET They require a description of the data;
.BULLET They require queries written in a dedicated language (SQL);
.BULLET They implement many sophisticated algorithms for performance reasons;
.BULLET Data is written in unconventional formats.
.ENDBULLET
.BULLET
.B "Key-value stores."
.B "Memcached" .B "Memcached"
.B "Redis"
.B "MongoDB"
.B "duckdb" .B "duckdb"
.ENDBULLET
.TBD .TBD
. .
@ -1043,71 +1066,102 @@ This obviously goes beyond the scope of this document, but let's mention a few w
Actual statistics on the use of the different security mechanisms is rather hard to obtain. Actual statistics on the use of the different security mechanisms is rather hard to obtain.
Both presented mechanisms are supported by a wealthy company or by the operating system itself. Both presented mechanisms are supported by a wealthy company or by the operating system itself.
.FOOTNOTE2 .FOOTNOTE2
.STARTBULLET
.BULLET "\fBAppArmor\f[] (linux):" .BULLET "\fBAppArmor\f[] (linux):"
Linux has many mechanisms to handle software security, one of the most known is AppArmor (supported since 2009 by Canonical). Linux has many mechanisms to handle software security, one of the most known is AppArmor (supported since 2009 by Canonical).
AppArmor can limit the use of syscalls and the access to files and directories. AppArmor can limit the use of syscalls and the access to files and directories.
A configuration file is used to describe what an application (defined by the path to its executable) can or cannot do; the source code of the application is left untouched. A configuration file is used to describe what an application (defined by the path to its executable) can or cannot do; the source code of the application is left untouched.
.BULLET "\fBpledge and unveil\f[] (openbsd):" .BULLET "\fBpledge and unveil\f[] (openbsd):"
The OpenBSD operating system includes two complementary syscalls: pledge (to allow or to deny use of syscalls) and unveil (to allow or to deny the access of directories and files). the OpenBSD operating system includes two complementary syscalls to handle permissions: pledge (syscalls) and unveil (directories and files).
The design of these functions is simple: applications often have an The design of these functions is simple: applications often have an
.I init .I initialization
phase during which the connections are made or files are opened (including configuration files), phase during which the connections are made or files are opened (including configuration files),
then comes the then comes the
.I running .I running
phase, during which the application needs less priviledges. phase, during which the application needs less priviledges.
Therefore, an application can access whatever it needs for its initialization phase, then restricts its own rights over syscalls and files. Therefore, an application can access whatever it needs for its initialization phase, which is less prone to attacks, then restricts its own rights over syscalls and files before accepting connections from the internet.
For example, a web server can read its configuration file to learn where are the files to serve, then prevents itself from accessing any other file (including its own configuration file). For example, a web server can read its configuration file to learn the path to the files to serve, then prevents itself from accessing any other file (including its own configuration file) before serving the files.
Having in-app mechanisms such as these greatly simplifies the configuration; and they are even inherently safer than AppArmor and the like. In-app mechanisms such as these greatly simplifies the configuration.
Security parameters related to the file-system don't require to be sync with the configuration of the application.
Also, any syscall that is irrelevent for the
.I running
phase can be disallowed without fuss, which makes pledge+unveil inherently safer than AppArmor and the like.
.ENDBULLET .ENDBULLET
.
.
.SECTION Real-world usage: netlib.re
Common to all the above mechanisms (AppArmor and pledge+unveil), by default, without taking a deep dive into software architecture, none of these prevents a user from accessing the entirety of the database.
A malicious user who successfully took control of the application can now open files (at least in the DODB directory) and read the application's memory (including cached data).
.
.
.SECTION Real-world usage: netlibre
DODB instances have been deployed in a real-world setting by the netlibre service.
This section presents this service and its use of DODB, showing how this method of handling data can be used in conventional online services.
.B Presentation .
Netlibre Netlibre
.[ .[
netlibre netlibre
.] .]
is a service providing free domain names and a website to manage DNS zones. is a service providing free domain names and a website to manage DNS zones.
Domains can be shared and transfered between users, so organizations do not have to rely on a single person. Domains can be shared and transfered between users, so organizations do not have to rely on a single person.
Resource records are managed with dedicated interfaces, users are helped as much as possible through many automatic zone verifications. Users are helped as much as possible through dedicated user interfaces for complex resource records, many automatic zone verifications, documentation and so on.
Resource records can be automatically updated via Resource records can be automatically updated via
.I tokens , .I tokens ,
enabling users to host a service despite having a dynamic IP address. enabling users to host services on the internet despite having a dynamic IP address; those tokens are used with a trivial command:
The resource can be updated with a trivial command:
.SOURCE Ruby ps=9 vs=10 .SOURCE Ruby ps=9 vs=10
wget "https://www.netlib.re/token-update/<token>" wget "https://www.netlib.re/token-update/<token>"
.SOURCE .SOURCE
Thus, netlibre is a real-life service providing domains to more than 7500 users to this day.
.B "The technical parts" . .B "The technical parts" .
The service is split into two three components: a user interface (the website, written in purescript), an authentication daemon (\fIauthd\f[]) and a daemon handling all the server operations related to the actual service (\fIdnsmanagerd\f[]). The service is split into three components: the user interface (the website), an authentication daemon\*[*] (\fIauthd\f[]) and a daemon handling all the server operations related to the actual service (\fIdnsmanagerd\f[]).
.FOOTNOTE1
The authentication daemon is separated from the service-specific code in order to factor authentication for future applications.
Thus, besides factoring code, this enables users to register only once for multiple services.
.FOOTNOTE2
Several "databases" are maintained: Several "databases" are maintained:
.ENUM \fIusers\f[], .ENUM \fIusers\f[] (authd), authentication and user preference data.
.ENUM \fIdomains\f[], .ENUM \fIdomains\f[] (dnsmanagerd), ownership management.
.ENUM \fItokens\f[], .ENUM \fIzones\f[] (dnsmanagerd), actual zone content management.
.ENUM \fIconnected users\f[] for collaborative work .ENUM \fItokens\f[] (dnsmanagerd), enabling automatic updates of records, each token being related to a single record of a single domain.
.ENUM \fIhub\f[] (dnsmanagerd), for data synchronization between users in case of collaborative work (several owners working on a domain at the same time).
.ENDENUM .ENDENUM
.BULLET Common and RAM-only databases Besides
.ENDBULLET .I "hub"
which is an instance of
.I RAM-only
database since it contains inherently volatile data, the others are instances of the
.I Common
database.
This way, application start-up phase is practically instanteneous while caching data for busy entries.
Performance-wise, netlibre handles between 2 to 3k req/s with a single core, without any optimization. Performance-wise, netlibre handles between 2 to 3k req/s with a single core, without any optimization.
Code is written in an almost naive\*[*] way and still performing fine. Code is written in one of the most naive way possible\*[*] and still is performing fine\*[*].
.FOOTNOTE1 .FOOTNOTE1
Keep in mind that netlibre (through Keep in mind that netlibre (through
.B libipc ) .B libipc )
uses poll(2), a very old syscall to handle its event loop (from the 80's!); not newest and way faster event facilities such as epoll(2) and the like. uses poll(2), a very old syscall to handle its event loop (from the 80's!); not newest and way faster event facilities such as epoll(2) and the like.
Also, JSON is used for data storage and queries between the service and its users instead of a more efficient format such as CBOR.
Furthermore, each entry in the logs is written by opening a file, appending the string to the end of the file then closing the file.
Finally, any modification to the content of a zone triggers the on-disk write of its representation in the Bind9 zone file format.
.br
It's almost as the application intentionally avoids any possible optimization.
.FOOTNOTE2 .FOOTNOTE2
.FOOTNOTE1
Indexes with file-system representation enables quick debugging sessions and to perform a few basic tasks such as listing all the domains of a user which, in practice, is great to have at our fingertips with simple unix tools. Especially given that the number of actual requests is expected to be around 10 requests per second on busy days.
.FOOTNOTE2
Indexes with file-system representation enables quick debugging sessions and to perform a few basic tasks (such as listing all the domains of a user) which, in practice, is great to have at our fingertips with simple unix tools.
.SECTION Conclusion .SECTION Conclusion
The The
.I common .I common
database should be an acceptable choice for most applications. database should be an acceptable choice for most applications.
.STARTBULLET
.BULLET it is possible to write other triggers to replace the way index, partition and tags are working, and store their data differently, possibly in a flat file for example.
.BULLET talk about netlib.re .BULLET talk about netlib.re
.BULLET triggers are available and can be adapted to do anything, indexes are a simple use of triggers .BULLET triggers are available and can be adapted to do anything, indexes are a simple use of triggers
.BULLET common db is great for most applications .BULLET common db is great for most applications