Notes about a distributed database design

In my spare time, I am planning/designing a simple distributed database. Here is some of my notes.

The conceptual model

  • A database is a lexicographic sorted list of rows
  • A row consist of fields (the number of them may vary)
  • Append only operation, – though a time stamp field or similar may tell which data is current or deprecated, and deprecated data may be freed. This allows for easy distribution.



In my opinion, security should be integrated with the data.

The idea is that each table has a public/private key.
Every change/snapshot of the database is signed,
and the private key is the key to modifying it.
Having the write-key, gives write-access, and also read-access.
The hash of the private key, is the skeleton-key for read access.
A named encryption key can be derived from the skeleton-key,
as the hash of the name and the skeleton key.
Parts of the table can be a link to second table,
where the second table determines the access pattern.

To summarise, there are three kind of keys:

  • Write keys, which gives write(and read) access to a table.
  • Skeleton keys, which gives full read access to all protected of a table.
  • Read keys, which gives read access to one part of a protected table.

All of this can be implemented with encryption, write keys works essentially like ipns, and the read-keys are just usual encryption.

Implementation design

The plan would be to implement this as a persistent trie data structure on top of ipfs. This eliminates the cost of common prefixes for the rows.

The raw data is public, and security is achieved through encryption.

Nodes with the write key of a table communicate, and do distributed merge of changes to the table. The table has a commit-cycle, corresponding to the latency between all the nodes with write access. New write-nodes can be added to the table, by just giving them the private key.