Top Related Projects
A library that provides an embeddable, persistent key-value store for fast storage.
FoundationDB - the open source, distributed, transactional key-value store
🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
RocksDB/LevelDB inspired key-value database in Go
Quick Overview
Sophia is a lightweight, embeddable, transactional key-value database written in C. It is designed to be fast, reliable, and easy to integrate into applications that require a persistent storage solution.
Pros
- Lightweight and Embeddable: Sophia is a self-contained, single-file library that can be easily integrated into applications without the need for a separate server or external dependencies.
- Transactional and Durable: Sophia provides ACID (Atomicity, Consistency, Isolation, Durability) guarantees, ensuring data integrity and reliability.
- High Performance: Sophia is designed to be fast, with low latency and high throughput, making it suitable for applications that require efficient data storage and retrieval.
- Cross-Platform: Sophia is written in portable C code and can be compiled and used on a variety of platforms, including Windows, Linux, and macOS.
Cons
- Limited Query Capabilities: Sophia is primarily a key-value store and does not provide advanced querying capabilities like SQL databases or document-oriented databases.
- Lack of Replication and Clustering: Sophia is a single-node database and does not currently support replication or clustering, which may limit its scalability for large-scale applications.
- Limited Documentation: The project's documentation could be more comprehensive, which may make it more challenging for new users to get started with Sophia.
- Relatively Niche: Sophia is a relatively niche database solution, and it may not have the same level of community support and ecosystem as more popular database technologies.
Code Examples
Here are a few code examples demonstrating the usage of Sophia:
- Initializing and Opening a Database:
#include <sophia.h>
int main() {
sp_env *env = sp_env();
sp_open(env);
// Use the database
sp_close(env);
sp_destroy(env);
return 0;
}
- Inserting and Retrieving Data:
#include <sophia.h>
int main() {
sp_env *env = sp_env();
sp_open(env);
// Insert a key-value pair
sp_key *key = sp_document(env);
sp_setstring(key, "key", "hello", 5);
sp_setstring(key, "value", "world", 5);
sp_set(env, key);
// Retrieve the value
sp_key *get_key = sp_document(env);
sp_setstring(get_key, "key", "hello", 5);
sp_get(env, get_key);
const char *value = sp_getstring(get_key, "value", NULL);
printf("Value: %s\n", value);
sp_destroy(key);
sp_destroy(get_key);
sp_close(env);
sp_destroy(env);
return 0;
}
- Performing a Transactional Operation:
#include <sophia.h>
int main() {
sp_env *env = sp_env();
sp_open(env);
// Begin a transaction
sp_tx *tx = sp_begin(env);
// Insert a key-value pair
sp_key *key = sp_document(env);
sp_setstring(key, "key", "hello", 5);
sp_setstring(key, "value", "world", 5);
sp_set(tx, key);
// Commit the transaction
sp_commit(tx);
sp_destroy(key);
sp_close(env);
sp_destroy(env);
return 0;
}
Getting Started
To get started with Sophia, follow these steps:
- Download the latest version of Sophia from the GitHub repository.
- Extract the downloaded archive and navigate to the
sophia
directory. - Compile the Sophia library by running the following commands:
$ make $ sudo make install
- In your C project, include the
Competitor Comparisons
A library that provides an embeddable, persistent key-value store for fast storage.
Pros of RocksDB
- Widely adopted and battle-tested in production environments
- Extensive documentation and community support
- Rich feature set, including column families and transactions
Cons of RocksDB
- Higher memory usage and larger disk footprint
- More complex API and configuration options
- Slower write performance in some scenarios
Code Comparison
RocksDB:
rocksdb::DB* db;
rocksdb::Options options;
options.create_if_missing = true;
rocksdb::Status status = rocksdb::DB::Open(options, "/path/to/db", &db);
Sophia:
void *env = sp_env();
void *db = sp_open(env);
int rc = sp_set(db, "key", 3, "value", 5);
RocksDB offers a more feature-rich API with additional options and configurations, while Sophia provides a simpler, more straightforward interface. RocksDB's code is more verbose but allows for fine-tuned control over database behavior. Sophia's API is more compact and easier to use for basic operations, but may lack some advanced features found in RocksDB.
Both databases have their strengths and weaknesses, and the choice between them depends on specific project requirements, performance needs, and development preferences.
FoundationDB - the open source, distributed, transactional key-value store
Pros of FoundationDB
- Highly scalable and distributed architecture, supporting multi-node clusters
- Strong ACID guarantees with support for multi-key transactions
- Backed by Apple, with a large and active community
Cons of FoundationDB
- More complex setup and configuration compared to Sophia
- Steeper learning curve due to its distributed nature
- Potentially higher resource requirements for small-scale applications
Code Comparison
FoundationDB (Python):
@fdb.transactional
def add_user(tr, user_id, name):
tr[f'users:{user_id}:name'] = name
fdb.open()
add_user(db, '123', 'John Doe')
Sophia (C):
void *db = sp_open("users.db");
sp_set(db, "users:123:name", "John Doe");
sp_close(db);
Summary
FoundationDB is a distributed database system offering strong consistency and scalability, ideal for large-scale applications. Sophia, on the other hand, is a simpler key-value store focused on embedded use and high performance. FoundationDB provides more advanced features but requires more setup, while Sophia offers ease of use and lower overhead for simpler use cases.
🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
Pros of ArangoDB
- Multi-model database supporting key/value, document, and graph data
- Highly scalable with clustering and sharding capabilities
- Rich query language (AQL) for complex data operations
Cons of ArangoDB
- Higher resource consumption due to its comprehensive feature set
- Steeper learning curve for new users
- More complex setup and configuration compared to simpler databases
Code Comparison
Sophia (key-value storage):
sophia_env *env = sophia_env();
sophia_set(env, "sophia.path", "/tmp/sophia");
sophia_open(env);
int rc = sophia_set(db, "key", "value", 0);
ArangoDB (document storage):
const db = require('arangojs')();
const collection = db.collection('myCollection');
await collection.save({ key: 'value' });
const doc = await collection.document('myKey');
Summary
Sophia is a lightweight key-value store focused on simplicity and performance, while ArangoDB is a feature-rich multi-model database suitable for complex data scenarios. Sophia offers easier integration for basic storage needs, whereas ArangoDB provides more advanced querying and data modeling capabilities at the cost of increased complexity.
RocksDB/LevelDB inspired key-value database in Go
Pros of Pebble
- More actively maintained with frequent updates and contributions
- Better documentation and community support
- Designed specifically for CockroachDB, optimized for distributed systems
Cons of Pebble
- Less flexible for general-purpose use cases
- Potentially more complex to set up and use outside of CockroachDB
- May have higher resource requirements
Code Comparison
Sophia (key-value storage):
sophia_t *env = sp_env();
sp_setstring(env, "sophia.path", "/tmp/sophia", 0);
sp_open(env);
void *db = sp_getobject(env, "db.test");
sp_set(db, "key", 3, "value", 5);
sp_destroy(env);
Pebble (key-value storage):
db, err := pebble.Open("path/to/db", &pebble.Options{})
if err != nil {
log.Fatal(err)
}
err = db.Set([]byte("key"), []byte("value"), pebble.Sync)
if err != nil {
log.Fatal(err)
}
defer db.Close()
Both Sophia and Pebble are key-value storage engines, but they have different focuses and use cases. Sophia is a more general-purpose embedded database, while Pebble is tailored for use in CockroachDB. Pebble offers better performance in distributed environments, while Sophia may be easier to integrate into smaller projects. The code examples show that both provide simple key-value operations, but with different APIs and language preferences.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Sophia is advanced transactional MVCC key-value/row storage library.
Sophia is RAM-Disk hybrid storage. It is designed to provide best possible on-disk performance without degradation in time. It has guaranteed O(1) worst case complexity for read, write and range scan operations.
Features
- Full ACID compliancy
- MVCC engine
- Optimistic, non-blocking concurrency with N-writers and M-readers
- Pure Append-Only
- Unique data storage architecture
- Fast: O(1) worst for read, write and range scan operations
- Multi-threaded compaction
- Multi-databases support (sharing a single write-ahead log)
- Multi-Statement and Single-Statement Transactions (cross-database)
- Serialized Snapshot Isolation (SSI)
- Optimized storage schema (numeric types has zero-cost storage)
- Can be used to build Secondary Indexes
- Upsert (fast write-only 'update or insert' operation)
- Consistent Cursors
- Prefix search
- Automatic garbage-collection
- Automatic key-expire
- Hot Backup
- Compression (no fixed-size blocks, no-holes, supported: lz4, zstd)
- Direct IO support
- Use mmap or pread access methods
- Simple and easy to use (minimalistic API, FFI-friendly, amalgamated)
- Implemented as small C-written library with zero dependencies
- Carefully tested
- Open Source Software, BSD
The project is no longer supported, but it is still interesting in terms of the storage engine design evolution and architectural choices.
Version 1.1 (first design approach)
Original Sophia's architecture combines a region in-memory index with an in-memory key index.
A region index is represented by an ordered range of regions with their min and max keys and latest on-disk reference. Regions never overlap.
These regions have the same semantical meaning as the B-Tree pages, but designed differently. They do not have a tree structure or any internal page-to-page relationships, thus no meta-data overhead (specifically to append-only B-Tree).
A single region on-disk holds keys with values. As a B-tree page, region has its maximum key count. Regions are uniquely identified by region id number that makes them trackable in future.
Key index is very similar to LSM zero-level (memtable), but has a different key lifecycle. All modifications first get inserted into the index and then hold up until they are explicitly removed by merger.
Lifecycle
The database update lifecycle is organized in terms of epochs. Epoch lifetime is set in terms of key updates. When the update counter reaches an epoch's watermark number then the Rotation event happen.
Each epoch, depending on its state, is associated with a single log file or database file. Before getting added to the in-memory index, modifications are first written to the epoch's write-ahead log.
On each rotation event:
- current epoch, so called 'live', is marked as 'transfer' which leads to a new 'live' epoch creation (new log file)
- create new in-memory key index and swap it with current one
- merger thread is being woken up
The merger thread is the core part that is responsible for region merging and garbage collecting of the old regions and older epochs. On the wakeup, the merger thread iterates through list of epochs marked as 'transfer' and starts the merge procedure.
The merge procedure has the following steps:
- create new database file for the latest 'transfer' epoch
- fetch any keys from in-memory index that associated with a single destination region
- start the merge for each fetched key and origin region, then write a new region to the database file
- on each completed region (current merged key count is less or equal to max region key count):
- allocate new split region for region index, set min and max
- first region always has id of origin destination region
- link region and schedule for future commit
- on origin region update completion:
- update destination region index file reference to the current epoch and insert split regions
- remove keys from key index
- start step (2) until there is no updates left
- start garbage collector
- sync database with a disk drive, then, if everything went well, remove all 'transfer' epochs (log files) and gc'ed databases
- free index memory
Garbage Collection
The garbage collector has a simple design.
All that you need is to track an epoch's total region count and the count of transfered regions during merge procedure. Thus, if some older epoch database has fewer than 70% (or any other changeable factor) live regions, they just get copied to the current epoch database file while the old one is being deleted.
On database recovery, Sophia tracks and builds an index of pages from the earliest epochs (biggest numbers) down to the oldest. Log files then are being replayed and epochs become marked as 'transfer'.
Algorithmic Complexity
Sophia has been evaluated as having the following complexity (in terms of disk accesses):
set worst case is O(1) append-only key write + in-memory index insert
delete worst case is O(1) (delete operation is equal to set)
get worst case is O(1) random region read, which itself does amortized O(log region_key_count) key compares + in-memory key index search + in-memory region search
range range queries are very fast due to the fact that each iteration needs to compare no more than two keys without a search, and access through mmaped database. Roughly complexity can be equally evaluated as sequential reading of the mmaped file.
Version 1.2 (latest design)
There have been major changes in storage architecture since version 1.1.
Version 1.1 defines strict 2-level storage model between in-memory and disk. It gives worst-case guarantee O(1) for any ordered key read in terms of disk access.
This approach had its limitations, since it was unable to efficiently maintain memory limit with required performance. Additionally there was a need for multi-threaded compaction.
Sophia has evolved in a way that expands original ideas. Architecture has been designed to efficiently work with memory and large amount of keys.
In fact, it became even simplier.
Design
Sophia's main storage unit is a Node.
Each Node represents a single file with associated in-memory region index and two in-memory key indexes. Node file consists of Branches.
Each Branch consists of a set of sorted Regions and Region Index.
A single Region holds keys with values. It has the same semantical meaning as a B-Tree page, but organized in a different way. It does not have a tree structure or any internal page-to-page relationships and thus no meta-data overhead.
A Region Index is represented by an ordered range of regions with their min and max keys and on-disk reference. Regions never overlap.
A Key Index is very similar to LSM zero-level (memtable), but has a different key lifecycle. All modifications first get into the index and hold up until they are explicitly removed by the merger.
Before getting added to the in-memory Key Index, modifications are first written to the Write-Ahead Log.
Lifecycle
Database lifecycle is organized in terms of two major operations: Branch and Compaction.
When a Node's in-memory Key Index size reaches a special watermark value or global memory limit, Branch operation is scheduled.
When some Node branch counter reaches a special watermark value, Compaction operation is scheduled.
Branch operation
- rotate in-memory Key Index (use second one empty) (Node updates during Branch goes to second index)
- create new Regions and Region Index from Key Index
- create new Node Branch
- sequentially write Branch to the end of Node file
- sync
- free index and rotate
Compaction operation
- sequentially read Node file into memory
- create iterator for each Branch
- merge and split key-stream:
- create one or more Nodes
- delete Node
- sequentially write new Node or Nodes
- sync
- redistribute online updates between new Nodes
- remove old Node
- rename new Node or Nodes when completed
Optimization Game
All background operations are planned by special scheduler.
There is a game between available memory, a number of Branches and Search times.
Each additional branch says that there is a possible additional disk access during the search. During the search, only branch regions that have min >= key <= max are examined. In the same time, it is unable to maintain memory limits without branching, because compaction times are greater than possible rate of incoming data.
Sophia is designed to be read optimized. There is a high possibility that latest created Branches (hot data) are stored in the file system cache. Scheduler is aware about nodes which have largest in-memory Key Index and biggest number of Branches. These are processed first.
Additionally all operations are planned taking current system state in account, like memory usage statistics, current load profiler (redzone), operations priorities, checkpoint, backup, etc.
Sophia compaction is multi-threaded. Each worker (thread) requests scheduler for a new task. Basic unit of a background task is an operation on a single Node.
Sophia is designed to efficiently utilize available memory. If there is more memory available, then branch/compaction operations become more infrequent and system becomes more disk-efficient. Best performance can be obtained with no memory limit set. Sophia is Hard-Drive (and Flash) friendly, since all operations are delayed and executed in large sequential reads and writes, without overwrite.
Garbage Collection
Garbage collection (MVCC related) is executed automatically by Compaction task.
Also, scheduler periodically checks if there are any nodes which have large percentage of transactional versions (duplicates) stored per node.
Algorithmic Complexity
Sophia has following algorithmic complexity (in terms of disk access):
set worst case is O(1) write-ahead-log append-only key write + in-memory node index search + in-memory index insert
delete worst case is O(1) (delete operation is equal to set)
get worst case is amortized O(max_branch_count_per_node) random region read from a single node file, which itself does in-memory key index search + in-memory region search
range worst case of full database traversal is amortized O(total_region_count) + in-memory key-index searches for each node
Top Related Projects
A library that provides an embeddable, persistent key-value store for fast storage.
FoundationDB - the open source, distributed, transactional key-value store
🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
RocksDB/LevelDB inspired key-value database in Go
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot