scylladb

NoSQL data store using the Seastar framework, compatible with Apache Cassandra and Amazon DynamoDB

14,262

1,350

14,262

3,540

View on GitHub

Top Related Projects

cockroach

30,793

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.

elasticsearch

72,500

Free and Open Source, Distributed, RESTful Search Engine

tidb

38,344

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.

Quick Overview

ScyllaDB is an open-source distributed NoSQL database that is designed to be a drop-in replacement for Apache Cassandra. It is written in C++ and aims to provide better performance and lower latency than Cassandra while maintaining compatibility with its ecosystem.

Pros

High performance and low latency due to its C++ implementation and optimized architecture
Seamless compatibility with Apache Cassandra, allowing easy migration and use of existing tools
Automatic sharding and replication for improved scalability and fault tolerance
Support for both on-premises and cloud deployments

Cons

Relatively smaller community compared to more established databases like Cassandra or MongoDB
Limited support for advanced features found in some other NoSQL databases
Steeper learning curve for developers not familiar with Cassandra-like systems
Fewer third-party integrations and tools compared to more popular databases

Code Examples

Here are a few examples of using ScyllaDB with the Python driver:

Connecting to ScyllaDB:

from cassandra.cluster import Cluster

cluster = Cluster(['127.0.0.1'])
session = cluster.connect()

Creating a keyspace and table:

session.execute("""
    CREATE KEYSPACE IF NOT EXISTS example_keyspace
    WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}
""")

session.execute("""
    CREATE TABLE IF NOT EXISTS example_keyspace.users (
        id UUID PRIMARY KEY,
        name TEXT,
        age INT
    )
""")

Inserting and querying data:

from cassandra.util import uuid

user_id = uuid.uuid4()
session.execute(
    "INSERT INTO example_keyspace.users (id, name, age) VALUES (%s, %s, %s)",
    (user_id, "John Doe", 30)
)

rows = session.execute("SELECT * FROM example_keyspace.users WHERE id = %s", [user_id])
for row in rows:
    print(f"User: {row.name}, Age: {row.age}")

Getting Started

To get started with ScyllaDB:

Install ScyllaDB using Docker:

docker pull scylladb/scylla
docker run --name scylla-node -d scylladb/scylla

Install the Python driver:

pip install cassandra-driver

Use the code examples above to connect, create a keyspace and table, and perform basic operations.
For more advanced usage, refer to the ScyllaDB documentation and the Python driver documentation.

Competitor Comparisons

cassandra

9,160

Apache Cassandra®

Pros of Cassandra

Mature and battle-tested with a large community and extensive documentation
Highly scalable and fault-tolerant, designed for large-scale distributed systems
Rich ecosystem of tools and integrations

Cons of Cassandra

Written in Java, which can lead to higher memory usage and longer garbage collection pauses
Generally slower performance compared to ScyllaDB, especially for read-heavy workloads
More complex configuration and tuning required for optimal performance

Code Comparison

Cassandra (CQL):

CREATE TABLE users (
  user_id uuid PRIMARY KEY,
  username text,
  email text
);

ScyllaDB (CQL):

CREATE TABLE users (
  user_id uuid PRIMARY KEY,
  username text,
  email text
) WITH compression = { 'sstable_compression' : 'LZ4Compressor' };

Both ScyllaDB and Cassandra use CQL (Cassandra Query Language) for data manipulation, making them syntactically similar. However, ScyllaDB offers some additional options for performance tuning, such as specifying compression algorithms at the table level.

ScyllaDB aims to be a drop-in replacement for Cassandra, focusing on improved performance and reduced operational complexity. While Cassandra has a longer history and wider adoption, ScyllaDB leverages its C++ implementation and shard-per-core architecture to achieve better resource utilization and lower latencies, especially for larger datasets and read-intensive workloads.

cockroach

30,793

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.

Pros of CockroachDB

Stronger consistency model with serializable isolation
Built-in multi-region support for global deployments
More mature SQL support with advanced features

Cons of CockroachDB

Higher resource consumption and overhead
Steeper learning curve for operations and tuning
Less predictable performance under high concurrency

Code Comparison

ScyllaDB (CQL):

CREATE TABLE users (
  id UUID PRIMARY KEY,
  name TEXT,
  email TEXT
);

CockroachDB (SQL):

CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name STRING,
  email STRING
);

Both databases use similar syntax for basic operations, but CockroachDB follows standard SQL more closely. ScyllaDB uses CQL (Cassandra Query Language), which is similar to SQL but with some differences in data types and features.

CockroachDB offers more advanced SQL features like foreign keys, indexes, and joins out of the box, while ScyllaDB focuses on high-performance, low-latency operations for simpler data models.

ScyllaDB's approach is generally more suitable for write-heavy workloads and time-series data, while CockroachDB excels in scenarios requiring strong consistency and complex queries across distributed data.

mongo

27,075

The MongoDB Database

Pros of MongoDB

More mature and widely adopted ecosystem with extensive documentation and community support
Flexible schema design allows for easier adaptation to changing data structures
Rich query language and aggregation framework for complex data operations

Cons of MongoDB

Generally slower performance compared to ScyllaDB, especially for write-heavy workloads
Less efficient use of system resources, potentially requiring more hardware
Scaling can be more complex and costly, particularly for large datasets

Code Comparison

MongoDB query example:

db.users.find({
  age: { $gte: 18 },
  status: "active"
}).sort({ name: 1 })

ScyllaDB query example (using CQL):

SELECT * FROM users
WHERE age >= 18 AND status = 'active'
ORDER BY name ASC;

Both databases offer different query languages, with MongoDB using a JSON-like syntax and ScyllaDB using CQL, which is similar to SQL. MongoDB's query language is often considered more flexible, while ScyllaDB's CQL is more familiar to those with SQL experience.

ScyllaDB is designed for high performance and scalability, particularly suited for large-scale, write-intensive applications. MongoDB, on the other hand, offers more flexibility in data modeling and querying, making it a popular choice for a wide range of applications, especially those with evolving schemas.

elasticsearch

72,500

Free and Open Source, Distributed, RESTful Search Engine

Pros of Elasticsearch

More mature and widely adopted, with a larger ecosystem and community support
Powerful full-text search capabilities and advanced querying options
Extensive documentation and learning resources available

Cons of Elasticsearch

Higher resource consumption and slower performance for large-scale deployments
More complex setup and configuration process
Licensing changes have caused concerns in the open-source community

Code Comparison

Elasticsearch query example:

GET /my_index/_search
{
  "query": {
    "match": {
      "title": "elasticsearch"
    }
  }
}

ScyllaDB query example:

SELECT * FROM my_table
WHERE title LIKE '%scylladb%';

ScyllaDB focuses on high-performance, low-latency operations for large datasets, while Elasticsearch excels in full-text search and complex querying. ScyllaDB uses a CQL-like syntax, similar to SQL, making it more familiar for developers with relational database experience. Elasticsearch uses a JSON-based query DSL, which is powerful but may require a steeper learning curve.

Both databases have their strengths and are suited for different use cases. Elasticsearch is ideal for search-heavy applications, while ScyllaDB is better for high-throughput, low-latency workloads with large amounts of data.

tidb

38,344

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.

Pros of TiDB

SQL compatibility: TiDB offers MySQL compatibility, making it easier for users familiar with SQL databases
Horizontal scalability: Designed for distributed scaling across multiple nodes
HTAP capabilities: Supports both OLTP and OLAP workloads in a single system

Cons of TiDB

Higher resource consumption: Generally requires more resources compared to ScyllaDB
Complexity: More complex architecture and setup process than ScyllaDB
Learning curve: Steeper learning curve for optimization and management

Code Comparison

TiDB SQL query example:

SELECT * FROM users
WHERE age > 25 AND city = 'New York'
ORDER BY name
LIMIT 10;

ScyllaDB CQL query example:

SELECT * FROM users
WHERE age > 25 AND city = 'New York'
ORDER BY name
LIMIT 10;

While the syntax looks similar, TiDB supports a wider range of SQL features and functions compared to ScyllaDB's CQL. ScyllaDB focuses on high-performance, low-latency operations for specific use cases, while TiDB aims to provide a more comprehensive SQL-compatible distributed database solution.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Scylla

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++23 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Developer documentation for more information on building Scylla.
Build documentation on how to build Scylla binaries, tests, and packages.
Docker image build documentation for information on how to build Docker images.

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its API - CQL. There is also support for the API of Amazon DynamoDBâ¢, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDBâ¢ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

The community forum and Slack channel are for users to discuss configuration, management, and operations of ScyllaDB.
The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot