cassandra

Apache Cassandra®

9,303

3,728

9,303

539

View on GitHub

Top Related Projects

scylladb

14,641

NoSQL data store using the Seastar framework, compatible with Apache Cassandra and Amazon DynamoDB

cockroach

31,141

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.

elasticsearch

73,408

Free and Open Source, Distributed, RESTful Search Engine

tidb

38,817

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.

Quick Overview

Apache Cassandra is a highly scalable, distributed NoSQL database management system designed to handle large amounts of structured data across many commodity servers. It provides high availability with no single point of failure, and is capable of handling massive amounts of data across multiple data centers and cloud availability zones.

Pros

Highly scalable and can handle petabytes of data
Offers tunable consistency and high availability
Supports fast writes and good read performance
Flexible data model with support for structured, semi-structured, and unstructured data

Cons

Complex setup and maintenance compared to traditional databases
Limited support for ad-hoc queries and joins
Eventual consistency model can be challenging for some use cases
Requires careful data modeling to achieve optimal performance

Code Examples

Creating a keyspace and table:

CREATE KEYSPACE example_keyspace
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

USE example_keyspace;

CREATE TABLE users (
  user_id UUID PRIMARY KEY,
  username TEXT,
  email TEXT
);

Inserting data:

INSERT INTO users (user_id, username, email)
VALUES (uuid(), 'johndoe', 'john@example.com');

Querying data:

SELECT * FROM users WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;

Getting Started

Install Cassandra:

# For Ubuntu/Debian
echo "deb https://downloads.apache.org/cassandra/debian 40x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
sudo apt-get update
sudo apt-get install cassandra

Start Cassandra:
```
sudo service cassandra start
```
Connect to Cassandra:
```
cqlsh
```

Create a keyspace and table:

CREATE KEYSPACE mykeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
USE mykeyspace;
CREATE TABLE mytable (id UUID PRIMARY KEY, name TEXT);

Insert and query data:

INSERT INTO mytable (id, name) VALUES (uuid(), 'John Doe');
SELECT * FROM mytable;

Competitor Comparisons

scylladb

14,641

NoSQL data store using the Seastar framework, compatible with Apache Cassandra and Amazon DynamoDB

Pros of ScyllaDB

Higher performance and throughput due to its C++ implementation and shared-nothing architecture
Lower latency and more efficient resource utilization
Better support for large-scale deployments and multi-core processors

Cons of ScyllaDB

Smaller community and ecosystem compared to Cassandra
Less mature and potentially less stable in certain scenarios
Limited compatibility with some Cassandra features and tools

Code Comparison

ScyllaDB (C++):

class sstable {
    std::unique_ptr<sstable_writer> get_writer() {
        return std::make_unique<sstable_writer>(*this);
    }
};

Cassandra (Java):

public class SSTableWriter implements Closeable {
    public static SSTableWriter create(Descriptor descriptor, long keyCount) {
        return new SSTableWriter(descriptor, keyCount, CFMetaData.DEFAULT_COMPRESSION_PARAMETERS);
    }
}

Both projects aim to provide distributed NoSQL database solutions, but ScyllaDB focuses on performance optimization and hardware efficiency. Cassandra, being older and more established, has a larger community and broader adoption. The code comparison highlights the language difference (C++ vs. Java) and the slightly different approaches to creating writers for SSTables (Sorted String Tables) in each system.

cockroach

31,141

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.

Pros of CockroachDB

Automatic sharding and rebalancing for easier scalability
Strong consistency model with distributed ACID transactions
SQL-compatible interface, making migration easier for traditional RDBMS users

Cons of CockroachDB

Higher resource consumption, especially for smaller datasets
Steeper learning curve for operations and maintenance
Less mature ecosystem and community support

Code Comparison

CockroachDB SQL syntax:

CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name STRING,
  created_at TIMESTAMP DEFAULT current_timestamp()
);

Cassandra CQL syntax:

CREATE TABLE users (
  id uuid PRIMARY KEY,
  name text,
  created_at timestamp
);

Both databases use similar syntax for basic operations, but CockroachDB offers more SQL-like features and data types. Cassandra's syntax is more focused on its distributed nature and eventual consistency model.

CockroachDB is designed for global, distributed SQL databases with strong consistency, while Cassandra excels in high-throughput, eventually consistent workloads. CockroachDB may be easier for teams familiar with traditional SQL databases, whereas Cassandra might be more suitable for large-scale, write-heavy applications that can tolerate eventual consistency.

mongo

27,382

The MongoDB Database

Pros of MongoDB

Flexible document-based schema allows for easier data modeling and schema evolution
Rich query language with support for complex queries and aggregations
Better performance for read-heavy workloads and single-server deployments

Cons of MongoDB

Less robust support for ACID transactions compared to Cassandra's eventual consistency model
May struggle with write-heavy workloads in large-scale distributed environments
Limited support for complex joins and relationships between collections

Code Comparison

MongoDB query example:

db.users.find({
  age: { $gte: 18 },
  interests: "programming"
}).sort({ name: 1 })

Cassandra query example:

SELECT * FROM users
WHERE age >= 18
AND interests CONTAINS 'programming'
ORDER BY name ASC;

Both databases have different query languages and data models. MongoDB uses a JSON-like syntax for queries, while Cassandra uses a SQL-like language called CQL. MongoDB's flexible document model allows for more dynamic querying, while Cassandra's model is optimized for specific query patterns defined by the table structure.

The choice between MongoDB and Cassandra depends on specific use cases, scalability requirements, and data consistency needs. MongoDB excels in flexibility and ease of use, while Cassandra offers better write scalability and tunable consistency for distributed systems.

elasticsearch

73,408

Free and Open Source, Distributed, RESTful Search Engine

Pros of Elasticsearch

Powerful full-text search capabilities with advanced querying and analytics
Real-time indexing and search results
Highly scalable and distributed architecture

Cons of Elasticsearch

Higher memory consumption compared to Cassandra
Less efficient for write-heavy workloads
Steeper learning curve for complex configurations

Code Comparison

Elasticsearch query example:

GET /my_index/_search
{
  "query": {
    "match": {
      "title": "elasticsearch"
    }
  }
}

Cassandra query example:

SELECT * FROM my_table
WHERE title = 'cassandra'
ALLOW FILTERING;

Key Differences

Elasticsearch excels in full-text search and real-time analytics, while Cassandra is optimized for high-volume write operations and linear scalability
Elasticsearch uses a document-based data model, whereas Cassandra uses a wide-column store
Elasticsearch provides a RESTful API and JSON-based queries, while Cassandra uses CQL (Cassandra Query Language)

Use Cases

Elasticsearch: Log analysis, content search, and real-time analytics
Cassandra: Time-series data, IoT sensor data, and large-scale distributed systems

Both databases have their strengths and are suited for different scenarios. The choice between them depends on specific project requirements and data access patterns.

tidb

38,817

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.

Pros of TiDB

SQL support: TiDB offers SQL compatibility, making it easier for developers familiar with traditional relational databases
Horizontal scalability: TiDB provides better horizontal scaling capabilities, allowing for easier cluster expansion
HTAP (Hybrid Transactional/Analytical Processing) support: TiDB can handle both OLTP and OLAP workloads efficiently

Cons of TiDB

Maturity: TiDB is relatively newer compared to Cassandra, which may result in fewer community resources and battle-tested deployments
Learning curve: TiDB's architecture and features can be more complex to understand and manage for teams new to distributed databases

Code Comparison

Cassandra CQL query:

SELECT * FROM users WHERE user_id = 123;

TiDB SQL query:

SELECT * FROM users WHERE user_id = 123;

While the basic query syntax is similar, TiDB supports a wider range of SQL features and functions compared to Cassandra's CQL. TiDB's SQL compatibility allows for more complex queries and joins, which may not be possible or efficient in Cassandra.

Both databases have their strengths and are suited for different use cases. Cassandra excels in write-heavy workloads and high availability, while TiDB offers a more familiar SQL interface and better support for complex queries and transactions.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Apache Cassandra

Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key.

https://cwiki.apache.org/confluence/display/CASSANDRA2/Partitioners[Partitioning] means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster.

https://cwiki.apache.org/confluence/display/CASSANDRA2/DataModel[Row store] means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

For more information, see http://cassandra.apache.org/[the Apache Cassandra web site].

Issues should be reported on https://issues.apache.org/jira/projects/CASSANDRA/issues/[The Cassandra Jira].

Requirements

Java: see supported versions in build.xml (search for property "java.supported").
Python: for cqlsh, see bin/cqlsh (search for function "is_supported_version").

Getting started

This short guide will walk you through getting a basic one node cluster up and running, and demonstrate some simple reads and writes. For a more-complete guide, please see the Apache Cassandra website's https://cassandra.apache.org/doc/latest/cassandra/getting_started/index.html[Getting Started Guide].

First, we'll unpack our archive:

$ tar -zxvf apache-cassandra-$VERSION.tar.gz $ cd apache-cassandra-$VERSION

After that we start the server. Running the startup script with the -f argument will cause Cassandra to remain in the foreground and log to standard out; it can be stopped with ctrl-C.

$ bin/cassandra -f

Now let's try to read and write some data using the Cassandra Query Language:

$ bin/cqlsh

The command line client is interactive so if everything worked you should be sitting in front of a prompt:

Connected to Test Cluster at localhost:9160. [cqlsh 6.3.0 | Cassandra 5.0-SNAPSHOT | CQL spec 3.4.8 | Native protocol v5] Use HELP for help. cqlsh>

As the banner says, you can use 'help;' or '?' to see what CQL has to offer, and 'quit;' or 'exit;' when you've had enough fun. But lets try something slightly more interesting:

cqlsh> CREATE KEYSPACE schema1 WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; cqlsh> USE schema1; cqlsh:Schema1> CREATE TABLE users ( user_id varchar PRIMARY KEY, first varchar, last varchar, age int ); cqlsh:Schema1> INSERT INTO users (user_id, first, last, age) VALUES ('jsmith', 'John', 'Smith', 42); cqlsh:Schema1> SELECT * FROM users; user_id | age | first | last ---------+-----+-------+------- jsmith | 42 | john | smith cqlsh:Schema1>

If your session looks similar to what's above, congrats, your single node cluster is operational!

For more on what commands are supported by CQL, see http://cassandra.apache.org/doc/latest/cql/[the CQL reference]. A reasonable way to think of it is as, "SQL minus joins and subqueries, plus collections."

Wondering where to go from here?

Join us in #cassandra on the https://s.apache.org/slack-invite[ASF Slack] and ask questions.
Subscribe to the Users mailing list by sending a mail to user-subscribe@cassandra.apache.org.
Subscribe to the Developer mailing list by sending a mail to dev-subscribe@cassandra.apache.org.
Visit the http://cassandra.apache.org/community/[community section] of the Cassandra website for more information on getting involved.
Visit the http://cassandra.apache.org/doc/latest/development/index.html[development section] of the Cassandra website for more information on how to contribute.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot