starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

9,730

1,944

9,730

1,182

View on GitHub

Top Related Projects

doris

13,866

Apache Doris is an easy-to-use, high performance and unified analytics database.

ClickHouse

40,390

ClickHouse® is a real-time analytics database management system

druid

13,683

Apache Druid: a high performance real-time analytics database.

trino

11,217

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

pinot

5,817

Apache Pinot - A realtime distributed OLAP datastore

Quick Overview

StarRocks is an open-source, high-performance analytical database designed for real-time analytics on massive-scale data. It combines the benefits of MPP databases and columnar storage engines, offering fast query performance and high concurrency for both multi-dimensional analytics and real-time data analysis.

Pros

Excellent query performance, especially for complex analytical queries
High concurrency support, allowing multiple users to query simultaneously
Flexible data ingestion methods, including batch and real-time options
Compatibility with various data ecosystems and BI tools

Cons

Relatively new project, still maturing compared to some established alternatives
Limited documentation and community resources compared to more established databases
Steeper learning curve for users unfamiliar with MPP databases
May require more hardware resources for optimal performance compared to some alternatives

Getting Started

To get started with StarRocks:

Download and install StarRocks:

wget https://github.com/StarRocks/starrocks/releases/download/2.5.4/StarRocks-2.5.4.tar.gz
tar -xzf StarRocks-2.5.4.tar.gz
cd StarRocks-2.5.4

Start the StarRocks cluster:

./bin/start_fe.sh
./bin/start_be.sh

Connect to StarRocks using the MySQL client:

mysql -h127.0.0.1 -P9030 -uroot

Create a database and table:

CREATE DATABASE example_db;
USE example_db;
CREATE TABLE example_table (
    id INT,
    name VARCHAR(50),
    value DOUBLE
) ENGINE=OLAP
DISTRIBUTED BY HASH(id) BUCKETS 10;

Insert data and run queries:

INSERT INTO example_table VALUES (1, 'John', 10.5), (2, 'Jane', 20.3);
SELECT * FROM example_table WHERE value > 15;

For more detailed instructions and advanced features, refer to the official StarRocks documentation.

Competitor Comparisons

doris

13,866

Apache Doris is an easy-to-use, high performance and unified analytics database.

Pros of Doris

Mature Apache project with a larger community and longer history
Better documentation and more comprehensive user guides
Stronger support for SQL compliance and compatibility

Cons of Doris

Generally slower query performance, especially for complex analytical queries
Less flexible storage architecture, limiting scalability for very large datasets
Fewer advanced features for real-time analytics and streaming data ingestion

Code Comparison

Doris query example:

SELECT user_id, SUM(order_amount) AS total_amount
FROM orders
WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY user_id
HAVING total_amount > 1000
ORDER BY total_amount DESC
LIMIT 10;

StarRocks query example:

SELECT user_id, SUM(order_amount) AS total_amount
FROM orders
WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY user_id
HAVING total_amount > 1000
ORDER BY total_amount DESC
LIMIT 10;

The SQL syntax for both systems is very similar, as they both aim for MySQL compatibility. However, StarRocks may offer better performance for this type of analytical query due to its optimized storage and query engine.

ClickHouse

40,390

ClickHouse® is a real-time analytics database management system

Pros of ClickHouse

Mature project with a larger community and more extensive documentation
Wider range of data types and functions supported
Better performance for certain types of analytical queries

Cons of ClickHouse

Steeper learning curve and more complex configuration
Less optimized for real-time analytics and updates
Limited support for distributed transactions

Code Comparison

ClickHouse query example:

SELECT
    toYear(date) AS year,
    sum(amount) AS total_amount
FROM sales
GROUP BY year
ORDER BY year

StarRocks query example:

SELECT
    year(date) AS year,
    sum(amount) AS total_amount
FROM sales
GROUP BY year
ORDER BY year

Both systems use SQL-like syntax, but ClickHouse often requires more specific function names (e.g., toYear instead of year). StarRocks generally aims for a more familiar SQL experience, which can be easier for users transitioning from traditional databases.

While both systems excel in analytical processing, StarRocks focuses more on real-time analytics and easier integration with big data ecosystems. ClickHouse, on the other hand, offers more flexibility and power for complex analytical queries, albeit with a steeper learning curve.

druid

13,683

Apache Druid: a high performance real-time analytics database.

Pros of Druid

Mature project with a large community and extensive documentation
Excellent for real-time analytics and time-series data
Highly scalable and fault-tolerant architecture

Cons of Druid

Steeper learning curve and more complex setup compared to StarRocks
Less efficient for ad-hoc queries on large datasets
Limited support for complex joins and subqueries

Code Comparison

Druid query example:

SELECT COUNT(*) AS count
FROM my_datasource
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '1' DAY
GROUP BY time_floor(__time, 'PT1H')

StarRocks query example:

SELECT COUNT(*) AS count
FROM my_table
WHERE timestamp >= DATE_SUB(NOW(), INTERVAL 1 DAY)
GROUP BY DATE_TRUNC('HOUR', timestamp)

Both systems use SQL-like syntax, but Druid has some specific functions like time_floor, while StarRocks uses more standard SQL functions like DATE_TRUNC. StarRocks generally offers a more familiar SQL experience for users coming from traditional database backgrounds.

trino

11,217

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Pros of Trino

More mature project with a larger community and ecosystem
Supports a wider range of data sources and connectors
Better suited for complex, federated queries across multiple data sources

Cons of Trino

Generally slower query performance for analytical workloads
Higher resource consumption, especially for memory-intensive operations
More complex setup and configuration process

Code Comparison

Trino SQL query:

SELECT
  customer.name,
  SUM(orders.total_price) AS total_spent
FROM
  hive.sales.customer
  JOIN mysql.ecommerce.orders ON customer.id = orders.customer_id
GROUP BY
  customer.name

StarRocks SQL query:

SELECT
  customer.name,
  SUM(orders.total_price) AS total_spent
FROM
  customer
  JOIN orders ON customer.id = orders.customer_id
GROUP BY
  customer.name

The main difference in these examples is that Trino explicitly specifies the data source for each table (hive.sales and mysql.ecommerce), while StarRocks assumes all tables are within the same database. This highlights Trino's strength in federated queries across multiple data sources, while StarRocks focuses on optimized performance within a single analytical database.

pinot

5,817

Apache Pinot - A realtime distributed OLAP datastore

Pros of Pinot

More mature project with a larger community and ecosystem
Supports real-time ingestion and near real-time query processing
Offers flexible schema design and automatic schema inference

Cons of Pinot

Higher complexity in setup and configuration
Steeper learning curve for beginners
May require more resources for optimal performance

Code Comparison

Pinot query example:

SELECT COUNT(*) FROM myTable
WHERE timeColumn BETWEEN 1000 AND 2000
GROUP BY dimension1, dimension2
TOP 50

StarRocks query example:

SELECT COUNT(*) FROM myTable
WHERE timeColumn BETWEEN 1000 AND 2000
GROUP BY dimension1, dimension2
ORDER BY COUNT(*) DESC
LIMIT 50

Both systems use SQL-like syntax for querying, with minor differences in syntax for certain operations. Pinot uses TOP for limiting results, while StarRocks uses the more standard LIMIT clause.

StarRocks generally offers a simpler setup process and easier management, making it more suitable for users who prioritize ease of use. Pinot, on the other hand, provides more advanced features and flexibility, which can be beneficial for complex use cases and larger-scale deployments.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Download | Docs | Benchmarks | Demo

Commit Activities

StarRocks is the world's fastest open query engine for sub-second, ad-hoc analytics both on and off the data lakehouse. With average query performance 3x faster than other popular alternatives, StarRocks is a query engine that eliminates the need for denormalization and adapts to your use cases, without having to move your data or rewrite SQL. A Linux Foundation project.

Learn more ðð» What Is StarRocks: Features and Use Cases

Features

ð Native vectorized SQL engine: StarRocks adopts vectorization technology to make full use of the parallel computing power of CPU, achieving sub-second query returns in multi-dimensional analyses, which is 5 to 10 times faster than previous systems.
ð Standard SQL: StarRocks supports ANSI SQL syntax (fully supported TPC-H and TPC-DS). It is also compatible with the MySQL protocol. Various clients and BI software can be used to access StarRocks.
ð¡ Smart query optimization: StarRocks can optimize complex queries through CBO (Cost Based Optimizer). With a better execution plan, the data analysis efficiency will be greatly improved.
â¡ Real-time update: The updated model of StarRocks can perform upsert/delete operations according to the primary key, and achieve efficient query while concurrent updates.
ðª Intelligent materialized view: The materialized view of StarRocks can be automatically updated during the data import and automatically selected when the query is executed.
â¨ Querying data in data lakes directly: StarRocks allows direct access to data from Apache Hiveâ¢, Apache Icebergâ¢, Delta Lakeâ¢ and Apache Hudiâ¢ without importing.
ðï¸ Resource management: This feature allows StarRocks to limit resource consumption for queries and implement isolation and efficient use of resources among tenants in the same cluster.
ð Easy to maintain: Simple architecture makes StarRocks easy to deploy, maintain and scale out. StarRocks tunes its query plan agilely, balances the resources when the cluster is scaled in or out, and recovers the data replica under node failure automatically.

Architecture Overview

StarRocksâs streamlined architecture is mainly composed of two modules: Frontend (FE) and Backend (BE). The entire system eliminates single points of failure through seamless and horizontal scaling of FE and BE, as well as replication of metadata and data.

Starting from version 3.0, StarRocks supports a new shared-data architecture, which can provide better scalability and lower costs.

Resources

ð Read the docs

Section	Description
Quick Starts	How-tos and Tutorials.
Deploy	Learn how to run and configure StarRocks.
Docs	Full documentation.
Blogs	StarRocks deep dive and user stories.

â Get support

Slack community: join technical discussions, ask questions, and meet other users!
YouTube channel: subscribe to the latest video tutorials and webcasts.
GitHub issues: report an issue with StarRocks.

Contributing to StarRocks

We welcome all kinds of contributions from the community, individuals and partners. We owe our success to your active involvement.

See Contributing.md to get started.
Set up StarRocks development environment:

Understand our GitHub workflow for opening a pull request; use this PR Template when submitting a pull request.
Pick a good first issue and start contributing.

ð License: StarRocks is licensed under Apache License 2.0.

ð¥ Community Membership: Learn more about different contributor roles in StarRocks community.

ð¬ Developer Groupï¼ Please join our Google Groups to discuss StarRocks features, project directions, issues, pull requests, or share suggestions.

Used By

This project is used by the following companies. Learn more about their use cases:

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Getting Started

Competitor Comparisons

Pros of Doris

Cons of Doris

Code Comparison

Pros of ClickHouse

Cons of ClickHouse

Code Comparison

Pros of Druid

Cons of Druid

Code Comparison

Pros of Trino

Cons of Trino

Code Comparison

Pros of Pinot

Cons of Pinot

Code Comparison

Convert designs to code with AI

README

Features

Architecture Overview

Resources

ð Read the docs

â Get support

Contributing to StarRocks

Used By

Top Related Projects

Convert designs to code with AI

ð Read the docs

â Get support