starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
Top Related Projects
Apache Doris is an easy-to-use, high performance and unified analytics database.
ClickHouse® is a real-time analytics DBMS
Apache Druid: a high performance real-time analytics database.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Apache Pinot - A realtime distributed OLAP datastore
Quick Overview
StarRocks is an open-source, high-performance analytical database designed for real-time analytics on massive-scale data. It combines the benefits of MPP databases and columnar storage engines, offering fast query performance and high concurrency for both multi-dimensional analytics and real-time data analysis.
Pros
- Excellent query performance, especially for complex analytical queries
- High concurrency support, allowing multiple users to query simultaneously
- Flexible data ingestion methods, including batch and real-time options
- Compatibility with various data ecosystems and BI tools
Cons
- Relatively new project, still maturing compared to some established alternatives
- Limited documentation and community resources compared to more established databases
- Steeper learning curve for users unfamiliar with MPP databases
- May require more hardware resources for optimal performance compared to some alternatives
Getting Started
To get started with StarRocks:
- Download and install StarRocks:
wget https://github.com/StarRocks/starrocks/releases/download/2.5.4/StarRocks-2.5.4.tar.gz
tar -xzf StarRocks-2.5.4.tar.gz
cd StarRocks-2.5.4
- Start the StarRocks cluster:
./bin/start_fe.sh
./bin/start_be.sh
- Connect to StarRocks using the MySQL client:
mysql -h127.0.0.1 -P9030 -uroot
- Create a database and table:
CREATE DATABASE example_db;
USE example_db;
CREATE TABLE example_table (
id INT,
name VARCHAR(50),
value DOUBLE
) ENGINE=OLAP
DISTRIBUTED BY HASH(id) BUCKETS 10;
- Insert data and run queries:
INSERT INTO example_table VALUES (1, 'John', 10.5), (2, 'Jane', 20.3);
SELECT * FROM example_table WHERE value > 15;
For more detailed instructions and advanced features, refer to the official StarRocks documentation.
Competitor Comparisons
Apache Doris is an easy-to-use, high performance and unified analytics database.
Pros of Doris
- Mature Apache project with a larger community and longer history
- Better documentation and more comprehensive user guides
- Stronger support for SQL compliance and compatibility
Cons of Doris
- Generally slower query performance, especially for complex analytical queries
- Less flexible storage architecture, limiting scalability for very large datasets
- Fewer advanced features for real-time analytics and streaming data ingestion
Code Comparison
Doris query example:
SELECT user_id, SUM(order_amount) AS total_amount
FROM orders
WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY user_id
HAVING total_amount > 1000
ORDER BY total_amount DESC
LIMIT 10;
StarRocks query example:
SELECT user_id, SUM(order_amount) AS total_amount
FROM orders
WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY user_id
HAVING total_amount > 1000
ORDER BY total_amount DESC
LIMIT 10;
The SQL syntax for both systems is very similar, as they both aim for MySQL compatibility. However, StarRocks may offer better performance for this type of analytical query due to its optimized storage and query engine.
ClickHouse® is a real-time analytics DBMS
Pros of ClickHouse
- Mature project with a larger community and more extensive documentation
- Wider range of data types and functions supported
- Better performance for certain types of analytical queries
Cons of ClickHouse
- Steeper learning curve and more complex configuration
- Less optimized for real-time analytics and updates
- Limited support for distributed transactions
Code Comparison
ClickHouse query example:
SELECT
toYear(date) AS year,
sum(amount) AS total_amount
FROM sales
GROUP BY year
ORDER BY year
StarRocks query example:
SELECT
year(date) AS year,
sum(amount) AS total_amount
FROM sales
GROUP BY year
ORDER BY year
Both systems use SQL-like syntax, but ClickHouse often requires more specific function names (e.g., toYear
instead of year
). StarRocks generally aims for a more familiar SQL experience, which can be easier for users transitioning from traditional databases.
While both systems excel in analytical processing, StarRocks focuses more on real-time analytics and easier integration with big data ecosystems. ClickHouse, on the other hand, offers more flexibility and power for complex analytical queries, albeit with a steeper learning curve.
Apache Druid: a high performance real-time analytics database.
Pros of Druid
- Mature project with a large community and extensive documentation
- Excellent for real-time analytics and time-series data
- Highly scalable and fault-tolerant architecture
Cons of Druid
- Steeper learning curve and more complex setup compared to StarRocks
- Less efficient for ad-hoc queries on large datasets
- Limited support for complex joins and subqueries
Code Comparison
Druid query example:
SELECT COUNT(*) AS count
FROM my_datasource
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '1' DAY
GROUP BY time_floor(__time, 'PT1H')
StarRocks query example:
SELECT COUNT(*) AS count
FROM my_table
WHERE timestamp >= DATE_SUB(NOW(), INTERVAL 1 DAY)
GROUP BY DATE_TRUNC('HOUR', timestamp)
Both systems use SQL-like syntax, but Druid has some specific functions like time_floor
, while StarRocks uses more standard SQL functions like DATE_TRUNC
. StarRocks generally offers a more familiar SQL experience for users coming from traditional database backgrounds.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Pros of Trino
- More mature project with a larger community and ecosystem
- Supports a wider range of data sources and connectors
- Better suited for complex, federated queries across multiple data sources
Cons of Trino
- Generally slower query performance for analytical workloads
- Higher resource consumption, especially for memory-intensive operations
- More complex setup and configuration process
Code Comparison
Trino SQL query:
SELECT
customer.name,
SUM(orders.total_price) AS total_spent
FROM
hive.sales.customer
JOIN mysql.ecommerce.orders ON customer.id = orders.customer_id
GROUP BY
customer.name
StarRocks SQL query:
SELECT
customer.name,
SUM(orders.total_price) AS total_spent
FROM
customer
JOIN orders ON customer.id = orders.customer_id
GROUP BY
customer.name
The main difference in these examples is that Trino explicitly specifies the data source for each table (hive.sales and mysql.ecommerce), while StarRocks assumes all tables are within the same database. This highlights Trino's strength in federated queries across multiple data sources, while StarRocks focuses on optimized performance within a single analytical database.
Apache Pinot - A realtime distributed OLAP datastore
Pros of Pinot
- More mature project with a larger community and ecosystem
- Supports real-time ingestion and near real-time query processing
- Offers flexible schema design and automatic schema inference
Cons of Pinot
- Higher complexity in setup and configuration
- Steeper learning curve for beginners
- May require more resources for optimal performance
Code Comparison
Pinot query example:
SELECT COUNT(*) FROM myTable
WHERE timeColumn BETWEEN 1000 AND 2000
GROUP BY dimension1, dimension2
TOP 50
StarRocks query example:
SELECT COUNT(*) FROM myTable
WHERE timeColumn BETWEEN 1000 AND 2000
GROUP BY dimension1, dimension2
ORDER BY COUNT(*) DESC
LIMIT 50
Both systems use SQL-like syntax for querying, with minor differences in syntax for certain operations. Pinot uses TOP
for limiting results, while StarRocks uses the more standard LIMIT
clause.
StarRocks generally offers a simpler setup process and easier management, making it more suitable for users who prioritize ease of use. Pinot, on the other hand, provides more advanced features and flexibility, which can be beneficial for complex use cases and larger-scale deployments.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Download | Docs | Benchmarks | Demo
Learn more ðð» What Is StarRocks: Features and Use Cases
Features
- ð Native vectorized SQL engine: StarRocks adopts vectorization technology to make full use of the parallel computing power of CPU, achieving sub-second query returns in multi-dimensional analyses, which is 5 to 10 times faster than previous systems.
- ð Standard SQL: StarRocks supports ANSI SQL syntax (fully supported TPC-H and TPC-DS). It is also compatible with the MySQL protocol. Various clients and BI software can be used to access StarRocks.
- ð¡ Smart query optimization: StarRocks can optimize complex queries through CBO (Cost Based Optimizer). With a better execution plan, the data analysis efficiency will be greatly improved.
- â¡ Real-time update: The updated model of StarRocks can perform upsert/delete operations according to the primary key, and achieve efficient query while concurrent updates.
- ðª Intelligent materialized view: The materialized view of StarRocks can be automatically updated during the data import and automatically selected when the query is executed.
- ⨠Querying data in data lakes directly: StarRocks allows direct access to data from Apache Hiveâ¢, Apache Icebergâ¢, Delta Lake⢠and Apache Hudi⢠without importing.
- ðï¸ Resource management: This feature allows StarRocks to limit resource consumption for queries and implement isolation and efficient use of resources among tenants in the same cluster.
- ð Easy to maintain: Simple architecture makes StarRocks easy to deploy, maintain and scale out. StarRocks tunes its query plan agilely, balances the resources when the cluster is scaled in or out, and recovers the data replica under node failure automatically.
Architecture Overview
StarRocksâs streamlined architecture is mainly composed of two modules: Frontend (FE) and Backend (BE). The entire system eliminates single points of failure through seamless and horizontal scaling of FE and BE, as well as replication of metadata and data.
Starting from version 3.0, StarRocks supports a new shared-data architecture, which can provide better scalability and lower costs.
Resources
ð Read the docs
Section | Description |
---|---|
Quick Starts | How-tos and Tutorials. |
Deploy | Learn how to run and configure StarRocks. |
Docs | Full documentation. |
Blogs | StarRocks deep dive and user stories. |
â Get support
- Slack community: join technical discussions, ask questions, and meet other users!
- YouTube channel: subscribe to the latest video tutorials and webcasts.
- GitHub issues: report an issue with StarRocks.
Contributing to StarRocks
We welcome all kinds of contributions from the community, individuals and partners. We owe our success to your active involvement.
- See Contributing.md to get started.
- Set up StarRocks development environment:
- Understand our GitHub workflow for opening a pull request; use this PR Template when submitting a pull request.
- Pick a good first issue and start contributing.
ð License: StarRocks is licensed under Apache License 2.0.
ð¥ Community Membership: Learn more about different contributor roles in StarRocks community.
Used By
This project is used by the following companies. Learn more about their use cases:
Top Related Projects
Apache Doris is an easy-to-use, high performance and unified analytics database.
ClickHouse® is a real-time analytics DBMS
Apache Druid: a high performance real-time analytics database.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Apache Pinot - A realtime distributed OLAP datastore
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot