Top Related Projects
Apache Druid: a high performance real-time analytics database.
Apache Pinot - A realtime distributed OLAP datastore
An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
Distributed PostgreSQL as an extension
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Quick Overview
ClickHouse is an open-source column-oriented database management system (DBMS) designed for online analytical processing (OLAP) of large datasets. It offers high performance for complex queries on massive amounts of data, making it particularly suitable for data warehousing and real-time analytics applications.
Pros
- Exceptional query performance, especially for analytical workloads
- Highly scalable and can handle petabytes of data
- Supports a wide range of data types and storage engines
- Offers both SQL and HTTP interfaces for easy integration
Cons
- Steeper learning curve compared to traditional relational databases
- Limited support for transactions and updates
- Requires careful schema design for optimal performance
- Resource-intensive, especially in terms of memory usage
Code Examples
- Creating a table and inserting data:
CREATE TABLE events (
event_date Date,
user_id UInt32,
event_type String
) ENGINE = MergeTree()
ORDER BY (event_date, user_id);
INSERT INTO events VALUES
('2023-04-01', 1, 'login'),
('2023-04-01', 2, 'purchase'),
('2023-04-02', 1, 'logout');
- Performing a simple aggregation query:
SELECT
event_date,
event_type,
count() AS event_count
FROM events
GROUP BY event_date, event_type
ORDER BY event_date, event_count DESC;
- Using a window function for advanced analytics:
SELECT
event_date,
user_id,
event_type,
row_number() OVER (PARTITION BY user_id ORDER BY event_date) AS event_sequence
FROM events
ORDER BY user_id, event_date;
Getting Started
To get started with ClickHouse:
-
Install ClickHouse:
sudo apt-get install clickhouse-server clickhouse-client
-
Start the ClickHouse server:
sudo service clickhouse-server start
-
Connect to ClickHouse using the command-line client:
clickhouse-client
-
Create a database and table:
CREATE DATABASE mydb; USE mydb; CREATE TABLE mytable (id UInt32, name String) ENGINE = MergeTree() ORDER BY id;
-
Insert and query data:
INSERT INTO mytable VALUES (1, 'Alice'), (2, 'Bob'); SELECT * FROM mytable;
Competitor Comparisons
Apache Druid: a high performance real-time analytics database.
Pros of Druid
- Better support for real-time data ingestion and streaming
- More flexible query language (SQL-like DruidSQL)
- Built-in support for rollup and data summarization
Cons of Druid
- Higher system requirements and more complex setup
- Less efficient for batch processing of large datasets
- Steeper learning curve for configuration and optimization
Code Comparison
Druid query example:
SELECT COUNT(*) AS count,
SUM(price) AS total_price
FROM transactions
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '1' DAY
GROUP BY product_id
HAVING count > 100
ORDER BY total_price DESC
LIMIT 10
ClickHouse query example:
SELECT product_id,
COUNT(*) AS count,
SUM(price) AS total_price
FROM transactions
WHERE timestamp >= now() - INTERVAL 1 DAY
GROUP BY product_id
HAVING count > 100
ORDER BY total_price DESC
LIMIT 10
Both systems support SQL-like queries, but Druid's DruidSQL offers more flexibility in terms of time-based operations and built-in functions. ClickHouse's SQL syntax is more straightforward and closely resembles standard SQL. ClickHouse generally provides better performance for analytical queries on large datasets, while Druid excels in real-time data ingestion and querying scenarios.
Apache Pinot - A realtime distributed OLAP datastore
Pros of Pinot
- Better support for real-time streaming ingestion and analysis
- More flexible schema design with support for nested fields
- Stronger support for multi-tenancy and resource isolation
Cons of Pinot
- Steeper learning curve and more complex setup compared to ClickHouse
- Less mature ecosystem and fewer integrations with other tools
- Generally slower query performance for large-scale batch processing
Code Comparison
ClickHouse query example:
SELECT date, count() AS count
FROM events
WHERE type = 'click'
GROUP BY date
ORDER BY date DESC
LIMIT 10
Pinot query example:
SELECT dateTime, COUNT(*) AS count
FROM events
WHERE type = 'click'
GROUP BY dateTime
ORDER BY dateTime DESC
LIMIT 10
Both systems use SQL-like syntax for querying, but Pinot's query language (PQL) has some unique features and limitations compared to ClickHouse's more standard SQL dialect. ClickHouse generally offers more advanced SQL functionality, while Pinot focuses on optimizing for specific real-time analytics use cases.
An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
Pros of TimescaleDB
- Built on PostgreSQL, leveraging its robust ecosystem and SQL compatibility
- Optimized for time-series data with automatic partitioning and indexing
- Supports both relational and time-series data in a single database
Cons of TimescaleDB
- Generally slower for large-scale analytics compared to ClickHouse
- Limited support for distributed deployments and horizontal scaling
- Higher storage overhead due to its row-based storage model
Code Comparison
TimescaleDB (SQL-based):
CREATE TABLE sensors (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
SELECT time_bucket('1 hour', time) AS hour,
AVG(temperature) AS avg_temp
FROM sensors
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
ClickHouse (SQL-like):
CREATE TABLE sensors (
time DateTime,
sensor_id UInt32,
temperature Float64,
humidity Float64
) ENGINE = MergeTree()
ORDER BY (time, sensor_id);
SELECT toStartOfHour(time) AS hour,
avg(temperature) AS avg_temp
FROM sensors
WHERE time > now() - INTERVAL 1 DAY
GROUP BY hour
ORDER BY hour;
Both systems use SQL-like syntax, but ClickHouse has some unique functions and a different table creation process with explicit engine specification.
Distributed PostgreSQL as an extension
Pros of Citus
- Built on PostgreSQL, leveraging its robust ecosystem and features
- Supports both distributed and local tables, offering flexibility
- Seamless horizontal scaling for PostgreSQL workloads
Cons of Citus
- Limited to PostgreSQL-compatible queries and syntax
- May require more complex setup and configuration compared to ClickHouse
- Performance can be less optimal for certain analytical workloads
Code Comparison
ClickHouse query example:
SELECT date, count() AS events
FROM events
WHERE type = 'click'
GROUP BY date
ORDER BY date
Citus (PostgreSQL) query example:
SELECT date_trunc('day', event_time) AS date, COUNT(*) AS events
FROM events
WHERE type = 'click'
GROUP BY date
ORDER BY date
Both ClickHouse and Citus are designed for distributed data processing, but they take different approaches. ClickHouse is a column-oriented DBMS optimized for analytical queries, while Citus extends PostgreSQL for distributed workloads. ClickHouse often provides better performance for large-scale analytics, whereas Citus offers more flexibility and compatibility with existing PostgreSQL applications.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Pros of Trino
- Supports a wide variety of data sources and formats
- Designed for distributed query processing across large datasets
- Highly scalable and can handle petabyte-scale data warehouses
Cons of Trino
- Generally slower for OLAP workloads compared to ClickHouse
- More complex setup and configuration
- Higher resource requirements for optimal performance
Code Comparison
ClickHouse SQL example:
SELECT sum(number) FROM numbers(1000000000)
Trino SQL example:
SELECT sum(cast(n AS bigint))
FROM UNNEST(sequence(1, 1000000000)) AS t(n)
Both examples perform a sum operation on a large sequence of numbers, but ClickHouse's syntax is more concise and optimized for analytical queries. Trino's query uses a more standard SQL approach with the UNNEST function to generate the sequence.
ClickHouse is known for its exceptional performance in OLAP scenarios, while Trino excels in federated queries across diverse data sources. ClickHouse is often faster for large-scale aggregations, while Trino offers more flexibility in connecting to various data systems.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real-time.
How To Install (Linux, macOS, FreeBSD)
curl https://clickhouse.com/ | sh
Useful Links
- Official website has a quick high-level overview of ClickHouse on the main page.
- ClickHouse Cloud ClickHouse as a service, built by the creators and maintainers.
- Tutorial shows how to set up and query a small ClickHouse cluster.
- Documentation provides more in-depth information.
- YouTube channel has a lot of content about ClickHouse in video format.
- Slack and Telegram allow chatting with ClickHouse users in real-time.
- Blog contains various ClickHouse-related articles, as well as announcements and reports about events.
- Code Browser (github.dev) with syntax highlighting, powered by github.dev.
- Contacts can help to get your questions answered if there are any.
Monthly Release & Community Call
Every month we get together with the community (users, contributors, customers, those interested in learning more about ClickHouse) to discuss what is coming in the latest release. If you are interested in sharing what you've built on ClickHouse, let us know.
- v24.9 Community Call - September 26
Upcoming Events
Keep an eye out for upcoming meetups and events around the world. Somewhere else you want us to be? Please feel free to reach out to tyler <at>
clickhouse <dot>
com. You can also peruse ClickHouse Events for a list of all upcoming trainings, meetups, speaking engagements, etc.
The following upcoming meetups are featuring creator of ClickHouse & CTO, Alexey Milovidov:
- Raleigh Meetup (Deutsche Bank) - September 9
- New York Meetup (Rokt) - September 10
- Chicago Meetup (Jump Capital) - September 12
Other upcoming meetups
- Toronto Meetup (Shopify) - September 10
- Austin Meetup - September 17
- London Meetup - September 17
- Bangalore Meetup - September 18
- Tel Aviv Meetup - September 22
- Jakarta Meetup - October 1
- Singapore Meetup - October 3
- Madrid Meetup - October 22
- Barcelona Meetup - October 29
- Oslo Meetup - October 31
- Ghent Meetup - November 19
- Dubai Meetup - November 21
- Paris Meetup - November 26
Recently completed events
- ClickHouse Guangzhou User Group Meetup - August 25
- Seattle Meetup (Statsig) - August 27
- Melbourne Meetup - August 27
- Sydney Meetup - September 5
- Zurich Meetup - September 5
- San Francisco Meetup (Cloudflare) - September 5
Recent Recordings
- Recent Meetup Videos: Meetup Playlist Whenever possible recordings of the ClickHouse Community Meetups are edited and presented as individual talks. Current featuring "Modern SQL in 2023", "Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse", and "Full-Text Indices: Design and Experiments"
- Recording available: v24.8 LTS Release Call All the features of 24.8 LTS, one convenient video! Watch it now!
Interested in joining ClickHouse and making it your full-time job?
We are a globally diverse and distributed team, united behind a common goal of creating industry-leading, real-time analytics. Here, you will have an opportunity to solve some of the most cutting-edge technical challenges and have direct ownership of your work and vision. If you are a contributor by nature, a thinker and a doer - weâll definitely click!
Check out our current openings here: https://clickhouse.com/company/careers
Can't find what you are looking for, but want to let us know you are interested in joining ClickHouse? Email careers@clickhouse.com!
Top Related Projects
Apache Druid: a high performance real-time analytics database.
Apache Pinot - A realtime distributed OLAP datastore
An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
Distributed PostgreSQL as an extension
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot