Top Related Projects
Mirror of Apache Kafka
Apache Spark - A unified analytics engine for large-scale data processing
Scalable datastore for metrics, events, and real-time analytics
A time-series database for high-performance real-time analytics packaged as a Postgres extension
Apache Druid: a high performance real-time analytics database.
Apache Flink
Quick Overview
ClickHouse-java is a Java client library for interacting with the ClickHouse distributed column-oriented DBMS. It provides a simple and efficient way to execute SQL queries, manage data, and interact with ClickHouse clusters from Java applications.
Pros
- High Performance: ClickHouse-java is designed to leverage the high-performance capabilities of the ClickHouse database, allowing for efficient data processing and querying.
- Comprehensive Functionality: The library supports a wide range of ClickHouse features, including data manipulation, query execution, and cluster management.
- Asynchronous Support: ClickHouse-java provides asynchronous API support, enabling non-blocking and scalable data processing.
- Flexible Configuration: The library offers flexible configuration options, allowing users to customize connection settings, query parameters, and more.
Cons
- Limited Documentation: The project's documentation could be more comprehensive, making it challenging for new users to get started quickly.
- Dependency on ClickHouse: As a client library, ClickHouse-java is inherently dependent on the ClickHouse database, which may be a limitation for users not already familiar with or using ClickHouse.
- Lack of Widespread Adoption: Compared to other Java database clients, ClickHouse-java may have a smaller user base and community, which could impact the availability of resources and support.
- Potential Performance Overhead: While the library is designed for high performance, there may be some overhead associated with the abstraction layer between the Java application and the ClickHouse database.
Code Examples
Executing a Simple Query
try (ClickHouseClient client = ClickHouseClient.newInstance()) {
ClickHouseResponse response = client.execute("SELECT * FROM system.tables LIMIT 10");
List<ClickHouseColumn> columns = response.getColumns();
List<ClickHouseRow> rows = response.getRows();
for (ClickHouseRow row : rows) {
System.out.println(row.toString());
}
}
This code demonstrates how to execute a simple SQL query using the ClickHouse-java library and retrieve the results.
Inserting Data
try (ClickHouseClient client = ClickHouseClient.newInstance()) {
ClickHouseTable table = client.getTable("my_database", "my_table");
ClickHouseDataProvider provider = table.newDataProvider();
provider.addRow("John Doe", 30);
provider.addRow("Jane Smith", 25);
provider.addRow("Bob Johnson", 40);
table.insert(provider);
}
This code shows how to insert data into a ClickHouse table using the ClickHouse-java library.
Executing a Parameterized Query
try (ClickHouseClient client = ClickHouseClient.newInstance()) {
ClickHouseStatement statement = client.createStatement();
statement.setQuery("SELECT * FROM my_table WHERE name = ?");
statement.setString(1, "John Doe");
ClickHouseResponse response = statement.executeQuery();
List<ClickHouseRow> rows = response.getRows();
for (ClickHouseRow row : rows) {
System.out.println(row.toString());
}
}
This code demonstrates how to execute a parameterized SQL query using the ClickHouse-java library.
Getting Started
To get started with ClickHouse-java, follow these steps:
- Add the ClickHouse-java dependency to your project's build configuration. For example, in a Maven project, add the following to your
pom.xml
file:
<dependency>
<groupId>com.clickhouse</groupId>
<artifactId>clickhouse-jdbc</artifactId>
<version>0.3.2</version>
</dependency>
- Create a ClickHouseClient instance and use it to interact with the ClickHouse database:
try (ClickHouseClient client = ClickHouseClient.newInstance()) {
// Execute queries, insert data, and perform other operations
Competitor Comparisons
Mirror of Apache Kafka
Pros of Apache Kafka
- Scalable and distributed architecture, allowing for high-throughput and fault-tolerant message processing.
- Supports a wide range of programming languages and platforms, making it a versatile choice for various use cases.
- Provides a rich set of features, including partitioning, replication, and consumer groups, which enhance the reliability and performance of message processing.
Cons of Apache Kafka
- Complexity in setup and configuration, which can be challenging for smaller projects or teams with limited resources.
- Steep learning curve, especially for developers who are new to distributed systems and message queuing.
- Potential for higher operational overhead compared to simpler message queue solutions.
Code Comparison
Apache Kafka
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
ProducerRecord<String, String> record = new ProducerRecord<>("my-topic", "key", "value");
producer.send(record);
ClickHouse/clickhouse-java
ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123/default");
try (Connection connection = dataSource.getConnection()) {
Statement statement = connection.createStatement();
ResultSet resultSet = statement.executeQuery("SELECT * FROM table_name");
while (resultSet.next()) {
// Process the result set
}
}
Apache Spark - A unified analytics engine for large-scale data processing
Pros of Spark
- Spark provides a more comprehensive set of features and APIs for distributed data processing, including support for batch, streaming, and machine learning workloads.
- Spark has a larger and more active community, with more third-party libraries and integrations available.
- Spark's performance is generally better than ClickHouse for certain types of workloads, especially those involving complex transformations and iterative algorithms.
Cons of Spark
- Spark has a steeper learning curve and requires more configuration and setup compared to ClickHouse.
- Spark is generally more resource-intensive, requiring more memory and CPU to achieve high performance.
- Spark's focus on general-purpose data processing may make it less optimized for specific use cases, such as real-time analytics, where ClickHouse excels.
Code Comparison
Spark (Scala):
val df = spark.read.json("data.json")
val result = df.select("name", "age")
.where("age > 30")
.orderBy("age")
.limit(10)
result.show()
ClickHouse (Java):
ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123");
try (ClickHouseConnection connection = dataSource.getConnection()) {
ClickHouseStatement statement = connection.createStatement();
ClickHouseResultSet resultSet = statement.executeQuery("SELECT name, age FROM data WHERE age > 30 ORDER BY age LIMIT 10");
while (resultSet.next()) {
System.out.println(resultSet.getString("name") + ", " + resultSet.getInt("age"));
}
}
Scalable datastore for metrics, events, and real-time analytics
Pros of InfluxDB
- Time-Series Data Storage: InfluxDB is designed specifically for storing and querying time-series data, making it well-suited for use cases such as monitoring, IoT, and analytics.
- Query Language: InfluxDB has its own query language, InfluxQL, which is similar to SQL and provides a powerful way to interact with the database.
- Scalability: InfluxDB is designed to be highly scalable, with the ability to handle large amounts of data and high write and read throughput.
Cons of InfluxDB
- Limited Language Support: InfluxDB has official client libraries for a limited number of programming languages, such as Go, Python, and JavaScript, while ClickHouse-Java provides a more comprehensive set of language bindings.
- Complexity: InfluxDB can be more complex to set up and configure compared to ClickHouse, which has a simpler deployment process.
Code Comparison
InfluxDB (Python):
from influxdb import InfluxDBClient
client = InfluxDBClient(host='localhost', port=8086)
client.create_database('my_database')
data = [
{
"measurement": "cpu_load_short",
"tags": {
"host": "server01",
"region": "us-west"
},
"time": "2009-11-10T23:00:00Z",
"fields": {
"value": 0.64
}
}
]
client.write_points(data)
ClickHouse-Java:
ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123/default");
try (Connection connection = dataSource.getConnection()) {
try (Statement statement = connection.createStatement()) {
statement.execute("CREATE TABLE IF NOT EXISTS example (id Int32, name String) ENGINE = MergeTree ORDER BY id");
statement.execute("INSERT INTO example (id, name) VALUES (1, 'John'), (2, 'Jane')");
}
}
A time-series database for high-performance real-time analytics packaged as a Postgres extension
Pros of TimescaleDB
- Time-Series Optimizations: TimescaleDB is designed specifically for time-series data, with optimizations for storing and querying this type of data efficiently.
- SQL Compatibility: TimescaleDB is built on top of PostgreSQL, allowing users to leverage the full power of SQL and the PostgreSQL ecosystem.
- Scalability: TimescaleDB can scale to handle large amounts of time-series data, making it suitable for high-volume use cases.
Cons of TimescaleDB
- Limited Language Support: While TimescaleDB provides a Java client, the project's primary focus is on the PostgreSQL ecosystem, with less emphasis on other programming languages.
- Complexity: As a PostgreSQL extension, TimescaleDB adds an additional layer of complexity compared to a standalone time-series database like ClickHouse.
Code Comparison
ClickHouse/clickhouse-java:
ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123");
try (ClickHouseConnection connection = dataSource.getConnection()) {
ClickHouseStatement statement = connection.createStatement();
ClickHouseResultSet resultSet = statement.executeQuery("SELECT * FROM my_table");
// Process the result set
}
TimescaleDB (using the official Java client):
PGSimpleDataSource dataSource = new PGSimpleDataSource();
dataSource.setUrl("jdbc:postgresql://localhost:5432/my_database");
try (Connection connection = dataSource.getConnection()) {
Statement statement = connection.createStatement();
ResultSet resultSet = statement.executeQuery("SELECT * FROM my_hypertable");
// Process the result set
}
Apache Druid: a high performance real-time analytics database.
Pros of Druid
- Druid is a highly scalable and fault-tolerant real-time analytics data store, making it well-suited for handling large volumes of data.
- Druid provides a rich set of features, including support for ad-hoc queries, real-time ingestion, and high-performance aggregations.
- Druid's architecture is designed to be highly available and resilient, with features like automatic data replication and failover.
Cons of Druid
- Druid has a steeper learning curve compared to ClickHouse/clickhouse-java, as it requires a more complex setup and configuration process.
- Druid may have higher resource requirements, especially for large-scale deployments, which can make it more challenging to manage and operate.
Code Comparison
ClickHouse/clickhouse-java:
ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123/default");
try (ClickHouseConnection connection = dataSource.getConnection()) {
ClickHouseStatement statement = connection.createStatement();
ClickHouseResultSet resultSet = statement.executeQuery("SELECT * FROM table_name LIMIT 10");
while (resultSet.next()) {
// Process the result set
}
}
Druid:
DruidClient client = new DruidClient("http://druid-broker:8082");
QueryBuilder queryBuilder = client.newQuery();
QueryResult result = queryBuilder
.dataSource("data_source_name")
.intervals("2023-01-01/2023-12-31")
.granularity(Granularities.DAY)
.aggregators(new CountAggregatorFactory("count"))
.execute();
// Process the query result
Apache Flink
Pros of Flink
- Flink is a powerful and versatile stream processing framework that can handle both batch and streaming data.
- Flink provides a rich set of APIs and libraries for various data processing tasks, including SQL, machine learning, and graph processing.
- Flink has a strong focus on fault tolerance and high availability, making it suitable for mission-critical applications.
Cons of Flink
- Flink has a steeper learning curve compared to ClickHouse/clickhouse-java, which may be a barrier for some users.
- Flink's deployment and configuration can be more complex, especially in large-scale distributed environments.
- Flink's performance may not be as optimized for certain use cases as ClickHouse/clickhouse-java, which is designed specifically for analytical workloads.
Code Comparison
ClickHouse/clickhouse-java:
ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123/default");
try (ClickHouseConnection connection = dataSource.getConnection()) {
ClickHouseStatement statement = connection.createStatement();
ClickHouseResultSet resultSet = statement.executeQuery("SELECT * FROM table_name LIMIT 10");
while (resultSet.next()) {
// Process the result set
}
}
Flink:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> stream = env.readTextFile("input/path");
stream.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
for (String word : value.split(" ")) {
out.collect(new Tuple2<>(word, 1));
}
}
})
.keyBy(0)
.sum(1)
.print();
env.execute("Flink Streaming Java API Skeleton");
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Table of Contents
About the Project
This is a repo of the Java Client and JDBC Driver for ClickHouse Database (https://github.com/ClickHouse/Clickhouse) supported by the ClickHouse team. The Java Client is the core component that provides an API to interact with the database via HTTP Protocol.
The JDBC driver component implements the JDBC specification and communicates with ClickHouse using the Java Client API.
Historically, there are two versions of both components. The previous version of the Java client required a significant rewrite, so we decided to create a new one, client-v2
, not to disturb anyone's work and to give time for migration. The JDBC driver also required changes to be compatible with the new client and comply more with JDBC specs, and we created jdbc-v2
. This component will replace an old version (to keep the artifact name).
Client Features
Name | Client V2 | Client V1 | Comments |
---|---|---|---|
Http Connection | â | â | |
Http Compression (LZ4) | â | â | |
Server Response Compression - LZ4 | â | â | |
Client Request Compression - LZ4 | â | â | |
HTTPS | â | â | |
Client SSL Cert (mTLS) | â | â | |
Http Proxy with Authentication | â | â | |
Java Object SerDe | â | â | |
Connection Pool | â | â | Apache HTTP Client only |
Named Parameters | â | â | |
Retry on failure | â | â | |
Failover | â | â | |
Load-balancing | â | â | |
Server auto-discovery | â | â | |
Log Comment | â | â | |
Session Roles | â | â | |
SSL Client Authentication | â | â | |
Session timezone | â | â |
Important
Artifact Changes in 0.9.0 (June)
We are going to retire some JDBC artifacts (actually only classifiers) in 0.9.0. Here is the list:
Artifact | Classifier | Comments |
---|---|---|
clickhouse-jdbc | shaded | Use one with all classifier instead |
clickhouse-jdbc | http | |
clickhouse-jdbc | shaded-all | Use one with all classifier instead |
Artifact com.clickhouse:clickhouse-jdbc
remains untouched.
Artifact com.clickhouse:clickhouse-jdbc:0.9.0:all
will contain all required classes.
Upcoming deprecations:
Component | Version | Comment |
---|---|---|
ClickHouse Java v1 | TBC | We'll be deprecating Java v1 in 2025 |
Installation
Releases: Maven Central (web site https://mvnrepository.com/artifact/com.clickhouse)
Nightly Builds: https://s01.oss.sonatype.org/content/repositories/snapshots/com/clickhouse/
Client V2
Artifacts
Component | Maven Central Link | Javadoc Link | Documentation Link |
---|---|---|---|
ClickHouse Java Client V2 | docs |
Examples
JDBC Driver
Artifacts
Component | Maven Central Link | Javadoc Link | Documentation Link |
---|---|---|---|
ClickHouse JDBC Driver | docs |
Examples
See JDBC examples
R2DBC Driver
Artifacts
Component | Maven Central Link | Javadoc Link | Documentation Link |
---|---|---|---|
ClickHouse R2DBC Driver | docs |
Misc Artifacts
Component | Maven Central Link | Javadoc Link |
---|---|---|
ClickHouse Java Unified Client | ||
ClickHouse Java HTTP Client |
Compatibility
- All projects in this repo are tested with all active LTS versions of ClickHouse.
- Support policy
- We recommend to upgrade client continuously to not miss security fixes and new improvements
- If you have an issue with migration - create and issue and we will respond!
Contributing
Please see our contributing guide.
Top Related Projects
Mirror of Apache Kafka
Apache Spark - A unified analytics engine for large-scale data processing
Scalable datastore for metrics, events, and real-time analytics
A time-series database for high-performance real-time analytics packaged as a Postgres extension
Apache Druid: a high performance real-time analytics database.
Apache Flink
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot