clickhouse-java

ClickHouse Java Clients & JDBC Driver

1,548

588

1,548

240

View on GitHub

Top Related Projects

spark

41,366

Apache Spark - A unified analytics engine for large-scale data processing

influxdb

30,400

Scalable datastore for metrics, events, and real-time analytics

timescaledb

19,738

A time-series database for high-performance real-time analytics packaged as a Postgres extension

druid

13,782

Apache Druid: a high performance real-time analytics database.

Quick Overview

ClickHouse-java is a Java client library for interacting with the ClickHouse distributed column-oriented DBMS. It provides a simple and efficient way to execute SQL queries, manage data, and interact with ClickHouse clusters from Java applications.

Pros

High Performance: ClickHouse-java is designed to leverage the high-performance capabilities of the ClickHouse database, allowing for efficient data processing and querying.
Comprehensive Functionality: The library supports a wide range of ClickHouse features, including data manipulation, query execution, and cluster management.
Asynchronous Support: ClickHouse-java provides asynchronous API support, enabling non-blocking and scalable data processing.
Flexible Configuration: The library offers flexible configuration options, allowing users to customize connection settings, query parameters, and more.

Cons

Limited Documentation: The project's documentation could be more comprehensive, making it challenging for new users to get started quickly.
Dependency on ClickHouse: As a client library, ClickHouse-java is inherently dependent on the ClickHouse database, which may be a limitation for users not already familiar with or using ClickHouse.
Lack of Widespread Adoption: Compared to other Java database clients, ClickHouse-java may have a smaller user base and community, which could impact the availability of resources and support.
Potential Performance Overhead: While the library is designed for high performance, there may be some overhead associated with the abstraction layer between the Java application and the ClickHouse database.

Code Examples

Executing a Simple Query

try (ClickHouseClient client = ClickHouseClient.newInstance()) {
    ClickHouseResponse response = client.execute("SELECT * FROM system.tables LIMIT 10");
    List<ClickHouseColumn> columns = response.getColumns();
    List<ClickHouseRow> rows = response.getRows();

    for (ClickHouseRow row : rows) {
        System.out.println(row.toString());
    }
}

This code demonstrates how to execute a simple SQL query using the ClickHouse-java library and retrieve the results.

Inserting Data

try (ClickHouseClient client = ClickHouseClient.newInstance()) {
    ClickHouseTable table = client.getTable("my_database", "my_table");
    ClickHouseDataProvider provider = table.newDataProvider();

    provider.addRow("John Doe", 30);
    provider.addRow("Jane Smith", 25);
    provider.addRow("Bob Johnson", 40);

    table.insert(provider);
}

This code shows how to insert data into a ClickHouse table using the ClickHouse-java library.

Executing a Parameterized Query

try (ClickHouseClient client = ClickHouseClient.newInstance()) {
    ClickHouseStatement statement = client.createStatement();
    statement.setQuery("SELECT * FROM my_table WHERE name = ?");
    statement.setString(1, "John Doe");

    ClickHouseResponse response = statement.executeQuery();
    List<ClickHouseRow> rows = response.getRows();

    for (ClickHouseRow row : rows) {
        System.out.println(row.toString());
    }
}

This code demonstrates how to execute a parameterized SQL query using the ClickHouse-java library.

Getting Started

To get started with ClickHouse-java, follow these steps:

Add the ClickHouse-java dependency to your project's build configuration. For example, in a Maven project, add the following to your pom.xml file:

<dependency>
    <groupId>com.clickhouse</groupId>
    <artifactId>clickhouse-jdbc</artifactId>
    <version>0.3.2</version>
</dependency>

Create a ClickHouseClient instance and use it to interact with the ClickHouse database:

try (ClickHouseClient client = ClickHouseClient.newInstance()) {
    // Execute queries, insert data, and perform other operations

Competitor Comparisons

kafka

30,603

Mirror of Apache Kafka

Pros of Apache Kafka

Scalable and distributed architecture, allowing for high-throughput and fault-tolerant message processing.
Supports a wide range of programming languages and platforms, making it a versatile choice for various use cases.
Provides a rich set of features, including partitioning, replication, and consumer groups, which enhance the reliability and performance of message processing.

Cons of Apache Kafka

Complexity in setup and configuration, which can be challenging for smaller projects or teams with limited resources.
Steep learning curve, especially for developers who are new to distributed systems and message queuing.
Potential for higher operational overhead compared to simpler message queue solutions.

Code Comparison

Apache Kafka

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

Producer<String, String> producer = new KafkaProducer<>(props);
ProducerRecord<String, String> record = new ProducerRecord<>("my-topic", "key", "value");
producer.send(record);

ClickHouse/clickhouse-java

ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123/default");
try (Connection connection = dataSource.getConnection()) {
    Statement statement = connection.createStatement();
    ResultSet resultSet = statement.executeQuery("SELECT * FROM table_name");
    while (resultSet.next()) {
        // Process the result set
    }
}

spark

41,366

Apache Spark - A unified analytics engine for large-scale data processing

Pros of Spark

Spark provides a more comprehensive set of features and APIs for distributed data processing, including support for batch, streaming, and machine learning workloads.
Spark has a larger and more active community, with more third-party libraries and integrations available.
Spark's performance is generally better than ClickHouse for certain types of workloads, especially those involving complex transformations and iterative algorithms.

Cons of Spark

Spark has a steeper learning curve and requires more configuration and setup compared to ClickHouse.
Spark is generally more resource-intensive, requiring more memory and CPU to achieve high performance.
Spark's focus on general-purpose data processing may make it less optimized for specific use cases, such as real-time analytics, where ClickHouse excels.

Code Comparison

Spark (Scala):

val df = spark.read.json("data.json")
val result = df.select("name", "age")
                .where("age > 30")
                .orderBy("age")
                .limit(10)
result.show()

ClickHouse (Java):

ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123");
try (ClickHouseConnection connection = dataSource.getConnection()) {
    ClickHouseStatement statement = connection.createStatement();
    ClickHouseResultSet resultSet = statement.executeQuery("SELECT name, age FROM data WHERE age > 30 ORDER BY age LIMIT 10");
    while (resultSet.next()) {
        System.out.println(resultSet.getString("name") + ", " + resultSet.getInt("age"));
    }
}

influxdb

30,400

Scalable datastore for metrics, events, and real-time analytics

Pros of InfluxDB

Time-Series Data Storage: InfluxDB is designed specifically for storing and querying time-series data, making it well-suited for use cases such as monitoring, IoT, and analytics.
Query Language: InfluxDB has its own query language, InfluxQL, which is similar to SQL and provides a powerful way to interact with the database.
Scalability: InfluxDB is designed to be highly scalable, with the ability to handle large amounts of data and high write and read throughput.

Cons of InfluxDB

Limited Language Support: InfluxDB has official client libraries for a limited number of programming languages, such as Go, Python, and JavaScript, while ClickHouse-Java provides a more comprehensive set of language bindings.
Complexity: InfluxDB can be more complex to set up and configure compared to ClickHouse, which has a simpler deployment process.

Code Comparison

InfluxDB (Python):

from influxdb import InfluxDBClient

client = InfluxDBClient(host='localhost', port=8086)
client.create_database('my_database')

data = [
    {
        "measurement": "cpu_load_short",
        "tags": {
            "host": "server01",
            "region": "us-west"
        },
        "time": "2009-11-10T23:00:00Z",
        "fields": {
            "value": 0.64
        }
    }
]

client.write_points(data)

ClickHouse-Java:

ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123/default");
try (Connection connection = dataSource.getConnection()) {
    try (Statement statement = connection.createStatement()) {
        statement.execute("CREATE TABLE IF NOT EXISTS example (id Int32, name String) ENGINE = MergeTree ORDER BY id");
        statement.execute("INSERT INTO example (id, name) VALUES (1, 'John'), (2, 'Jane')");
    }
}

timescaledb

19,738

A time-series database for high-performance real-time analytics packaged as a Postgres extension

Pros of TimescaleDB

Time-Series Optimizations: TimescaleDB is designed specifically for time-series data, with optimizations for storing and querying this type of data efficiently.
SQL Compatibility: TimescaleDB is built on top of PostgreSQL, allowing users to leverage the full power of SQL and the PostgreSQL ecosystem.
Scalability: TimescaleDB can scale to handle large amounts of time-series data, making it suitable for high-volume use cases.

Cons of TimescaleDB

Limited Language Support: While TimescaleDB provides a Java client, the project's primary focus is on the PostgreSQL ecosystem, with less emphasis on other programming languages.
Complexity: As a PostgreSQL extension, TimescaleDB adds an additional layer of complexity compared to a standalone time-series database like ClickHouse.

Code Comparison

ClickHouse/clickhouse-java:

ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123");
try (ClickHouseConnection connection = dataSource.getConnection()) {
    ClickHouseStatement statement = connection.createStatement();
    ClickHouseResultSet resultSet = statement.executeQuery("SELECT * FROM my_table");
    // Process the result set
}

TimescaleDB (using the official Java client):

PGSimpleDataSource dataSource = new PGSimpleDataSource();
dataSource.setUrl("jdbc:postgresql://localhost:5432/my_database");
try (Connection connection = dataSource.getConnection()) {
    Statement statement = connection.createStatement();
    ResultSet resultSet = statement.executeQuery("SELECT * FROM my_hypertable");
    // Process the result set
}

druid

13,782

Apache Druid: a high performance real-time analytics database.

Pros of Druid

Druid is a highly scalable and fault-tolerant real-time analytics data store, making it well-suited for handling large volumes of data.
Druid provides a rich set of features, including support for ad-hoc queries, real-time ingestion, and high-performance aggregations.
Druid's architecture is designed to be highly available and resilient, with features like automatic data replication and failover.

Cons of Druid

Druid has a steeper learning curve compared to ClickHouse/clickhouse-java, as it requires a more complex setup and configuration process.
Druid may have higher resource requirements, especially for large-scale deployments, which can make it more challenging to manage and operate.

Code Comparison

ClickHouse/clickhouse-java:

ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123/default");
try (ClickHouseConnection connection = dataSource.getConnection()) {
    ClickHouseStatement statement = connection.createStatement();
    ClickHouseResultSet resultSet = statement.executeQuery("SELECT * FROM table_name LIMIT 10");
    while (resultSet.next()) {
        // Process the result set
    }
}

Druid:

DruidClient client = new DruidClient("http://druid-broker:8082");
QueryBuilder queryBuilder = client.newQuery();
QueryResult result = queryBuilder
    .dataSource("data_source_name")
    .intervals("2023-01-01/2023-12-31")
    .granularity(Granularities.DAY)
    .aggregators(new CountAggregatorFactory("count"))
    .execute();
// Process the query result

flink

25,110

Apache Flink

Pros of Flink

Flink is a powerful and versatile stream processing framework that can handle both batch and streaming data.
Flink provides a rich set of APIs and libraries for various data processing tasks, including SQL, machine learning, and graph processing.
Flink has a strong focus on fault tolerance and high availability, making it suitable for mission-critical applications.

Cons of Flink

Flink has a steeper learning curve compared to ClickHouse/clickhouse-java, which may be a barrier for some users.
Flink's deployment and configuration can be more complex, especially in large-scale distributed environments.
Flink's performance may not be as optimized for certain use cases as ClickHouse/clickhouse-java, which is designed specifically for analytical workloads.

Code Comparison

ClickHouse/clickhouse-java:

ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123/default");
try (ClickHouseConnection connection = dataSource.getConnection()) {
    ClickHouseStatement statement = connection.createStatement();
    ClickHouseResultSet resultSet = statement.executeQuery("SELECT * FROM table_name LIMIT 10");
    while (resultSet.next()) {
        // Process the result set
    }
}

Flink:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> stream = env.readTextFile("input/path");
stream.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
    public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
        for (String word : value.split(" ")) {
            out.collect(new Tuple2<>(word, 1));
        }
    }
})
.keyBy(0)
.sum(1)
.print();
env.execute("Flink Streaming Java API Skeleton");

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ClickHouse Java Client & JDBC Driver

Table of Contents

About The project
Client Features
Important
Installation
Client V2
- Artifacts
- Examples
Client V1
- Artifacts
- Examples
Contributing

About the Project

This is a repo of the Java Client and JDBC Driver for ClickHouse Database (https://github.com/ClickHouse/Clickhouse) supported by the ClickHouse team. The Java Client is the core component that provides an API to interact with the database via HTTP Protocol.
The JDBC driver component implements the JDBC specification and communicates with ClickHouse using the Java Client API. Historically, there are two versions of both components. The previous version of the Java client required a significant rewrite, so we decided to create a new one, client-v2, not to disturb anyone's work and to give time for migration. The JDBC driver also required changes to be compatible with the new client and comply more with JDBC specs, and we created jdbc-v2. This component will replace an old version (to keep the artifact name).

Client Features

Name	Client V2	Client V1	Comments
Http Connection	â	â
Http Compression (LZ4)	â	â
Server Response Compression - LZ4	â	â
Client Request Compression - LZ4	â	â
HTTPS	â	â
Client SSL Cert (mTLS)	â	â
Http Proxy with Authentication	â	â
Java Object SerDe	â	â
Connection Pool	â	â	Apache HTTP Client only
Named Parameters	â	â
Retry on failure	â	â
Failover	â	â
Load-balancing	â	â
Server auto-discovery	â	â
Log Comment	â	â
Session Roles	â	â
SSL Client Authentication	â	â
Session timezone	â	â

Important

Artifact Changes in 0.9.0 (June)

We are going to retire some JDBC artifacts (actually only classifiers) in 0.9.0. Here is the list:

Artifact	Classifier	Comments
clickhouse-jdbc	shaded	Use one with `all` classifier instead
clickhouse-jdbc	http
clickhouse-jdbc	shaded-all	Use one with `all` classifier instead

Artifact com.clickhouse:clickhouse-jdbc remains untouched. Artifact com.clickhouse:clickhouse-jdbc:0.9.0:all will contain all required classes.

Upcoming deprecations:

Component	Version	Comment
ClickHouse Java v1	TBC	We'll be deprecating Java v1 in 2025

Installation

Releases: Maven Central (web site https://mvnrepository.com/artifact/com.clickhouse)

Nightly Builds: https://s01.oss.sonatype.org/content/repositories/snapshots/com/clickhouse/

Client V2

Artifacts

Component	Maven Central Link	Javadoc Link	Documentation Link
ClickHouse Java Client V2			docs

Examples

Begin-with Usage Examples

Spring Demo Service

JDBC Driver

Artifacts

Component	Maven Central Link	Javadoc Link	Documentation Link
ClickHouse JDBC Driver			docs

Examples

See JDBC examples

R2DBC Driver

Artifacts

Component	Maven Central Link	Javadoc Link	Documentation Link
ClickHouse R2DBC Driver			docs

Misc Artifacts

Component	Maven Central Link	Javadoc Link
ClickHouse Java Unified Client
ClickHouse Java HTTP Client

Compatibility

All projects in this repo are tested with all active LTS versions of ClickHouse.
Support policy
We recommend to upgrade client continuously to not miss security fixes and new improvements
- If you have an issue with migration - create and issue and we will respond!

Contributing

Please see our contributing guide.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot