druid

阿里云计算平台DataWorks(https://help.aliyun.com/document_detail/137663.html) 团队出品，为监控而生的数据库连接池

28,156

8,598

28,156

2,266

View on GitHub

Top Related Projects

presto

16,305

The official home of the Presto distributed SQL query engine for big data

pinot

5,817

Apache Pinot - A realtime distributed OLAP datastore

Quick Overview

Alibaba Druid is a high-performance, feature-rich JDBC connection pool and monitoring solution. It provides powerful monitoring and extensions for both open-source and commercial databases, making it an essential tool for database connection management and performance optimization in Java applications.

Pros

High performance and efficient connection pooling
Comprehensive monitoring capabilities, including SQL execution statistics and performance analysis
Extensive support for various databases, including MySQL, Oracle, PostgreSQL, and more
Rich set of features, including SQL parsing, SQL firewall, and data source encryption

Cons

Steeper learning curve compared to simpler connection pool libraries
Configuration can be complex for advanced features
Some users report occasional stability issues in certain environments
Documentation is primarily in Chinese, which may be challenging for non-Chinese speakers

Code Examples

Creating a Druid DataSource:

DruidDataSource dataSource = new DruidDataSource();
dataSource.setUrl("jdbc:mysql://localhost:3306/test");
dataSource.setUsername("root");
dataSource.setPassword("password");
dataSource.setInitialSize(5);
dataSource.setMaxActive(10);

Executing a query using Druid:

try (Connection conn = dataSource.getConnection();
     PreparedStatement ps = conn.prepareStatement("SELECT * FROM users WHERE id = ?")) {
    ps.setInt(1, 1);
    try (ResultSet rs = ps.executeQuery()) {
        while (rs.next()) {
            System.out.println(rs.getString("name"));
        }
    }
}

Enabling SQL statistics:

dataSource.setFilters("stat");
dataSource.setConnectionProperties("druid.stat.mergeSql=true;druid.stat.slowSqlMillis=5000");

Getting Started

To use Alibaba Druid in your Java project, follow these steps:

Add the Druid dependency to your project's pom.xml:

<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>druid</artifactId>
    <version>1.2.9</version>
</dependency>

Create a Druid DataSource in your application:

DruidDataSource dataSource = new DruidDataSource();
dataSource.setUrl("jdbc:mysql://localhost:3306/your_database");
dataSource.setUsername("your_username");
dataSource.setPassword("your_password");
dataSource.setInitialSize(5);
dataSource.setMaxActive(10);
dataSource.setMinIdle(5);
dataSource.setMaxWait(60000);
dataSource.setValidationQuery("SELECT 1");
dataSource.setTestOnBorrow(false);
dataSource.setTestOnReturn(false);
dataSource.setTestWhileIdle(true);
dataSource.setPoolPreparedStatements(true);
dataSource.setMaxPoolPreparedStatementPerConnectionSize(20);

Use the DataSource to obtain connections and execute queries as shown in the code examples above.

Competitor Comparisons

calcite

4,874

Apache Calcite

Pros of Calcite

More comprehensive SQL support, including advanced features like window functions and complex joins
Extensible architecture allowing integration with various data sources and processing engines
Active open-source community with regular updates and contributions

Cons of Calcite

Steeper learning curve due to its more complex architecture
May require more configuration and setup compared to Druid's out-of-the-box functionality
Performance can be slower for certain types of queries, especially those involving large datasets

Code Comparison

Calcite query optimization example:

RelNode logicalPlan = sqlToRelConverter.convertQuery(sqlNode, false, true).rel;
HepProgram program = HepProgram.builder().addRuleInstance(FilterJoinRule.FILTER_ON_JOIN).build();
HepPlanner planner = new HepPlanner(program);
planner.setRoot(logicalPlan);
RelNode optimizedPlan = planner.findBestExp();

Druid query example:

GroupByQuery query = GroupByQuery.builder()
    .setDataSource("sample_datasource")
    .setInterval("2019-01-01/2020-01-01")
    .setGranularity(Granularities.DAY)
    .setDimensions(Collections.singletonList(DefaultDimensionSpec.of("dimension1")))
    .setAggregators(Collections.singletonList(CountAggregatorFactory.of("count")))
    .build();

Both Calcite and Druid serve different purposes in data processing and querying. Calcite focuses on SQL parsing and optimization across various data sources, while Druid specializes in real-time analytics on large datasets. The choice between them depends on specific use cases and requirements.

flink

24,808

Apache Flink

Pros of Flink

Designed for distributed stream and batch processing, offering real-time data processing capabilities
Supports both stream and batch processing with a unified API
Provides advanced features like exactly-once processing semantics and event time processing

Cons of Flink

Steeper learning curve due to its complex architecture and concepts
Requires more system resources and configuration compared to Druid
Less optimized for OLAP queries on large datasets

Code Comparison

Flink (Stream Processing):

DataStream<String> stream = env.addSource(new FlinkKafkaConsumer<>("topic", new SimpleStringSchema(), properties));
stream.flatMap(new Splitter())
      .keyBy("word")
      .window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
      .sum("count")
      .print();

Druid (OLAP Query):

SELECT
  time_column,
  SUM(metric1) AS total_metric1,
  COUNT(DISTINCT user_id) AS unique_users
FROM datasource
WHERE time_column BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY time_column
ORDER BY time_column ASC

While Flink excels in stream processing and complex event processing scenarios, Druid is more focused on real-time analytics and OLAP queries. Flink offers greater flexibility for various data processing tasks, whereas Druid provides optimized performance for analytical queries on large datasets.

hive

5,693

Apache Hive

Pros of Hive

Mature ecosystem with extensive community support and integration with Hadoop
Powerful SQL-like query language (HiveQL) for data warehousing
Supports a wide range of data formats and storage systems

Cons of Hive

Slower query performance compared to Druid, especially for real-time analytics
Less efficient for handling high-concurrency workloads
Limited support for real-time data ingestion and streaming

Code Comparison

Hive query example:

SELECT customer_id, SUM(order_total)
FROM orders
WHERE order_date >= '2023-01-01'
GROUP BY customer_id;

Druid query example:

{
  "queryType": "groupBy",
  "dataSource": "orders",
  "intervals": ["2023-01-01/2023-12-31"],
  "granularity": "all",
  "dimensions": ["customer_id"],
  "aggregations": [
    { "type": "longSum", "name": "total_orders", "fieldName": "order_total" }
  ]
}

While Hive uses a SQL-like syntax, Druid employs a JSON-based query language. Hive's syntax is more familiar to SQL users, but Druid's approach allows for more flexible and complex queries, especially for real-time analytics scenarios.

kylin

3,718

Apache Kylin

Pros of Kylin

Designed specifically for OLAP and multidimensional analysis
Supports pre-calculation of cubes for faster query performance
Integrates well with Hadoop ecosystem components

Cons of Kylin

More complex setup and maintenance compared to Druid
Limited support for real-time data ingestion
Steeper learning curve for users unfamiliar with OLAP concepts

Code Comparison

Kylin query example:

SELECT year, SUM(price) AS total_price
FROM sales_fact
GROUP BY year

Druid query example:

SELECT __time AS year, SUM(price) AS total_price
FROM sales
GROUP BY 1

Both Kylin and Druid use SQL-like syntax for querying data, but Kylin's queries are more oriented towards multidimensional analysis, while Druid's queries are more focused on time-series data.

Kylin is better suited for complex OLAP scenarios with pre-calculated cubes, while Druid excels in real-time analytics and time-series data processing. Kylin offers stronger integration with the Hadoop ecosystem, whereas Druid provides more flexibility for real-time data ingestion and ad-hoc queries.

presto

16,305

The official home of the Presto distributed SQL query engine for big data

Pros of Presto

Designed for interactive analytics queries, offering faster query execution for large-scale data processing
Supports a wide range of data sources, including Hadoop, Cassandra, and relational databases
Highly scalable and can handle petabytes of data across distributed systems

Cons of Presto

Requires more memory resources compared to Druid
Less efficient for real-time data ingestion and streaming analytics
May have higher latency for certain types of queries, especially on smaller datasets

Code Comparison

Presto SQL query:

SELECT region, SUM(sales) AS total_sales
FROM sales_data
GROUP BY region
HAVING SUM(sales) > 1000000
ORDER BY total_sales DESC
LIMIT 10;

Druid query:

{
  "queryType": "groupBy",
  "dataSource": "sales_data",
  "dimensions": ["region"],
  "aggregations": [{"type": "longSum", "name": "total_sales", "fieldName": "sales"}],
  "having": {"type": "greaterThan", "aggregation": "total_sales", "value": 1000000},
  "granularity": "all",
  "limitSpec": {"type": "default", "limit": 10, "columns": [{"dimension": "total_sales", "direction": "descending"}]}
}

While both systems can perform similar analytics, Presto uses standard SQL syntax, making it more familiar to SQL users. Druid uses a JSON-based query language, which may require additional learning but offers fine-grained control over query execution.

pinot

5,817

Apache Pinot - A realtime distributed OLAP datastore

Pros of Pinot

Designed for real-time analytics with low latency querying on large datasets
Supports multi-tenancy and horizontal scalability
Offers a wide range of indexing techniques for optimized query performance

Cons of Pinot

Steeper learning curve and more complex setup compared to Druid
Less mature ecosystem and smaller community than Druid
Limited support for complex joins and subqueries

Code Comparison

Pinot query example:

SELECT COUNT(*) FROM myTable
WHERE timeColumn BETWEEN 1589662000000 AND 1589748400000
GROUP BY dimension1, dimension2

Druid query example:

SELECT COUNT(*) FROM myDataSource
WHERE __time BETWEEN TIMESTAMP '2020-05-17' AND TIMESTAMP '2020-05-18'
GROUP BY dimension1, dimension2

Both systems use SQL-like syntax for querying, but Pinot uses millisecond timestamps while Druid uses ISO 8601 format. Pinot's query structure is generally simpler, while Druid offers more advanced features in its query language.