druid
阿里云计算平台DataWorks(https://help.aliyun.com/document_detail/137663.html) 团队出品,为监控而生的数据库连接池
Top Related Projects
Quick Overview
Alibaba Druid is a high-performance, feature-rich JDBC connection pool and monitoring solution. It provides powerful monitoring and extensions for both open-source and commercial databases, making it an essential tool for database connection management and performance optimization in Java applications.
Pros
- High performance and efficient connection pooling
- Comprehensive monitoring capabilities, including SQL execution statistics and performance analysis
- Extensive support for various databases, including MySQL, Oracle, PostgreSQL, and more
- Rich set of features, including SQL parsing, SQL firewall, and data source encryption
Cons
- Steeper learning curve compared to simpler connection pool libraries
- Configuration can be complex for advanced features
- Some users report occasional stability issues in certain environments
- Documentation is primarily in Chinese, which may be challenging for non-Chinese speakers
Code Examples
- Creating a Druid DataSource:
DruidDataSource dataSource = new DruidDataSource();
dataSource.setUrl("jdbc:mysql://localhost:3306/test");
dataSource.setUsername("root");
dataSource.setPassword("password");
dataSource.setInitialSize(5);
dataSource.setMaxActive(10);
- Executing a query using Druid:
try (Connection conn = dataSource.getConnection();
PreparedStatement ps = conn.prepareStatement("SELECT * FROM users WHERE id = ?")) {
ps.setInt(1, 1);
try (ResultSet rs = ps.executeQuery()) {
while (rs.next()) {
System.out.println(rs.getString("name"));
}
}
}
- Enabling SQL statistics:
dataSource.setFilters("stat");
dataSource.setConnectionProperties("druid.stat.mergeSql=true;druid.stat.slowSqlMillis=5000");
Getting Started
To use Alibaba Druid in your Java project, follow these steps:
- Add the Druid dependency to your project's
pom.xml
:
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid</artifactId>
<version>1.2.9</version>
</dependency>
- Create a Druid DataSource in your application:
DruidDataSource dataSource = new DruidDataSource();
dataSource.setUrl("jdbc:mysql://localhost:3306/your_database");
dataSource.setUsername("your_username");
dataSource.setPassword("your_password");
dataSource.setInitialSize(5);
dataSource.setMaxActive(10);
dataSource.setMinIdle(5);
dataSource.setMaxWait(60000);
dataSource.setValidationQuery("SELECT 1");
dataSource.setTestOnBorrow(false);
dataSource.setTestOnReturn(false);
dataSource.setTestWhileIdle(true);
dataSource.setPoolPreparedStatements(true);
dataSource.setMaxPoolPreparedStatementPerConnectionSize(20);
- Use the DataSource to obtain connections and execute queries as shown in the code examples above.
Competitor Comparisons
Apache Calcite
Pros of Calcite
- More comprehensive SQL support, including advanced features like window functions and complex joins
- Extensible architecture allowing integration with various data sources and processing engines
- Active open-source community with regular updates and contributions
Cons of Calcite
- Steeper learning curve due to its more complex architecture
- May require more configuration and setup compared to Druid's out-of-the-box functionality
- Performance can be slower for certain types of queries, especially those involving large datasets
Code Comparison
Calcite query optimization example:
RelNode logicalPlan = sqlToRelConverter.convertQuery(sqlNode, false, true).rel;
HepProgram program = HepProgram.builder().addRuleInstance(FilterJoinRule.FILTER_ON_JOIN).build();
HepPlanner planner = new HepPlanner(program);
planner.setRoot(logicalPlan);
RelNode optimizedPlan = planner.findBestExp();
Druid query example:
GroupByQuery query = GroupByQuery.builder()
.setDataSource("sample_datasource")
.setInterval("2019-01-01/2020-01-01")
.setGranularity(Granularities.DAY)
.setDimensions(Collections.singletonList(DefaultDimensionSpec.of("dimension1")))
.setAggregators(Collections.singletonList(CountAggregatorFactory.of("count")))
.build();
Both Calcite and Druid serve different purposes in data processing and querying. Calcite focuses on SQL parsing and optimization across various data sources, while Druid specializes in real-time analytics on large datasets. The choice between them depends on specific use cases and requirements.
Apache Flink
Pros of Flink
- Designed for distributed stream and batch processing, offering real-time data processing capabilities
- Supports both stream and batch processing with a unified API
- Provides advanced features like exactly-once processing semantics and event time processing
Cons of Flink
- Steeper learning curve due to its complex architecture and concepts
- Requires more system resources and configuration compared to Druid
- Less optimized for OLAP queries on large datasets
Code Comparison
Flink (Stream Processing):
DataStream<String> stream = env.addSource(new FlinkKafkaConsumer<>("topic", new SimpleStringSchema(), properties));
stream.flatMap(new Splitter())
.keyBy("word")
.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
.sum("count")
.print();
Druid (OLAP Query):
SELECT
time_column,
SUM(metric1) AS total_metric1,
COUNT(DISTINCT user_id) AS unique_users
FROM datasource
WHERE time_column BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY time_column
ORDER BY time_column ASC
While Flink excels in stream processing and complex event processing scenarios, Druid is more focused on real-time analytics and OLAP queries. Flink offers greater flexibility for various data processing tasks, whereas Druid provides optimized performance for analytical queries on large datasets.
Apache Hive
Pros of Hive
- Mature ecosystem with extensive community support and integration with Hadoop
- Powerful SQL-like query language (HiveQL) for data warehousing
- Supports a wide range of data formats and storage systems
Cons of Hive
- Slower query performance compared to Druid, especially for real-time analytics
- Less efficient for handling high-concurrency workloads
- Limited support for real-time data ingestion and streaming
Code Comparison
Hive query example:
SELECT customer_id, SUM(order_total)
FROM orders
WHERE order_date >= '2023-01-01'
GROUP BY customer_id;
Druid query example:
{
"queryType": "groupBy",
"dataSource": "orders",
"intervals": ["2023-01-01/2023-12-31"],
"granularity": "all",
"dimensions": ["customer_id"],
"aggregations": [
{ "type": "longSum", "name": "total_orders", "fieldName": "order_total" }
]
}
While Hive uses a SQL-like syntax, Druid employs a JSON-based query language. Hive's syntax is more familiar to SQL users, but Druid's approach allows for more flexible and complex queries, especially for real-time analytics scenarios.
Apache Kylin
Pros of Kylin
- Designed specifically for OLAP and multidimensional analysis
- Supports pre-calculation of cubes for faster query performance
- Integrates well with Hadoop ecosystem components
Cons of Kylin
- More complex setup and maintenance compared to Druid
- Limited support for real-time data ingestion
- Steeper learning curve for users unfamiliar with OLAP concepts
Code Comparison
Kylin query example:
SELECT year, SUM(price) AS total_price
FROM sales_fact
GROUP BY year
Druid query example:
SELECT __time AS year, SUM(price) AS total_price
FROM sales
GROUP BY 1
Both Kylin and Druid use SQL-like syntax for querying data, but Kylin's queries are more oriented towards multidimensional analysis, while Druid's queries are more focused on time-series data.
Kylin is better suited for complex OLAP scenarios with pre-calculated cubes, while Druid excels in real-time analytics and time-series data processing. Kylin offers stronger integration with the Hadoop ecosystem, whereas Druid provides more flexibility for real-time data ingestion and ad-hoc queries.
The official home of the Presto distributed SQL query engine for big data
Pros of Presto
- Designed for interactive analytics queries, offering faster query execution for large-scale data processing
- Supports a wide range of data sources, including Hadoop, Cassandra, and relational databases
- Highly scalable and can handle petabytes of data across distributed systems
Cons of Presto
- Requires more memory resources compared to Druid
- Less efficient for real-time data ingestion and streaming analytics
- May have higher latency for certain types of queries, especially on smaller datasets
Code Comparison
Presto SQL query:
SELECT region, SUM(sales) AS total_sales
FROM sales_data
GROUP BY region
HAVING SUM(sales) > 1000000
ORDER BY total_sales DESC
LIMIT 10;
Druid query:
{
"queryType": "groupBy",
"dataSource": "sales_data",
"dimensions": ["region"],
"aggregations": [{"type": "longSum", "name": "total_sales", "fieldName": "sales"}],
"having": {"type": "greaterThan", "aggregation": "total_sales", "value": 1000000},
"granularity": "all",
"limitSpec": {"type": "default", "limit": 10, "columns": [{"dimension": "total_sales", "direction": "descending"}]}
}
While both systems can perform similar analytics, Presto uses standard SQL syntax, making it more familiar to SQL users. Druid uses a JSON-based query language, which may require additional learning but offers fine-grained control over query execution.
Apache Pinot - A realtime distributed OLAP datastore
Pros of Pinot
- Designed for real-time analytics with low latency querying on large datasets
- Supports multi-tenancy and horizontal scalability
- Offers a wide range of indexing techniques for optimized query performance
Cons of Pinot
- Steeper learning curve and more complex setup compared to Druid
- Less mature ecosystem and smaller community than Druid
- Limited support for complex joins and subqueries
Code Comparison
Pinot query example:
SELECT COUNT(*) FROM myTable
WHERE timeColumn BETWEEN 1589662000000 AND 1589748400000
GROUP BY dimension1, dimension2
Druid query example:
SELECT COUNT(*) FROM myDataSource
WHERE __time BETWEEN TIMESTAMP '2020-05-17' AND TIMESTAMP '2020-05-18'
GROUP BY dimension1, dimension2
Both systems use SQL-like syntax for querying, but Pinot uses millisecond timestamps while Druid uses ISO 8601 format. Pinot's query structure is generally simpler, while Druid offers more advanced features in its query language.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
druid
Introduction
- git clone https://github.com/alibaba/druid.git
- cd druid && mvn install
- have fun.
ç¸å ³é¿éäºäº§å
Documentation
Top Related Projects
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot