Convert Figma logo to code with AI

alibaba logocanal

阿里巴巴 MySQL binlog 增量订阅&消费组件

28,667
7,638
28,667
1,142

Top Related Projects

4,062

Maxwell's daemon, a mysql-to-json kafka producer

10,544

Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

MySQL Binary Log connector

3,643

Source-agnostic distributed change data capture system

1,404

Apache InLong - a one-stop, full-scenario integration framework for massive data

Flink CDC is a streaming data integration tool

Quick Overview

Canal is an open-source project developed by Alibaba that provides a high-performance data synchronization solution. It is designed to parse MySQL binlog and deliver data change events in real-time, enabling efficient data replication and integration between various systems.

Pros

  • High performance and low latency for real-time data synchronization
  • Supports multiple data sources and targets, including MySQL, Oracle, and Kafka
  • Provides strong consistency and reliability in data replication
  • Offers flexible deployment options and easy scalability

Cons

  • Primarily focused on MySQL, with limited support for other databases
  • Requires additional setup and maintenance compared to built-in replication tools
  • Learning curve for configuration and optimization
  • Documentation is primarily in Chinese, which may be challenging for non-Chinese speakers

Code Examples

  1. Configuring Canal client:
CanalConnector connector = CanalConnectors.newSingleConnector(
    new InetSocketAddress("127.0.0.1", 11111),
    "example",
    "canal",
    "canal"
);
connector.connect();
connector.subscribe(".*\\..*");
  1. Consuming data change events:
Message message = connector.getWithoutAck(100);
long batchId = message.getId();
List<CanalEntry.Entry> entries = message.getEntries();
for (CanalEntry.Entry entry : entries) {
    if (entry.getEntryType() == CanalEntry.EntryType.ROWDATA) {
        CanalEntry.RowChange rowChange = CanalEntry.RowChange.parseFrom(entry.getStoreValue());
        for (CanalEntry.RowData rowData : rowChange.getRowDatasList()) {
            // Process row data
        }
    }
}
connector.ack(batchId);
  1. Filtering specific tables:
connector.subscribe("test\\..*");

Getting Started

  1. Download and install Canal:

    wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz
    tar zxvf canal.deployer-1.1.5.tar.gz
    
  2. Configure Canal:

    • Edit conf/example/instance.properties
    • Set canal.instance.master.address=x.x.x.x:3306
    • Set canal.instance.dbUsername=canal
    • Set canal.instance.dbPassword=canal
  3. Start Canal:

    sh bin/startup.sh
    
  4. Implement a Canal client using the provided Java SDK or other available clients.

Competitor Comparisons

4,062

Maxwell's daemon, a mysql-to-json kafka producer

Pros of Maxwell

  • Simpler setup and configuration process
  • Better support for non-MySQL databases (e.g., PostgreSQL)
  • More straightforward integration with Kafka and other streaming platforms

Cons of Maxwell

  • Less feature-rich compared to Canal
  • Limited support for complex data transformations
  • Smaller community and fewer enterprise-level deployments

Code Comparison

Maxwell:

public class Maxwell {
    public static void main(String[] args) throws Exception {
        Maxwell maxwell = new Maxwell(new MaxwellConfig(args));
        maxwell.run();
    }
}

Canal:

public class CanalLauncher {
    public static void main(String args[]) throws Throwable {
        CanalLauncher launcher = new CanalLauncher();
        launcher.start();
    }
}

Both projects aim to capture and stream database changes, but they differ in their approach and feature set. Maxwell focuses on simplicity and ease of use, making it a good choice for smaller projects or those new to change data capture. Canal, on the other hand, offers more advanced features and is better suited for large-scale, enterprise-level deployments, particularly those heavily invested in the Alibaba ecosystem.

Maxwell's strength lies in its straightforward setup and broader database support, while Canal excels in handling complex scenarios and providing more extensive customization options. The choice between the two depends on the specific requirements of your project, the scale of your operations, and your familiarity with each ecosystem.

10,544

Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

Pros of Debezium

  • Supports a wider range of databases, including MySQL, PostgreSQL, MongoDB, and more
  • Offers built-in Kafka Connect integration for seamless data streaming
  • Provides robust schema evolution support and handling of complex data types

Cons of Debezium

  • Can be more complex to set up and configure compared to Canal
  • May have higher resource requirements, especially for large-scale deployments

Code Comparison

Canal:

CanalConnector connector = CanalConnectors.newSingleConnector(
    new InetSocketAddress(AddressUtils.getHostIp(), 11111),
    "example", "", "");
connector.connect();
connector.subscribe(".*\\..*");

Debezium:

{
  "name": "inventory-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "database.hostname": "mysql",
    "database.port": "3306",
    "database.user": "debezium",
    "database.password": "dbz",
    "database.server.id": "184054",
    "database.server.name": "dbserver1",
    "database.whitelist": "inventory"
  }
}

Both Canal and Debezium are powerful tools for change data capture (CDC), but they have different strengths and use cases. Canal is primarily focused on MySQL and is often used in Alibaba's ecosystem, while Debezium offers broader database support and tighter integration with Apache Kafka.

MySQL Binary Log connector

Pros of mysql-binlog-connector-java

  • Lightweight and focused solely on MySQL binlog parsing
  • Easy to integrate into existing Java applications
  • More flexible for custom implementations

Cons of mysql-binlog-connector-java

  • Less feature-rich compared to Canal
  • Requires more manual configuration and setup
  • Limited built-in support for data transformation and filtering

Code Comparison

mysql-binlog-connector-java:

BinaryLogClient client = new BinaryLogClient("hostname", 3306, "username", "password");
client.registerEventListener(event -> {
    // Process event
});
client.connect();

Canal:

CanalConnector connector = CanalConnectors.newSingleConnector(
    new InetSocketAddress("hostname", 11111), "destination", "username", "password");
connector.connect();
connector.subscribe(".*\\..*");
Message message = connector.getWithoutAck(batchSize);
// Process message
connector.ack(message.getId());

Both repositories provide tools for parsing MySQL binlog events, but they differ in scope and implementation. Canal offers a more comprehensive solution with built-in features for data synchronization and transformation, while mysql-binlog-connector-java provides a lightweight library focused on binlog parsing. Canal is better suited for large-scale data synchronization projects, while mysql-binlog-connector-java is more appropriate for developers who need fine-grained control over binlog processing in their Java applications.

3,643

Source-agnostic distributed change data capture system

Pros of Databus

  • More mature project with longer development history
  • Supports multiple databases beyond MySQL (e.g., Oracle)
  • Better documentation and community support

Cons of Databus

  • Less active development in recent years
  • More complex setup and configuration process
  • Limited support for newer database versions

Code Comparison

Canal:

CanalConnector connector = CanalConnectors.newSingleConnector(
    new InetSocketAddress(AddressUtils.getHostIp(), 11111),
    "example", "", "");
connector.connect();
connector.subscribe(".*\\..*");

Databus:

DbusEventBuffer eventBuffer = new DbusEventBuffer(config);
PhysicalSourceStaticConfig physicalSourceConfig = 
    new PhysicalSourceStaticConfig("source1", "localhost", "user", "pass");
DatabusSourcesConnection sourcesConn = 
    new DatabusSourcesConnection(physicalSourceConfig, eventBuffer);

Both projects aim to provide change data capture (CDC) solutions, but they differ in their approach and target databases. Canal focuses primarily on MySQL and is actively maintained by Alibaba, while Databus supports multiple databases but has seen less recent development. Canal offers a simpler setup process and better support for newer MySQL versions, making it a preferred choice for MySQL-centric environments. Databus, on the other hand, provides more flexibility for multi-database setups and has a more established history, which may be beneficial for complex enterprise environments.

1,404

Apache InLong - a one-stop, full-scenario integration framework for massive data

Pros of InLong

  • Broader data integration capabilities, supporting multiple data sources and sinks
  • More comprehensive data processing features, including data transformation and streaming
  • Active Apache project with a larger community and regular updates

Cons of InLong

  • Steeper learning curve due to its more complex architecture
  • Potentially higher resource requirements for deployment and operation
  • Less focused on MySQL-specific replication compared to Canal

Code Comparison

InLong (Java):

public class MySQLExtractor extends AbstractExtractor {
    @Override
    public void extract(TaskContext context) throws Exception {
        // MySQL extraction logic
    }
}

Canal (Java):

public class SimpleCanalClientExample {
    public static void main(String args[]) {
        CanalConnector connector = CanalConnectors.newSingleConnector(
            new InetSocketAddress(AddressUtils.getHostIp(), 11111),
            "example", "", "");
        connector.connect();
        connector.subscribe(".*\\..*");
        // Process data
    }
}

Both projects use Java, but InLong's code structure is more modular and extensible, while Canal's example shows a more straightforward approach for MySQL-specific replication. InLong's architecture allows for easier integration of various data sources and processing steps, whereas Canal focuses primarily on MySQL binlog parsing and replication.

Flink CDC is a streaming data integration tool

Pros of Flink CDC

  • Built on Apache Flink, offering robust stream processing capabilities
  • Supports a wider range of databases, including MySQL, PostgreSQL, Oracle, and SQL Server
  • Provides seamless integration with Flink's ecosystem and APIs

Cons of Flink CDC

  • Steeper learning curve due to Flink's complexity
  • Requires more resources to run compared to Canal's lightweight design

Code Comparison

Canal:

CanalConnector connector = CanalConnectors.newSingleConnector(
    new InetSocketAddress(AddressUtils.getHostIp(), 11111),
    "example", "", "");
connector.connect();
connector.subscribe(".*\\..*");

Flink CDC:

MySqlSource<String> mySqlSource = MySqlSource.<String>builder()
    .hostname("localhost")
    .port(3306)
    .databaseList("mydatabase")
    .tableList("mydatabase.users")
    .username("root")
    .password("password")
    .deserializer(new JsonDebeziumDeserializationSchema())
    .build();

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.addSource(mySqlSource)
    .print().setParallelism(1);
env.execute();

The code snippets demonstrate the setup process for each tool. Canal focuses on a simple connector setup, while Flink CDC showcases its integration with Flink's streaming environment and more detailed configuration options.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

build status codecov maven license average time to resolve an issue percentage of issues still open Leaderboard

简介

**canal [kə'næl]**,译意为水道/管道/沟渠,主要用途是基于 MySQL 数据库增量日志解析,提供增量数据订阅和消费

早期阿里巴巴因为杭州和美国双机房部署,存在跨机房同步的业务需求,实现方式主要是基于业务 trigger 获取增量变更。从 2010 年开始,业务逐步尝试数据库日志解析获取增量变更进行同步,由此衍生出了大量的数据库增量订阅和消费业务。

基于日志增量订阅和消费的业务包括

  • 数据库镜像
  • 数据库实时备份
  • 索引构建和实时维护(拆分异构索引、倒排索引等)
  • 业务 cache 刷新
  • 带业务逻辑的增量数据处理

当前的 canal 支持源端 MySQL 版本包括 5.1.x , 5.5.x , 5.6.x , 5.7.x , 8.0.x

工作原理

MySQL主备复制原理

  • MySQL master 将数据变更写入二进制日志( binary log, 其中记录叫做二进制日志事件binary log events,可以通过 show binlog events 进行查看)
  • MySQL slave 将 master 的 binary log events 拷贝到它的中继日志(relay log)
  • MySQL slave 重放 relay log 中事件,将数据变更反映它自己的数据

canal 工作原理

  • canal 模拟 MySQL slave 的交互协议,伪装自己为 MySQL slave ,向 MySQL master 发送dump 协议
  • MySQL master 收到 dump 请求,开始推送 binary log 给 slave (即 canal )
  • canal 解析 binary log 对象(原始为 byte 流)

重要版本更新说明

  1. canal 1.1.x 版本(release_note),性能与功能层面有较大的突破,重要提升包括:
  • 整体性能测试&优化,提升了150%. #726 参考: Performance
  • 原生支持prometheus监控 #765 Prometheus QuickStart
  • 原生支持kafka消息投递 #695 Canal Kafka/RocketMQ QuickStart
  • 原生支持aliyun rds的binlog订阅 (解决自动主备切换/oss binlog离线解析) 参考: Aliyun RDS QuickStart
  • 原生支持docker镜像 #801 参考: Docker QuickStart
  1. canal 1.1.4版本,迎来最重要的WebUI能力,引入canal-admin工程,支持面向WebUI的canal动态管理能力,支持配置、任务、日志等在线白屏运维能力,具体文档:Canal Admin Guide

文档

多语言

canal 特别设计了 client-server 模式,交互协议使用 protobuf 3.0 , client 端可采用不同语言实现不同的消费逻辑,欢迎大家提交 pull request

canal 作为 MySQL binlog 增量获取和解析工具,可将变更记录投递到 MQ 系统中,比如 Kafka/RocketMQ,可以借助于 MQ 的多语言能力

基于canal开发的工具

相关开源&产品

问题反馈

本项目的Issues会被同步沉淀至阿里云开发者社区