go-mysql-elasticsearch

Sync MySQL data into elasticsearch

4,156

800

4,156

218

View on GitHub

Top Related Projects

maxwell

4,190

Maxwell's daemon, a mysql-to-json kafka producer

debezium

11,679

Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

Quick Overview

go-mysql-elasticsearch is an open-source project that synchronizes data from MySQL to Elasticsearch in real-time. It uses MySQL binlog to capture data changes and then sends these changes to Elasticsearch, allowing for efficient and up-to-date full-text search capabilities on MySQL data.

Pros

Real-time synchronization between MySQL and Elasticsearch
Supports various data types and complex MySQL schemas
Minimal impact on MySQL performance due to binlog-based approach
Configurable and extensible for different use cases

Cons

Requires binlog enabled on MySQL, which may not be feasible for all setups
Limited support for complex MySQL operations (e.g., multi-table transactions)
May require additional maintenance and monitoring in production environments
Learning curve for proper configuration and optimization

Code Examples

Basic configuration setup:

cfg := river.NewConfig()
cfg.MyAddr = "127.0.0.1:3306"
cfg.MyUser = "root"
cfg.MyPassword = "password"
cfg.ESAddr = "127.0.0.1:9200"

Defining rules for synchronization:

cfg.Rules = []*river.Rule{
    {
        Schema: "test",
        Table:  "t1",
        Index:  "t1",
        Type:   "t1",
    },
}

Starting the river (synchronization process):

r, err := river.NewRiver(cfg)
if err != nil {
    // Handle error
}

if err = r.Start(); err != nil {
    // Handle error
}

Getting Started

Install the library:

go get github.com/go-mysql-org/go-mysql-elasticsearch

Create a configuration file (e.g., river.toml):

[mysql]
addr = "127.0.0.1:3306"
user = "root"
password = "password"

[elasticsearch]
addr = "127.0.0.1:9200"

[[rule]]
schema = "test"
table = "t1"
index = "t1"
type = "t1"

Run the synchronization:

go-mysql-elasticsearch -config river.toml

Competitor Comparisons

canal

29,402

阿里巴巴 MySQL binlog 增量订阅&消费组件

Pros of Canal

More comprehensive data synchronization solution, supporting multiple targets (e.g., Kafka, RocketMQ, Redis)
Better performance and scalability for large-scale data processing
Extensive documentation and active community support

Cons of Canal

Steeper learning curve due to more complex architecture
Primarily designed for Java ecosystems, potentially limiting integration options
Requires more setup and configuration compared to go-mysql-elasticsearch

Code Comparison

Canal (Java):

CanalConnector connector = CanalConnectors.newSingleConnector(
    new InetSocketAddress(AddressUtils.getHostIp(), 11111),
    "example", "", "");
connector.connect();
connector.subscribe(".*\\..*");

go-mysql-elasticsearch (Go):

cfg := replication.BinlogSyncerConfig{
    ServerID: 100,
    Flavor:   "mysql",
    Host:     "127.0.0.1",
    Port:     3306,
    User:     "root",
    Password: "",
}
syncer := replication.NewBinlogSyncer(cfg)

Both projects aim to synchronize MySQL data, but Canal offers a more robust solution for enterprise-level applications, while go-mysql-elasticsearch provides a simpler, Go-based approach focused on Elasticsearch integration. Canal is better suited for complex, large-scale deployments, whereas go-mysql-elasticsearch is ideal for smaller projects or those specifically targeting Elasticsearch as the primary destination.

go-mysql

4,845

a powerful mysql toolset with Go

Pros of go-mysql

More comprehensive MySQL toolset, including replication, binlog parsing, and general MySQL utilities
Broader scope, allowing for various MySQL-related operations beyond just data synchronization
Potentially more flexible for custom MySQL-related projects and integrations

Cons of go-mysql

Lacks built-in Elasticsearch integration, requiring additional development for data synchronization
May require more setup and configuration for specific use cases like MySQL to Elasticsearch syncing
Potentially steeper learning curve due to its broader feature set

Code Comparison

go-mysql:

cfg := replication.BinlogSyncerConfig{
    ServerID: 100,
    Flavor:   "mysql",
    Host:     "127.0.0.1",
    Port:     3306,
    User:     "root",
    Password: "",
}
syncer := replication.NewBinlogSyncer(cfg)

go-mysql-elasticsearch:

r, err := river.NewRiver(cfg)
if err != nil {
    panic(err)
}
r.Run()

The go-mysql code snippet shows configuration for binlog syncing, while go-mysql-elasticsearch demonstrates the simplicity of setting up a MySQL to Elasticsearch river. go-mysql-elasticsearch abstracts much of the complexity, making it easier to use for its specific purpose.

maxwell

4,190

Maxwell's daemon, a mysql-to-json kafka producer

Pros of Maxwell

Written in Java, which may be more familiar to some developers
Supports multiple output formats (JSON, Avro, Protobuf) and destinations (Kafka, Kinesis, RabbitMQ, Redis, etc.)
Includes a built-in web interface for monitoring and management

Cons of Maxwell

Limited to MySQL and MariaDB databases
May have higher resource usage due to Java runtime
Less flexible for custom transformations compared to Go-based solutions

Code Comparison

Maxwell (Java):

public class Maxwell {
    public static void main(String[] args) throws Exception {
        Maxwell maxwell = new Maxwell(new MaxwellConfig(args));
        maxwell.run();
    }
}

go-mysql-elasticsearch (Go):

func main() {
    var cfg *Config
    cfg, err := NewConfigWithFile(*configFile)
    if err != nil {
        println(err.Error())
        return
    }
    river := NewRiver(cfg)
    river.Run()
}

Both projects aim to replicate MySQL data to other systems, but they differ in implementation language and features. Maxwell offers broader output options and a web interface, while go-mysql-elasticsearch focuses specifically on Elasticsearch integration with potentially lower resource usage. The choice between them depends on specific use cases, existing infrastructure, and developer preferences.

debezium

11,679

Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

Pros of Debezium

Supports multiple databases (MySQL, PostgreSQL, MongoDB, etc.) vs. go-mysql-elasticsearch's MySQL-only focus
Provides real-time CDC (Change Data Capture) with low latency
Integrates seamlessly with Apache Kafka for scalable event streaming

Cons of Debezium

More complex setup and configuration compared to go-mysql-elasticsearch
Higher resource consumption due to its broader scope and features
Steeper learning curve for developers new to CDC and event streaming

Code Comparison

Debezium (Java):

@Entity
@Table(name = "customers")
public class Customer {
    @Id
    private Long id;
    private String name;
    // ...
}

go-mysql-elasticsearch (Go):

type Customer struct {
    ID   int    `json:"id"`
    Name string `json:"name"`
}

Both projects use different approaches for defining data structures. Debezium leverages Java annotations for ORM mapping, while go-mysql-elasticsearch uses Go struct tags for JSON serialization. This reflects their distinct focuses: Debezium on broader database support and CDC, and go-mysql-elasticsearch on specific MySQL to Elasticsearch synchronization.

kingshard

6,425

A high-performance MySQL proxy

Pros of kingshard

Provides MySQL proxy functionality with features like read/write splitting and sharding
Offers SQL parsing and rewriting capabilities for query optimization
Includes built-in connection pooling for improved performance

Cons of kingshard

More complex setup and configuration compared to go-mysql-elasticsearch
Limited to MySQL databases, while go-mysql-elasticsearch supports Elasticsearch integration
May require more maintenance and updates due to its broader feature set

Code Comparison

kingshard (SQL parsing and rewriting):

func (c *ClientConn) handleQuery(sql string) (err error) {
    // SQL parsing and rewriting logic
    stmt, err := sqlparser.Parse(sql)
    if err != nil {
        return err
    }
    // Further processing and execution
}

go-mysql-elasticsearch (Elasticsearch synchronization):

func (r *River) syncRow(binlog *replication.BinlogEvent) error {
    // Extract row data from MySQL binlog event
    rowData := binlog.Rows[0]
    // Convert and index data in Elasticsearch
    return r.es.Index(r.esIndex, r.esType, rowData)
}

The code snippets highlight the different focus areas of each project. kingshard emphasizes SQL handling and proxy functionality, while go-mysql-elasticsearch concentrates on data synchronization between MySQL and Elasticsearch.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

go-mysql-elasticsearch is a service syncing your MySQL data into Elasticsearch automatically.

It uses mysqldump to fetch the origin data at first, then syncs data incrementally with binlog.

Call for Committer/Maintainer

Sorry that I have no enough time to maintain this project wholly, if you like this project and want to help me improve it continuously, please contact me through email (siddontang@gmail.com).

Requirement: In the email, you should list somethings(including but not limited to below) to make me believe we can work together.

Your GitHub ID The contributions to go-mysql-elasticsearch before, including PRs or Issues. The reason why you can improve go-mysql-elasticsearch.

Install

Install Go (1.9+) and set your GOPATH
go get github.com/siddontang/go-mysql-elasticsearch, it will print some messages in console, skip it. :-)
cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch
make

How to use?

Create table in MySQL.
Create the associated Elasticsearch index, document type and mappings if possible, if not, Elasticsearch will create these automatically.
Config base, see the example config river.toml.
Set MySQL source in config file, see Source below.
Customize MySQL and Elasticsearch mapping rule in config file, see Rule below.
Start ./bin/go-mysql-elasticsearch -config=./etc/river.toml and enjoy it.

Notice

MySQL supported version < 8.0
ES supported version < 6.0
binlog format must be row.
binlog row image must be full for MySQL, you may lost some field data if you update PK data in MySQL with minimal or noblob binlog row image. MariaDB only supports full row image.
Can not alter table format at runtime.
MySQL table which will be synced should have a PK(primary key), multi columns PK is allowed now, e,g, if the PKs is (a, b), we will use "a:b" as the key. The PK data will be used as "id" in Elasticsearch. And you can also config the id's constituent part with other column.
You should create the associated mappings in Elasticsearch first, I don't think using the default mapping is a wise decision, you must know how to search accurately.
mysqldump must exist in the same node with go-mysql-elasticsearch, if not, go-mysql-elasticsearch will try to sync binlog only.
Don't change too many rows at same time in one SQL.

Source

In go-mysql-elasticsearch, you must decide which tables you want to sync into elasticsearch in the source config.

The format in config file is below:

[[source]]
schema = "test"
tables = ["t1", t2]

[[source]]
schema = "test_1"
tables = ["t3", t4]

schema is the database name, and tables includes the table need to be synced.

If you want to sync all table in database, you can use asterisk(*).

[[source]]
schema = "test"
tables = ["*"]

# When using an asterisk, it is not allowed to sync multiple tables
# tables = ["*", "table"]

Rule

By default, go-mysql-elasticsearch will use MySQL table name as the Elasticserach's index and type name, use MySQL table field name as the Elasticserach's field name.
e.g, if a table named blog, the default index and type in Elasticserach are both named blog, if the table field named title, the default field name is also named title.

Notice: go-mysql-elasticsearch will use the lower-case name for the ES index and type. E.g, if your table named BLOG, the ES index and type are both named blog.

Rule can let you change this name mapping. Rule format in config file is below:

[[rule]]
schema = "test"
table = "t1"
index = "t"
type = "t"
parent = "parent_id"
id = ["id"]

    [rule.field]
    mysql = "title"
    elastic = "my_title"

In the example above, we will use a new index and type both named "t" instead of default "t1", and use "my_title" instead of field name "title".

Rule field types

In order to map a mysql column on different elasticsearch types you can define the field type as follows:

[[rule]]
schema = "test"
table = "t1"
index = "t"
type = "t"

    [rule.field]
    // This will map column title to elastic search my_title
    title="my_title"

    // This will map column title to elastic search my_title and use array type
    title="my_title,list"

    // This will map column title to elastic search title and use array type
    title=",list"

    // If the created_time field type is "int", and you want to convert it to "date" type in es, you can do it as below
    created_time=",date"

Modifier "list" will translates a mysql string field like "a,b,c" on an elastic array type '{"a", "b", "c"}' this is specially useful if you need to use those fields on filtering on elasticsearch.

Wildcard table

go-mysql-elasticsearch only allows you determind which table to be synced, but sometimes, if you split a big table into multi sub tables, like 1024, table_0000, table_0001, ... table_1023, it is very hard to write rules for every table.

go-mysql-elasticserach supports using wildcard table, e.g:

[[source]]
schema = "test"
tables = ["test_river_[0-9]{4}"]

[[rule]]
schema = "test"
table = "test_river_[0-9]{4}"
index = "river"
type = "river"

"test_river_[0-9]{4}" is a wildcard table definition, which represents "test_river_0000" to "test_river_9999", at the same time, the table in the rule must be same as it.

At the above example, if you have 1024 sub tables, all tables will be synced into Elasticsearch with index "river" and type "river".

Parent-Child Relationship

One-to-many join ( parent-child relationship in Elasticsearch ) is supported. Simply specify the field name for parent property.

[[rule]]
schema = "test"
table = "t1"
index = "t"
type = "t"
parent = "parent_id"

Note: you should setup relationship with creating the mapping manually.

Filter fields

You can use filter to sync specified fields, like:

[[rule]]
schema = "test"
table = "tfilter"
index = "test"
type = "tfilter"

# Only sync following columns
filter = ["id", "name"]

In the above example, we will only sync MySQL table tfiler's columns id and name to Elasticsearch.

Ignore table without a primary key

When you sync table without a primary key, you can see below error message.

schema.table must have a PK for a column

You can ignore these tables in the configuration like:

# Ignore table without a primary key
skip_no_pk_table = true

Elasticsearch Pipeline

You can use Ingest Node Pipeline to pre-process documents before indexing, like JSON string decode, merge fileds and more.

[[rule]]
schema = "test"
table = "t1"
index = "t"
type = "_doc"

# pipeline id
pipeline = "my-pipeline-id"

Node: you should create pipeline manually and Elasticsearch >= 5.0.

Why not other rivers?

Although there are some other MySQL rivers for Elasticsearch, like elasticsearch-river-jdbc, elasticsearch-river-mysql, I still want to build a new one with Go, why?

Customization, I want to decide which table to be synced, the associated index and type name, or even the field name in Elasticsearch.
Incremental update with binlog, and can resume from the last sync position when the service starts again.
A common sync framework not only for Elasticsearch but also for others, like memcached, redis, etc...
Wildcard tables support, we have many sub tables like table_0000 - table_1023, but want use a unique Elasticsearch index and type.

Todo

MySQL 8
ES 6
Statistic.

Donate

If you like the project and want to buy me a cola, you can through:

PayPal	å¾®ä¿¡
	[

Feedback

go-mysql-elasticsearch is still in development, and we will try to use it in production later. Any feedback is very welcome.

Email: siddontang@gmail.com

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot