inlong
Apache InLong - a one-stop, full-scenario integration framework for massive data
Top Related Projects
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-like data
Apache NiFi
Apache Flink
Mirror of Apache Kafka
Apache Pulsar - distributed pub-sub messaging system
Apache Beam is a unified programming model for Batch and Streaming data processing.
Quick Overview
Apache InLong is a one-stop integration framework for massive data that provides automatic, secure, and reliable data transmission capabilities. It offers various data integration methods, including pub/sub model, SQL, and dataflow, supporting both batch and stream data processing scenarios.
Pros
- Supports multiple data sources and sinks, including MySQL, Kafka, Hive, and more
- Provides a unified management console for easy data pipeline configuration and monitoring
- Offers high performance and scalability for handling large-scale data processing
- Includes built-in data quality control and data governance features
Cons
- Steep learning curve for new users due to its comprehensive feature set
- Documentation can be improved, especially for advanced use cases
- Limited community support compared to some other data integration tools
- May be overkill for simple data integration scenarios
Code Examples
- Creating a data stream using InLong Client:
InlongClient client = InlongClient.create(clientConfig);
InlongStream stream = client.createStream(streamConfig);
stream.send("Hello, InLong!");
- Configuring a MySQL source in InLong:
MySQLSourceConfig sourceConfig = new MySQLSourceConfig();
sourceConfig.setHostname("localhost");
sourceConfig.setPort(3306);
sourceConfig.setUsername("root");
sourceConfig.setPassword("password");
sourceConfig.setDatabaseName("mydb");
sourceConfig.setTableName("mytable");
- Setting up a Kafka sink:
KafkaSinkConfig sinkConfig = new KafkaSinkConfig();
sinkConfig.setBootstrapServers("localhost:9092");
sinkConfig.setTopic("my-topic");
sinkConfig.setAcks("all");
Getting Started
To get started with Apache InLong, follow these steps:
- Download and install InLong from the official website.
- Configure the InLong Manager and start the service:
cd inlong-manager
vim conf/application.properties
./bin/startup.sh
- Set up InLong Agent for data collection:
cd inlong-agent
vim conf/agent.properties
./bin/agent.sh start
- Use the InLong Dashboard to create and manage data streams:
cd inlong-dashboard
npm install
npm run dev
- Access the dashboard at
http://localhost:8080
to configure your data pipelines.
Competitor Comparisons
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-like data
Pros of Flume
- Simpler architecture and easier to set up for basic log collection scenarios
- More mature project with a larger community and extensive documentation
- Better suited for small to medium-scale deployments
Cons of Flume
- Limited support for real-time data processing and complex transformations
- Less flexible in terms of data routing and multi-tenancy support
- Fewer built-in connectors for modern data sources and sinks
Code Comparison
Flume configuration example:
agent.sources = s1
agent.channels = c1
agent.sinks = k1
agent.sources.s1.type = netcat
agent.sources.s1.bind = localhost
agent.sources.s1.port = 44444
agent.channels.c1.type = memory
agent.sinks.k1.type = logger
InLong configuration example:
inlong:
group:
id: test_group
inlongGroupId: test_group
stream:
id: test_stream
fieldList:
- {name: name, type: string}
- {name: age, type: int}
sink:
type: pulsar
topic: test_topic
Both projects aim to facilitate data ingestion and processing, but InLong offers a more comprehensive and scalable solution for modern data integration scenarios, while Flume excels in simpler log collection use cases.
Apache NiFi
Pros of NiFi
- More mature and widely adopted project with a larger community
- Extensive documentation and robust ecosystem of processors
- Highly configurable with a user-friendly web-based interface
Cons of NiFi
- Can be resource-intensive for large-scale data processing
- Steeper learning curve due to its extensive feature set
- Less focused on real-time streaming compared to InLong
Code Comparison
NiFi (Java):
public class MyProcessor extends AbstractProcessor {
@Override
public void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException {
FlowFile flowFile = session.get();
if (flowFile == null) {
return;
}
// Process the FlowFile
}
}
InLong (Java):
public class MySource implements Source<String> {
@Override
public void open(Configuration parameters) throws Exception {
// Initialize the source
}
@Override
public String collect() throws Exception {
// Collect and return data
}
}
Both projects use Java, but NiFi focuses on a processor-based architecture, while InLong uses a source-sink model for data processing. NiFi's code structure is more complex due to its extensive features, while InLong's code tends to be more straightforward for basic data ingestion tasks.
Apache Flink
Pros of Flink
- More mature and widely adopted in the industry
- Extensive ecosystem with numerous connectors and libraries
- Advanced stream processing capabilities with event time semantics
Cons of Flink
- Steeper learning curve for beginners
- Higher resource requirements for small-scale applications
- More complex configuration and deployment process
Code Comparison
Flink:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> text = env.readTextFile("input.txt");
DataStream<Tuple2<String, Integer>> counts = text
.flatMap(new Tokenizer())
.keyBy(value -> value.f0)
.sum(1);
counts.print();
InLong:
InLongClient client = InLongClient.create(clientConfig);
PushRequest request = new PushRequest();
request.setGroupId("test_group");
request.setStreamId("test_stream");
request.setDataTime(System.currentTimeMillis());
request.addAttribute("key", "value");
client.send(request);
The code snippets demonstrate basic usage of each framework. Flink focuses on stream processing and data transformation, while InLong emphasizes data ingestion and integration.
Mirror of Apache Kafka
Pros of Kafka
- Highly scalable and fault-tolerant distributed streaming platform
- Excellent performance with high throughput and low latency
- Large ecosystem and extensive community support
Cons of Kafka
- Steeper learning curve and more complex setup
- Requires additional components for end-to-end data integration
- Limited built-in data transformation capabilities
Code Comparison
Kafka producer example:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
InLong producer example:
PulsarClient client = PulsarClient.builder().serviceUrl("pulsar://localhost:6650").build();
Producer<byte[]> producer = client.newProducer()
.topic("persistent://public/default/my-topic")
.create();
Both Kafka and InLong are Apache projects focused on data streaming and integration. Kafka is a more established and widely-used platform, offering high scalability and performance. InLong, on the other hand, provides a more comprehensive end-to-end data integration solution with built-in ETL capabilities. While Kafka may be preferred for pure streaming use cases, InLong offers a more complete data pipeline solution out of the box.
Apache Pulsar - distributed pub-sub messaging system
Pros of Pulsar
- More mature and widely adopted project with a larger community
- Supports multi-tenancy and geo-replication out of the box
- Offers both streaming and queuing models in a single system
Cons of Pulsar
- More complex architecture, potentially harder to set up and maintain
- Higher resource requirements for optimal performance
- Steeper learning curve for developers new to the system
Code Comparison
Pulsar producer example:
Producer<byte[]> producer = client.newProducer()
.topic("my-topic")
.create();
producer.send("Hello, Pulsar!".getBytes());
InLong producer example:
InLongMsg msg = InLongMsg.newInLongMsg();
msg.addBody("Hello, InLong!".getBytes());
producer.send("my-topic", msg);
Both systems use similar concepts for producing messages, but Pulsar's API is more straightforward and type-safe. InLong's API requires creating a specific message object before sending.
Apache Beam is a unified programming model for Batch and Streaming data processing.
Pros of Beam
- More mature and widely adopted project with a larger community
- Supports multiple programming languages (Java, Python, Go)
- Offers a unified programming model for batch and streaming data processing
Cons of Beam
- Steeper learning curve due to its comprehensive feature set
- Can be overkill for simpler data processing tasks
- Requires more setup and configuration compared to InLong
Code Comparison
Beam (Java):
PCollection<String> input = p.apply(TextIO.read().from("input.txt"));
PCollection<String> output = input.apply(MapElements.via(
new SimpleFunction<String, String>() {
public String apply(String input) {
return input.toUpperCase();
}
}));
output.apply(TextIO.write().to("output.txt"));
InLong (Java):
TubeClientConfig config = new TubeClientConfig(masterHostAndPorts);
TubeClient client = new TubeClient(config);
Message message = new Message(topicName, "Hello, InLong!".getBytes());
client.sendMessage(message);
While both projects deal with data processing, Beam focuses on providing a unified programming model for various data processing scenarios, whereas InLong is more specialized in data integration and ingestion. Beam offers more flexibility and language support, but InLong may be easier to set up for specific use cases.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
A one-stop, full-scenario integration framework for massive data
- What is Apache InLong?
- Features
- When should I use InLong?
- Build InLong
- Deploy InLong
- Contribute to InLong
- Contact Us
- Documentation
- License
What is Apache InLong?
Stargazers Over Time | Contributors Over Time |
---|---|
Apache InLong is a one-stop, full-scenario integration framework for massive data that supports Data Ingestion
, Data Synchronization
and Data Subscription
, and it provides automatic, secure and reliable data transmission capabilities. InLong also supports both batch and stream data processing at the same time, which offers great power to build data analysis, modeling and other real-time applications based on streaming data.
InLong (åºé¾) is a divine beast in Chinese mythology who guides the river into the sea, and it is regarded as a metaphor of the InLong system for reporting data streams.
InLong was originally built at Tencent, which has served online businesses for more than 8 years, to support massive data (data scale of more than 80 trillion pieces of data per day) reporting services in big data scenarios. The entire platform has integrated 5 modules: Ingestion, Convergence, Caching, Sorting, and Management, so that the business only needs to provide data sources, data service quality, data landing clusters and data landing formats, that is, the data can be continuously pushed from the source to the target cluster, which greatly meets the data reporting service requirements in the business big data scenario.
For getting more information, please visit our project documentation at https://inlong.apache.org/.
Features
Apache InLong offers a variety of features:
- Ease of Use: a SaaS-based service platform. Users can easily and quickly report, transfer, and distribute data by publishing and subscribing to data based on topics.
- Stability & Reliability: derived from the actual online production environment. It delivers high-performance processing capabilities for 10 trillion-level data streams and highly reliable services for 100 billion-level data streams.
- Comprehensive Features: supports various types of data access methods and can be integrated with different types of Message Queue (MQ). It also provides real-time data extract, transform, and load (ETL) and sorting capabilities based on rules. InLong also allows users to plug features to extend system capabilities.
- Service Integration: provides unified system monitoring and alert services. It provides fine-grained metrics to facilitate data visualization. Users can view the running status of queues and topic-based data statistics in a unified data metric platform. Users can also configure the alert service based on their business requirements so that users can be alerted when errors occur.
- Scalability: adopts a pluggable architecture that allows you to plug modules into the system based on specific protocols. Users can replace components and add features based on their business requirements.
When should I use InLong?
InLong aims to provide a one-stop, full-scenario integration framework for massive data, users can easily build stream-based data applications. It supports Data Ingestion
, Data Synchronization
and Data Subscription
at the same time, and is suitable for environments that need to quickly build a data reporting platform, as well as an ultra-large-scale data reporting environment that InLong is very suitable for, and an environment that needs to automatically sort and land the reported data.
You can use InLong in the following waysï¼
- Integrate InLong, manage data streams through SDK.
- Use the InLong command-line tool to view and create data streams.
- Visualize your operations on InLong dashboard.
Supported Data Nodes (Updating)
Type | Name | Version |
---|---|---|
Extract Node | Auto Push | None |
File | None | |
Kafka | 2.x | |
MongoDB | >= 3.6 | |
MQTT | >= 3.1 | |
MySQL | 5.6, 5.7, 8.0.x | |
Oracle | 11,12,19 | |
PostgreSQL | 9.6, 10, 11, 12 | |
Pulsar | 2.8.x | |
Redis | 2.6.x | |
SQLServer | 2012, 2014, 2016, 2017, 2019 | |
Load Node | Auto Consumption | None |
ClickHouse | 20.7+ | |
Elasticsearch | 6.x, 7.x | |
Greenplum | 4.x, 5.x, 6.x | |
HBase | 2.2.x | |
HDFS | 2.x, 3.x | |
Hive | 1.x, 2.x, 3.x | |
Iceberg | 0.12.x | |
Hudi | 0.12.x | |
Kafka | 2.x | |
MySQL | 5.6, 5.7, 8.0.x | |
Oracle | 11, 12, 19 | |
PostgreSQL | 9.6, 10, 11, 12 | |
SQLServer | 2012, 2014, 2016, 2017, 2019 | |
TDSQL-PostgreSQL | 10.17 | |
Doris | >= 0.13 | |
StarRocks | >= 2.0 | |
Kudu | >= 1.12.0 | |
Redis | >= 3.0 | |
OceanBase | >= 1.0 |
Build InLong
More detailed instructions can be found at Quick Start section in the documentation.
Requirements:
CodeStyle:
mvn spotless:apply
Compile and install:
mvn clean install -DskipTests
(Optional) Compile using docker image:
docker pull maven:3.6-openjdk-8
docker run -v `pwd`:/inlong -w /inlong maven:3.6-openjdk-8 mvn clean install -DskipTests
after compile successfully, you could find distribution file at inlong-distribution/target
.
Deploy InLong
Develop InLong
- Agent Plugin extends a Extract Data Node
- Sort Plugin extends a Data Node
- Manager Plugin extends a Data Node
- Dashboard Plugin extends a Data Node page
Contribute to InLong
- Report any issue on GitHub Issue
- Code pull request according to How to contribute.
Contact Us
- Join Apache InLong mailing lists:
Name Scope dev@inlong.apache.org Development-related discussions Subscribe Unsubscribe Archives - Ask questions on Apache InLong Slack
Documentation
- Home page: https://inlong.apache.org/
- Issues: https://github.com/apache/inlong/issues
License
© Contributors Licensed under an Apache-2.0 license.
Top Related Projects
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-like data
Apache NiFi
Apache Flink
Mirror of Apache Kafka
Apache Pulsar - distributed pub-sub messaging system
Apache Beam is a unified programming model for Batch and Streaming data processing.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot