Convert Figma logo to code with AI

apache logologging-flume

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-like data

2,528
1,568
2,528
92

Top Related Projects

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-like data

14,203

Logstash - transport and process your logs, events, or other data

12,869

Fluentd: Unified Logging Layer (project under CNCF)

17,660

A high-performance observability data pipeline.

14,466

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.

23,621

Like Prometheus, but for logs.

Quick Overview

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating, and moving large amounts of log data from many different sources to a centralized data store. It is designed to handle high-volume streaming data and can be used for log aggregation, event processing, and data ingestion into systems like Hadoop.

Pros

  • Highly scalable and fault-tolerant architecture
  • Flexible and customizable with a wide range of built-in and custom components
  • Supports multiple sources and sinks, allowing integration with various data systems
  • Provides reliable data delivery with transaction-based approach

Cons

  • Can be complex to set up and configure for advanced use cases
  • Limited built-in support for data transformation and processing
  • May require significant resources for high-volume data streams
  • Learning curve for newcomers to understand the concepts and architecture

Code Examples

  1. Basic Flume configuration for reading from a file and writing to HDFS:
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /var/log/myapp.log

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://namenode/flume/events

# Use a channel which buffers events in memory
a1.channels.c1.type = memory

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
  1. Configuring a custom interceptor:
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = com.example.CustomInterceptor$Builder
a1.sources.r1.interceptors.i1.someProperty = someValue
  1. Setting up a load balancing sink processor:
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.selector = round_robin
a1.sinkgroups.g1.processor.backoff = true

Getting Started

  1. Download and install Apache Flume from the official website.
  2. Create a configuration file (e.g., example.conf) with your desired setup.
  3. Start the Flume agent:
$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
  1. Flume will now start collecting and processing data according to your configuration.

Competitor Comparisons

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-like data

Pros of logging-flume

  • Identical repository, so all features and functionality are the same
  • No differences in performance or capabilities
  • Consistent development and maintenance as they are the same project

Cons of logging-flume

  • No unique advantages over the other repository
  • Potential confusion for users due to duplicate repositories
  • Redundant maintenance efforts if both repositories are actively managed

Code comparison

As both repositories are identical, there are no code differences to compare. Here's a sample of the code structure found in both repositories:

public class FlumeConfiguration {
  private static final Logger LOGGER = LoggerFactory.getLogger(FlumeConfiguration.class);

  private final Map<String, AgentConfiguration> agentConfigMap;
  private final Properties properties;
}

Both repositories contain the same codebase for Apache Flume, a distributed, reliable, and available system for efficiently collecting, aggregating, and moving large amounts of log data from many different sources to a centralized data store.

14,203

Logstash - transport and process your logs, events, or other data

Pros of Logstash

  • More extensive plugin ecosystem, allowing for greater flexibility in data processing and output
  • Tighter integration with the Elastic Stack, providing seamless compatibility with Elasticsearch and Kibana
  • Better support for real-time data processing and analytics

Cons of Logstash

  • Higher resource consumption, especially in terms of memory usage
  • Steeper learning curve for configuration and setup compared to Flume
  • Less suitable for high-volume, high-throughput scenarios without additional tuning

Code Comparison

Flume configuration example:

agent.sources = s1
agent.channels = c1
agent.sinks = k1

agent.sources.s1.type = netcat
agent.sources.s1.bind = localhost
agent.sources.s1.port = 44444

Logstash configuration example:

input {
  tcp {
    port => 44444
    type => "example"
  }
}
output {
  elasticsearch {
    hosts => ["localhost:9200"]
  }
}

Both examples show basic configurations for receiving data on port 44444. Flume uses a more structured approach with explicit source, channel, and sink definitions, while Logstash uses a more streamlined input-output pipeline configuration.

12,869

Fluentd: Unified Logging Layer (project under CNCF)

Pros of Fluentd

  • More flexible plugin system with over 500 community-contributed plugins
  • Better performance and scalability, especially for high-volume data streams
  • Easier configuration with a unified logging layer

Cons of Fluentd

  • Steeper learning curve due to more complex architecture
  • Requires more system resources, especially memory

Code Comparison

Flume configuration example:

agent.sources = s1
agent.channels = c1
agent.sinks = k1

agent.sources.s1.type = netcat
agent.sources.s1.bind = localhost
agent.sources.s1.port = 44444

Fluentd configuration example:

<source>
  @type tcp
  port 24224
  tag myapp.access
</source>

<match myapp.access>
  @type file
  path /var/log/fluent/access
</match>

Both Flume and Fluentd are popular log collection and aggregation tools, but they differ in their approach and capabilities. Flume is designed specifically for Hadoop ecosystems, while Fluentd is more versatile and can be used in various environments. Fluentd's plugin system and unified logging layer make it more adaptable to different use cases, but it may require more resources and have a steeper learning curve compared to Flume's simpler architecture.

17,660

A high-performance observability data pipeline.

Pros of Vector

  • More modern and actively maintained, with frequent updates and releases
  • Supports a wider range of data sources and sinks, including cloud-native integrations
  • Written in Rust, offering better performance and lower resource usage

Cons of Vector

  • Younger project with a smaller community compared to Flume
  • Less documentation and fewer third-party resources available
  • Steeper learning curve for users familiar with Java-based tools like Flume

Code Comparison

Vector configuration example:

[sources.apache_logs]
type = "file"
include = ["/var/log/apache2/*.log"]

[transforms.parse_apache]
type = "regex_parser"
inputs = ["apache_logs"]
pattern = '%{COMBINEDAPACHELOG}'

[sinks.elasticsearch]
type = "elasticsearch"
inputs = ["parse_apache"]
host = "http://localhost:9200"

Flume configuration example:

agent.sources = webserver
agent.channels = memoryChannel
agent.sinks = elasticsearch

agent.sources.webserver.type = exec
agent.sources.webserver.command = tail -F /var/log/apache2/access.log

agent.channels.memoryChannel.type = memory

agent.sinks.elasticsearch.type = elasticsearch
agent.sinks.elasticsearch.hostNames = 127.0.0.1:9200

Both examples show basic log collection and forwarding to Elasticsearch, but Vector's configuration is more concise and offers built-in parsing capabilities.

14,466

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.

Pros of Telegraf

  • More versatile: Supports a wider range of input plugins and data sources
  • Better performance: Designed for high-throughput data collection
  • Active development: More frequent updates and community contributions

Cons of Telegraf

  • Steeper learning curve: More complex configuration due to its extensive features
  • Resource intensive: Can consume more system resources, especially with many plugins

Code Comparison

Flume configuration example:

agent.sources = netcat-source
agent.sinks = logger-sink
agent.channels = memory-channel

agent.sources.netcat-source.type = netcat
agent.sources.netcat-source.bind = localhost
agent.sources.netcat-source.port = 44444

Telegraf configuration example:

[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false

[[outputs.influxdb]]
  urls = ["http://localhost:8086"]
  database = "telegraf"

Both Flume and Telegraf are data collection and forwarding tools, but they have different strengths. Flume is primarily designed for log data collection and aggregation in Hadoop environments, while Telegraf is a more general-purpose metrics collection agent with broader application support.

Telegraf offers greater flexibility and a wider range of integrations, making it suitable for various monitoring scenarios. However, this versatility comes at the cost of increased complexity and potentially higher resource usage.

Flume's configuration tends to be simpler and more focused on log data, while Telegraf's configuration allows for more detailed customization of metrics collection and processing.

23,621

Like Prometheus, but for logs.

Pros of Loki

  • Designed for cloud-native environments and Kubernetes
  • Efficient storage and indexing optimized for logs
  • Seamless integration with Grafana for visualization

Cons of Loki

  • Less mature compared to Flume
  • Limited support for non-Prometheus-based metrics
  • Steeper learning curve for those unfamiliar with Prometheus ecosystem

Code Comparison

Flume configuration example:

agent.sources = r1
agent.channels = c1
agent.sinks = k1

agent.sources.r1.type = netcat
agent.sources.r1.bind = localhost
agent.sources.r1.port = 44444

Loki configuration example:

auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1

Both Flume and Loki serve as log collection and aggregation tools, but they cater to different use cases and environments. Flume is more established and versatile for traditional data center setups, while Loki is tailored for modern cloud-native architectures. The code examples showcase the configuration differences, with Flume using a properties-based format and Loki utilizing YAML.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Project status

[!WARNING] This project is not maintained anymore! It has been marked as dormant by Apache Logging Services consensus on 2024-10-10. Users are advised to migrate to alternatives. For other inquiries, see the support policy.

Welcome to Apache Flume!

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic application.

The Apache Flume 1.x (NG) code line is a refactoring of the first generation Flume to solve certain known issues and limitations of the original design.

Apache Flume is open-sourced under the Apache Software Foundation License v2.0.

Documentation

Documentation is included in the binary distribution under the docs directory. In source form, it can be found in the flume-ng-doc directory.

The Flume 1.x guide and FAQ are available here:

Contact us!

Bug and Issue tracker.

Compiling Flume

Compiling Flume requires the following tools:

  • Oracle Java JDK 1.8
  • Apache Maven 3.x

Note: The Apache Flume build requires more memory than the default configuration. We recommend you set the following Maven options:

export MAVEN_OPTS="-Xms512m -Xmx1024m"

To compile Flume and build a distribution tarball, run mvn install from the top level directory. The artifacts will be placed under flume-ng-dist/target/.