incubator-heron

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

3,629

593

3,629

409

View on GitHub

Top Related Projects

incubator-heron

3,629

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

spark

40,785

Apache Spark - A unified analytics engine for large-scale data processing

beam

8,082

Apache Beam is a unified programming model for Batch and Streaming data processing.

Quick Overview

Apache Heron (incubating) is a real-time, distributed, fault-tolerant stream processing engine developed by Twitter. It is designed to be highly scalable, efficient, and easy to deploy, making it suitable for large-scale data processing applications. Heron is API-compatible with Apache Storm, allowing for easy migration of existing Storm topologies.

Pros

High performance and low latency, with better throughput than Apache Storm
Improved resource isolation and management through containerization
Easy to deploy and integrate with modern cluster management systems
Backwards compatibility with Apache Storm topologies

Cons

Still in incubation status, which may concern some enterprise users
Smaller community compared to more established stream processing frameworks
Limited ecosystem of connectors and integrations compared to Apache Flink or Spark Streaming
Steeper learning curve for users not familiar with Storm-like topologies

Code Examples

Defining a simple topology:

public class WordCountTopology {
    public static void main(String[] args) throws Exception {
        TopologyBuilder builder = new TopologyBuilder();
        builder.setSpout("word-spout", new WordSpout(), 2);
        builder.setBolt("count-bolt", new CountBolt(), 4).shuffleGrouping("word-spout");
        
        Config conf = new Config();
        conf.setNumWorkers(2);
        
        HeronSubmitter.submitTopology("word-count-topology", conf, builder.createTopology());
    }
}

Implementing a custom bolt:

public class CountBolt extends BaseBasicBolt {
    private Map<String, Integer> counts = new HashMap<>();

    @Override
    public void execute(Tuple tuple, BasicOutputCollector collector) {
        String word = tuple.getString(0);
        Integer count = counts.getOrDefault(word, 0) + 1;
        counts.put(word, count);
        collector.emit(new Values(word, count));
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word", "count"));
    }
}

Configuring a topology with custom options:

Config conf = new Config();
conf.setNumWorkers(2);
conf.setMaxSpoutPending(1000);
conf.setMessageTimeoutSecs(30);
conf.setTopologyReliabilityMode(Config.TopologyReliabilityMode.EFFECTIVELY_ONCE);
conf.setTopologyWorkerChildOpts("-XX:+UseG1GC");

Getting Started

Install Heron:

wget https://apache.org/dyn/closer.lua/incubator/heron/heron-0.20.3/heron-0.20.3-debian10.tar.gz
tar -xvf heron-0.20.3-debian10.tar.gz
export PATH=$PATH:`pwd`/heron-0.20.3/bin

Create a new Maven project and add Heron dependencies to your pom.xml:

<dependency>
  <groupId>org.apache.heron</groupId>
  <artifactId>heron-api</artifactId>
  <version>0.20.3-incubating</version>
</dependency>

Implement your topology and submit it:

mvn clean package
heron submit local /path/to/your/topology.jar com.example.WordCountTopology WordCount

Competitor Comparisons

incubator-heron

3,629

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

Pros of incubator-heron

Efficient and scalable distributed stream processing system
Low latency and high throughput for real-time analytics
Compatibility with Apache Storm topologies

Cons of incubator-heron

Steeper learning curve compared to some other stream processing systems
Limited ecosystem and third-party integrations

Code Comparison

Both repositories contain the same codebase for Apache Heron, so there isn't a direct code comparison to make. However, here's a sample of Heron's topology definition:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word", new TestWordSpout(), 5);
builder.setBolt("exclaim1", new ExclamationBolt(), 4)
        .shuffleGrouping("word");
builder.setBolt("exclaim2", new ExclamationBolt(), 4)
        .shuffleGrouping("exclaim1");

This code demonstrates how to create a simple topology in Heron, which is similar to Apache Storm's API.

Summary

incubator-heron is a powerful distributed stream processing system that offers high performance and compatibility with Apache Storm. While it may have a steeper learning curve and a smaller ecosystem, its efficiency and scalability make it a strong choice for real-time analytics applications.

storm

6,618

Apache Storm

Pros of Storm

More mature and widely adopted in production environments
Extensive ecosystem with a large community and numerous connectors
Supports multiple programming languages (Java, Python, etc.)

Cons of Storm

Higher latency compared to Heron
Less efficient resource utilization
More complex configuration and tuning process

Code Comparison

Storm topology definition:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentenceBolt(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCountBolt(), 12).fieldsGrouping("split", new Fields("word"));

Heron topology definition:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentenceBolt(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCountBolt(), 12).fieldsGrouping("split", "word");

The code structure for defining topologies is similar between Storm and Heron, making it easier for developers to migrate between the two systems. However, Heron offers improved performance and resource efficiency while maintaining API compatibility with Storm.

flink

24,808

Apache Flink

Pros of Flink

More mature and widely adopted in production environments
Extensive ecosystem with a wide range of connectors and libraries
Strong support for both stream and batch processing

Cons of Flink

Steeper learning curve due to its comprehensive feature set
Higher resource consumption, especially for smaller workloads
More complex configuration and deployment process

Code Comparison

Flink:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> text = env.socketTextStream("localhost", 9999);
DataStream<Tuple2<String, Integer>> counts = text
    .flatMap(new Tokenizer())
    .keyBy(0)
    .sum(1);
counts.print();

Heron:

Config conf = Config.newBuilder().build();
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word", new TestWordSpout(), 2);
builder.setBolt("count", new WordCountBolt(), 2)
       .fieldsGrouping("word", new Fields("word"));

Summary

Flink is a more established and feature-rich stream processing framework, offering a comprehensive ecosystem and strong support for both stream and batch processing. However, it comes with a steeper learning curve and higher resource requirements. Heron, being a newer project, focuses on simplicity and efficiency, making it easier to learn and deploy, especially for smaller-scale applications. The choice between the two depends on specific project requirements, scale, and team expertise.

spark

40,785

Apache Spark - A unified analytics engine for large-scale data processing

Pros of Spark

Mature ecosystem with extensive libraries and integrations
Supports both batch and stream processing
Highly scalable and efficient for large-scale data processing

Cons of Spark

Higher memory consumption
Steeper learning curve for beginners
Can be slower for real-time processing compared to dedicated streaming systems

Code Comparison

Spark (Scala):

val lines = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "host1:port1,host2:port2").option("subscribe", "topic1").load()
val words = lines.as[String].flatMap(_.split(" "))
val wordCounts = words.groupBy("value").count()

Heron (Java):

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-spout", new WordSpout(), 2);
builder.setBolt("count-bolt", new CountBolt(), 4).fieldsGrouping("word-spout", new Fields("word"));
Config conf = new Config();
StormSubmitter.submitTopology("word-count-topology", conf, builder.createTopology());

Key Differences

Spark offers a unified engine for various data processing tasks, while Heron focuses on real-time stream processing
Heron provides lower latency for streaming applications
Spark has a larger community and more extensive documentation
Heron offers better resource isolation and easier debugging capabilities

beam

8,082

Apache Beam is a unified programming model for Batch and Streaming data processing.

Pros of Beam

Broader ecosystem support with runners for multiple processing engines (Flink, Spark, Dataflow, etc.)
More mature and widely adopted in production environments
Unified programming model for batch and streaming processing

Cons of Beam

Steeper learning curve due to more complex abstractions
Can be overkill for simpler streaming use cases
Potentially higher latency compared to Heron for certain streaming scenarios

Code Comparison

Heron (Java):

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word", new TestWordSpout(), 2);
builder.setBolt("count", new TestWordCounter(), 4)
        .shuffleGrouping("word");

Beam (Java):

Pipeline p = Pipeline.create();
p.apply(TextIO.read().from("input.txt"))
 .apply(FlatMapElements.into(TypeDescriptors.strings())
        .via((String line) -> Arrays.asList(line.split("\\s+"))))
 .apply(Count.<String>perElement())
 .apply(MapElements.into(TypeDescriptors.strings())
        .via((KV<String, Long> wordCount) ->
            wordCount.getKey() + ": " + wordCount.getValue()))
 .apply(TextIO.write().to("output.txt"));

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

logo

Heron is a realtime analytics platform developed by Twitter. It has a wide array of architectural improvements over it's predecessor.

Heron in Apache Incubation

Documentation

https://heron.incubator.apache.org/
Confluence: https://cwiki.apache.org/confluence/display/HERON

Heron Requirements:

Java 11
Python 3.6
Bazel 6.0.0

Contact

Mailing lists

Name	Scope
user@heron.incubator.apache.org	User-related discussions	Subscribe	Unsubscribe	Archives
dev@heron.incubator.apache.org	Development-related discussions	Subscribe	Unsubscribe	Archives

Slack

Self-Register to our Heron Slack Workspace

Meetup Group

Bay Area Heron Meetup, We meet on Third Monday of Every Month in Palo Alto.

For more information:

Official Heron documentation located at https://heron.apache.org/
Official Heron resources, including Conference & Journal Papers, Videos, Blog Posts and selected Press located at Heron Resources
Twitter Heron: Stream Processing at Scale (academic paper)
Twitter Heron: Stream Processing at Scale (YouTube video)
Flying Faster with Twitter Heron (blog post)

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot