storm

Apache Storm

6,645

4,062

6,645

View on GitHub

Top Related Projects

storm

8,815

Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more

incubator-heron

3,619

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

spark

42,015

Apache Spark - A unified analytics engine for large-scale data processing

Quick Overview

Apache Storm is a distributed real-time computation system for processing large volumes of data with high fault tolerance and guaranteed data processing. It is designed to handle unbounded streams of data and can be used for real-time analytics, online machine learning, continuous computation, and more.

Pros

Scalable and fault-tolerant architecture
Low latency processing with high throughput
Supports multiple programming languages
Easy to set up and operate

Cons

Steep learning curve for beginners
Limited built-in support for exactly-once processing
Can be resource-intensive for large-scale deployments
Requires careful tuning for optimal performance

Code Examples

Creating a basic topology:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-spout", new RandomSentenceSpout());
builder.setBolt("word-splitter", new SplitSentenceBolt()).shuffleGrouping("word-spout");
builder.setBolt("word-counter", new WordCountBolt()).fieldsGrouping("word-splitter", new Fields("word"));

This code sets up a simple topology with a spout that generates random sentences, a bolt that splits sentences into words, and another bolt that counts word occurrences.

Defining a custom bolt:

public class WordCountBolt extends BaseBasicBolt {
    Map<String, Integer> counts = new HashMap<String, Integer>();

    @Override
    public void execute(Tuple tuple, BasicOutputCollector collector) {
        String word = tuple.getString(0);
        Integer count = counts.get(word);
        if (count == null)
            count = 0;
        count++;
        counts.put(word, count);
        collector.emit(new Values(word, count));
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word", "count"));
    }
}

This code defines a custom bolt that counts word occurrences and emits the word and its count.

Submitting a topology to a Storm cluster:

Config conf = new Config();
conf.setDebug(true);
conf.setNumWorkers(3);

StormSubmitter.submitTopology("word-count-topology", conf, builder.createTopology());

This code configures and submits the topology to a Storm cluster for execution.

Getting Started

Install Apache Storm:

wget https://downloads.apache.org/storm/apache-storm-2.4.0/apache-storm-2.4.0.tar.gz
tar -xzf apache-storm-2.4.0.tar.gz
cd apache-storm-2.4.0

Start Storm daemons:

bin/storm nimbus &
bin/storm supervisor &
bin/storm ui &

Create a Maven project and add Storm dependency:

<dependency>
    <groupId>org.apache.storm</groupId>
    <artifactId>storm-core</artifactId>
    <version>2.4.0</version>
    <scope>provided</scope>
</dependency>

Implement your topology and submit it to the cluster:

StormSubmitter.submitTopology("my-topology", conf, builder.createTopology());

Competitor Comparisons

storm

8,815

Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more

Pros of Storm (nathanmarz)

Historical significance as the original Storm project
Simpler codebase, potentially easier for beginners to understand
May contain experimental features not present in the Apache version

Cons of Storm (nathanmarz)

No longer actively maintained or updated
Lacks many improvements and optimizations found in the Apache version
Limited community support and contributions

Code Comparison

Storm (nathanmarz):

public class ExclamationBolt extends BaseRichBolt {
    OutputCollector _collector;

    @Override
    public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
        _collector = collector;
    }
}

Storm (Apache):

public class ExclamationBolt extends BaseRichBolt {
    private OutputCollector collector;

    @Override
    public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) {
        this.collector = collector;
    }
}

The code structure is similar, but the Apache version uses more specific type parameters and follows modern Java conventions. The Apache version also tends to have more extensive documentation and additional features not shown in this brief example.

jstorm

3,891

Enterprise Stream Process Engine

Pros of JStorm

Better performance and lower latency compared to Storm
Enhanced fault tolerance and reliability
Improved resource utilization and scheduling

Cons of JStorm

Less community support and fewer third-party integrations
Limited documentation and resources in English
Potential compatibility issues with some Storm topologies

Code Comparison

Storm topology example:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

JStorm topology example:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());

The code structure for defining topologies is similar between Storm and JStorm, with minor differences in configuration and cluster submission. JStorm provides additional features for performance optimization and resource management, which may require extra configuration in the topology setup.

incubator-heron

3,619

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

Pros of Heron

Better performance and lower latency compared to Storm
Improved resource isolation and easier debugging
Backwards compatibility with Storm topologies

Cons of Heron

Smaller community and ecosystem
Less mature and fewer production deployments
Steeper learning curve for new users

Code Comparison

Storm topology definition:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentenceBolt(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCountBolt(), 12).fieldsGrouping("split", new Fields("word"));

Heron topology definition:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentenceBolt(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCountBolt(), 12).fieldsGrouping("split", new Fields("word"));

The code structure for defining topologies is very similar between Storm and Heron, which aligns with Heron's goal of maintaining backwards compatibility with Storm topologies. The main differences lie in the underlying architecture and execution model rather than the API.

flink

25,110

Apache Flink

Pros of Flink

Better performance and lower latency for large-scale data processing
More comprehensive ecosystem with built-in support for complex event processing and machine learning
Stronger exactly-once processing semantics and fault tolerance mechanisms

Cons of Flink

Steeper learning curve due to more complex API and concepts
Higher memory requirements, especially for large state management
Less mature ecosystem for certain integrations compared to Storm

Code Comparison

Flink:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> text = env.socketTextStream("localhost", 9999);
DataStream<Tuple2<String, Integer>> counts = text
    .flatMap(new Tokenizer())
    .keyBy(0)
    .sum(1);
counts.print();

Storm:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader", new WordReader());
builder.setBolt("word-normalizer", new WordNormalizer())
    .shuffleGrouping("word-reader");
builder.setBolt("word-counter", new WordCounter())
    .fieldsGrouping("word-normalizer", new Fields("word"));

Both examples show basic stream processing setups, but Flink's API is more declarative and offers built-in operations like sum, while Storm requires more explicit bolt implementations for similar functionality.

spark

42,015

Apache Spark - A unified analytics engine for large-scale data processing

Pros of Spark

Faster processing speed for large-scale data analytics
More versatile with support for batch processing, interactive queries, and machine learning
Easier to use with high-level APIs in Java, Scala, Python, and R

Cons of Spark

Higher memory requirements, which can be costly for large datasets
Steeper learning curve for beginners due to more complex architecture
Less suitable for real-time, sub-second latency processing compared to Storm

Code Comparison

Storm example (Java):

public class WordCountBolt extends BaseBasicBolt {
    Map<String, Integer> counts = new HashMap<String, Integer>();

    @Override
    public void execute(Tuple tuple, BasicOutputCollector collector) {
        String word = tuple.getString(0);
        Integer count = counts.get(word);
        if (count == null)
            count = 0;
        count++;
        counts.put(word, count);
        collector.emit(new Values(word, count));
    }
}

Spark example (Scala):

val wordCounts = lines.flatMap(line => line.split(" "))
                      .map(word => (word, 1))
                      .reduceByKey(_ + _)

wordCounts.saveAsTextFile("hdfs://...")

The Spark example demonstrates its more concise syntax and higher-level abstractions for data processing tasks, while the Storm example shows its focus on real-time stream processing with fine-grained control over data flow.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Master Branch:

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use!

The Rationale page explains what Storm is and why it was built. This presentation is also a good introduction to the project.

Storm has a website at storm.apache.org.

Documentation

Documentation and tutorials can be found on the Storm website.

Developers and contributors should also take a look at our Developer documentation.

Getting help

Storm Users

Storm users should send messages and subscribe to user@storm.apache.org.

You can subscribe to this list by sending an email to user-subscribe@storm.apache.org. Likewise, you can cancel a subscription by sending an email to user-unsubscribe@storm.apache.org.

You can also browse the archives of the storm-user mailing list.

Storm Developers

Storm developers should send messages and subscribe to dev@storm.apache.org.

You can subscribe to this list by sending an email to dev-subscribe@storm.apache.org. Likewise, you can cancel a subscription by sending an email to dev-unsubscribe@storm.apache.org.

You can also browse the archives of the storm-dev mailing list.

Storm developers who would want to track issues should subscribe to issues@storm.apache.org.

You can subscribe to this list by sending an email to issues-subscribe@storm.apache.org. Likewise, you can cancel a subscription by sending an email to issues-unsubscribe@storm.apache.org.

You can view the archives of the mailing list here.

Issue tracker

In case you want to raise a bug/feature or propose an idea, please use GitHub Issues. If you do not have an account, you need to create one.

Which list should I send/subscribe to?

If you are using a pre-built binary distribution of Storm, then you should send questions, comments, storm-related announcements, etc. to user@storm.apache.org.

If you are building storm from source, developing new features, or otherwise hacking storm source code, then dev@storm.apache.org is more appropriate.

If you are committers and/or PMCs, or contributors looking for following up and participating development of Storm, then you would want to also subscribe issues@storm.apache.org in addition to dev@storm.apache.org.

What happened with storm-user@googlegroups.com?

All existing messages will remain archived there, and can be accessed/searched here.

License

Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

The LICENSE and NOTICE files cover the source distributions. The LICENSE-binary and NOTICE-binary files cover the binary distributions. The DEPENDENCY-LICENSES file lists the licenses of all dependencies of Storm, including those not packaged in the source or binary distributions, such as dependencies of optional connector modules.

Acknowledgements

YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: YourKit Java Profiler and YourKit .NET Profiler.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot