Convert Figma logo to code with AI

apache logostorm

Apache Storm

6,587
4,075
6,587
18

Top Related Projects

8,839

Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more

3,915

Enterprise Stream Process Engine

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

23,783

Apache Flink

39,274

Apache Spark - A unified analytics engine for large-scale data processing

Quick Overview

Apache Storm is a distributed real-time computation system for processing large volumes of data with high fault tolerance and guaranteed data processing. It is designed to handle unbounded streams of data and can be used for real-time analytics, online machine learning, continuous computation, and more.

Pros

  • Scalable and fault-tolerant architecture
  • Low latency processing with high throughput
  • Supports multiple programming languages
  • Easy to set up and operate

Cons

  • Steep learning curve for beginners
  • Limited built-in support for exactly-once processing
  • Can be resource-intensive for large-scale deployments
  • Requires careful tuning for optimal performance

Code Examples

  1. Creating a basic topology:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-spout", new RandomSentenceSpout());
builder.setBolt("word-splitter", new SplitSentenceBolt()).shuffleGrouping("word-spout");
builder.setBolt("word-counter", new WordCountBolt()).fieldsGrouping("word-splitter", new Fields("word"));

This code sets up a simple topology with a spout that generates random sentences, a bolt that splits sentences into words, and another bolt that counts word occurrences.

  1. Defining a custom bolt:
public class WordCountBolt extends BaseBasicBolt {
    Map<String, Integer> counts = new HashMap<String, Integer>();

    @Override
    public void execute(Tuple tuple, BasicOutputCollector collector) {
        String word = tuple.getString(0);
        Integer count = counts.get(word);
        if (count == null)
            count = 0;
        count++;
        counts.put(word, count);
        collector.emit(new Values(word, count));
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word", "count"));
    }
}

This code defines a custom bolt that counts word occurrences and emits the word and its count.

  1. Submitting a topology to a Storm cluster:
Config conf = new Config();
conf.setDebug(true);
conf.setNumWorkers(3);

StormSubmitter.submitTopology("word-count-topology", conf, builder.createTopology());

This code configures and submits the topology to a Storm cluster for execution.

Getting Started

  1. Install Apache Storm:

    wget https://downloads.apache.org/storm/apache-storm-2.4.0/apache-storm-2.4.0.tar.gz
    tar -xzf apache-storm-2.4.0.tar.gz
    cd apache-storm-2.4.0
    
  2. Start Storm daemons:

    bin/storm nimbus &
    bin/storm supervisor &
    bin/storm ui &
    
  3. Create a Maven project and add Storm dependency:

    <dependency>
        <groupId>org.apache.storm</groupId>
        <artifactId>storm-core</artifactId>
        <version>2.4.0</version>
        <scope>provided</scope>
    </dependency>
    
  4. Implement your topology and submit it to the cluster:

    StormSubmitter.submitTopology("my-topology", conf, builder.createTopology());
    

Competitor Comparisons

8,839

Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more

Pros of Storm (nathanmarz)

  • Historical significance as the original Storm project
  • Simpler codebase, potentially easier for beginners to understand
  • May contain experimental features not present in the Apache version

Cons of Storm (nathanmarz)

  • No longer actively maintained or updated
  • Lacks many improvements and optimizations found in the Apache version
  • Limited community support and contributions

Code Comparison

Storm (nathanmarz):

public class ExclamationBolt extends BaseRichBolt {
    OutputCollector _collector;

    @Override
    public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
        _collector = collector;
    }
}

Storm (Apache):

public class ExclamationBolt extends BaseRichBolt {
    private OutputCollector collector;

    @Override
    public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) {
        this.collector = collector;
    }
}

The code structure is similar, but the Apache version uses more specific type parameters and follows modern Java conventions. The Apache version also tends to have more extensive documentation and additional features not shown in this brief example.

3,915

Enterprise Stream Process Engine

Pros of JStorm

  • Better performance and lower latency compared to Storm
  • Enhanced fault tolerance and reliability
  • Improved resource utilization and scheduling

Cons of JStorm

  • Less community support and fewer third-party integrations
  • Limited documentation and resources in English
  • Potential compatibility issues with some Storm topologies

Code Comparison

Storm topology example:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

JStorm topology example:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());

The code structure for defining topologies is similar between Storm and JStorm, with minor differences in configuration and cluster submission. JStorm provides additional features for performance optimization and resource management, which may require extra configuration in the topology setup.

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

Pros of Heron

  • Better performance and lower latency compared to Storm
  • Improved resource isolation and easier debugging
  • Backwards compatibility with Storm topologies

Cons of Heron

  • Smaller community and ecosystem
  • Less mature and fewer production deployments
  • Steeper learning curve for new users

Code Comparison

Storm topology definition:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentenceBolt(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCountBolt(), 12).fieldsGrouping("split", new Fields("word"));

Heron topology definition:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentenceBolt(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCountBolt(), 12).fieldsGrouping("split", new Fields("word"));

The code structure for defining topologies is very similar between Storm and Heron, which aligns with Heron's goal of maintaining backwards compatibility with Storm topologies. The main differences lie in the underlying architecture and execution model rather than the API.

23,783

Apache Flink

Pros of Flink

  • Better performance and lower latency for large-scale data processing
  • More comprehensive ecosystem with built-in support for complex event processing and machine learning
  • Stronger exactly-once processing semantics and fault tolerance mechanisms

Cons of Flink

  • Steeper learning curve due to more complex API and concepts
  • Higher memory requirements, especially for large state management
  • Less mature ecosystem for certain integrations compared to Storm

Code Comparison

Flink:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> text = env.socketTextStream("localhost", 9999);
DataStream<Tuple2<String, Integer>> counts = text
    .flatMap(new Tokenizer())
    .keyBy(0)
    .sum(1);
counts.print();

Storm:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader", new WordReader());
builder.setBolt("word-normalizer", new WordNormalizer())
    .shuffleGrouping("word-reader");
builder.setBolt("word-counter", new WordCounter())
    .fieldsGrouping("word-normalizer", new Fields("word"));

Both examples show basic stream processing setups, but Flink's API is more declarative and offers built-in operations like sum, while Storm requires more explicit bolt implementations for similar functionality.

39,274

Apache Spark - A unified analytics engine for large-scale data processing

Pros of Spark

  • Faster processing speed for large-scale data analytics
  • More versatile with support for batch processing, interactive queries, and machine learning
  • Easier to use with high-level APIs in Java, Scala, Python, and R

Cons of Spark

  • Higher memory requirements, which can be costly for large datasets
  • Steeper learning curve for beginners due to more complex architecture
  • Less suitable for real-time, sub-second latency processing compared to Storm

Code Comparison

Storm example (Java):

public class WordCountBolt extends BaseBasicBolt {
    Map<String, Integer> counts = new HashMap<String, Integer>();

    @Override
    public void execute(Tuple tuple, BasicOutputCollector collector) {
        String word = tuple.getString(0);
        Integer count = counts.get(word);
        if (count == null)
            count = 0;
        count++;
        counts.put(word, count);
        collector.emit(new Values(word, count));
    }
}

Spark example (Scala):

val wordCounts = lines.flatMap(line => line.split(" "))
                      .map(word => (word, 1))
                      .reduceByKey(_ + _)

wordCounts.saveAsTextFile("hdfs://...")

The Spark example demonstrates its more concise syntax and higher-level abstractions for data processing tasks, while the Storm example shows its focus on real-time stream processing with fine-grained control over data flow.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Master Branch:
Java CI with Maven Maven Version

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use!

The Rationale page explains what Storm is and why it was built. This presentation is also a good introduction to the project.

Storm has a website at storm.apache.org.

Documentation

Documentation and tutorials can be found on the Storm website.

Developers and contributors should also take a look at our Developer documentation.

Getting help

Storm Users

Storm users should send messages and subscribe to user@storm.apache.org.

You can subscribe to this list by sending an email to user-subscribe@storm.apache.org. Likewise, you can cancel a subscription by sending an email to user-unsubscribe@storm.apache.org.

You can also browse the archives of the storm-user mailing list.

Storm Developers

Storm developers should send messages and subscribe to dev@storm.apache.org.

You can subscribe to this list by sending an email to dev-subscribe@storm.apache.org. Likewise, you can cancel a subscription by sending an email to dev-unsubscribe@storm.apache.org.

You can also browse the archives of the storm-dev mailing list.

Storm developers who would want to track the JIRA issues should subscribe to issues@storm.apache.org.

You can subscribe to this list by sending an email to issues-subscribe@storm.apache.org. Likewise, you can cancel a subscription by sending an email to issues-unsubscribe@storm.apache.org.

You can view the archives of the mailing list here.

Issue tracker

In case you want to raise a bug/feature or propose an idea, please use Apache Jira. If you do not have an account, you need to create one.

Which list should I send/subscribe to?

If you are using a pre-built binary distribution of Storm, then you should send questions, comments, storm-related announcements, etc. to user@storm.apache.org.

If you are building storm from source, developing new features, or otherwise hacking storm source code, then dev@storm.apache.org is more appropriate.

If you are committers and/or PMCs, or contributors looking for following up and participating development of Storm, then you would want to also subscribe issues@storm.apache.org in addition to dev@storm.apache.org.

What happened with storm-user@googlegroups.com?

All existing messages will remain archived there, and can be accessed/searched here.

License

Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

The LICENSE and NOTICE files cover the source distributions. The LICENSE-binary and NOTICE-binary files cover the binary distributions. The DEPENDENCY-LICENSES file lists the licenses of all dependencies of Storm, including those not packaged in the source or binary distributions, such as dependencies of optional connector modules.

Acknowledgements

YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: YourKit Java Profiler and YourKit .NET Profiler.