Convert Figma logo to code with AI

nathanmarz logostorm

Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more

8,825
1,666
8,825
42

Top Related Projects

23,929

Apache Flink

40,184

Apache Spark - A unified analytics engine for large-scale data processing

7,828

Apache Beam is a unified programming model for Batch and Streaming data processing.

4,830

Apache NiFi

28,601

Mirror of Apache Kafka

Quick Overview

Apache Storm is a distributed real-time computation system for processing large volumes of data with high fault tolerance and guaranteed data processing. It is designed to handle unbounded streams of data and can be used for real-time analytics, online machine learning, continuous computation, and more.

Pros

  • Scalable and fault-tolerant architecture
  • Low latency processing with high throughput
  • Supports multiple programming languages
  • Easy to set up and operate

Cons

  • Steep learning curve for complex topologies
  • Limited built-in support for stateful processing
  • Can be resource-intensive for large-scale deployments
  • Requires careful tuning for optimal performance

Code Examples

  1. Creating a basic topology:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-spout", new RandomSentenceSpout());
builder.setBolt("word-splitter", new SplitSentenceBolt()).shuffleGrouping("word-spout");
builder.setBolt("word-counter", new WordCountBolt()).fieldsGrouping("word-splitter", new Fields("word"));
  1. Defining a custom bolt:
public class WordCountBolt extends BaseBasicBolt {
    Map<String, Integer> counts = new HashMap<String, Integer>();

    @Override
    public void execute(Tuple tuple, BasicOutputCollector collector) {
        String word = tuple.getString(0);
        Integer count = counts.get(word);
        if (count == null)
            count = 0;
        count++;
        counts.put(word, count);
        collector.emit(new Values(word, count));
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word", "count"));
    }
}
  1. Submitting a topology to a Storm cluster:
Config conf = new Config();
conf.setDebug(true);
conf.setNumWorkers(3);

StormSubmitter.submitTopology("word-count-topology", conf, builder.createTopology());

Getting Started

  1. Add Storm dependency to your project:
<dependency>
    <groupId>org.apache.storm</groupId>
    <artifactId>storm-core</artifactId>
    <version>2.4.0</version>
</dependency>
  1. Create a topology with spouts and bolts:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout-id", new YourCustomSpout());
builder.setBolt("bolt-id", new YourCustomBolt()).shuffleGrouping("spout-id");
  1. Configure and submit the topology:
Config conf = new Config();
conf.setNumWorkers(2);
StormSubmitter.submitTopology("topology-name", conf, builder.createTopology());

Competitor Comparisons

23,929

Apache Flink

Pros of Flink

  • Higher throughput and lower latency for large-scale data processing
  • Built-in support for stateful computations and exactly-once semantics
  • More flexible windowing operations and event time processing

Cons of Flink

  • Steeper learning curve due to more complex API and concepts
  • Less mature ecosystem compared to Storm's long-standing community
  • Requires more resources and careful tuning for optimal performance

Code Comparison

Storm topology definition:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentenceBolt(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCountBolt(), 12).fieldsGrouping("split", new Fields("word"));

Flink job definition:

DataStream<String> text = env.addSource(new FlinkKafkaConsumer<>("topic", new SimpleStringSchema(), properties));
DataStream<Tuple2<String, Integer>> counts = text
    .flatMap(new Tokenizer())
    .keyBy(value -> value.f0)
    .window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
    .sum(1);

Both frameworks offer distributed stream processing capabilities, but Flink provides more advanced features for complex event processing and stateful computations. Storm's simplicity makes it easier to get started, while Flink's power comes with a steeper learning curve. The code examples show the difference in API design, with Flink offering a more declarative approach to stream processing.

40,184

Apache Spark - A unified analytics engine for large-scale data processing

Pros of Spark

  • Higher-level APIs and better support for batch processing
  • More efficient memory usage and faster performance for large-scale data processing
  • Wider ecosystem with libraries for machine learning, graph processing, and SQL

Cons of Spark

  • Steeper learning curve due to more complex architecture
  • Higher resource requirements, especially for smaller datasets
  • Less suitable for real-time stream processing compared to Storm

Code Comparison

Storm example:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

Spark example:

val lines = spark.readStream.format("socket").option("host", "localhost").option("port", 9999).load()
val words = lines.as[String].flatMap(_.split(" "))
val wordCounts = words.groupBy("value").count()
val query = wordCounts.writeStream.outputMode("complete").format("console").start()

Both examples demonstrate basic stream processing, but Spark's code is more concise and uses higher-level abstractions. Storm's topology is explicitly defined, while Spark's processing is more declarative.

7,828

Apache Beam is a unified programming model for Batch and Streaming data processing.

Pros of Beam

  • Unified programming model for batch and streaming data processing
  • Supports multiple execution engines (Flink, Spark, Dataflow)
  • Extensive set of built-in transforms and I/O connectors

Cons of Beam

  • Steeper learning curve due to more complex API
  • Less mature ecosystem compared to Storm
  • Potentially higher resource requirements for small-scale applications

Code Comparison

Storm:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

Beam:

Pipeline p = Pipeline.create();
p.apply(TextIO.read().from("input.txt"))
 .apply(ParDo.of(new ExtractWordsFn()))
 .apply(Count.<String>perElement())
 .apply(MapElements.into(TypeDescriptors.strings()).via(kv -> kv.getKey() + ": " + kv.getValue()))
 .apply(TextIO.write().to("output.txt"));

Both Storm and Beam offer powerful distributed data processing capabilities, but they cater to different use cases. Storm excels in real-time stream processing with low latency, while Beam provides a more versatile approach for both batch and streaming scenarios across multiple execution engines.

4,830

Apache NiFi

Pros of NiFi

  • More user-friendly with a web-based UI for designing and managing data flows
  • Supports a wider range of data formats and protocols out-of-the-box
  • Better suited for ETL and data integration tasks

Cons of NiFi

  • Generally slower processing speed compared to Storm
  • Less suitable for real-time stream processing at massive scale
  • Steeper learning curve for complex data flow configurations

Code Comparison

NiFi uses a declarative approach with XML-based flow configurations:

<processor>
  <id>abc123</id>
  <name>GenerateFlowFile</name>
  <position x="0" y="0"/>
  <config>
    <properties>
      <entry>
        <key>File Size</key>
        <value>1 MB</value>
      </entry>
    </properties>
  </config>
</processor>

Storm uses Java-based topologies with a more programmatic approach:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentenceBolt(), 8)
       .shuffleGrouping("spout");
builder.setBolt("count", new WordCountBolt(), 12)
       .fieldsGrouping("split", new Fields("word"));

Both projects have their strengths, with NiFi excelling in data integration and Storm in high-throughput stream processing.

28,601

Mirror of Apache Kafka

Pros of Kafka

  • Higher throughput and better scalability for large-scale data streaming
  • Built-in partitioning and replication for fault tolerance and high availability
  • Longer data retention capabilities, allowing for replay and batch processing

Cons of Kafka

  • More complex setup and configuration compared to Storm
  • Less real-time processing capabilities, as it's primarily designed for data streaming
  • Limited built-in processing functionality, often requiring additional tools for data transformation

Code Comparison

Storm (Topology definition):

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

Kafka (Producer example):

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
producer.send(new ProducerRecord<>("topic", "key", "value"));

Both Storm and Kafka are powerful tools for distributed data processing, but they serve different primary purposes. Storm excels in real-time stream processing, while Kafka is optimized for high-throughput data streaming and storage.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

IMPORTANT NOTE!!!

Storm has Moved to Apache. The official Storm git repository is now hosted by Apache, and is mirrored on github here:

https://github.com/apache/incubator-storm

Contributing

Source code contributions can be submitted either by sumitting a pull request or by creating an issue in JIRA and attaching patches.

Migrating Git Repos from nathanmarz/storm to apache/incubator-storm

If you have an existing fork/clone of nathanmarz/storm, you can migrate to apache/incubator-storm by doing the following:

  1. Create a new fork of apache/incubator-storm

  2. Point your existing clone to the new fork:

     git remote remove origin
     git remote add origin git@github.com:username/incubator-storm.git
    

Issue Tracking

The official issue tracker for Storm is Apache JIRA:

https://issues.apache.org/jira/browse/STORM

User Mailing List

Storm users should send messages and subscribe to user@storm.incubator.apache.org.

You can subscribe to this list by sending an email to user-subscribe@storm.incubator.apache.org. Likewise, you can cancel a subscription by sending an email to user-unsubscribe@storm.incubator.apache.org.

You can view the archives of the mailing list here.

Developer Mailing List

Storm developers should send messages and subscribe to dev@storm.incubator.apache.org.

You can subscribe to this list by sending an email to dev-subscribe@storm.incubator.apache.org. Likewise, you can cancel a subscription by sending an email to dev-unsubscribe@storm.incubator.apache.org.

You can view the archives of the mailing list here.

Which list should I send/subscribe to?

If you are using a pre-built binary distribution of Storm, then chances are you should send questions, comments, storm-related announcements, etc. to user@storm.apache.incubator.org.

If you are building storm from source, developing new features, or otherwise hacking storm source code, then dev@storm.incubator.apache.org is more appropriate.

What will happen with storm-user@googlegroups.com?

All existing messages will remain archived there, and can be accessed/searched here.

New messages sent to storm-user@googlegroups.com will either be rejected/bounced or replied to with a message to direct the email to the appropriate Apache-hosted group.