Top Related Projects
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more
Enterprise Stream Process Engine
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
Apache Flink
Apache Spark - A unified analytics engine for large-scale data processing
Quick Overview
Apache Storm is a distributed real-time computation system for processing large volumes of data with high fault tolerance and guaranteed data processing. It is designed to handle unbounded streams of data and can be used for real-time analytics, online machine learning, continuous computation, and more.
Pros
- Scalable and fault-tolerant architecture
- Low latency processing with high throughput
- Supports multiple programming languages
- Easy to set up and operate
Cons
- Steep learning curve for beginners
- Limited built-in support for exactly-once processing
- Can be resource-intensive for large-scale deployments
- Requires careful tuning for optimal performance
Code Examples
- Creating a basic topology:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-spout", new RandomSentenceSpout());
builder.setBolt("word-splitter", new SplitSentenceBolt()).shuffleGrouping("word-spout");
builder.setBolt("word-counter", new WordCountBolt()).fieldsGrouping("word-splitter", new Fields("word"));
This code sets up a simple topology with a spout that generates random sentences, a bolt that splits sentences into words, and another bolt that counts word occurrences.
- Defining a custom bolt:
public class WordCountBolt extends BaseBasicBolt {
Map<String, Integer> counts = new HashMap<String, Integer>();
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.get(word);
if (count == null)
count = 0;
count++;
counts.put(word, count);
collector.emit(new Values(word, count));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
This code defines a custom bolt that counts word occurrences and emits the word and its count.
- Submitting a topology to a Storm cluster:
Config conf = new Config();
conf.setDebug(true);
conf.setNumWorkers(3);
StormSubmitter.submitTopology("word-count-topology", conf, builder.createTopology());
This code configures and submits the topology to a Storm cluster for execution.
Getting Started
-
Install Apache Storm:
wget https://downloads.apache.org/storm/apache-storm-2.4.0/apache-storm-2.4.0.tar.gz tar -xzf apache-storm-2.4.0.tar.gz cd apache-storm-2.4.0
-
Start Storm daemons:
bin/storm nimbus & bin/storm supervisor & bin/storm ui &
-
Create a Maven project and add Storm dependency:
<dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-core</artifactId> <version>2.4.0</version> <scope>provided</scope> </dependency>
-
Implement your topology and submit it to the cluster:
StormSubmitter.submitTopology("my-topology", conf, builder.createTopology());
Competitor Comparisons
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more
Pros of Storm (nathanmarz)
- Historical significance as the original Storm project
- Simpler codebase, potentially easier for beginners to understand
- May contain experimental features not present in the Apache version
Cons of Storm (nathanmarz)
- No longer actively maintained or updated
- Lacks many improvements and optimizations found in the Apache version
- Limited community support and contributions
Code Comparison
Storm (nathanmarz):
public class ExclamationBolt extends BaseRichBolt {
OutputCollector _collector;
@Override
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
}
Storm (Apache):
public class ExclamationBolt extends BaseRichBolt {
private OutputCollector collector;
@Override
public void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
}
}
The code structure is similar, but the Apache version uses more specific type parameters and follows modern Java conventions. The Apache version also tends to have more extensive documentation and additional features not shown in this brief example.
Enterprise Stream Process Engine
Pros of JStorm
- Better performance and lower latency compared to Storm
- Enhanced fault tolerance and reliability
- Improved resource utilization and scheduling
Cons of JStorm
- Less community support and fewer third-party integrations
- Limited documentation and resources in English
- Potential compatibility issues with some Storm topologies
Code Comparison
Storm topology example:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
JStorm topology example:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());
The code structure for defining topologies is similar between Storm and JStorm, with minor differences in configuration and cluster submission. JStorm provides additional features for performance optimization and resource management, which may require extra configuration in the topology setup.
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
Pros of Heron
- Better performance and lower latency compared to Storm
- Improved resource isolation and easier debugging
- Backwards compatibility with Storm topologies
Cons of Heron
- Smaller community and ecosystem
- Less mature and fewer production deployments
- Steeper learning curve for new users
Code Comparison
Storm topology definition:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentenceBolt(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCountBolt(), 12).fieldsGrouping("split", new Fields("word"));
Heron topology definition:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentenceBolt(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCountBolt(), 12).fieldsGrouping("split", new Fields("word"));
The code structure for defining topologies is very similar between Storm and Heron, which aligns with Heron's goal of maintaining backwards compatibility with Storm topologies. The main differences lie in the underlying architecture and execution model rather than the API.
Apache Flink
Pros of Flink
- Better performance and lower latency for large-scale data processing
- More comprehensive ecosystem with built-in support for complex event processing and machine learning
- Stronger exactly-once processing semantics and fault tolerance mechanisms
Cons of Flink
- Steeper learning curve due to more complex API and concepts
- Higher memory requirements, especially for large state management
- Less mature ecosystem for certain integrations compared to Storm
Code Comparison
Flink:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> text = env.socketTextStream("localhost", 9999);
DataStream<Tuple2<String, Integer>> counts = text
.flatMap(new Tokenizer())
.keyBy(0)
.sum(1);
counts.print();
Storm:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader", new WordReader());
builder.setBolt("word-normalizer", new WordNormalizer())
.shuffleGrouping("word-reader");
builder.setBolt("word-counter", new WordCounter())
.fieldsGrouping("word-normalizer", new Fields("word"));
Both examples show basic stream processing setups, but Flink's API is more declarative and offers built-in operations like sum
, while Storm requires more explicit bolt implementations for similar functionality.
Apache Spark - A unified analytics engine for large-scale data processing
Pros of Spark
- Faster processing speed for large-scale data analytics
- More versatile with support for batch processing, interactive queries, and machine learning
- Easier to use with high-level APIs in Java, Scala, Python, and R
Cons of Spark
- Higher memory requirements, which can be costly for large datasets
- Steeper learning curve for beginners due to more complex architecture
- Less suitable for real-time, sub-second latency processing compared to Storm
Code Comparison
Storm example (Java):
public class WordCountBolt extends BaseBasicBolt {
Map<String, Integer> counts = new HashMap<String, Integer>();
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.get(word);
if (count == null)
count = 0;
count++;
counts.put(word, count);
collector.emit(new Values(word, count));
}
}
Spark example (Scala):
val wordCounts = lines.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
wordCounts.saveAsTextFile("hdfs://...")
The Spark example demonstrates its more concise syntax and higher-level abstractions for data processing tasks, while the Storm example shows its focus on real-time stream processing with fine-grained control over data flow.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use!
The Rationale page explains what Storm is and why it was built. This presentation is also a good introduction to the project.
Storm has a website at storm.apache.org.
Documentation
Documentation and tutorials can be found on the Storm website.
Developers and contributors should also take a look at our Developer documentation.
Getting help
Storm Users
Storm users should send messages and subscribe to user@storm.apache.org.
You can subscribe to this list by sending an email to user-subscribe@storm.apache.org. Likewise, you can cancel a subscription by sending an email to user-unsubscribe@storm.apache.org.
You can also browse the archives of the storm-user mailing list.
Storm Developers
Storm developers should send messages and subscribe to dev@storm.apache.org.
You can subscribe to this list by sending an email to dev-subscribe@storm.apache.org. Likewise, you can cancel a subscription by sending an email to dev-unsubscribe@storm.apache.org.
You can also browse the archives of the storm-dev mailing list.
Storm developers who would want to track the JIRA issues should subscribe to issues@storm.apache.org.
You can subscribe to this list by sending an email to issues-subscribe@storm.apache.org. Likewise, you can cancel a subscription by sending an email to issues-unsubscribe@storm.apache.org.
You can view the archives of the mailing list here.
Issue tracker
In case you want to raise a bug/feature or propose an idea, please use Apache Jira. If you do not have an account, you need to create one.
Which list should I send/subscribe to?
If you are using a pre-built binary distribution of Storm, then you should send questions, comments, storm-related announcements, etc. to user@storm.apache.org.
If you are building storm from source, developing new features, or otherwise hacking storm source code, then dev@storm.apache.org is more appropriate.
If you are committers and/or PMCs, or contributors looking for following up and participating development of Storm, then you would want to also subscribe issues@storm.apache.org in addition to dev@storm.apache.org.
What happened with storm-user@googlegroups.com?
All existing messages will remain archived there, and can be accessed/searched here.
License
Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
The LICENSE and NOTICE files cover the source distributions. The LICENSE-binary and NOTICE-binary files cover the binary distributions. The DEPENDENCY-LICENSES file lists the licenses of all dependencies of Storm, including those not packaged in the source or binary distributions, such as dependencies of optional connector modules.
Acknowledgements
YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: YourKit Java Profiler and YourKit .NET Profiler.
Top Related Projects
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more
Enterprise Stream Process Engine
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
Apache Flink
Apache Spark - A unified analytics engine for large-scale data processing
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot