Top Related Projects
Apache Storm
Apache Flink
Apache Spark - A unified analytics engine for large-scale data processing
Apache Beam is a unified programming model for Batch and Streaming data processing.
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
Quick Overview
JStorm is an open-source, distributed, and fault-tolerant real-time computation system developed by Alibaba. It is designed to process unbounded streams of data at scale, providing a Java-based alternative to Apache Storm with enhanced performance and easier operability.
Pros
- High performance and low latency for real-time data processing
- Improved stability and easier operability compared to Apache Storm
- Seamless integration with other Alibaba ecosystem tools
- Active development and maintenance by Alibaba
Cons
- Less widespread adoption compared to Apache Storm or Apache Flink
- Documentation and community resources primarily in Chinese
- Steeper learning curve for developers not familiar with Storm-like systems
- Limited ecosystem of third-party connectors and libraries
Code Examples
- Creating a basic topology:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentenceBolt(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCountBolt(), 12).fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
conf.setNumWorkers(3);
StormSubmitter.submitTopology("word-count", conf, builder.createTopology());
- Implementing a custom spout:
public class MySpout extends BaseRichSpout {
private SpoutOutputCollector collector;
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.collector = collector;
}
@Override
public void nextTuple() {
String message = generateMessage();
collector.emit(new Values(message));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("message"));
}
}
- Implementing a custom bolt:
public class MyBolt extends BaseRichBolt {
private OutputCollector collector;
@Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
}
@Override
public void execute(Tuple input) {
String message = input.getString(0);
String processedMessage = processMessage(message);
collector.emit(new Values(processedMessage));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("processed_message"));
}
}
Getting Started
- Add JStorm dependency to your Maven
pom.xml
:
<dependency>
<groupId>com.alibaba.jstorm</groupId>
<artifactId>jstorm-core</artifactId>
<version>2.4.0</version>
</dependency>
- Create a topology class with spouts and bolts.
- Configure and submit the topology:
Config conf = new Config();
conf.setNumWorkers(3);
StormSubmitter.submitTopology("my-topology", conf, builder.createTopology());
- Package your application as a JAR file and submit it to the JStorm cluster using the
jstorm
command-line tool.
Competitor Comparisons
Apache Storm
Pros of Storm
- Larger and more active community support
- More extensive documentation and resources
- Better integration with other Apache projects
Cons of Storm
- Generally slower performance for certain workloads
- Less optimized for specific use cases in Chinese tech ecosystem
- Steeper learning curve for beginners
Code Comparison
JStorm:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new TestWordSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Storm:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
The code structure is very similar between JStorm and Storm, with minor differences in class names and import statements. Both use the TopologyBuilder to construct the topology, set spouts and bolts, and define groupings. The main differences lie in the specific implementations of spouts and bolts, which may be optimized differently for each system.
Apache Flink
Pros of Flink
- More active development and larger community support
- Broader ecosystem with extensive libraries and connectors
- Advanced features like stateful stream processing and event time processing
Cons of Flink
- Steeper learning curve due to more complex API
- Higher resource requirements for small-scale applications
Code Comparison
JStorm example:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new TestWordSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Flink example:
DataStream<String> text = env.addSource(new FlinkKafkaConsumer<>("topic", new SimpleStringSchema(), properties));
DataStream<Tuple2<String, Integer>> wordCounts = text
.flatMap(new Tokenizer())
.keyBy(value -> value.f0)
.sum(1);
Both frameworks offer distributed stream processing capabilities, but Flink provides more advanced features and a wider range of use cases. JStorm, being more lightweight, may be easier to set up for simpler applications. Flink's programming model is more flexible, supporting both stream and batch processing, while JStorm is primarily focused on stream processing.
Apache Spark - A unified analytics engine for large-scale data processing
Pros of Spark
- Wider ecosystem and community support
- More extensive documentation and learning resources
- Better performance for large-scale data processing and machine learning tasks
Cons of Spark
- Steeper learning curve, especially for complex use cases
- Higher memory requirements, which can be costly for large datasets
- Slower startup time compared to JStorm
Code Comparison
Spark (Scala):
val conf = new SparkConf().setAppName("WordCount")
val sc = new SparkContext(conf)
val textFile = sc.textFile("input.txt")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
JStorm (Java):
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Both frameworks offer distributed processing capabilities, but Spark provides a more comprehensive ecosystem for big data analytics and machine learning. JStorm, on the other hand, focuses on real-time stream processing with lower latency. The code examples demonstrate the different approaches to defining data processing pipelines in each framework.
Apache Beam is a unified programming model for Batch and Streaming data processing.
Pros of Beam
- Supports multiple programming languages (Java, Python, Go)
- Provides a unified model for batch and stream processing
- Offers a rich set of built-in transforms and connectors
Cons of Beam
- Steeper learning curve due to its abstraction layer
- May have higher resource requirements for simple use cases
- Less focused on real-time processing compared to JStorm
Code Comparison
JStorm (Java):
public class ExampleTopology {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
}
}
Beam (Java):
public class WordCount {
public static void main(String[] args) {
Pipeline p = Pipeline.create(PipelineOptionsFactory.create());
p.apply(TextIO.read().from("input.txt"))
.apply(FlatMapElements.into(TypeDescriptors.strings()).via((String line) -> Arrays.asList(line.split("\\W+"))))
.apply(Count.<String>perElement())
.apply(MapElements.into(TypeDescriptors.strings()).via((KV<String, Long> wordCount) -> wordCount.getKey() + ": " + wordCount.getValue()))
.apply(TextIO.write().to("output"));
}
}
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
Pros of Heron
- Better performance and lower latency compared to JStorm
- More flexible and modular architecture, allowing easier customization
- Stronger community support and active development as an Apache project
Cons of Heron
- Steeper learning curve due to more complex architecture
- Less mature and potentially less stable than JStorm
- Smaller ecosystem of third-party integrations and tools
Code Comparison
JStorm topology definition:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new TestWordSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Heron topology definition:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new TestWordSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", "word");
The code structure is similar, with minor differences in syntax. Heron uses a string for field grouping instead of a Fields object.
Both projects aim to provide distributed stream processing capabilities, but Heron offers improved performance and flexibility at the cost of increased complexity. JStorm may be a better choice for simpler use cases or when working with existing Alibaba infrastructure.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Alibaba Group has donated JStorm project to the Apache Software Foundation as a subproject of Apache Storm. The improvements and features have been merged into Apache Storm. This is an archived and read-only repository which doesn't accept new issues. Please use Apache Storm instead and report issues there.
Top Related Projects
Apache Storm
Apache Flink
Apache Spark - A unified analytics engine for large-scale data processing
Apache Beam is a unified programming model for Batch and Streaming data processing.
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot