jstorm

Enterprise Stream Process Engine

3,900

1,797

3,900

226

View on GitHub

Top Related Projects

spark

40,785

Apache Spark - A unified analytics engine for large-scale data processing

beam

7,959

Apache Beam is a unified programming model for Batch and Streaming data processing.

incubator-heron

3,629

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

Quick Overview

JStorm is an open-source, distributed, and fault-tolerant real-time computation system developed by Alibaba. It is designed to process unbounded streams of data at scale, providing a Java-based alternative to Apache Storm with enhanced performance and easier operability.

Pros

High performance and low latency for real-time data processing
Improved stability and easier operability compared to Apache Storm
Seamless integration with other Alibaba ecosystem tools
Active development and maintenance by Alibaba

Cons

Less widespread adoption compared to Apache Storm or Apache Flink
Documentation and community resources primarily in Chinese
Steeper learning curve for developers not familiar with Storm-like systems
Limited ecosystem of third-party connectors and libraries

Code Examples

Creating a basic topology:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentenceBolt(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCountBolt(), 12).fieldsGrouping("split", new Fields("word"));

Config conf = new Config();
conf.setNumWorkers(3);
StormSubmitter.submitTopology("word-count", conf, builder.createTopology());

Implementing a custom spout:

public class MySpout extends BaseRichSpout {
    private SpoutOutputCollector collector;

    @Override
    public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
        this.collector = collector;
    }

    @Override
    public void nextTuple() {
        String message = generateMessage();
        collector.emit(new Values(message));
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("message"));
    }
}

Implementing a custom bolt:

public class MyBolt extends BaseRichBolt {
    private OutputCollector collector;

    @Override
    public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
        this.collector = collector;
    }

    @Override
    public void execute(Tuple input) {
        String message = input.getString(0);
        String processedMessage = processMessage(message);
        collector.emit(new Values(processedMessage));
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("processed_message"));
    }
}

Getting Started

Add JStorm dependency to your Maven pom.xml:

<dependency>
    <groupId>com.alibaba.jstorm</groupId>
    <artifactId>jstorm-core</artifactId>
    <version>2.4.0</version>
</dependency>

Create a topology class with spouts and bolts.
Configure and submit the topology:

Config conf = new Config();
conf.setNumWorkers(3);
StormSubmitter.submitTopology("my-topology", conf, builder.createTopology());

Package your application as a JAR file and submit it to the JStorm cluster using the jstorm command-line tool.

Competitor Comparisons

storm

6,609

Apache Storm

Pros of Storm

Larger and more active community support
More extensive documentation and resources
Better integration with other Apache projects

Cons of Storm

Generally slower performance for certain workloads
Less optimized for specific use cases in Chinese tech ecosystem
Steeper learning curve for beginners

Code Comparison

JStorm:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new TestWordSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

Storm:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

The code structure is very similar between JStorm and Storm, with minor differences in class names and import statements. Both use the TopologyBuilder to construct the topology, set spouts and bolts, and define groupings. The main differences lie in the specific implementations of spouts and bolts, which may be optimized differently for each system.

flink

24,422

Apache Flink

Pros of Flink

More active development and larger community support
Broader ecosystem with extensive libraries and connectors
Advanced features like stateful stream processing and event time processing

Cons of Flink

Steeper learning curve due to more complex API
Higher resource requirements for small-scale applications

Code Comparison

JStorm example:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new TestWordSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

Flink example:

DataStream<String> text = env.addSource(new FlinkKafkaConsumer<>("topic", new SimpleStringSchema(), properties));
DataStream<Tuple2<String, Integer>> wordCounts = text
    .flatMap(new Tokenizer())
    .keyBy(value -> value.f0)
    .sum(1);

Both frameworks offer distributed stream processing capabilities, but Flink provides more advanced features and a wider range of use cases. JStorm, being more lightweight, may be easier to set up for simpler applications. Flink's programming model is more flexible, supporting both stream and batch processing, while JStorm is primarily focused on stream processing.

spark

40,785

Apache Spark - A unified analytics engine for large-scale data processing

Pros of Spark

Wider ecosystem and community support
More extensive documentation and learning resources
Better performance for large-scale data processing and machine learning tasks

Cons of Spark

Steeper learning curve, especially for complex use cases
Higher memory requirements, which can be costly for large datasets
Slower startup time compared to JStorm

Code Comparison

Spark (Scala):

val conf = new SparkConf().setAppName("WordCount")
val sc = new SparkContext(conf)
val textFile = sc.textFile("input.txt")
val counts = textFile.flatMap(line => line.split(" "))
                     .map(word => (word, 1))
                     .reduceByKey(_ + _)

JStorm (Java):

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

Both frameworks offer distributed processing capabilities, but Spark provides a more comprehensive ecosystem for big data analytics and machine learning. JStorm, on the other hand, focuses on real-time stream processing with lower latency. The code examples demonstrate the different approaches to defining data processing pipelines in each framework.

beam

7,959

Apache Beam is a unified programming model for Batch and Streaming data processing.

Pros of Beam

Supports multiple programming languages (Java, Python, Go)
Provides a unified model for batch and stream processing
Offers a rich set of built-in transforms and connectors

Cons of Beam

Steeper learning curve due to its abstraction layer
May have higher resource requirements for simple use cases
Less focused on real-time processing compared to JStorm

Code Comparison

JStorm (Java):

public class ExampleTopology {
    public static void main(String[] args) throws Exception {
        TopologyBuilder builder = new TopologyBuilder();
        builder.setSpout("spout", new RandomSentenceSpout(), 5);
        builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
        builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
    }
}

Beam (Java):

public class WordCount {
    public static void main(String[] args) {
        Pipeline p = Pipeline.create(PipelineOptionsFactory.create());
        p.apply(TextIO.read().from("input.txt"))
         .apply(FlatMapElements.into(TypeDescriptors.strings()).via((String line) -> Arrays.asList(line.split("\\W+"))))
         .apply(Count.<String>perElement())
         .apply(MapElements.into(TypeDescriptors.strings()).via((KV<String, Long> wordCount) -> wordCount.getKey() + ": " + wordCount.getValue()))
         .apply(TextIO.write().to("output"));
    }
}

incubator-heron

3,629

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

Pros of Heron

Better performance and lower latency compared to JStorm
More flexible and modular architecture, allowing easier customization
Stronger community support and active development as an Apache project

Cons of Heron

Steeper learning curve due to more complex architecture
Less mature and potentially less stable than JStorm
Smaller ecosystem of third-party integrations and tools

Code Comparison

JStorm topology definition:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new TestWordSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

Heron topology definition:

TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new TestWordSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", "word");

The code structure is similar, with minor differences in syntax. Heron uses a string for field grouping instead of a Fields object.

Both projects aim to provide distributed stream processing capabilities, but Heron offers improved performance and flexibility at the cost of increased complexity. JStorm may be a better choice for simpler use cases or when working with existing Alibaba infrastructure.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Alibaba Group has donated JStorm project to the Apache Software Foundation as a subproject of Apache Storm. The improvements and features have been merged into Apache Storm. This is an archived and read-only repository which doesn't accept new issues. Please use Apache Storm instead and report issues there.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot