nifi

Apache NiFi

5,494

2,837

5,494

View on GitHub

Top Related Projects

logstash

14,590

Logstash - transport and process your logs, events, or other data

airbyte

19,037

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

airflow

41,350

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

prefect

19,925

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

dagster

13,694

An orchestration platform for the development, production, and observation of data assets.

spring-cloud-dataflow

1,132

A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes

Quick Overview

Apache NiFi is a powerful and scalable data integration and distribution system. It provides a web-based user interface for designing, controlling, and monitoring a dataflow, allowing users to automate the movement of data between disparate systems. NiFi supports a wide range of data formats and protocols, making it versatile for various data processing scenarios.

Pros

Highly scalable and can handle large volumes of data
User-friendly web interface for designing and monitoring dataflows
Supports a wide range of data formats and protocols
Provides data provenance and lineage tracking

Cons

Steep learning curve for complex dataflows
Resource-intensive for large-scale deployments
Limited support for real-time streaming compared to some alternatives
Can be complex to set up and configure in distributed environments

Getting Started

To get started with Apache NiFi:

Download the latest version from the Apache NiFi website.
Extract the archive to your desired location.
Open a terminal and navigate to the NiFi directory.
Run the following command to start NiFi:

bin/nifi.sh start

Open a web browser and go to http://localhost:8080/nifi to access the NiFi web interface.
Begin designing your dataflow by dragging and dropping processors onto the canvas.
Configure processors and connect them to create your desired data pipeline.

For more detailed instructions and documentation, refer to the official Apache NiFi documentation.

Competitor Comparisons

logstash

14,590

Logstash - transport and process your logs, events, or other data

Pros of Logstash

Tighter integration with Elasticsearch and Kibana (ELK stack)
Simpler setup and configuration for log processing pipelines
Extensive plugin ecosystem for input, filter, and output options

Cons of Logstash

Less flexible for complex data routing and transformation scenarios
Limited support for real-time data processing and streaming
Higher resource consumption, especially for large-scale deployments

Code Comparison

NiFi configuration example:

<processor>
  <id>abc123</id>
  <name>ConvertRecord</name>
  <properties>
    <entry>
      <key>record-reader</key>
      <value>csv-reader</value>
    </entry>
  </properties>
</processor>

Logstash configuration example:

input {
  file {
    path => "/var/log/messages"
    type => "syslog"
  }
}
filter {
  grok {
    match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
  }
}
output {
  elasticsearch {
    hosts => ["localhost:9200"]
  }
}

Both NiFi and Logstash are powerful data ingestion and processing tools, but they cater to different use cases. NiFi excels in complex data flows and distributed processing, while Logstash shines in log processing and integration with the Elastic stack.

airbyte

19,037

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Pros of Airbyte

More focused on data integration and ETL processes
Larger number of pre-built connectors for various data sources and destinations
User-friendly UI for configuring and managing data pipelines

Cons of Airbyte

Less mature project with a smaller community compared to NiFi
More limited in terms of general-purpose data flow and processing capabilities
Fewer advanced features for data transformation and routing

Code Comparison

NiFi uses a Java-based approach for defining processors:

public class MyProcessor extends AbstractProcessor {
    @Override
    public void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException {
        // Process data
    }
}

Airbyte uses a Python-based approach for defining source connectors:

class SourceMyConnector(AbstractSource):
    def check_connection(self, logger, config):
        # Check connection logic
    
    def streams(self, config):
        return [MyStream(config)]

Both projects use different programming languages and paradigms for extending functionality, reflecting their distinct focuses and architectures.

airflow

41,350

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Pros of Airflow

More flexible and programmable workflow management
Better suited for complex data pipelines and ETL processes
Stronger community and ecosystem with many integrations

Cons of Airflow

Steeper learning curve, especially for non-programmers
Can be more resource-intensive and slower for simple workflows
Less intuitive for real-time data processing

Code Comparison

Airflow DAG example:

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

def print_hello():
    return 'Hello world!'

dag = DAG('hello_world', description='Simple tutorial DAG',
          schedule_interval='0 12 * * *',
          start_date=datetime(2017, 3, 20), catchup=False)

hello_operator = PythonOperator(task_id='hello_task', python_callable=print_hello, dag=dag)

NiFi flow example (XML representation):

<processor>
  <id>abc123</id>
  <name>GenerateFlowFile</name>
  <style></style>
  <class>org.apache.nifi.processors.standard.GenerateFlowFile</class>
  <bundle>
    <group>org.apache.nifi</group>
    <artifact>nifi-standard-nar</artifact>
    <version>1.13.2</version>
  </bundle>
</processor>

prefect

19,925

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

Pros of Prefect

More modern, Python-based workflow engine with a focus on data science and ML pipelines
Easier to set up and use, with a more intuitive API and better documentation
Cloud-native architecture with built-in support for distributed computing

Cons of Prefect

Less mature ecosystem compared to NiFi, with fewer connectors and processors
Limited support for non-Python workflows and data processing tasks
Smaller community and fewer enterprise-grade features

Code Comparison

Prefect workflow example:

from prefect import task, Flow

@task
def extract():
    return [1, 2, 3]

@task
def transform(data):
    return [i * 2 for i in data]

with Flow("ETL") as flow:
    data = extract()
    transform(data)

NiFi workflow (XML configuration):

<processor>
  <name>GenerateFlowFile</name>
  <config>
    <property name="File Size">1KB</property>
    <property name="Batch Size">1</property>
  </config>
</processor>
<processor>
  <name>UpdateAttribute</name>
  <config>
    <property name="attribute-to-update">filename</property>
    <property name="attribute-value">output.txt</property>
  </config>
</processor>

Both Prefect and NiFi are powerful data workflow tools, but they cater to different use cases and skill sets. Prefect is more suitable for Python-centric, data science workflows, while NiFi excels in enterprise-grade, visual data flow management across various systems and formats.

dagster

13,694

An orchestration platform for the development, production, and observation of data assets.

Pros of Dagster

More developer-friendly with Python-based workflows and integration with modern data tools
Better support for testing and local development of data pipelines
Stronger emphasis on data lineage and observability

Cons of Dagster

Less mature ecosystem compared to NiFi's extensive library of processors
Steeper learning curve for non-developers or those unfamiliar with Python
Limited support for real-time data processing compared to NiFi's flow-based architecture

Code Comparison

Dagster pipeline definition:

@pipeline
def my_pipeline():
    data = load_data()
    processed = process_data(data)
    store_results(processed)

NiFi flow definition (XML snippet):

<processor>
  <name>LoadData</name>
  <class>org.apache.nifi.processors.standard.GetFile</class>
</processor>
<processor>
  <name>ProcessData</name>
  <class>org.apache.nifi.processors.standard.ExecuteScript</class>
</processor>

Dagster focuses on defining pipelines in Python code, while NiFi uses a visual drag-and-drop interface with XML-based configurations. Dagster's approach is more programmatic and version-control friendly, while NiFi's visual approach can be more intuitive for non-developers.

spring-cloud-dataflow

1,132

A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes

Pros of Spring Cloud Data Flow

More lightweight and easier to set up for simple data processing pipelines
Better integration with Spring ecosystem and microservices architecture
Supports both stream and batch processing out of the box

Cons of Spring Cloud Data Flow

Less extensive built-in processor library compared to NiFi
Not as feature-rich for complex data flows and transformations
Limited visual interface for pipeline design and monitoring

Code Comparison

Spring Cloud Data Flow stream definition:

stream create --name httpIngest --definition "http | file"

NiFi processor configuration (ProcessorConfiguration.java):

public class HttpProcessor extends AbstractProcessor {
    @Override
    public void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException {
        // HTTP ingestion logic
    }
}

Both projects aim to simplify data flow and processing, but they take different approaches. Spring Cloud Data Flow focuses on lightweight, microservice-based pipelines, while NiFi offers a more comprehensive and visually-driven solution for complex data flows. Spring Cloud Data Flow excels in Spring ecosystem integration, while NiFi provides a richer set of built-in processors and a more robust visual interface for pipeline design and monitoring.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Apache NiFi

Status

Resources

Contacts

Community

Features

Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data.

NiFi automates cybersecurity, observability, event streams, and generative AI data pipelines and distribution for thousands of companies worldwide across every industry.

Browser User Interface
- Seamless experience for design, control, and monitoring
- Runtime management and versioned pipelines
- Secure by default with HTTPS
Scalable Processing
- Configurable prioritization for throughput and latency
- Guaranteed delivery with retry and backoff strategies
- Horizontal scaling with clustering
Provenance Tracking
- Searchable history with configurable attributes
- Graph data lineage from source to destination
- Metadata and content for each processing decision
Extensible Design
- Plugin interface for Processors and Controller Services
- Support for Processors in native Python
- REST API for orchestration and monitoring
Secure Configuration
- Single sign-on with OpenID Connect or SAML 2
- Flexible authorization policies for role-based access
- Encrypted communication with TLS and SFTP

Requirements

NiFi supports modern operating systems and requires recent language versions for developing and running the application.

Platform Requirements

Java 21

Optional Dependencies

Python 3.10 or higher

Projects

The source repository includes several component projects.

Please review individual project documentation for additional details.

Getting Started

Project guides provide extensive documentation for installing and extending the application.

Developing

NiFi uses the Maven Wrapper for project development. The Maven Wrapper provides shell scripts that download and cache a selected version of Apache Maven for running build commands.

Developing on Microsoft Windows requires using mvnw.cmd instead of mvnw to run Maven commands.

Building

Run the following command to build project modules using parallel execution:

./mvnw install -T1C

Run the following command to build project modules using parallel execution with static analysis to confirm compliance with code and licensing requirements:

./mvnw install -T1C -P contrib-check

Run the following command to build the application binaries without building other optional modules:

./mvnw install -T1C -am -pl :nifi-assembly

Binaries

The nifi-assembly module contains the binary distribution.

ls nifi-assembly/target/nifi-*-bin.zip

The nifi-assembly module includes the binary distribution in a directory for local development and testing.

cd nifi-assembly/target/nifi-*-bin/nifi-*/

Running

NiFi provides shell scripts for starting and stopping the system.

Running on Microsoft Windows requires using nifi.cmd instead of nifi.sh for system commands.

Starting

Run the following command to start NiFi from the distribution directory:

./bin/nifi.sh start

Accessing

The default configuration generates a random username and password on startup. NiFi writes the generated credentials to the application log located in logs/nifi-app.log under the NiFi installation directory.

The following command can be used to find the generated credentials on operating systems with grep installed:

grep Generated logs/nifi-app*log

NiFi logs the generated credentials as follows:

Generated Username [USERNAME]
Generated Password [PASSWORD]

The USERNAME will be a random UUID composed of 36 characters. The PASSWORD will be a random string.

The username and password can be replaced with custom credentials using the following command:

./bin/nifi.sh set-single-user-credentials <username> <password>

NiFi defaults to running on the localhost address with HTTPS on port 8443 at the following URL:

https://localhost:8443/nifi

Browsers will display a warning message indicating a potential security risk due to the self-signed certificate generated during initialization. Production deployments should provision a certificate from a trusted certificate authority and update the NiFi keystore and truststore configuration.

License

Except as otherwise noted this software is licensed under the Apache License, Version 2.0

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Export Control

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See https://www.wassenaar.org for more information.

The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code.

The following provides more details on the included cryptographic software:

Apache NiFi uses the following libraries and frameworks for encrypted communication and storage of sensitive information:

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot