Top Related Projects
Logstash - transport and process your logs, events, or other data
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
An orchestration platform for the development, production, and observation of data assets.
A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes
Quick Overview
Apache NiFi is a powerful and scalable data integration and distribution system. It provides a web-based user interface for designing, controlling, and monitoring a dataflow, allowing users to automate the movement of data between disparate systems. NiFi supports a wide range of data formats and protocols, making it versatile for various data processing scenarios.
Pros
- Highly scalable and can handle large volumes of data
- User-friendly web interface for designing and monitoring dataflows
- Supports a wide range of data formats and protocols
- Provides data provenance and lineage tracking
Cons
- Steep learning curve for complex dataflows
- Resource-intensive for large-scale deployments
- Limited support for real-time streaming compared to some alternatives
- Can be complex to set up and configure in distributed environments
Getting Started
To get started with Apache NiFi:
- Download the latest version from the Apache NiFi website.
- Extract the archive to your desired location.
- Open a terminal and navigate to the NiFi directory.
- Run the following command to start NiFi:
bin/nifi.sh start
- Open a web browser and go to
http://localhost:8080/nifi
to access the NiFi web interface. - Begin designing your dataflow by dragging and dropping processors onto the canvas.
- Configure processors and connect them to create your desired data pipeline.
For more detailed instructions and documentation, refer to the official Apache NiFi documentation.
Competitor Comparisons
Logstash - transport and process your logs, events, or other data
Pros of Logstash
- Tighter integration with Elasticsearch and Kibana (ELK stack)
- Simpler setup and configuration for log processing pipelines
- Extensive plugin ecosystem for input, filter, and output options
Cons of Logstash
- Less flexible for complex data routing and transformation scenarios
- Limited support for real-time data processing and streaming
- Higher resource consumption, especially for large-scale deployments
Code Comparison
NiFi configuration example:
<processor>
<id>abc123</id>
<name>ConvertRecord</name>
<properties>
<entry>
<key>record-reader</key>
<value>csv-reader</value>
</entry>
</properties>
</processor>
Logstash configuration example:
input {
file {
path => "/var/log/messages"
type => "syslog"
}
}
filter {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
Both NiFi and Logstash are powerful data ingestion and processing tools, but they cater to different use cases. NiFi excels in complex data flows and distributed processing, while Logstash shines in log processing and integration with the Elastic stack.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Pros of Airbyte
- More focused on data integration and ETL processes
- Larger number of pre-built connectors for various data sources and destinations
- User-friendly UI for configuring and managing data pipelines
Cons of Airbyte
- Less mature project with a smaller community compared to NiFi
- More limited in terms of general-purpose data flow and processing capabilities
- Fewer advanced features for data transformation and routing
Code Comparison
NiFi uses a Java-based approach for defining processors:
public class MyProcessor extends AbstractProcessor {
@Override
public void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException {
// Process data
}
}
Airbyte uses a Python-based approach for defining source connectors:
class SourceMyConnector(AbstractSource):
def check_connection(self, logger, config):
# Check connection logic
def streams(self, config):
return [MyStream(config)]
Both projects use different programming languages and paradigms for extending functionality, reflecting their distinct focuses and architectures.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Pros of Airflow
- More flexible and programmable workflow management
- Better suited for complex data pipelines and ETL processes
- Stronger community and ecosystem with many integrations
Cons of Airflow
- Steeper learning curve, especially for non-programmers
- Can be more resource-intensive and slower for simple workflows
- Less intuitive for real-time data processing
Code Comparison
Airflow DAG example:
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime
def print_hello():
return 'Hello world!'
dag = DAG('hello_world', description='Simple tutorial DAG',
schedule_interval='0 12 * * *',
start_date=datetime(2017, 3, 20), catchup=False)
hello_operator = PythonOperator(task_id='hello_task', python_callable=print_hello, dag=dag)
NiFi flow example (XML representation):
<processor>
<id>abc123</id>
<name>GenerateFlowFile</name>
<style></style>
<class>org.apache.nifi.processors.standard.GenerateFlowFile</class>
<bundle>
<group>org.apache.nifi</group>
<artifact>nifi-standard-nar</artifact>
<version>1.13.2</version>
</bundle>
</processor>
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Pros of Prefect
- More modern, Python-based workflow engine with a focus on data science and ML pipelines
- Easier to set up and use, with a more intuitive API and better documentation
- Cloud-native architecture with built-in support for distributed computing
Cons of Prefect
- Less mature ecosystem compared to NiFi, with fewer connectors and processors
- Limited support for non-Python workflows and data processing tasks
- Smaller community and fewer enterprise-grade features
Code Comparison
Prefect workflow example:
from prefect import task, Flow
@task
def extract():
return [1, 2, 3]
@task
def transform(data):
return [i * 2 for i in data]
with Flow("ETL") as flow:
data = extract()
transform(data)
NiFi workflow (XML configuration):
<processor>
<name>GenerateFlowFile</name>
<config>
<property name="File Size">1KB</property>
<property name="Batch Size">1</property>
</config>
</processor>
<processor>
<name>UpdateAttribute</name>
<config>
<property name="attribute-to-update">filename</property>
<property name="attribute-value">output.txt</property>
</config>
</processor>
Both Prefect and NiFi are powerful data workflow tools, but they cater to different use cases and skill sets. Prefect is more suitable for Python-centric, data science workflows, while NiFi excels in enterprise-grade, visual data flow management across various systems and formats.
An orchestration platform for the development, production, and observation of data assets.
Pros of Dagster
- More developer-friendly with Python-based workflows and integration with modern data tools
- Better support for testing and local development of data pipelines
- Stronger emphasis on data lineage and observability
Cons of Dagster
- Less mature ecosystem compared to NiFi's extensive library of processors
- Steeper learning curve for non-developers or those unfamiliar with Python
- Limited support for real-time data processing compared to NiFi's flow-based architecture
Code Comparison
Dagster pipeline definition:
@pipeline
def my_pipeline():
data = load_data()
processed = process_data(data)
store_results(processed)
NiFi flow definition (XML snippet):
<processor>
<name>LoadData</name>
<class>org.apache.nifi.processors.standard.GetFile</class>
</processor>
<processor>
<name>ProcessData</name>
<class>org.apache.nifi.processors.standard.ExecuteScript</class>
</processor>
Dagster focuses on defining pipelines in Python code, while NiFi uses a visual drag-and-drop interface with XML-based configurations. Dagster's approach is more programmatic and version-control friendly, while NiFi's visual approach can be more intuitive for non-developers.
A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes
Pros of Spring Cloud Data Flow
- Tighter integration with Spring ecosystem and microservices architecture
- Better support for cloud-native deployments and containerization
- More flexible and extensible for custom development
Cons of Spring Cloud Data Flow
- Steeper learning curve for developers not familiar with Spring
- Less out-of-the-box processors and connectors compared to NiFi
- Requires more coding and configuration for complex workflows
Code Comparison
Spring Cloud Data Flow:
@EnableTask
@SpringBootApplication
public class MyTaskApplication {
public static void main(String[] args) {
SpringApplication.run(MyTaskApplication.class, args);
}
}
NiFi:
<processor>
<id>abc123</id>
<class>org.apache.nifi.processors.standard.ExecuteProcess</class>
<property name="Command">echo</property>
<property name="Arguments">Hello, NiFi!</property>
</processor>
Spring Cloud Data Flow uses Java annotations and Spring Boot for task definition, while NiFi relies on XML configuration for processor setup. Spring Cloud Data Flow's approach is more code-centric, offering greater flexibility for custom logic, while NiFi's XML configuration is more declarative and easier for non-developers to understand and modify.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data.
Table of Contents
- Features
- Requirements
- Getting Started
- MiNiFi subproject
- Registry subproject
- Getting Help
- Documentation
- License
- Export Control
Features
Apache NiFi was made for dataflow. It supports highly configurable directed graphs of data routing, transformation, and system mediation logic. Some of its key features include:
- Web-based user interface
- Seamless experience for design, control, and monitoring
- Multi-tenant user experience
- Highly configurable
- Loss tolerant vs guaranteed delivery
- Low latency vs high throughput
- Dynamic prioritization
- Flows can be modified at runtime
- Back pressure
- Scales up to leverage full machine capability
- Scales out with zero-leader clustering model
- Data Provenance
- Track dataflow from beginning to end
- Designed for extension
- Build your own processors and more
- Enables rapid development and effective testing
- Secure
- SSL, SSH, HTTPS, encrypted content, etc...
- Pluggable fine-grained role-based authentication/authorization
- Multiple teams can manage and share specific portions of the flow
Minimum Requirements
- JDK 21
- Apache Maven 3.9.6
Getting Started
Read through the quickstart guide for development. It will include information on getting a local copy of the source, give pointers on issue tracking, and provide some warnings about common problems with development environments.
For a more comprehensive guide to development and information about contributing to the project read through the NiFi Developer's Guide.
Building
Run the following Maven command to build standard project modules using parallel execution:
./mvnw clean install -T2C
Run the following Maven command to build project modules with static analysis to confirm compliance with code and licensing requirements:
./mvnw clean install -T2C -P contrib-check
Building on Microsoft Windows requires using mvnw.cmd
instead of mwnw
to run the Maven Wrapper.
Deploying
Change directories to nifi-assembly
. The target
directory contains binary archives.
laptop:nifi myuser$ cd nifi-assembly
laptop:nifi-assembly myuser$ ls -lhd target/nifi*
drwxr-xr-x 3 myuser mygroup 102B Apr 30 00:29 target/nifi-1.0.0-SNAPSHOT-bin
-rw-r--r-- 1 myuser mygroup 144M Apr 30 00:30 target/nifi-1.0.0-SNAPSHOT-bin.tar.gz
-rw-r--r-- 1 myuser mygroup 144M Apr 30 00:30 target/nifi-1.0.0-SNAPSHOT-bin.zip
Copy the nifi-VERSION-bin.tar.gz
or nifi-VERSION-bin.zip
to a separate deployment directory.
Extracting the distribution will create a new directory named for the version.
laptop:nifi-assembly myuser$ mkdir ~/example-nifi-deploy
laptop:nifi-assembly myuser$ tar xzf target/nifi-*-bin.tar.gz -C ~/example-nifi-deploy
laptop:nifi-assembly myuser$ ls -lh ~/example-nifi-deploy/
total 0
drwxr-xr-x 10 myuser mygroup 340B Apr 30 01:06 nifi-1.0.0-SNAPSHOT
Starting
Change directories to the deployment location and run the following command to start NiFi.
laptop:~ myuser$ cd ~/example-nifi-deploy/nifi-*
laptop:nifi-1.0.0-SNAPSHOT myuser$ ./bin/nifi.sh start
Running bin/nifi.sh start
starts NiFi in the background and exits. Use --wait-for-init
with an optional timeout in
seconds to wait for a complete startup before exiting.
laptop:nifi-1.0.0-SNAPSHOT myuser$ ./bin/nifi.sh start --wait-for-init 120
Authenticating
The default configuration generates a random username and password on startup. NiFi writes the generated credentials
to the application log located in logs/nifi-app.log
under the NiFi installation directory.
The following command can be used to find the generated credentials on operating systems with grep
installed:
laptop:nifi-1.0.0-SNAPSHOT myuser$ grep Generated logs/nifi-app*log
NiFi logs the generated credentials as follows:
Generated Username [USERNAME]
Generated Password [PASSWORD]
The USERNAME
will be a random UUID composed of 36 characters. The PASSWORD
will be a random string composed of
32 characters. The generated credentials will be stored in conf/login-identity-providers.xml
with the password stored
using bcrypt hashing. Record these credentials in a secure location for access to NiFi.
The random username and password can be replaced with custom credentials using the following command:
./bin/nifi.sh set-single-user-credentials <username> <password>
Running
Open the following link in a web browser to access NiFi: https://localhost:8443/nifi
The web browser will display a warning message indicating a potential security risk due to the self-signed certificate NiFi generated during initialization. Accepting the potential security risk and continuing to load the interface is an option for initial development installations. Production deployments should provision a certificate from a trusted certificate authority and update the NiFi keystore and truststore configuration.
Accessing NiFi after accepting the self-signed certificate will display the login screen.
Using the generated credentials, enter the generated username in the User
field
and the generated password in the Password
field, then select LOG IN
to access the system.
Configuring
The NiFi User Guide describes how to build a data flow.
Stopping
Run the following command to stop NiFi:
laptop:~ myuser$ cd ~/example-nifi-deploy/nifi-*
laptop:nifi-1.0.0-SNAPSHOT myuser$ ./bin/nifi.sh stop
MiNiFi subproject
MiNiFi is a child project effort of Apache NiFi. It is a complementary data collection approach that supplements the core tenets of NiFi in dataflow management, focusing on the collection of data at the source of its creation.
Specific goals for MiNiFi are comprised of:
- small and lightweight footprint
- central management of agents
- generation of data provenance
- integration with NiFi for follow-on dataflow management and full chain of custody of information
Perspectives of the role of MiNiFi should be from the perspective of the agent acting immediately at, or directly adjacent to, source sensors, systems, or servers.
To run:
-
Change directory to 'minifi-assembly'. In the target directory, there should be a build of minifi.
$ cd minifi-assembly $ ls -lhd target/minifi* drwxr-xr-x 3 user staff 102B Jul 6 13:07 minifi-1.14.0-SNAPSHOT-bin -rw-r--r-- 1 user staff 39M Jul 6 13:07 minifi-1.14.0-SNAPSHOT-bin.tar.gz -rw-r--r-- 1 user staff 39M Jul 6 13:07 minifi-1.14.0-SNAPSHOT-bin.zip
-
For testing ongoing development you could use the already unpacked build present in the directory named "minifi-version-bin", where version is the current project version. To deploy in another location make use of either the tarball or zipfile and unpack them wherever you like. The distribution will be within a common parent directory named for the version.
$ mkdir ~/example-minifi-deploy $ tar xzf target/minifi-*-bin.tar.gz -C ~/example-minifi-deploy $ ls -lh ~/example-minifi-deploy/ total 0 drwxr-xr-x 10 user staff 340B Jul 6 01:06 minifi-1.14.0-SNAPSHOT
To run MiNiFi:
-
Change directory to the location where you installed MiNiFi and run it.
$ cd ~/example-minifi-deploy/minifi-* $ ./bin/minifi.sh start
-
View the logs located in the logs folder $ tail -F ~/example-minifi-deploy/logs/minifi-app.log
-
For help building your first data flow and sending data to a NiFi instance see the System Admin Guide located in the docs folder or making use of the minifi-toolkit.
-
If you are testing ongoing development, you will likely want to stop your instance.
$ cd ~/example-minifi-deploy/minifi-* $ ./bin/minifi.sh stop
Docker Build
To build:
- Run a full NiFi build (see above for instructions). Then from the minifi/ subdirectory, execute
mvn -P docker clean install
. This will run the full build, create a docker image based on it, and run docker-compose integration tests. After it completes successfully, you should have an apacheminifi:${minifi.version} image that can be started with the following command (replacing ${minifi.version} with the current maven version of your branch):
docker run -d -v YOUR_CONFIG.YML:/opt/minifi/minifi-${minifi.version}/conf/config.yml apacheminifi:${minifi.version}
Registry subproject
Registryâa subproject of Apache NiFiâis a complementary application that provides a central location for storage and management of shared resources across one or more instances of NiFi and/or MiNiFi.
Getting Registry Started
- Build NiFi (see Getting Started for NiFi )
or
Build only the Registry subproject:
cd nifi/nifi-registry
mvn clean install
If you wish to enable style and license checks, specify the contrib-check profile:
mvn clean install -Pcontrib-check
2) Start Registry
cd nifi-registry/nifi-registry-assembly/target/nifi-registry-<VERSION>-bin/nifi-registry-<VERSION>/
./bin/nifi-registry.sh start
Note that the application web server can take a while to load before it is accessible.
- Accessing the application web UI
With the default settings, the application UI will be available at http://localhost:18080/nifi-registry
- Accessing the application REST API
If you wish to test against the application REST API, you can access the REST API directly. With the default settings, the base URL of the REST API will be at http://localhost:18080/nifi-registry-api
. A UI for testing the REST API will be available at http://localhost:18080/nifi-registry-api/swagger/ui.html
- Accessing the application logs
Logs will be available in logs/nifi-registry-app.log
Database Testing
In order to ensure that NiFi Registry works correctly against different relational databases, the existing integration tests can be run against different databases by leveraging the Testcontainers framework.
Spring profiles are used to control the DataSource factory that will be made available to the Spring application context. DataSource factories are provided that use the Testcontainers framework to start a Docker container for a given database and create a corresponding DataSource. If no profile is specified then an H2 DataSource will be used by default and no Docker containers are required.
Assuming Docker is running on the system where the build is running, then the following commands can be run:
Target Database | Build Command |
---|---|
All supported | mvn verify -Ptest-all-dbs |
H2 (default) | mvn verify |
MariaDB 10.3 | mvn verify -Pcontrib-check -Dspring.profiles.active=mariadb-10-3 |
MySQL 8 | mvn verify -Pcontrib-check -Dspring.profiles.active=mysql-8 |
PostgreSQL 10 | mvn verify -Dspring.profiles.active=postgres-10 |
For a full list of the available DataSource factories, consult the nifi-registry-test
module.
Getting Help
If you have questions, you can reach out to our mailing list: dev@nifi.apache.org (archive). For more interactive discussions, community members can often be found in the following locations:
-
Apache NiFi Slack Workspace: https://apachenifi.slack.com/
New users can join the workspace using the following invite link.
To submit a feature request or bug report, please file a Jira at https://issues.apache.org/jira/projects/NIFI/issues. If this is a security vulnerability report, please email security@nifi.apache.org directly and review the Apache NiFi Security Vulnerability Disclosure and Apache Software Foundation Security processes first.
Documentation
See https://nifi.apache.org/ for the latest NiFi documentation.
See https://nifi.apache.org/minifi and https://cwiki.apache.org/confluence/display/MINIFI for the latest MiNiFi-specific documentation.
See https://nifi.apache.org/registry for the latest Registry-specific documentation.
License
Except as otherwise noted this software is licensed under the Apache License, Version 2.0
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Export Control
This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See https://www.wassenaar.org/ for more information.
The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code.
The following provides more details on the included cryptographic software:
Apache NiFi uses BouncyCastle, JCraft Inc., and the built-in Java cryptography libraries for SSL, SSH, and the protection of sensitive configuration parameters. See
- https://bouncycastle.org/about.html
- http://www.jcraft.com/c-info.html
- https://www.oracle.com/corporate/security-practices/corporate/governance/global-trade-compliance.html
for more details on each of these libraries cryptography features.
Top Related Projects
Logstash - transport and process your logs, events, or other data
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
An orchestration platform for the development, production, and observation of data assets.
A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot