prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

19,650

1,856

19,650

988

View on GitHub

Top Related Projects

airflow

39,846

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

dagster

13,049

An orchestration platform for the development, production, and observation of data assets.

great_expectations

10,345

Always know what to expect from your data.

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

pipelines

3,867

Machine Learning Pipelines for Kubeflow

luigi

18,244

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Quick Overview

Prefect is an open-source workflow management system designed to build, run, and monitor data pipelines. It provides a flexible and scalable platform for orchestrating complex workflows, handling failures gracefully, and offering real-time visibility into task execution.

Pros

Highly customizable and extensible, allowing users to adapt it to various use cases
Robust error handling and retry mechanisms for improved reliability
Supports both local and distributed execution environments
Comprehensive dashboard for monitoring and managing workflows

Cons

Steeper learning curve compared to some simpler workflow tools
Documentation can be overwhelming for beginners
Some advanced features require the commercial version (Prefect Cloud)

Code Examples

Defining a simple task:

from prefect import task

@task
def add_numbers(x, y):
    return x + y

Creating a flow with multiple tasks:

from prefect import flow, task

@task
def fetch_data():
    return [1, 2, 3, 4, 5]

@task
def process_data(data):
    return [x * 2 for x in data]

@flow
def my_flow():
    data = fetch_data()
    processed = process_data(data)
    print(f"Processed data: {processed}")

Running a flow with parameters:

from prefect import flow

@flow
def greet(name: str):
    print(f"Hello, {name}!")

if __name__ == "__main__":
    greet("Alice")

Getting Started

To get started with Prefect:

Install Prefect:

pip install prefect

Create a simple flow:

from prefect import flow, task

@task
def say_hello(name):
    print(f"Hello, {name}!")

@flow
def hello_flow(name: str):
    say_hello(name)

if __name__ == "__main__":
    hello_flow("World")

Run the flow:

python your_flow_file.py

For more advanced usage, including scheduling and deploying flows, refer to the Prefect documentation.

Competitor Comparisons

airflow

39,846

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Pros of Airflow

Mature ecosystem with extensive community support and integrations
Rich UI for monitoring and managing workflows
Robust scheduling capabilities with cron-like syntax

Cons of Airflow

Steeper learning curve and more complex setup
Less flexibility in task dependencies and flow control
Heavier resource requirements, especially for small-scale projects

Code Comparison

Airflow DAG definition:

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

def hello_world():
    print("Hello, World!")

dag = DAG('hello_world', start_date=datetime(2023, 1, 1))
PythonOperator(task_id='hello_task', python_callable=hello_world, dag=dag)

Prefect flow definition:

from prefect import task, Flow

@task
def hello_world():
    print("Hello, World!")

with Flow("hello-flow") as flow:
    hello_world()

flow.run()

Both Airflow and Prefect are powerful workflow orchestration tools, but they differ in their approach and complexity. Airflow offers a more comprehensive solution for large-scale, complex workflows, while Prefect provides a more modern, flexible, and user-friendly experience, especially for smaller projects or those requiring more dynamic task dependencies.

dagster

13,049

An orchestration platform for the development, production, and observation of data assets.

Pros of Dagster

More comprehensive asset-based orchestration, allowing for better data lineage tracking
Stronger focus on software engineering practices, with better support for testing and local development
More flexible execution engine, supporting various compute environments out of the box

Cons of Dagster

Steeper learning curve due to more complex concepts and abstractions
Less extensive integration ecosystem compared to Prefect
Potentially more verbose code for simple workflows

Code Comparison

Dagster:

@op
def hello():
    return "Hello, World!"

@job
def hello_job():
    hello()

Prefect:

from prefect import task, Flow

@task
def hello():
    return "Hello, World!"

with Flow("hello-flow") as flow:
    hello()

Both Dagster and Prefect are powerful workflow orchestration tools, but they have different philosophies and strengths. Dagster focuses more on data-aware pipelines and software engineering practices, while Prefect emphasizes simplicity and flexibility. The choice between them depends on specific project requirements and team preferences.

great_expectations

10,345

Always know what to expect from your data.

Pros of Great Expectations

Focused on data quality and validation, providing a comprehensive framework for data testing
Extensive library of built-in expectations for common data quality checks
Generates detailed data quality reports and documentation automatically

Cons of Great Expectations

Steeper learning curve due to its specialized focus on data quality
Less flexibility for general-purpose workflow orchestration
May require additional tools for complete data pipeline management

Code Comparison

Great Expectations:

import great_expectations as ge

df = ge.read_csv("my_data.csv")
df.expect_column_values_to_be_between("age", min_value=0, max_value=120)
df.expect_column_values_to_not_be_null("name")

Prefect:

from prefect import task, Flow

@task
def process_data():
    # Data processing logic here
    pass

with Flow("My Flow") as flow:
    process_data()

flow.run()

Great Expectations excels in data validation and quality checks, while Prefect offers a more general-purpose workflow orchestration solution. The choice between them depends on the specific needs of your data pipeline and whether data quality or workflow management is the primary focus.

kedro

10,291

Pros of Kedro

Strong focus on data engineering and pipeline organization
Built-in support for data versioning and lineage tracking
Modular architecture promoting code reusability and maintainability

Cons of Kedro

Steeper learning curve for beginners
Less extensive scheduling and monitoring capabilities
Smaller community and ecosystem compared to Prefect

Code Comparison

Kedro pipeline definition:

def create_pipeline(**kwargs):
    return Pipeline(
        [
            node(process_data, "raw_data", "processed_data"),
            node(train_model, "processed_data", "model"),
        ]
    )

Prefect flow definition:

@flow
def data_pipeline():
    raw_data = load_data()
    processed_data = process_data(raw_data)
    model = train_model(processed_data)
    return model

Both Kedro and Prefect offer powerful tools for building data pipelines, but they have different strengths. Kedro excels in data engineering and pipeline organization, while Prefect provides more robust scheduling and monitoring features. The choice between them depends on specific project requirements and team expertise.

pipelines

3,867

Machine Learning Pipelines for Kubeflow

Pros of Kubeflow Pipelines

Native integration with Kubernetes, ideal for cloud-native and containerized workflows
Strong support for machine learning workflows and model deployment
Extensive ecosystem with pre-built components and integrations

Cons of Kubeflow Pipelines

Steeper learning curve, especially for those unfamiliar with Kubernetes
More complex setup and infrastructure requirements
Less flexibility for non-ML workflows compared to Prefect

Code Comparison

Kubeflow Pipelines:

import kfp
from kfp import dsl

@dsl.pipeline(name='My pipeline')
def my_pipeline():
    task1 = dsl.ContainerOp(name='Task 1', image='image1:latest')
    task2 = dsl.ContainerOp(name='Task 2', image='image2:latest')
    task2.after(task1)

Prefect:

from prefect import task, Flow

@task
def task1():
    pass

@task
def task2():
    pass

with Flow("My pipeline") as flow:
    t1 = task1()
    t2 = task2(upstream_tasks=[t1])

Both Kubeflow Pipelines and Prefect offer powerful workflow orchestration capabilities, but they cater to different use cases and environments. Kubeflow Pipelines excels in Kubernetes-based ML workflows, while Prefect provides more flexibility and ease of use for general data workflows.

luigi

18,244

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Pros of Luigi

Mature and battle-tested, with a large user base and extensive documentation
Simple and lightweight, with a focus on task dependencies and workflow management
Native support for Hadoop and various data processing frameworks

Cons of Luigi

Less modern features compared to Prefect (e.g., no native parallelism or distributed execution)
Limited built-in visualization and monitoring capabilities
Steeper learning curve for complex workflows

Code Comparison

Luigi task example:

class MyTask(luigi.Task):
    def requires(self):
        return SomeOtherTask()

    def run(self):
        # Task logic here

Prefect task example:

@task
def my_task():
    # Task logic here

with Flow("My Flow") as flow:
    task_result = my_task()

Luigi focuses on class-based task definitions with explicit dependencies, while Prefect uses a more functional approach with decorators and flow context managers. Prefect's syntax is generally more concise and allows for easier composition of complex workflows.

Both Luigi and Prefect are powerful workflow management tools, but Prefect offers more modern features and a more user-friendly API. Luigi may be preferred for simpler workflows or when working with Hadoop ecosystems, while Prefect shines in complex, distributed scenarios with its advanced scheduling and monitoring capabilities.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Installation Â· Quickstart Â· Build workflows Â· Deploy workflows Â· Prefect Cloud

Prefect

Prefect is a workflow orchestration framework for building data pipelines in Python. It's the simplest way to elevate a script into a production workflow. With Prefect, you can build resilient, dynamic data pipelines that react to the world around them and recover from unexpected changes.

With just a few lines of code, data teams can confidently automate any data process with features such as scheduling, caching, retries, and event-based automations.

Workflow activity is tracked and can be monitored with a self-hosted Prefect server instance or managed Prefect Cloud dashboard.

[!TIP] Prefect flows can handle retries, dependencies, and even complex branching logic

Check our docs or see the example below to learn more!

Getting started

Prefect requires Python 3.9+. To install the latest version of Prefect, run one of the following commands:

pip install -U prefect

uv add prefect

Then create and run a Python file that uses Prefect flow and task decorators to orchestrate and observe your workflow - in this case, a simple script that fetches the number of GitHub stars from a repository:

from prefect import flow, task
import httpx


@task(log_prints=True)
def get_stars(repo: str):
    url = f"https://api.github.com/repos/{repo}"
    count = httpx.get(url).json()["stargazers_count"]
    print(f"{repo} has {count} stars!")


@flow(name="GitHub Stars")
def github_stars(repos: list[str]):
    for repo in repos:
        get_stars(repo)


# run the flow!
if __name__ == "__main__":
    github_stars(["PrefectHQ/Prefect"])

Fire up a Prefect server and open the UI at http://localhost:4200 to see what happened:

prefect server start

To run your workflow on a schedule, turn it into a deployment and schedule it to run every minute by changing the last line of your script to the following:

if __name__ == "__main__":
    github_stars.serve(
        name="first-deployment",
        cron="* * * * *",
        parameters={"repos": ["PrefectHQ/prefect"]}
    )

You now have a process running locally that is looking for scheduled deployments! Additionally you can run your workflow manually from the UI or CLI. You can even run deployments in response to events.

[!TIP] Where to go next - check out our documentation to learn more about:

Deploying flows to production environments

Adding error handling and retries

Integrating with your existing tools

Setting up team collaboration features

Prefect Cloud

Prefect Cloud provides workflow orchestration for the modern data enterprise. By automating over 200 million data tasks monthly, Prefect empowers diverse organizations â from Fortune 50 leaders such as Progressive Insurance to innovative disruptors such as Cash App â to increase engineering productivity, reduce pipeline errors, and cut data workflow compute costs.

Read more about Prefect Cloud here or sign up to try it for yourself.

prefect-client

If your use case is geared towards communicating with Prefect Cloud or a remote Prefect server, check out our prefect-client. It is a lighter-weight option for accessing client-side functionality in the Prefect SDK and is ideal for use in ephemeral execution environments.

Connect & Contribute

Join a thriving community of over 25,000 practitioners who solve data challenges with Prefect. Prefect's community is built on collaboration, technical innovation, and continuous improvement.

Community Resources

ð Explore the Documentation - Comprehensive guides and API references
ð¬ Join the Slack Community - Connect with thousands of practitioners
ð¤ Contribute to Prefect - Help shape the future of the project
ð Support or create a new Prefect integration - Extend Prefect's capabilities

Stay Informed

ð¥ Subscribe to our Newsletter - Get the latest Prefect news and updates
ð£ Twitter/X - Latest updates and announcements
ðº YouTube - Video tutorials and webinars
ð± LinkedIn - Professional networking and company news

Your contributions, questions, and ideas make Prefect better every day. Whether you're reporting bugs, suggesting features, or improving documentation, your input is invaluable to the Prefect community.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot