Top Related Projects
Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
A cloud-native Pipeline resource.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
An orchestration platform for the development, production, and observation of data assets.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Machine Learning Pipelines for Kubeflow
Quick Overview
Argo Workflows is an open-source container-native workflow engine for orchestrating parallel jobs on Kubernetes. It is designed to handle complex, multi-step workflows and supports both DAG and step-based workflows. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition).
Pros
- Highly scalable and efficient for running large-scale parallel processing workflows
- Native Kubernetes integration, leveraging existing Kubernetes resources and concepts
- Rich set of features including artifact passing, parameter substitution, and conditional execution
- Extensible architecture with support for custom executors and plugins
Cons
- Steep learning curve for users not familiar with Kubernetes concepts
- Limited built-in support for non-container workloads
- Complexity in setting up and managing for small-scale projects
- Resource-intensive for small clusters or single-node setups
Code Examples
- Basic workflow definition:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
spec:
entrypoint: whalesay
templates:
- name: whalesay
container:
image: docker/whalesay:latest
command: [cowsay]
args: ["Hello World"]
This example defines a simple workflow that runs a single container to print "Hello World" using the whalesay image.
- DAG-based workflow:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: dag-diamond-
spec:
entrypoint: diamond
templates:
- name: diamond
dag:
tasks:
- name: A
template: echo
arguments:
parameters: [{name: message, value: A}]
- name: B
dependencies: [A]
template: echo
arguments:
parameters: [{name: message, value: B}]
- name: C
dependencies: [A]
template: echo
arguments:
parameters: [{name: message, value: C}]
- name: D
dependencies: [B, C]
template: echo
arguments:
parameters: [{name: message, value: D}]
- name: echo
inputs:
parameters:
- name: message
container:
image: alpine:3.7
command: [echo, "{{inputs.parameters.message}}"]
This example demonstrates a DAG-based workflow with dependencies between tasks.
- Artifact passing:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-passing-
spec:
entrypoint: artifact-example
templates:
- name: artifact-example
steps:
- - name: generate-artifact
template: whalesay
- - name: consume-artifact
template: print-message
arguments:
artifacts:
- name: message
from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"
- name: whalesay
container:
image: docker/whalesay:latest
command: [sh, -c]
args: ["cowsay hello world | tee /tmp/hello_world.txt"]
outputs:
artifacts:
- name: hello-art
path: /tmp/hello_world.txt
- name: print-message
inputs:
artifacts:
- name: message
path: /tmp/message
container:
image: alpine:latest
command: [sh, -c]
args: ["cat /tmp/message"]
This example shows how to pass artifacts between steps in a workflow.
Getting Started
To get started with Argo Workflows:
-
Install Argo Workflows on your Kubernetes cluster:
kubectl create namespace argo kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v3.4.3/install.yaml
-
Install the Argo CLI:
curl -sLO https://
Competitor Comparisons
Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
Pros of Flux
- Native GitOps approach for continuous delivery
- Supports multi-tenancy and hierarchical configurations
- Integrates well with Kubernetes' native features and CRDs
Cons of Flux
- Less flexible for complex, non-GitOps workflows
- Limited support for workflow visualization and monitoring
- Steeper learning curve for teams not familiar with GitOps principles
Code Comparison
Flux (HelmRelease example):
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: podinfo
spec:
interval: 5m
chart:
spec:
chart: podinfo
version: '>=5.0.0 <6.0.0'
sourceRef:
kind: HelmRepository
name: podinfo
Argo Workflows (Workflow example):
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: hello-world
spec:
entrypoint: whalesay
templates:
- name: whalesay
container:
image: docker/whalesay:latest
command: [cowsay]
args: ["hello world"]
Flux focuses on declarative GitOps-style deployments, while Argo Workflows excels in defining and managing complex workflows and pipelines. Flux is better suited for teams embracing GitOps practices, whereas Argo Workflows offers more flexibility for various CI/CD and data processing scenarios.
A cloud-native Pipeline resource.
Pros of Pipeline
- Native Kubernetes CRDs for defining CI/CD workflows
- Modular architecture allowing for custom task implementations
- Strong focus on cloud-native and serverless environments
Cons of Pipeline
- Steeper learning curve due to more complex architecture
- Less mature ecosystem compared to Argo Workflows
- Limited built-in UI capabilities
Code Comparison
Argo Workflows example:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: hello-world
spec:
entrypoint: whalesay
templates:
- name: whalesay
container:
image: docker/whalesay
command: [cowsay]
args: ["hello world"]
Pipeline example:
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: hello-world
spec:
steps:
- name: echo
image: alpine
command:
- echo
args:
- "Hello World!"
Both Argo Workflows and Pipeline offer powerful workflow orchestration capabilities for Kubernetes environments. Argo Workflows provides a more straightforward approach with a single CRD for defining workflows, while Pipeline offers a more modular and extensible architecture. Argo Workflows has a more mature ecosystem and user-friendly UI, making it easier for beginners. Pipeline, on the other hand, excels in cloud-native and serverless scenarios with its focus on modularity and custom task implementations.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Pros of Airflow
- Mature ecosystem with extensive community support and integrations
- Rich UI for monitoring and managing workflows
- Flexible scheduling options with cron-like syntax
Cons of Airflow
- Steeper learning curve due to complex architecture
- Resource-intensive, especially for small-scale deployments
- Less native support for containerized workflows
Code Comparison
Airflow DAG definition:
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime
def print_hello():
return 'Hello World'
dag = DAG('hello_world', description='Simple tutorial DAG',
schedule_interval='0 12 * * *',
start_date=datetime(2017, 3, 20), catchup=False)
hello_operator = PythonOperator(task_id='hello_task', python_callable=print_hello, dag=dag)
Argo Workflows definition:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
spec:
entrypoint: whalesay
templates:
- name: whalesay
container:
image: docker/whalesay:latest
command: [cowsay]
args: ["hello world"]
An orchestration platform for the development, production, and observation of data assets.
Pros of Dagster
- More comprehensive data orchestration platform with built-in asset management
- Stronger focus on software engineering practices and type checking
- Better integration with Python ecosystem and development workflows
Cons of Dagster
- Steeper learning curve due to more complex concepts and abstractions
- Less mature and smaller community compared to Argo Workflows
- Primarily Python-focused, which may limit flexibility for some use cases
Code Comparison
Dagster:
@op
def process_data(data: pd.DataFrame) -> pd.DataFrame:
return data.dropna()
@job
def my_job():
process_data()
Argo Workflows:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
spec:
entrypoint: process-data
templates:
- name: process-data
container:
image: data-processor:latest
command: [python, process_data.py]
Dagster focuses on defining ops and jobs in Python, emphasizing type hints and software engineering practices. Argo Workflows uses YAML to define workflows, with a more container-centric approach typical of Kubernetes-native tools.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Pros of Prefect
- More Pythonic approach, making it easier for Python developers to adopt
- Built-in support for distributed computing and parallel task execution
- Flexible scheduling options, including real-time and event-driven workflows
Cons of Prefect
- Less mature ecosystem compared to Argo Workflows
- Steeper learning curve for users not familiar with Python
- Limited native support for container-based workflows
Code Comparison
Prefect workflow example:
from prefect import task, Flow
@task
def hello_task():
print("Hello, Prefect!")
with Flow("My First Flow") as flow:
hello_task()
flow.run()
Argo Workflows example:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: hello-world
spec:
entrypoint: whalesay
templates:
- name: whalesay
container:
image: docker/whalesay
command: [cowsay]
args: ["Hello Argo!"]
Both Argo Workflows and Prefect are powerful workflow orchestration tools, but they cater to different use cases and preferences. Argo Workflows is more Kubernetes-native and container-centric, while Prefect offers a more Pythonic approach with built-in support for distributed computing. The choice between the two depends on your specific requirements, infrastructure, and team expertise.
Machine Learning Pipelines for Kubeflow
Pros of Kubeflow Pipelines
- More comprehensive ML-specific features and integrations
- Better support for hyperparameter tuning and experiment tracking
- Tighter integration with other Kubeflow components
Cons of Kubeflow Pipelines
- Steeper learning curve and more complex setup
- Less flexibility for non-ML workflows
- Heavier resource requirements
Code Comparison
Argo Workflows:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
spec:
entrypoint: whalesay
templates:
- name: whalesay
container:
image: docker/whalesay:latest
command: [cowsay]
args: ["hello world"]
Kubeflow Pipelines:
import kfp
from kfp import dsl
@dsl.pipeline(
name='Hello World Pipeline',
description='A simple pipeline that prints "Hello, World!"'
)
def hello_world_pipeline():
hello_op = dsl.ContainerOp(
name='hello',
image='library/bash:4.4.23',
command=['echo', 'Hello, World!']
)
kfp.compiler.Compiler().compile(hello_world_pipeline, 'hello_world_pipeline.yaml')
Both Argo Workflows and Kubeflow Pipelines are powerful tools for orchestrating workflows on Kubernetes. Argo Workflows is more general-purpose and lightweight, while Kubeflow Pipelines is tailored for machine learning workflows with additional features specific to ML operations.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
What is Argo Workflows?
Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition).
- Define workflows where each step is a container.
- Model multi-step workflows as a sequence of tasks or capture the dependencies between tasks using a directed acyclic graph (DAG).
- Easily run compute intensive jobs for machine learning or data processing in a fraction of the time using Argo Workflows on Kubernetes.
Argo is a Cloud Native Computing Foundation (CNCF) graduated project.
Use Cases
- Machine Learning pipelines
- Data and batch processing
- Infrastructure automation
- CI/CD
- Other use cases
Why Argo Workflows?
- Argo Workflows is the most popular workflow execution engine for Kubernetes.
- Light-weight, scalable, and easier to use.
- Designed from the ground up for containers without the overhead and limitations of legacy VM and server-based environments.
- Cloud agnostic and can run on any Kubernetes cluster.
Read what people said in our latest survey
Try Argo Workflows
You can try Argo Workflows via one of the following:
Who uses Argo Workflows?
About 200+ organizations are officially using Argo Workflows
Ecosystem
Just some of the projects that use or rely on Argo Workflows (complete list here):
- Argo Events
- Couler
- Hera
- Katib
- Kedro
- Kubeflow Pipelines
- Netflix Metaflow
- Onepanel
- Orchest
- Piper
- Ploomber
- Seldon
- SQLFlow
Client Libraries
Check out our Java, Golang and Python clients.
Quickstart
Documentation
Features
An incomplete list of features Argo Workflows provide:
- UI to visualize and manage Workflows
- Artifact support (S3, Artifactory, Alibaba Cloud OSS, Azure Blob Storage, HTTP, Git, GCS, raw)
- Workflow templating to store commonly used Workflows in the cluster
- Archiving Workflows after executing for later access
- Scheduled workflows using cron
- Server interface with REST API (HTTP and GRPC)
- DAG or Steps based declaration of workflows
- Step level input & outputs (artifacts/parameters)
- Loops
- Parameterization
- Conditionals
- Timeouts (step & workflow level)
- Retry (step & workflow level)
- Resubmit (memoized)
- Suspend & Resume
- Cancellation
- K8s resource orchestration
- Exit Hooks (notifications, cleanup)
- Garbage collection of completed workflow
- Scheduling (affinity/tolerations/node selectors)
- Volumes (ephemeral/existing)
- Parallelism limits
- Daemoned steps
- DinD (docker-in-docker)
- Script steps
- Event emission
- Prometheus metrics
- Multiple executors
- Multiple pod and workflow garbage collection strategies
- Automatically calculated resource usage per step
- Java/Golang/Python SDKs
- Pod Disruption Budget support
- Single-sign on (OAuth2/OIDC)
- Webhook triggering
- CLI
- Out-of-the box and custom Prometheus metrics
- Windows container support
- Embedded widgets
- Multiplex log viewer
Community Meetings
We host monthly community meetings where we and the community showcase demos and discuss the current and future state of the project. Feel free to join us! For Community Meeting information, minutes and recordings, please see here.
Participation in Argo Workflows is governed by the CNCF Code of Conduct
Community Blogs and Presentations
- Awesome-Argo: A Curated List of Awesome Projects and Resources Related to Argo
- Automation of Everything - How To Combine Argo Events, Workflows & Pipelines, CD, and Rollouts
- Argo Workflows and Pipelines - CI/CD, Machine Learning, and Other Kubernetes Workflows
- Argo Ansible role: Provisioning Argo Workflows on OpenShift
- Argo Workflows vs Apache Airflow
- CI/CD with Argo on Kubernetes
- Define Your CI/CD Pipeline with Argo Workflows
- Distributed Machine Learning Patterns from Manning Publication
- Running Argo Workflows Across Multiple Kubernetes Clusters
- Open Source Model Management Roundup: Polyaxon, Argo, and Seldon
- Producing 200 OpenStreetMap extracts in 35 minutes using a scalable data workflow
- Argo integration review
- TGI Kubernetes with Joe Beda: Argo workflow system
Project Resources
Security
See SECURITY.md.
Top Related Projects
Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
A cloud-native Pipeline resource.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
An orchestration platform for the development, production, and observation of data assets.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Machine Learning Pipelines for Kubeflow
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot