Convert Figma logo to code with AI

kubeflow logopipelines

Machine Learning Pipelines for Kubeflow

3,635
1,638
3,635
254

Top Related Projects

Workflow Engine for Kubernetes

36,684

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

16,099

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

11,381

An orchestration platform for the development, production, and observation of data assets.

Open Source Platform for developing, scaling and deploying serious ML, AI, and data science systems

18,503

Open source platform for the machine learning lifecycle

Quick Overview

Kubeflow Pipelines is an open-source platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. It provides a user interface for managing and tracking experiments, jobs, and runs, making it easier to compose, deploy, and manage complex ML pipelines.

Pros

  • Seamless integration with Kubernetes for scalable and portable ML workflows
  • Supports end-to-end orchestration of ML pipelines, from data preparation to model deployment
  • Provides a user-friendly interface for visualizing and managing pipeline runs
  • Enables easy sharing and reuse of components and pipelines across teams and projects

Cons

  • Steep learning curve for users unfamiliar with Kubernetes and container technologies
  • Complex setup and maintenance, especially for on-premises deployments
  • Limited support for certain ML frameworks and libraries compared to some other platforms
  • Resource-intensive, which may lead to higher costs for small-scale projects

Code Examples

  1. Defining a simple pipeline component:
from kfp.dsl import component

@component
def add_numbers(a: int, b: int) -> int:
    return a + b
  1. Creating a pipeline using components:
from kfp.dsl import pipeline

@pipeline(name="Simple Addition Pipeline")
def addition_pipeline(a: int, b: int):
    add_op = add_numbers(a, b)
    print_op = print_result(add_op.output)
  1. Compiling and running a pipeline:
from kfp import compiler

compiler.Compiler().compile(addition_pipeline, "addition_pipeline.yaml")

client = kfp.Client()
client.create_run_from_pipeline_func(addition_pipeline, arguments={"a": 5, "b": 7})

Getting Started

To get started with Kubeflow Pipelines:

  1. Install the Kubeflow Pipelines SDK:

    pip install kfp
    
  2. Set up a Kubernetes cluster and install Kubeflow Pipelines:

    kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=1.8.5"
    kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
    kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=1.8.5"
    
  3. Port-forward the Kubeflow Pipelines UI:

    kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80
    
  4. Access the Kubeflow Pipelines UI at http://localhost:8080

Competitor Comparisons

Workflow Engine for Kubernetes

Pros of Argo Workflows

  • Simpler and more lightweight, focusing solely on workflow orchestration
  • More flexible and customizable, allowing for complex workflow patterns
  • Better support for GitOps practices and CI/CD integration

Cons of Argo Workflows

  • Less integrated with other ML-specific tools and frameworks
  • Requires more manual setup and configuration for ML-specific tasks
  • Steeper learning curve for data scientists without DevOps experience

Code Comparison

Argo Workflows:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
spec:
  entrypoint: whalesay
  templates:
  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [cowsay]
      args: ["hello world"]

Kubeflow Pipelines:

import kfp
from kfp import dsl

@dsl.pipeline(
    name='Hello World Pipeline',
    description='A simple pipeline that prints "Hello, World!"'
)
def hello_world_pipeline():
    hello_op = dsl.ContainerOp(
        name='hello',
        image='library/bash:4.4.23',
        command=['echo', 'Hello, World!']
    )

if __name__ == '__main__':
    kfp.compiler.Compiler().compile(hello_world_pipeline, 'hello_world_pipeline.yaml')

Both Argo Workflows and Kubeflow Pipelines are powerful tools for orchestrating workflows on Kubernetes. Argo Workflows offers more flexibility and is better suited for general-purpose workflow orchestration, while Kubeflow Pipelines is more tailored for machine learning workflows with integrated ML-specific features.

36,684

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Pros of Airflow

  • More mature and widely adopted in the industry
  • Extensive ecosystem with a large number of integrations and plugins
  • Flexible scheduling capabilities with cron-like expressions

Cons of Airflow

  • Steeper learning curve, especially for complex workflows
  • Less native support for machine learning and data science workflows
  • Can be resource-intensive for large-scale deployments

Code Comparison

Airflow DAG definition:

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

def my_function():
    print("Hello from Airflow!")

dag = DAG('example_dag', start_date=datetime(2023, 1, 1), schedule_interval='@daily')
task = PythonOperator(task_id='example_task', python_callable=my_function, dag=dag)

Kubeflow Pipelines component definition:

from kfp import dsl

@dsl.component
def my_component():
    print("Hello from Kubeflow Pipelines!")

@dsl.pipeline
def my_pipeline():
    my_component()

Both Airflow and Pipelines offer workflow orchestration capabilities, but Pipelines is more focused on machine learning workflows and Kubernetes integration. Airflow provides a more general-purpose solution for data pipeline orchestration across various environments.

16,099

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

Pros of Prefect

  • More lightweight and flexible, easier to set up and use for smaller projects
  • Better support for local development and testing
  • More intuitive Python-based workflow definition

Cons of Prefect

  • Less integrated with Kubernetes and cloud-native ecosystems
  • Fewer built-in components for machine learning workflows
  • Smaller community and ecosystem compared to Kubeflow Pipelines

Code Comparison

Prefect workflow definition:

@task
def process_data(data):
    return data.upper()

@flow
def my_flow(input_data):
    result = process_data(input_data)
    return result

Kubeflow Pipelines workflow definition:

@dsl.pipeline(
    name='My pipeline',
    description='A simple pipeline'
)
def my_pipeline(input_data: str):
    process_op = dsl.ContainerOp(
        name='process-data',
        image='my-image:latest',
        command=['python', 'process.py'],
        arguments=[input_data]
    )

Both Prefect and Kubeflow Pipelines are powerful workflow orchestration tools, but they cater to different use cases. Prefect is more suitable for general-purpose data workflows and easier to get started with, while Kubeflow Pipelines is better integrated with Kubernetes and machine learning ecosystems, offering more robust features for large-scale, production ML pipelines.

11,381

An orchestration platform for the development, production, and observation of data assets.

Pros of Dagster

  • More flexible and lightweight, suitable for various environments (local, cloud, etc.)
  • Better support for testing and local development
  • Stronger focus on data quality and observability

Cons of Dagster

  • Less integrated with Kubernetes and cloud-native ecosystems
  • Smaller community and ecosystem compared to Kubeflow Pipelines
  • Steeper learning curve for complex workflows

Code Comparison

Dagster:

@solid
def process_data(context, data):
    return data.upper()

@pipeline
def my_pipeline():
    process_data()

Kubeflow Pipelines:

def process_data_op():
    return dsl.ContainerOp(
        name='Process Data',
        image='my-image:latest',
        command=['python', 'process.py']
    )

@dsl.pipeline(name='My Pipeline')
def my_pipeline():
    process_data_op()

Both Dagster and Kubeflow Pipelines are powerful tools for building data pipelines, but they have different strengths and use cases. Dagster is more flexible and focuses on data quality, while Kubeflow Pipelines is better integrated with Kubernetes and cloud-native environments. The choice between them depends on your specific requirements and infrastructure preferences.

Open Source Platform for developing, scaling and deploying serious ML, AI, and data science systems

Pros of Metaflow

  • Simpler setup and easier to get started
  • More flexible and language-agnostic (supports Python, R, and more)
  • Better suited for data scientists with less DevOps experience

Cons of Metaflow

  • Less comprehensive ecosystem and integrations
  • Not as scalable for large, complex workflows
  • Limited built-in support for distributed training

Code Comparison

Metaflow:

from metaflow import FlowSpec, step

class MyFlow(FlowSpec):
    @step
    def start(self):
        self.data = 'Hello, World!'
        self.next(self.end)

    @step
    def end(self):
        print(self.data)

Kubeflow Pipelines:

import kfp
from kfp import dsl

@dsl.pipeline(name='My Pipeline')
def my_pipeline():
    op1 = dsl.ContainerOp(
        name='Print Data',
        image='python:3.7',
        command=['python', '-c'],
        arguments=['print("Hello, World!")']
    )

kfp.compiler.Compiler().compile(my_pipeline, 'pipeline.yaml')

Both Metaflow and Kubeflow Pipelines are powerful tools for building and managing machine learning workflows. Metaflow offers a more user-friendly approach, making it easier for data scientists to get started quickly. Kubeflow Pipelines, on the other hand, provides a more comprehensive ecosystem and better scalability for complex, production-grade workflows.

18,503

Open source platform for the machine learning lifecycle

Pros of MLflow

  • Lightweight and easy to set up, with minimal dependencies
  • Language-agnostic, supporting Python, R, Java, and more
  • Flexible deployment options (local, cloud, or on-premise)

Cons of MLflow

  • Less comprehensive end-to-end ML workflow management
  • Limited native support for distributed training and hyperparameter tuning
  • Fewer built-in integrations with cloud services and ML frameworks

Code Comparison

MLflow:

import mlflow

mlflow.start_run()
mlflow.log_param("param1", 5)
mlflow.log_metric("accuracy", 0.85)
mlflow.end_run()

Kubeflow Pipelines:

from kfp import dsl

@dsl.pipeline(name='My pipeline')
def my_pipeline():
    task1 = dsl.ContainerOp(name='Task 1', image='image1')
    task2 = dsl.ContainerOp(name='Task 2', image='image2')
    task2.after(task1)

MLflow focuses on experiment tracking and model management, while Kubeflow Pipelines emphasizes defining and orchestrating complex ML workflows. MLflow's code is simpler for logging experiments, while Kubeflow Pipelines requires more setup but offers greater control over pipeline structure and execution.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Coverage Status SDK Documentation Status SDK Package version SDK Supported Python versions

Overview of the Kubeflow pipelines service

Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

Kubeflow pipelines are reusable end-to-end ML workflows built using the Kubeflow Pipelines SDK.

The Kubeflow pipelines service has the following goals:

  • End to end orchestration: enabling and simplifying the orchestration of end to end machine learning pipelines
  • Easy experimentation: making it easy for you to try numerous ideas and techniques, and manage your various trials/experiments.
  • Easy re-use: enabling you to re-use components and pipelines to quickly cobble together end to end solutions, without having to re-build each time.

Installation

  • Kubeflow Pipelines can be installed as part of the Kubeflow Platform. Alternatively you can deploy Kubeflow Pipelines as a standalone service.

  • The Docker container runtime has been deprecated on Kubernetes 1.20+. Kubeflow Pipelines has switched to use Emissary Executor by default from Kubeflow Pipelines 1.8. Emissary executor is Container runtime agnostic, meaning you are able to run Kubeflow Pipelines on Kubernetes cluster with any Container runtimes.

Documentation

Get started with your first pipeline and read further information in the Kubeflow Pipelines overview.

See the various ways you can use the Kubeflow Pipelines SDK.

See the Kubeflow Pipelines API doc for API specification.

Consult the Python SDK reference docs when writing pipelines using the Python SDK.

Contributing to Kubeflow Pipelines

Before you start contributing to Kubeflow Pipelines, read the guidelines in How to Contribute. To learn how to build and deploy Kubeflow Pipelines from source code, read the developer guide.

Kubeflow Pipelines Community Meeting

The meeting is happening every other Wed 10-11AM (PST) Calendar Invite or Join Meeting Directly

Meeting notes

Kubeflow Pipelines Slack Channel

#kubeflow-pipelines

Blog posts

Acknowledgments

Kubeflow pipelines uses Argo Workflows by default under the hood to orchestrate Kubernetes resources. The Argo community has been very supportive and we are very grateful. Additionally there is Tekton backend available as well. To access it, please refer to Kubeflow Pipelines with Tekton repository.