kubeflow

Machine Learning Toolkit for Kubernetes

15,113

2,538

15,113

View on GitHub

Top Related Projects

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

pytorch

93,668

Tensors and Dynamic neural networks in Python with strong GPU acceleration

The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

ray

38,187

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

spark

42,015

Apache Spark - A unified analytics engine for large-scale data processing

Quick Overview

Kubeflow is an open-source machine learning platform designed to deploy, scale, and manage ML workflows on Kubernetes. It provides a comprehensive suite of tools for data scientists, ML engineers, and DevOps teams to streamline the entire machine learning lifecycle, from experimentation to production.

Pros

Seamless integration with Kubernetes for scalable and portable ML workflows
Comprehensive ecosystem of tools for various ML tasks (e.g., TensorFlow, PyTorch, Jupyter notebooks)
Supports multi-tenancy and collaboration features for team-based ML projects
Extensible architecture allowing for custom components and integrations

Cons

Steep learning curve, especially for those unfamiliar with Kubernetes
Complex setup and configuration process
Resource-intensive, requiring significant computational resources
Limited support for on-premises deployments compared to cloud-based solutions

Getting Started

To get started with Kubeflow, follow these steps:

Install kubectl and kustomize:

# Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# Install kustomize
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
sudo mv kustomize /usr/local/bin/

Clone the Kubeflow manifests repository:

git clone https://github.com/kubeflow/manifests.git
cd manifests

Install Kubeflow:

while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done

Access the Kubeflow dashboard:

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

Visit http://localhost:8080 in your browser to access the Kubeflow dashboard. Note that this is a basic setup, and you may need to configure additional components based on your specific requirements.

Competitor Comparisons

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

More mature and established project with a larger community
Broader scope, covering various machine learning tasks beyond just deployment
Extensive documentation and tutorials for beginners and advanced users

Cons of TensorFlow

Steeper learning curve for newcomers to machine learning
Less focused on end-to-end ML workflows and production deployment
Requires additional tools for orchestration and scaling in production environments

Code Comparison

TensorFlow example:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy')

Kubeflow example:

apiVersion: "kubeflow.org/v1beta1"
kind: TFJob
metadata:
  name: mnist-train
spec:
  tfReplicaSpecs:
    Worker:
      replicas: 1
      template:
        spec:
          containers:
            - name: tensorflow
              image: mnist-model:v1

The TensorFlow example shows model definition and compilation, while the Kubeflow example demonstrates job deployment configuration for distributed training in Kubernetes.

pytorch

93,668

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

More focused on deep learning and neural networks
Easier to debug with dynamic computational graphs
Larger community and more extensive ecosystem of tools and libraries

Cons of PyTorch

Less integrated with cloud-native and distributed computing environments
Narrower scope, primarily for deep learning rather than end-to-end ML workflows
Steeper learning curve for beginners compared to high-level frameworks

Code Comparison

PyTorch example:

import torch

x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = torch.add(x, y)

Kubeflow example:

from kfp import dsl

@dsl.pipeline(name='My pipeline')
def my_pipeline():
    preprocess_op = dsl.ContainerOp(
        name='Preprocess',
        image='preprocess-image:latest'
    )

While PyTorch focuses on tensor operations and neural network building blocks, Kubeflow provides a higher-level abstraction for defining and managing machine learning workflows in a distributed environment. PyTorch is more suitable for researchers and developers working directly with deep learning models, while Kubeflow is better suited for teams deploying and scaling machine learning pipelines in production environments.

mlflow

21,434

Pros of MLflow

Lightweight and easy to set up, with minimal dependencies
Language-agnostic, supporting Python, R, Java, and more
Flexible integration with various ML frameworks and tools

Cons of MLflow

Less comprehensive end-to-end ML platform compared to Kubeflow
Limited native support for distributed training and serving
Fewer built-in components for advanced ML workflows

Code Comparison

MLflow tracking example:

import mlflow

mlflow.start_run()
mlflow.log_param("learning_rate", 0.01)
mlflow.log_metric("accuracy", 0.85)
mlflow.end_run()

Kubeflow pipeline example:

import kfp
from kfp import dsl

@dsl.pipeline(name='My pipeline')
def my_pipeline():
    train_op = dsl.ContainerOp(
        name='Train Model',
        image='my-image:latest',
        command=['python', 'train.py']
    )

kfp.compiler.Compiler().compile(my_pipeline, 'pipeline.yaml')

MLflow focuses on experiment tracking and model management, while Kubeflow provides a more comprehensive platform for building end-to-end ML workflows. MLflow is easier to adopt for smaller projects, while Kubeflow offers more scalability and integration with Kubernetes for larger, production-grade deployments.

ray

38,187

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Pros of Ray

Simpler setup and deployment process
More flexible and adaptable for general-purpose distributed computing
Better support for reinforcement learning and other AI/ML tasks

Cons of Ray

Less integrated with Kubernetes ecosystem
Fewer built-in tools for ML workflow management
Less extensive enterprise support options

Code Comparison

Kubeflow example:

from kubeflow.fairing import TrainJob

def train_fn():
    # Training logic here

job = TrainJob(entry_point=train_fn, docker_registry='gcr.io/my-project')
job.submit()

Ray example:

import ray

@ray.remote
def train_fn():
    # Training logic here

ray.init()
result = ray.get(train_fn.remote())

Both Kubeflow and Ray are powerful frameworks for distributed machine learning, but they have different focuses. Kubeflow is more tightly integrated with Kubernetes and provides a comprehensive ML platform, while Ray offers a more flexible distributed computing framework that can be applied to various tasks beyond ML. The choice between them depends on specific project requirements and existing infrastructure.

spark

42,015

Apache Spark - A unified analytics engine for large-scale data processing

Pros of Spark

Mature and widely adopted distributed computing framework
Supports a broader range of data processing tasks beyond machine learning
Extensive ecosystem with numerous libraries and integrations

Cons of Spark

Steeper learning curve for beginners in data processing
Less focused on end-to-end machine learning workflows
Requires more manual configuration for ML-specific tasks

Code Comparison

Spark (PySpark):

from pyspark.ml.classification import LogisticRegression
from pyspark.ml.feature import VectorAssembler

# Prepare data
assembler = VectorAssembler(inputCols=["feature1", "feature2"], outputCol="features")
data = assembler.transform(df)

# Train model
lr = LogisticRegression(maxIter=10)
model = lr.fit(data)

Kubeflow (TensorFlow):

import tensorflow as tf

# Define model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# Compile and train
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(x_train, y_train, epochs=10)

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Kubeflow

About Kubeflow

Kubeflow makes artificial intelligence and machine learning simple, portable, and scalable.

We are an ecosystem of Kubernetes based components for each stage in the AI/ML Lifecycle with support for best-in-class open source tools and frameworks. Please refer to the official documentation for more information.

Kubeflow Components

The Kubeflow Ecosystem is composed of several projects known as Kubeflow Components.

The following table lists the components and their respective source code repositories:

Component	Source Code
KServe	`kserve/kserve`
Kubeflow Katib	`kubeflow/katib`
Kubeflow Model Registry	`kubeflow/model-registry`
Kubeflow MPI Operator	`kubeflow/mpi-operator`
Kubeflow Notebooks	`kubeflow/notebooks`
Kubeflow Pipelines	`kubeflow/pipelines`
Kubeflow Spark Operator	`kubeflow/spark-operator`
Kubeflow Training Operator	`kubeflow/training-operator`

Kubeflow Platform

The Kubeflow Platform refers to the full suite of Kubeflow Components bundled together with additional integration and management tools.

The following table lists the platform components and their respective source code repositories:

Component	Source Code
Central Dashboard	`kubeflow/dashboard`
Profile Controller	`kubeflow/dashboard`
Kubeflow Manifests	`kubeflow/manifests`

Kubeflow Community & Contributing

Kubeflow is a community-lead project maintained by the Kubeflow Working Groups under the guidance of the Kubeflow Steering Committee.

We encourage you to learn about the Kubeflow Community and how to contribute to the project!

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot