determined

Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.

3,155

365

3,155

102

View on GitHub

Top Related Projects

ray

36,653

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

mlflow

20,329

Open source platform for the machine learning lifecycle

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

wandb

9,810

The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.

Quick Overview

Determined AI is an open-source deep learning training platform that simplifies the process of training, experimenting with, and deploying machine learning models. It provides a comprehensive set of tools for distributed training, hyperparameter tuning, and experiment management, making it easier for data scientists and machine learning engineers to build and scale their ML workflows.

Pros

Seamless distributed training and hyperparameter tuning
Built-in experiment tracking and visualization
Support for popular deep learning frameworks like PyTorch and TensorFlow
Easy-to-use CLI and web interface for managing experiments

Cons

Steeper learning curve compared to simpler ML tools
Requires cluster setup for full distributed capabilities
Limited support for non-deep learning ML algorithms
Smaller community compared to some other ML platforms

Code Examples

Defining a model using Determined AI's PyTorch interface:

from determined.pytorch import PyTorchTrial

class MyModel(PyTorchTrial):
    def __init__(self, context):
        super().__init__(context)
        self.model = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 1)
        )
        self.optimizer = torch.optim.Adam(self.model.parameters())

    def train_batch(self, batch, epoch_idx, batch_idx):
        inputs, labels = batch
        outputs = self.model(inputs)
        loss = nn.MSELoss()(outputs, labels)
        self.context.backward(loss)
        self.context.step_optimizer(self.optimizer)
        return {"loss": loss.item()}

Configuring hyperparameter search:

hyperparameters:
  learning_rate:
    type: log
    minval: -5.0
    maxval: 0.0
    base: 10
  batch_size:
    type: categorical
    vals: [32, 64, 128]

searcher:
  name: adaptive_asha
  metric: validation_loss
  smaller_is_better: true
  max_trials: 50

Launching an experiment using the Determined AI CLI:

det experiment create config.yaml model_def.py

Getting Started

Install Determined AI:

pip install determined

Create a simple model definition (e.g., model_def.py) and configuration file (e.g., config.yaml).
Start a local Determined cluster:

det deploy local cluster-up

Launch an experiment:

det experiment create config.yaml model_def.py

Monitor your experiment using the web UI or CLI:

det experiment list

Competitor Comparisons

ray

36,653

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Pros of Ray

More extensive ecosystem with libraries for various tasks (e.g., RLlib, Ray Serve)
Better suited for distributed computing and large-scale machine learning
Larger community and more frequent updates

Cons of Ray

Steeper learning curve due to its broader scope
Can be overkill for smaller projects or simpler machine learning tasks
Less focus on experiment tracking and reproducibility

Code Comparison

Ray:

import ray

@ray.remote
def f(x):
    return x * x

futures = [f.remote(i) for i in range(4)]
print(ray.get(futures))

Determined:

from determined.experimental import Determined

def train():
    # Training logic here
    pass

if __name__ == "__main__":
    with Determined() as det:
        det.create_experiment(train)

Ray focuses on distributed computing primitives, while Determined emphasizes experiment management and reproducibility. Ray's code shows remote function execution, whereas Determined's code demonstrates experiment creation and management.

mlflow

20,329

Open source platform for the machine learning lifecycle

Pros of MLflow

Broader ecosystem support and integrations with various ML frameworks
Lightweight and easy to set up for small to medium-sized projects
Flexible experiment tracking and model versioning capabilities

Cons of MLflow

Less focus on distributed training and resource management
Limited built-in hyperparameter tuning capabilities
Requires more manual configuration for advanced use cases

Code Comparison

MLflow:

import mlflow

mlflow.start_run()
mlflow.log_param("learning_rate", 0.01)
mlflow.log_metric("accuracy", 0.85)
mlflow.end_run()

Determined:

from determined.experimental import Determined

with Determined() as det:
    experiment = det.create_experiment(
        config={"hyperparameters": {"learning_rate": 0.01}}
    )
    experiment.wait()

Summary

MLflow is a versatile ML lifecycle management tool suitable for various project sizes, offering easy setup and flexible experiment tracking. However, it may require more manual configuration for advanced scenarios. Determined, on the other hand, provides stronger support for distributed training and resource management, making it more suitable for large-scale ML projects with complex infrastructure requirements.

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

Larger community and ecosystem, with more resources and third-party libraries
More flexible and customizable for low-level research and experimentation
Wider industry adoption and support

Cons of PyTorch

Steeper learning curve for beginners
Requires more boilerplate code for training and evaluation loops
Less integrated with cloud and distributed computing platforms

Code Comparison

PyTorch:

import torch

model = torch.nn.Linear(10, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
criterion = torch.nn.MSELoss()

for epoch in range(100):
    optimizer.zero_grad()
    output = model(input_data)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()

Determined:

from determined.pytorch import PyTorchTrial

class MyTrial(PyTorchTrial):
    def __init__(self, context):
        self.model = torch.nn.Linear(10, 1)
        self.optimizer = torch.optim.SGD(self.model.parameters(), lr=0.01)
        self.criterion = torch.nn.MSELoss()

    def train_batch(self, batch, epoch_idx, batch_idx):
        output = self.model(batch[0])
        loss = self.criterion(output, batch[1])
        return {"loss": loss}

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Larger ecosystem with extensive libraries and tools
Broader industry adoption and community support
More comprehensive documentation and learning resources

Cons of TensorFlow

Steeper learning curve for beginners
Can be more complex to set up and configure
Less focus on distributed training out-of-the-box

Code Comparison

TensorFlow:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy')

Determined:

from determined.keras import TFKerasTrial

class MyTrial(TFKerasTrial):
    def build_model(self):
        model = tf.keras.Sequential([
            tf.keras.layers.Dense(64, activation='relu'),
            tf.keras.layers.Dense(10, activation='softmax')
        ])
        model.compile(optimizer='adam', loss='categorical_crossentropy')
        return model

Determined builds on top of TensorFlow, providing a higher-level abstraction for distributed training and experiment management. While TensorFlow offers more flexibility and a wider range of features, Determined simplifies the process of scaling machine learning workflows and managing experiments across multiple GPUs or machines.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of Transformers

Extensive library of pre-trained models for various NLP tasks
Active community with frequent updates and contributions
Comprehensive documentation and tutorials

Cons of Transformers

Focused primarily on NLP, limiting its use for other ML domains
Can be resource-intensive for large models and datasets
Steeper learning curve for beginners due to its extensive features

Code Comparison

Transformers:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

Determined:

from determined.experimental import Determined

model = Determined(master_url="http://localhost:8080").get_experiment(1).top_checkpoint()

Transformers provides a more straightforward approach for loading pre-trained models, while Determined focuses on experiment management and distributed training. Transformers is ideal for NLP tasks, whereas Determined offers a broader platform for various ML workflows and scalable training.

wandb

9,810

The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.

Pros of wandb

More extensive visualization and experiment tracking capabilities
Larger community and ecosystem with integrations for many ML frameworks
Easier to set up and use for beginners

Cons of wandb

Less focus on distributed training and resource management
Potentially higher costs for large-scale projects or teams
Limited built-in hyperparameter tuning capabilities

Code Comparison

wandb:

import wandb

wandb.init(project="my-project")
wandb.config.hyperparameters = {...}
model.fit(X, y)
wandb.log({"loss": loss, "accuracy": accuracy})

determined:

from determined.experimental import Determined

with Determined() as det:
    config = det.create_experiment(
        name="my-experiment",
        config={"hyperparameters": {...}}
    )
    model = MyModel(config)
    model.fit(X, y)

The code comparison shows that wandb focuses on logging and tracking, while determined emphasizes experiment configuration and resource management. wandb's API is simpler for basic use cases, while determined provides more control over experiment creation and execution.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Determined is an all-in-one deep learning platform, compatible with PyTorch and TensorFlow.

It takes care of:

Distributed training for faster results.
Hyperparameter tuning for obtaining the best models.
Resource management for cutting cloud GPU costs.
Experiment tracking for analysis and reproducibility.

Features gif

How Determined Works

The main components of Determined are the Python library, the command line interface (CLI), and the Web UI.

Python Library

Use the Python library to make your existing PyTorch or Tensorflow code compatible with Determined.

You can do this by organizing your code into one of the class-based APIs:

from determined.pytorch import PyTorchTrial

class YourExperiment(PyTorchTrial):
  def __init__(self, context):
    ...

Or by using just the functions you want, via the Core API:

import determined as det

with det.core.init() as core_context:
    ...

Command Line Interface (CLI)

You can use the CLI to:

Start a Determined cluster locally:

det deploy local cluster-up

Launch Determined on cloud services, such as Amazon Web Services (AWS) or Google Cloud Platform (GCP):

det deploy aws up

Train your models:

det experiment create gpt.yaml .

Configure everything from distributed training to hyperparameter tuning using YAML files:

resources:
  slots_per_trial: 8
  priority: 1
hyperparameters:
  learning_rate:
    type: double
    minval: .0001
    maxval: 1.0
searcher:
  name: adaptive_asha
  metric: validation_loss
  smaller_is_better: true

Web UI

Use the Web UI to view loss curves, hyperparameter plots, code and configuration snapshots, model registries, cluster utilization, debugging logs, performance profiling reports, and more.

Web UI

Installation

To install the CLI:

pip install determined

Then use det deploy to start the Determined cluster locally, or on cloud services like AWS and GCP.

For installation details, visit the the cluster deployment guide for your environment:

Examples

Get familiar with Determined by exploring the 30+ examples in the examples folder and the determined-examples repo.

Documentation

Documentation
Quick Start Guide
Tutorials:
- PyTorch MNIST Tutorial
- TensorFlow Keras MNIST Tutorial
User Guides:

Community

If you need help, want to file a bug report, or just want to keep up-to-date with the latest news about Determined, please join the Determined community!

Slack is the best place to ask questions about Determined and get support. Click here to join our Slack.
You can also follow us on YouTube and Twitter.
You can also join the community mailing list to ask questions about the project and receive announcements.
To report a bug, open an issue on GitHub.
To report a security issue, email security@determined.ai.

Contributing

Contributor's Guide

License

Apache V2

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot