determined
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
Top Related Projects
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Open source platform for the machine learning lifecycle
Tensors and Dynamic neural networks in Python with strong GPU acceleration
An Open Source Machine Learning Framework for Everyone
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
Quick Overview
Determined AI is an open-source deep learning training platform that simplifies the process of training, experimenting with, and deploying machine learning models. It provides a comprehensive set of tools for distributed training, hyperparameter tuning, and experiment management, making it easier for data scientists and machine learning engineers to build and scale their ML workflows.
Pros
- Seamless distributed training and hyperparameter tuning
- Built-in experiment tracking and visualization
- Support for popular deep learning frameworks like PyTorch and TensorFlow
- Easy-to-use CLI and web interface for managing experiments
Cons
- Steeper learning curve compared to simpler ML tools
- Requires cluster setup for full distributed capabilities
- Limited support for non-deep learning ML algorithms
- Smaller community compared to some other ML platforms
Code Examples
- Defining a model using Determined AI's PyTorch interface:
from determined.pytorch import PyTorchTrial
class MyModel(PyTorchTrial):
def __init__(self, context):
super().__init__(context)
self.model = nn.Sequential(
nn.Linear(10, 64),
nn.ReLU(),
nn.Linear(64, 1)
)
self.optimizer = torch.optim.Adam(self.model.parameters())
def train_batch(self, batch, epoch_idx, batch_idx):
inputs, labels = batch
outputs = self.model(inputs)
loss = nn.MSELoss()(outputs, labels)
self.context.backward(loss)
self.context.step_optimizer(self.optimizer)
return {"loss": loss.item()}
- Configuring hyperparameter search:
hyperparameters:
learning_rate:
type: log
minval: -5.0
maxval: 0.0
base: 10
batch_size:
type: categorical
vals: [32, 64, 128]
searcher:
name: adaptive_asha
metric: validation_loss
smaller_is_better: true
max_trials: 50
- Launching an experiment using the Determined AI CLI:
det experiment create config.yaml model_def.py
Getting Started
- Install Determined AI:
pip install determined
-
Create a simple model definition (e.g.,
model_def.py
) and configuration file (e.g.,config.yaml
). -
Start a local Determined cluster:
det deploy local cluster-up
- Launch an experiment:
det experiment create config.yaml model_def.py
- Monitor your experiment using the web UI or CLI:
det experiment list
Competitor Comparisons
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Pros of Ray
- More extensive ecosystem with libraries for various tasks (e.g., RLlib, Ray Serve)
- Better suited for distributed computing and large-scale machine learning
- Larger community and more frequent updates
Cons of Ray
- Steeper learning curve due to its broader scope
- Can be overkill for smaller projects or simpler machine learning tasks
- Less focus on experiment tracking and reproducibility
Code Comparison
Ray:
import ray
@ray.remote
def f(x):
return x * x
futures = [f.remote(i) for i in range(4)]
print(ray.get(futures))
Determined:
from determined.experimental import Determined
def train():
# Training logic here
pass
if __name__ == "__main__":
with Determined() as det:
det.create_experiment(train)
Ray focuses on distributed computing primitives, while Determined emphasizes experiment management and reproducibility. Ray's code shows remote function execution, whereas Determined's code demonstrates experiment creation and management.
Open source platform for the machine learning lifecycle
Pros of MLflow
- Broader ecosystem support and integrations with various ML frameworks
- Lightweight and easy to set up for small to medium-sized projects
- Flexible experiment tracking and model versioning capabilities
Cons of MLflow
- Less focus on distributed training and resource management
- Limited built-in hyperparameter tuning capabilities
- Requires more manual configuration for advanced use cases
Code Comparison
MLflow:
import mlflow
mlflow.start_run()
mlflow.log_param("learning_rate", 0.01)
mlflow.log_metric("accuracy", 0.85)
mlflow.end_run()
Determined:
from determined.experimental import Determined
with Determined() as det:
experiment = det.create_experiment(
config={"hyperparameters": {"learning_rate": 0.01}}
)
experiment.wait()
Summary
MLflow is a versatile ML lifecycle management tool suitable for various project sizes, offering easy setup and flexible experiment tracking. However, it may require more manual configuration for advanced scenarios. Determined, on the other hand, provides stronger support for distributed training and resource management, making it more suitable for large-scale ML projects with complex infrastructure requirements.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Pros of PyTorch
- Larger community and ecosystem, with more resources and third-party libraries
- More flexible and customizable for low-level research and experimentation
- Wider industry adoption and support
Cons of PyTorch
- Steeper learning curve for beginners
- Requires more boilerplate code for training and evaluation loops
- Less integrated with cloud and distributed computing platforms
Code Comparison
PyTorch:
import torch
model = torch.nn.Linear(10, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
criterion = torch.nn.MSELoss()
for epoch in range(100):
optimizer.zero_grad()
output = model(input_data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
Determined:
from determined.pytorch import PyTorchTrial
class MyTrial(PyTorchTrial):
def __init__(self, context):
self.model = torch.nn.Linear(10, 1)
self.optimizer = torch.optim.SGD(self.model.parameters(), lr=0.01)
self.criterion = torch.nn.MSELoss()
def train_batch(self, batch, epoch_idx, batch_idx):
output = self.model(batch[0])
loss = self.criterion(output, batch[1])
return {"loss": loss}
An Open Source Machine Learning Framework for Everyone
Pros of TensorFlow
- Larger ecosystem with extensive libraries and tools
- Broader industry adoption and community support
- More comprehensive documentation and learning resources
Cons of TensorFlow
- Steeper learning curve for beginners
- Can be more complex to set up and configure
- Less focus on distributed training out-of-the-box
Code Comparison
TensorFlow:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy')
Determined:
from determined.keras import TFKerasTrial
class MyTrial(TFKerasTrial):
def build_model(self):
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy')
return model
Determined builds on top of TensorFlow, providing a higher-level abstraction for distributed training and experiment management. While TensorFlow offers more flexibility and a wider range of features, Determined simplifies the process of scaling machine learning workflows and managing experiments across multiple GPUs or machines.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of Transformers
- Extensive library of pre-trained models for various NLP tasks
- Active community with frequent updates and contributions
- Comprehensive documentation and tutorials
Cons of Transformers
- Focused primarily on NLP, limiting its use for other ML domains
- Can be resource-intensive for large models and datasets
- Steeper learning curve for beginners due to its extensive features
Code Comparison
Transformers:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
Determined:
from determined.experimental import Determined
model = Determined(master_url="http://localhost:8080").get_experiment(1).top_checkpoint()
Transformers provides a more straightforward approach for loading pre-trained models, while Determined focuses on experiment management and distributed training. Transformers is ideal for NLP tasks, whereas Determined offers a broader platform for various ML workflows and scalable training.
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
Pros of wandb
- More extensive visualization and experiment tracking capabilities
- Larger community and ecosystem with integrations for many ML frameworks
- Easier to set up and use for beginners
Cons of wandb
- Less focus on distributed training and resource management
- Potentially higher costs for large-scale projects or teams
- Limited built-in hyperparameter tuning capabilities
Code Comparison
wandb:
import wandb
wandb.init(project="my-project")
wandb.config.hyperparameters = {...}
model.fit(X, y)
wandb.log({"loss": loss, "accuracy": accuracy})
determined:
from determined.experimental import Determined
with Determined() as det:
config = det.create_experiment(
name="my-experiment",
config={"hyperparameters": {...}}
)
model = MyModel(config)
model.fit(X, y)
The code comparison shows that wandb focuses on logging and tracking, while determined emphasizes experiment configuration and resource management. wandb's API is simpler for basic use cases, while determined provides more control over experiment creation and execution.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Determined is an all-in-one deep learning platform, compatible with PyTorch and TensorFlow.
It takes care of:
- Distributed training for faster results.
- Hyperparameter tuning for obtaining the best models.
- Resource management for cutting cloud GPU costs.
- Experiment tracking for analysis and reproducibility.
How Determined Works
The main components of Determined are the Python library, the command line interface (CLI), and the Web UI.
Python Library
Use the Python library to make your existing PyTorch or Tensorflow code compatible with Determined.
You can do this by organizing your code into one of the class-based APIs:
from determined.pytorch import PyTorchTrial
class YourExperiment(PyTorchTrial):
def __init__(self, context):
...
Or by using just the functions you want, via the Core API:
import determined as det
with det.core.init() as core_context:
...
Command Line Interface (CLI)
You can use the CLI to:
- Start a Determined cluster locally:
det deploy local cluster-up
- Launch Determined on cloud services, such as Amazon Web Services (AWS) or Google Cloud Platform (GCP):
det deploy aws up
- Train your models:
det experiment create gpt.yaml .
Configure everything from distributed training to hyperparameter tuning using YAML files:
resources:
slots_per_trial: 8
priority: 1
hyperparameters:
learning_rate:
type: double
minval: .0001
maxval: 1.0
searcher:
name: adaptive_asha
metric: validation_loss
smaller_is_better: true
Web UI
Use the Web UI to view loss curves, hyperparameter plots, code and configuration snapshots, model registries, cluster utilization, debugging logs, performance profiling reports, and more.
Installation
To install the CLI:
pip install determined
Then use det deploy
to start the Determined cluster locally, or on cloud services like AWS and GCP.
For installation details, visit the the cluster deployment guide for your environment:
Examples
Get familiar with Determined by exploring the 30+ examples in the examples folder and the determined-examples repo.
Documentation
- Documentation
- Quick Start Guide
- Tutorials:
- User Guides:
Community
If you need help, want to file a bug report, or just want to keep up-to-date with the latest news about Determined, please join the Determined community!
- Slack is the best place to ask questions about Determined and get support. Click here to join our Slack.
- You can also follow us on YouTube and Twitter.
- You can also join the community mailing list to ask questions about the project and receive announcements.
- To report a bug, open an issue on GitHub.
- To report a security issue, email
security@determined.ai
.
Contributing
License
Top Related Projects
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Open source platform for the machine learning lifecycle
Tensors and Dynamic neural networks in Python with strong GPU acceleration
An Open Source Machine Learning Framework for Everyone
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot