Top Related Projects
Library for training machine learning models with privacy for training data
Google's differential privacy libraries.
Perform data science on data that remains in someone else's server
Microsoft SEAL is an easy-to-use and powerful homomorphic encryption library.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Quick Overview
Opacus is an open-source library that enables training PyTorch models with differential privacy. It provides a simple and efficient way to add privacy guarantees to machine learning models, protecting individual data points while allowing useful insights to be extracted from datasets.
Pros
- Easy integration with existing PyTorch models and workflows
- Supports a wide range of optimizers and architectures
- Provides built-in privacy accounting and analysis tools
- Actively maintained and supported by the Facebook AI Research team
Cons
- May introduce performance overhead due to additional privacy computations
- Can potentially reduce model accuracy, especially with stricter privacy settings
- Requires careful tuning of privacy parameters to balance utility and privacy
- Limited to PyTorch ecosystem, not applicable to other deep learning frameworks
Code Examples
- Basic usage with a simple model:
import torch
from opacus import PrivacyEngine
model = torch.nn.Linear(10, 5)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
privacy_engine = PrivacyEngine()
model, optimizer, train_loader = privacy_engine.make_private(
module=model,
optimizer=optimizer,
data_loader=train_loader,
noise_multiplier=1.0,
max_grad_norm=1.0,
)
- Training loop with privacy:
for epoch in range(epochs):
for batch in train_loader:
optimizer.zero_grad()
loss = criterion(model(batch[0]), batch[1])
loss.backward()
optimizer.step()
epsilon = privacy_engine.get_epsilon(delta=1e-5)
print(f"Privacy budget spent: (ε = {epsilon:.2f}, δ = 1e-5)")
- Privacy analysis:
from opacus.utils.batch_memory_manager import BatchMemoryManager
with BatchMemoryManager(
data_loader=train_loader,
max_physical_batch_size=128,
optimizer=optimizer
) as memory_safe_data_loader:
for epoch in range(epochs):
for batch in memory_safe_data_loader:
# Training loop here
privacy_engine.steps = len(train_loader) * epochs
epsilon, best_alpha = privacy_engine.get_privacy_spent(delta=1e-5)
print(f"Privacy guarantee: ε = {epsilon:.2f} at δ = 1e-5")
Getting Started
To get started with Opacus, follow these steps:
- Install Opacus:
pip install opacus
- Import necessary modules and create a model:
import torch
from opacus import PrivacyEngine
model = YourModel()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
- Initialize PrivacyEngine and make your model private:
privacy_engine = PrivacyEngine()
model, optimizer, train_loader = privacy_engine.make_private(
module=model,
optimizer=optimizer,
data_loader=train_loader,
noise_multiplier=1.0,
max_grad_norm=1.0,
)
- Train your model as usual, and monitor privacy budget:
# Training loop here
epsilon = privacy_engine.get_epsilon(delta=1e-5)
print(f"Privacy budget spent: (ε = {epsilon:.2f}, δ = 1e-5)")
Competitor Comparisons
Library for training machine learning models with privacy for training data
Pros of TensorFlow Privacy
- More comprehensive privacy-preserving techniques beyond just differential privacy
- Better integration with TensorFlow ecosystem and tools
- More extensive documentation and examples
Cons of TensorFlow Privacy
- Steeper learning curve for those not familiar with TensorFlow
- Less frequent updates and maintenance compared to Opacus
Code Comparison
TensorFlow Privacy:
optimizer = dp_optimizer.DPAdamGaussianOptimizer(
l2_norm_clip=1.0,
noise_multiplier=0.1,
num_microbatches=1,
learning_rate=0.1
)
Opacus:
privacy_engine = PrivacyEngine()
model, optimizer, train_loader = privacy_engine.make_private(
module=model,
optimizer=optimizer,
data_loader=train_loader,
noise_multiplier=1.1,
max_grad_norm=1.0,
)
Both libraries provide similar functionality for implementing differential privacy in machine learning models, but with syntax and integration specific to their respective frameworks (TensorFlow and PyTorch).
Google's differential privacy libraries.
Pros of differential-privacy
- Broader language support (C++, Go, Java)
- More comprehensive privacy tools beyond ML/DL
- Extensive documentation and examples
Cons of differential-privacy
- Less focus on deep learning applications
- Steeper learning curve for ML practitioners
- Not integrated with popular ML frameworks
Code Comparison
Opacus (PyTorch-based):
model = Net()
optimizer = optim.SGD(model.parameters(), lr=0.1)
privacy_engine = PrivacyEngine()
model, optimizer, train_loader = privacy_engine.make_private(
module=model,
optimizer=optimizer,
data_loader=train_loader,
noise_multiplier=1.1,
max_grad_norm=1.0,
)
differential-privacy (C++):
std::unique_ptr<BoundedMean<int64_t>> mean =
BoundedMean<int64_t>::Builder()
.SetEpsilon(1.0)
.SetLower(0)
.SetUpper(100)
.Build()
.ValueOrDie();
Output result = mean->Result(input.begin(), input.end()).ValueOrDie();
Perform data science on data that remains in someone else's server
Pros of PySyft
- Broader scope: Supports various privacy-preserving techniques beyond differential privacy
- Federated learning capabilities: Enables training on decentralized data
- Multi-framework support: Works with PyTorch, TensorFlow, and other ML frameworks
Cons of PySyft
- Steeper learning curve: More complex due to its broader feature set
- Potentially slower performance: Overhead from supporting multiple frameworks
- Less specialized for differential privacy: May lack some DP-specific optimizations
Code Comparison
PySyft example:
import syft as sy
hook = sy.TorchHook(torch)
bob = sy.VirtualWorker(hook, id="bob")
x = torch.tensor([1, 2, 3, 4, 5]).send(bob)
y = x + x
Opacus example:
from opacus import PrivacyEngine
model = YourModel()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
privacy_engine = PrivacyEngine()
model, optimizer, train_loader = privacy_engine.make_private(
module=model, optimizer=optimizer, data_loader=train_loader, noise_multiplier=1.0, max_grad_norm=1.0
)
Microsoft SEAL is an easy-to-use and powerful homomorphic encryption library.
Pros of SEAL
- Focuses on homomorphic encryption, providing advanced privacy-preserving computation capabilities
- Supports multiple programming languages (C++, .NET, Python)
- Offers a wider range of cryptographic operations and techniques
Cons of SEAL
- Steeper learning curve due to its focus on complex cryptographic operations
- May have higher computational overhead for certain tasks compared to differential privacy approaches
- Less integrated with machine learning frameworks
Code Comparison
SEAL (C++):
Encryptor encryptor(context, public_key);
Ciphertext encrypted = encryptor.encrypt(encoder.encode(5.0));
Opacus (Python):
privacy_engine = PrivacyEngine()
model, optimizer, train_loader = privacy_engine.make_private(
module=model, optimizer=optimizer, data_loader=train_loader, noise_multiplier=1.0, max_grad_norm=1.0
)
Summary
SEAL and Opacus serve different purposes in the privacy-preserving computation space. SEAL focuses on homomorphic encryption, offering advanced cryptographic operations across multiple languages. Opacus, on the other hand, specializes in differential privacy for PyTorch models, providing an easier integration with machine learning workflows. While SEAL offers more flexibility in terms of cryptographic operations, Opacus is more tailored for privacy-preserving machine learning tasks with potentially lower computational overhead.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Pros of JAX
- Better performance and optimization capabilities, especially for large-scale machine learning tasks
- More flexible and customizable, allowing for easier implementation of complex algorithms
- Supports automatic differentiation and vectorization, enabling efficient gradient computations
Cons of JAX
- Steeper learning curve compared to PyTorch-based Opacus
- Smaller community and ecosystem, potentially leading to fewer resources and third-party libraries
- Less focus on privacy-preserving machine learning techniques out-of-the-box
Code Comparison
Opacus (PyTorch):
from opacus import PrivacyEngine
model = YourModel()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
privacy_engine = PrivacyEngine()
model, optimizer, train_loader = privacy_engine.make_private(
module=model, optimizer=optimizer, data_loader=train_loader, noise_multiplier=1.0, max_grad_norm=1.0
)
JAX:
import jax
import jax.numpy as jnp
@jax.jit
def train_step(params, batch):
def loss_fn(params):
# Define your loss function here
return loss
grad_fn = jax.value_and_grad(loss_fn)
loss, grads = grad_fn(params)
return loss, jax.tree_map(lambda g: jnp.clip(g, -1.0, 1.0), grads)
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Opacus is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the client, has little impact on training performance, and allows the client to online track the privacy budget expended at any given moment.
Target audience
This code release is aimed at two target audiences:
- ML practitioners will find this to be a gentle introduction to training a model with differential privacy as it requires minimal code changes.
- Differential Privacy researchers will find this easy to experiment and tinker with, allowing them to focus on what matters.
Installation
The latest release of Opacus can be installed via pip
:
pip install opacus
OR, alternatively, via conda
:
conda install -c conda-forge opacus
You can also install directly from the source for the latest features (along with its quirks and potentially occasional bugs):
git clone https://github.com/pytorch/opacus.git
cd opacus
pip install -e .
Getting started
To train your model with differential privacy, all you need to do is to instantiate a PrivacyEngine
and pass your model, data_loader, and optimizer to the engine's make_private()
method to obtain their private counterparts.
# define your components as usual
model = Net()
optimizer = SGD(model.parameters(), lr=0.05)
data_loader = torch.utils.data.DataLoader(dataset, batch_size=1024)
# enter PrivacyEngine
privacy_engine = PrivacyEngine()
model, optimizer, data_loader = privacy_engine.make_private(
module=model,
optimizer=optimizer,
data_loader=data_loader,
noise_multiplier=1.1,
max_grad_norm=1.0,
)
# Now it's business as usual
The MNIST example shows an end-to-end run using Opacus. The examples folder contains more such examples.
Migrating to 1.0
Opacus 1.0 introduced many improvements to the library, but also some breaking changes. If you've been using Opacus 0.x and want to update to the latest release, please use this Migration Guide
Learn more
Interactive tutorials
We've built a series of IPython-based tutorials as a gentle introduction to training models with privacy and using various Opacus features.
- Building an Image Classifier with Differential Privacy
- Training a differentially private LSTM model for name classification
- Building text classifier with Differential Privacy on BERT
- Opacus Guide: Introduction to advanced features
- Opacus Guide: Grad samplers
- Opacus Guide: Module Validator and Fixer
Technical report and citation
The technical report introducing Opacus, presenting its design principles, mathematical foundations, and benchmarks can be found here.
Consider citing the report if you use Opacus in your papers, as follows:
@article{opacus,
title={Opacus: {U}ser-Friendly Differential Privacy Library in {PyTorch}},
author={Ashkan Yousefpour and Igor Shilov and Alexandre Sablayrolles and Davide Testuggine and Karthik Prasad and Mani Malek and John Nguyen and Sayan Ghosh and Akash Bharadwaj and Jessica Zhao and Graham Cormode and Ilya Mironov},
journal={arXiv preprint arXiv:2109.12298},
year={2021}
}
Blogposts and talks
If you want to learn more about DP-SGD and related topics, check out our series of blogposts and talks:
- Differential Privacy Series Part 1 | DP-SGD Algorithm Explained
- Differential Privacy Series Part 2 | Efficient Per-Sample Gradient Computation in Opacus
- PriCon 2020 Tutorial: Differentially Private Model Training with Opacus
- Differential Privacy on PyTorch | PyTorch Developer Day 2020
- Opacus v1.0 Highlights | PyTorch Developer Day 2021
FAQ
Check out the FAQ page for answers to some of the most frequently asked questions about differential privacy and Opacus.
Contributing
See the CONTRIBUTING file for how to help out. Do also check out the README files inside the repo to learn how the code is organized.
License
This code is released under Apache 2.0, as found in the LICENSE file.
Top Related Projects
Library for training machine learning models with privacy for training data
Google's differential privacy libraries.
Perform data science on data that remains in someone else's server
Microsoft SEAL is an easy-to-use and powerful homomorphic encryption library.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot