privacy

Library for training machine learning models with privacy for training data

1,974

465

1,974

132

View on GitHub

Top Related Projects

differential-privacy

3,191

Google's differential privacy libraries.

PySyft

9,755

Perform data science on data that remains in someone else's server

opacus

1,835

Training PyTorch models with differential privacy

SEAL

3,845

Microsoft SEAL is an easy-to-use and powerful homomorphic encryption library.

Quick Overview

TensorFlow Privacy is an open-source library that implements privacy-preserving machine learning techniques, specifically differential privacy (DP) for TensorFlow. It provides tools for training and evaluating machine learning models with strong privacy guarantees, helping developers protect sensitive data while still benefiting from machine learning insights.

Pros

Implements state-of-the-art differential privacy techniques for machine learning
Seamlessly integrates with TensorFlow, a popular deep learning framework
Provides a variety of DP optimizers and privacy accounting methods
Actively maintained by Google and the open-source community

Cons

May introduce performance overhead due to privacy-preserving computations
Requires careful tuning of privacy parameters to balance utility and privacy
Limited to TensorFlow ecosystem, not directly applicable to other ML frameworks
Learning curve for understanding and implementing differential privacy concepts

Code Examples

Creating a DP-SGD optimizer:

import tensorflow_privacy as tfp

optimizer = tfp.DPKerasSGDOptimizer(
    l2_norm_clip=1.0,
    noise_multiplier=0.5,
    num_microbatches=1,
    learning_rate=0.1
)

Training a model with differential privacy:

import tensorflow as tf
import tensorflow_privacy as tfp

model = tf.keras.Sequential([...])  # Define your model architecture

dp_optimizer = tfp.DPKerasSGDOptimizer(...)  # Configure DP optimizer

model.compile(optimizer=dp_optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10, batch_size=32)

Computing privacy guarantees:

from tensorflow_privacy.privacy.analysis import compute_dp_sgd_privacy

eps, delta = compute_dp_sgd_privacy(n=60000, batch_size=256, noise_multiplier=1.3, epochs=60, delta=1e-5)
print(f"ε = {eps:.2f}, δ = {delta}")

Getting Started

To get started with TensorFlow Privacy:

Install the library:

pip install tensorflow-privacy

Import the library in your Python script:

import tensorflow_privacy as tfp

Use DP optimizers and tools in your TensorFlow models:

optimizer = tfp.DPKerasSGDOptimizer(...)
model.compile(optimizer=optimizer, ...)

Analyze privacy guarantees:

from tensorflow_privacy.privacy.analysis import compute_dp_sgd_privacy
eps, delta = compute_dp_sgd_privacy(...)

Competitor Comparisons

differential-privacy

3,191

Google's differential privacy libraries.

Pros of differential-privacy

Broader language support (C++, Go, Java) compared to TensorFlow-specific implementation
More comprehensive set of differential privacy algorithms and tools
Easier integration with non-TensorFlow projects and applications

Cons of differential-privacy

Less tightly integrated with machine learning workflows
May require more manual configuration and setup for ML-specific tasks
Potentially steeper learning curve for those already familiar with TensorFlow

Code Comparison

differential-privacy (C++):

std::unique_ptr<BoundedMean<int>> mean = BoundedMean<int>::Builder()
    .SetEpsilon(1.0)
    .SetLower(0)
    .SetUpper(100)
    .Build()
    .ValueOrDie();

privacy (Python):

dp_mean = tensorflow_privacy.DPMeanGaussianQuery(
    l2_norm_clip=1.0,
    noise_multiplier=1.0,
    denominator=1.0
)

Both repositories provide implementations of differential privacy techniques, but they cater to different use cases and ecosystems. differential-privacy offers a more general-purpose toolkit for various programming languages, while privacy focuses on integrating differential privacy with TensorFlow for machine learning applications. The choice between them depends on the specific project requirements, programming language preferences, and the level of integration needed with TensorFlow or other ML frameworks.

PySyft

9,755

Perform data science on data that remains in someone else's server

Pros of PySyft

Broader focus on privacy-preserving machine learning techniques, including federated learning and secure multi-party computation
More user-friendly API and easier integration with popular ML frameworks like PyTorch
Active community and frequent updates

Cons of PySyft

Less specialized in differential privacy compared to TensorFlow Privacy
May have a steeper learning curve for beginners due to its wider range of features

Code Comparison

PySyft:

import syft as sy
hook = sy.TorchHook(torch)
bob = sy.VirtualWorker(hook, id="bob")
x = torch.tensor([1, 2, 3, 4, 5]).send(bob)

TensorFlow Privacy:

import tensorflow_privacy as tfp
optimizer = tfp.DPKerasSGDOptimizer(
    l2_norm_clip=1.0,
    noise_multiplier=0.1,
    num_microbatches=1,
    learning_rate=0.1
)

Both libraries offer privacy-preserving machine learning capabilities, but PySyft provides a more comprehensive suite of tools for various privacy-preserving techniques, while TensorFlow Privacy focuses primarily on differential privacy within the TensorFlow ecosystem. PySyft's flexibility and broader scope make it suitable for a wider range of privacy-preserving ML applications, but TensorFlow Privacy may be more appropriate for users specifically interested in differential privacy with TensorFlow.

opacus

1,835

Training PyTorch models with differential privacy

Pros of Opacus

More user-friendly API, easier to integrate with existing PyTorch models
Better documentation and tutorials for beginners
Actively maintained with frequent updates and community support

Cons of Opacus

Limited to PyTorch ecosystem, less flexible for other frameworks
Fewer advanced features compared to TensorFlow Privacy
Slightly lower performance in some scenarios due to PyTorch's dynamic graph approach

Code Comparison

Opacus:

from opacus import PrivacyEngine

model = YourModel()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
privacy_engine = PrivacyEngine()
model, optimizer, train_loader = privacy_engine.make_private(
    module=model,
    optimizer=optimizer,
    data_loader=train_loader,
    noise_multiplier=1.0,
    max_grad_norm=1.0,
)

TensorFlow Privacy:

from tensorflow_privacy.privacy.optimizers import dp_optimizer

model = YourModel()
optimizer = dp_optimizer.DPGradientDescentGaussianOptimizer(
    l2_norm_clip=1.0,
    noise_multiplier=1.0,
    num_microbatches=1,
    learning_rate=0.1
)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

SEAL

3,845

Microsoft SEAL is an easy-to-use and powerful homomorphic encryption library.

Pros of SEAL

Focuses specifically on homomorphic encryption, providing a more specialized toolset for privacy-preserving computations
Offers a lower-level implementation, allowing for more fine-grained control and optimization
Supports multiple programming languages, including C++, .NET, and Python

Cons of SEAL

Has a steeper learning curve due to its focus on homomorphic encryption concepts
Lacks built-in machine learning functionality, requiring additional integration for ML tasks
May have lower performance for certain operations compared to TensorFlow Privacy's optimized implementations

Code Comparison

SEAL (C++):

Encryptor encryptor(context, public_key);
Ciphertext encrypted = encryptor.encrypt(plaintext);
evaluator.add_inplace(encrypted, encrypted);

TensorFlow Privacy (Python):

dp_optimizer = DPGradientDescentGaussianOptimizer(
    l2_norm_clip=1.0,
    noise_multiplier=0.1,
    num_microbatches=1,
    learning_rate=0.1
)
model.compile(optimizer=dp_optimizer, loss='categorical_crossentropy')

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

TensorFlow Privacy

This repository contains the source code for TensorFlow Privacy, a Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy. The library comes with tutorials and analysis tools for computing the privacy guarantees provided.

The TensorFlow Privacy library is under continual development, always welcoming contributions. In particular, we always welcome help towards resolving the issues currently open.

Latest Updates

2024-02-14: As of version 0.9.0, the TensorFlow Privacy github repository will be published as two separate PyPI packages. The first will inherit the name tensorflow-privacy and contain the parts related to training of DP models. The second, tensorflow-empirical-privacy, will contain the parts related to testing for empirical privacy.

2023-02-21: A new implementation of efficient per-example gradient clipping is now available for DP keras models consisting only of Dense and Embedding layers. The models use the fast gradient calculation results of this paper. The implementation should allow for doing DP training without any meaningful memory or runtime overhead. It also removes the need for tuning the number of microbatches as it clips the gradient with respect to each example.

Setting up TensorFlow Privacy

Dependencies

This library uses TensorFlow to define machine learning models. Therefore, installing TensorFlow (>= 1.14) is a pre-requisite. You can find instructions here. For better performance, it is also recommended to install TensorFlow with GPU support (detailed instructions on how to do this are available in the TensorFlow installation documentation).

Installing TensorFlow Privacy

If you only want to use TensorFlow Privacy as a library, you can simply execute

pip install tensorflow-privacy

Otherwise, you can clone this GitHub repository into a directory of your choice:

git clone https://github.com/tensorflow/privacy

You can then install the local package in "editable" mode in order to add it to your PYTHONPATH:

cd privacy
pip install -e .

If you'd like to make contributions, we recommend first forking the repository and then cloning your fork rather than cloning this repository directly.

Contributing

Contributions are welcomed! Bug fixes and new features can be initiated through GitHub pull requests. To speed the code review process, we ask that:

When making code contributions to TensorFlow Privacy, you follow the PEP8 with two spaces coding style (the same as the one used by TensorFlow) in your pull requests. In most cases this can be done by running autopep8 -i --indent-size 2 <file> on the files you have edited.
You should also check your code with pylint and TensorFlow's pylint configuration file by running pylint --rcfile=/path/to/the/tf/rcfile <edited file.py>.
When making your first pull request, you sign the Google CLA
We do not accept pull requests that add git submodules because of the problems that arise when maintaining git submodules

Tutorials directory

To help you get started with the functionalities provided by this library, we provide a detailed walkthrough here that will teach you how to wrap existing optimizers (e.g., SGD, Adam, ...) into their differentially private counterparts using TensorFlow (TF) Privacy. You will also learn how to tune the parameters introduced by differentially private optimization and how to measure the privacy guarantees provided using analysis tools included in TF Privacy.

In addition, the tutorials/ folder comes with scripts demonstrating how to use the library features. The list of tutorials is described in the README included in the tutorials directory.

NOTE: the tutorials are maintained carefully. However, they are not considered part of the API and they can change at any time without warning. You should not write 3rd party code that imports the tutorials and expect that the interface will not break.

Research directory

This folder contains code to reproduce results from research papers related to privacy in machine learning. It is not maintained as carefully as the tutorials directory, but rather intended as a convenient archive.

TensorFlow 2.x

TensorFlow Privacy now works with TensorFlow 2! You can use the new Keras-based estimators found in privacy/tensorflow_privacy/privacy/optimizers/dp_optimizer_keras.py.

For this to work with tf.keras.Model and tf.estimator.Estimator, however, you need to install TensorFlow 2.4 or later.

Remarks

The content of this repository supersedes the following existing folder in the tensorflow/models repository

Contacts

If you have any questions that cannot be addressed by raising an issue, feel free to contact:

Galen Andrew (@galenmandrew)
Steve Chien (@schien1729)
Nicolas Papernot (@npapernot)

Copyright

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot