Top Related Projects
Quick Overview
PySyft is an open-source library for secure and private machine learning. It enables data scientists and researchers to train AI models on encrypted data that they never see, preserving privacy and confidentiality. PySyft implements privacy-preserving techniques like federated learning, differential privacy, and encrypted computation.
Pros
- Enables privacy-preserving machine learning on sensitive data
- Supports multiple privacy-enhancing technologies in one framework
- Integrates with popular machine learning libraries like PyTorch and TensorFlow
- Active community and ongoing development
Cons
- Steep learning curve for newcomers to privacy-preserving ML
- Performance overhead compared to traditional ML approaches
- Limited support for certain advanced ML models and techniques
- Requires careful implementation to avoid privacy leaks
Code Examples
- Creating a virtual worker for secure computations:
import syft as sy
hook = sy.TorchHook(torch)
bob = sy.VirtualWorker(hook, id="bob")
- Encrypting and sending data to a virtual worker:
import torch
x = torch.tensor([1, 2, 3, 4, 5])
x_encrypted = x.encrypt(protocol="fv", public_key=bob.public_key)
x_encrypted_pointer = x_encrypted.send(bob)
- Performing computations on encrypted data:
y = torch.tensor([2, 3, 4, 5, 6])
y_encrypted = y.encrypt(protocol="fv", public_key=bob.public_key)
y_encrypted_pointer = y_encrypted.send(bob)
result = x_encrypted_pointer + y_encrypted_pointer
decrypted_result = result.get().decrypt()
Getting Started
To get started with PySyft:
- Install PySyft:
pip install syft
- Import the library and create a virtual worker:
import syft as sy
import torch
hook = sy.TorchHook(torch)
alice = sy.VirtualWorker(hook, id="alice")
- Perform secure computations:
x = torch.tensor([1, 2, 3, 4, 5]).send(alice)
y = torch.tensor([2, 3, 4, 5, 6]).send(alice)
z = x + y
result = z.get()
print(result)
Competitor Comparisons
Training PyTorch models with differential privacy
Pros of Opacus
- Simpler implementation focused specifically on differential privacy for PyTorch
- Tighter integration with PyTorch, potentially offering better performance
- More active development and frequent updates
Cons of Opacus
- Limited to differential privacy, while PySyft offers a broader range of privacy-preserving techniques
- Less flexibility for advanced use cases or custom privacy implementations
- Smaller community and ecosystem compared to PySyft
Code Comparison
PySyft example:
import syft as sy
hook = sy.TorchHook(torch)
bob = sy.VirtualWorker(hook, id="bob")
x = torch.tensor([1, 2, 3, 4, 5]).send(bob)
y = x + x
Opacus example:
from opacus import PrivacyEngine
model = YourModel()
optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
privacy_engine = PrivacyEngine()
model, optimizer, train_loader = privacy_engine.make_private(
module=model, optimizer=optimizer, data_loader=train_loader, noise_multiplier=1.1, max_grad_norm=1.0
)
Library for training machine learning models with privacy for training data
Pros of TensorFlow Privacy
- Tightly integrated with TensorFlow ecosystem
- Focused specifically on differential privacy for machine learning
- Backed by Google, with potential for long-term support and development
Cons of TensorFlow Privacy
- Limited to TensorFlow framework
- Narrower scope compared to PySyft's broader federated learning capabilities
- Less emphasis on multi-party computation and secure aggregation
Code Comparison
PySyft example:
import syft as sy
hook = sy.TorchHook(torch)
bob = sy.VirtualWorker(hook, id="bob")
x = torch.tensor([1, 2, 3, 4, 5]).send(bob)
y = x + x
TensorFlow Privacy example:
import tensorflow_privacy as tfp
optimizer = tfp.DPKerasSGDOptimizer(
l2_norm_clip=1.0,
noise_multiplier=0.1,
num_microbatches=1,
learning_rate=0.1
)
PySyft offers a more general-purpose approach to federated learning and privacy-preserving computations, while TensorFlow Privacy focuses specifically on differential privacy within the TensorFlow ecosystem. PySyft's code emphasizes data ownership and remote execution, whereas TensorFlow Privacy's code centers around privacy-preserving optimizers and model training.
A framework for Privacy Preserving Machine Learning
Pros of CrypTen
- Focused specifically on secure multi-party computation (MPC) and homomorphic encryption (HE)
- Tighter integration with PyTorch, allowing for easier adoption by existing PyTorch users
- More extensive documentation and tutorials for getting started with privacy-preserving machine learning
Cons of CrypTen
- Limited to MPC and HE, while PySyft offers a broader range of privacy-enhancing technologies
- Smaller community and ecosystem compared to PySyft
- Less flexibility in terms of deployment options and federated learning scenarios
Code Comparison
PySyft example:
import syft as sy
hook = sy.TorchHook(torch)
bob = sy.VirtualWorker(hook, id="bob")
x = torch.tensor([1, 2, 3, 4, 5]).send(bob)
CrypTen example:
import crypten
crypten.init()
x = crypten.cryptensor([1, 2, 3, 4, 5])
Both libraries aim to provide privacy-preserving machine learning capabilities, but they differ in their approach and focus. PySyft offers a more comprehensive suite of privacy-enhancing technologies, while CrypTen specializes in MPC and HE with tighter PyTorch integration.
Microsoft SEAL is an easy-to-use and powerful homomorphic encryption library.
Pros of SEAL
- More mature and stable implementation of homomorphic encryption
- Offers lower-level control and flexibility for advanced users
- Supports multiple programming languages through C++ core
Cons of SEAL
- Steeper learning curve due to lower-level implementation
- Less focus on privacy-preserving machine learning workflows
- Requires more manual setup and configuration
Code Comparison
PySyft example (federated learning):
import syft as sy
alice = sy.VirtualWorker(hook, id="alice")
bob = sy.VirtualWorker(hook, id="bob")
data = torch.tensor([1, 2, 3, 4]).send(alice)
model = nn.Linear(1, 1).send(alice)
SEAL example (homomorphic encryption):
#include "seal/seal.h"
using namespace seal;
EncryptionParameters parms(scheme_type::bfv);
parms.set_poly_modulus_degree(4096);
auto context = SEALContext::Create(parms);
PySyft focuses on high-level privacy-preserving machine learning workflows, while SEAL provides low-level homomorphic encryption primitives. PySyft offers a more user-friendly interface for federated learning and differential privacy, whereas SEAL gives users fine-grained control over encryption parameters and operations. The choice between them depends on the specific use case and required level of abstraction.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Data Science on data you are not allowed to see
PySyft enables a new way to do data science, where you can use non-public information, without seeing nor obtaining a copy of the data itself. All you need is to connect to a Datasite!
Datasites are like websites, but for data. Designed with the principles of structured transparency, they enable data owners to control how their data is protected and data scientists to use data without obtaining a copy.
PySyft supports any statistical analysis or machine learning, offering support for directly running Python code - even using third-party Python libraries.
Supported on:
â Linux â macOS â Windows â Docker â Kubernetes
Quickstart
Try out your first query against a live demo Datasite!
Install Client
pip install -U "syft[data_science]"
More instructions are available here.
Launch Server
Launch a development server directly in your Jupyter Notebook:
import syft as sy
sy.requires(">=0.9.1,<0.9.2")
server = sy.orchestra.launch(
name="my-datasite",
port=8080,
create_producer=True,
n_consumers=1,
dev_mode=False,
reset=True, # resets database
)
or from the command line:
$ syft launch --name=my-datasite --port=8080 --reset=True
Starting syft-datasite server on 0.0.0.0:8080
Datasite servers can be deployed as a single container using Docker or directly in Kubernetes. Check out our deployment guide.
Launch Client
Main way to use a Datasite is via our Syft client, in a Jupyter Notebook. Check out our PySyft client guide:
import syft as sy
sy.requires(">=0.9.1,<0.9.2")
datasite_client = sy.login(
port=8080,
email="info@openmined.org",
password="changethis"
)
PySyft - Getting started ð
Learn about PySyft via our getting started guide:
- PySyft from the ground up
- Part 1: Datasets & Assets
- Part 2: Client and Datasite Access
- Part 3: Propose the research study
- Part 4: Review Code Requests
- Part 5: Retrieving Results
PySyft In-depth
ð Check out our docs website.
Quick PySyft components links:
Why use PySyft?
In a variety of domains across society, data owners have valid concerns about the risks associated with sharing their data, such as legal risks, privacy invasion (misuing the data), or intellectual property (copying and redistributing it).
Datasites enable data scientists to answer questions without even seeing or acquiring a copy of the data, within the data owners's definition of acceptable use. We call this process Remote Data Science.
This means that the current risks of sharing information with someone will no longer prevent the vast benefits such as innovation, insights and scientific discovery. With each Datasite, data owners are able to enable 1000x more accesible data
in each scientific field and lead, together with data scientists, breakthrough innovation.
Learn more about our work on our website.
Support
For questions about PySyft, reach out via #support
on Slack.
Syft Versions
:exclamation: PySyft and Syft Server must use the same version
.
Latest Stable
0.9.1
(Stable) - Docs- Install PySyft (Stable):
pip install -U syft
Latest Beta
0.9.2
(Beta) -dev
branch ðð½- Install PySyft (Beta):
pip install -U syft --pre
Find more about previous releases here.
Community
Supported by the OpenMined Foundation, the OpenMined Community is an online network of over 17,000 technologists, researchers, and industry professionals keen to unlock 1000x more data in every scientific field and industry.
Courses
Contributors
OpenMined and Syft appreciates all contributors, if you would like to fix a bug or suggest a new feature, please reach out via Github or Slack!
About OpenMined
OpenMined is a non-profit foundation creating technology infrastructure that helps researchers get answers from data without needing a copy or direct access. Our community of technologists is building Syft.
Supporters
License
Apache License 2.0
Person icons created by Freepik - Flaticon
Top Related Projects
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot