probability

Probabilistic reasoning and statistical analysis in TensorFlow

4,330

1,117

4,330

711

View on GitHub

Top Related Projects

jax

32,065

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

pyro

8,740

Deep universal probabilistic programming with Python and PyTorch

pytorch

88,135

Tensors and Dynamic neural networks in Python with strong GPU acceleration

stan

2,650

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

Quick Overview

TensorFlow Probability (TFP) is a Python library built on TensorFlow that makes it easy to combine probabilistic models and deep learning on modern hardware (TPU, GPU). It's designed for statistical analysis, providing tools for probabilistic reasoning and statistical inference in machine learning models.

Pros

Seamless integration with TensorFlow ecosystem
Support for both static and dynamic computational graphs
Extensive collection of probability distributions and statistical models
Efficient on GPUs and TPUs for large-scale probabilistic computations

Cons

Steep learning curve, especially for those new to probabilistic programming
Documentation can be complex and sometimes lacking for advanced use cases
Frequent updates may lead to compatibility issues with older code
Performance can be slower compared to some specialized probabilistic programming languages

Code Examples

Creating and sampling from a distribution:

import tensorflow_probability as tfp
import tensorflow as tf

# Create a normal distribution
dist = tfp.distributions.Normal(loc=0., scale=1.)

# Sample from the distribution
samples = dist.sample(1000)

# Compute log probability
log_prob = dist.log_prob(samples)

Bayesian linear regression:

import tensorflow_probability as tfp

# Define the model
def bayesian_linear_regression(features, labels):
    w = yield tfp.distributions.Normal(loc=0., scale=1., name='w')
    b = yield tfp.distributions.Normal(loc=0., scale=1., name='b')
    y = yield tfp.distributions.Normal(loc=tf.einsum('ij,j->i', features, w) + b,
                                       scale=0.1,
                                       name='y')
    return y

# Fit the model
model = tfp.distributions.JointDistributionCoroutine(bayesian_linear_regression)

Variational inference:

import tensorflow_probability as tfp

# Define variational families
qw = tfp.distributions.Normal(loc=tf.Variable(tf.zeros([1])),
                              scale=tf.nn.softplus(tf.Variable(tf.zeros([1]))))
qb = tfp.distributions.Normal(loc=tf.Variable(0.),
                              scale=tf.nn.softplus(tf.Variable(0.)))

# Define surrogate posterior
surrogate_posterior = tfp.distributions.JointDistributionNamed({
    'w': qw,
    'b': qb
})

# Fit variational inference
losses = tfp.vi.fit_surrogate_posterior(
    target_log_prob_fn=model.log_prob,
    surrogate_posterior=surrogate_posterior,
    optimizer=tf.optimizers.Adam(learning_rate=0.1),
    num_steps=1000)

Getting Started

To get started with TensorFlow Probability:

Install the library:
```
pip install tensorflow-probability
```

Import the library in your Python script:

import tensorflow_probability as tfp
import tensorflow as tf

Start using TFP's distributions and models:

# Create a normal distribution
dist = tfp.distributions.Normal(loc=0., scale=1.)

# Sample from the distribution
samples = dist.sample(1000)

# Compute statistics
mean = tf.reduce_mean(samples)
variance = tf.math.reduce_variance(samples)

Competitor Comparisons

jax

32,065

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Pros of JAX

Better performance and efficiency, especially on GPUs and TPUs
More flexible and customizable, allowing for easier implementation of custom algorithms
Simpler API with a focus on functional programming principles

Cons of JAX

Smaller ecosystem and fewer pre-built models compared to TensorFlow Probability
Steeper learning curve for those familiar with TensorFlow or PyTorch
Less comprehensive documentation and tutorials

Code Comparison

TensorFlow Probability:

import tensorflow_probability as tfp
import tensorflow as tf

dist = tfp.distributions.Normal(loc=0., scale=1.)
samples = dist.sample(100)
log_prob = dist.log_prob(samples)

JAX:

import jax.numpy as jnp
from jax import random

key = random.PRNGKey(0)
samples = random.normal(key, shape=(100,))
log_prob = jnp.log(jnp.exp(-0.5 * samples**2) / jnp.sqrt(2 * jnp.pi))

Both libraries offer powerful probabilistic programming capabilities, but JAX provides more flexibility and potentially better performance at the cost of a smaller ecosystem and steeper learning curve. TensorFlow Probability may be more suitable for those already familiar with TensorFlow or seeking a more comprehensive set of pre-built models and tools.

pyro

8,740

Deep universal probabilistic programming with Python and PyTorch

Pros of Pyro

Built on PyTorch, offering dynamic computation graphs and easier debugging
More flexible and expressive for complex probabilistic models
Strong support for variational inference and MCMC techniques

Cons of Pyro

Smaller community and ecosystem compared to TensorFlow Probability
Less comprehensive documentation and tutorials
May have slower performance for some large-scale applications

Code Comparison

Pyro:

import pyro
import torch

def model(data):
    loc = pyro.param("loc", torch.tensor(0.0))
    scale = pyro.param("scale", torch.tensor(1.0))
    return pyro.sample("obs", pyro.distributions.Normal(loc, scale), obs=data)

TensorFlow Probability:

import tensorflow_probability as tfp
import tensorflow as tf

def model(data):
    loc = tf.Variable(0.0, name="loc")
    scale = tf.Variable(1.0, name="scale")
    return tfp.distributions.Normal(loc, scale).log_prob(data)

Both libraries offer powerful probabilistic programming capabilities, but Pyro excels in flexibility and ease of use for complex models, while TensorFlow Probability benefits from a larger ecosystem and potentially better performance for certain tasks. The choice between them often depends on specific project requirements and personal preferences.

pytorch

88,135

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

More intuitive and Pythonic API, easier for beginners to learn
Dynamic computational graphs allow for more flexible model architectures
Better support for debugging and easier to integrate with Python tools

Cons of PyTorch

Smaller ecosystem compared to TensorFlow, fewer pre-built models and tools
Less comprehensive support for deployment in production environments
Limited support for distributed training on multiple GPUs/machines

Code Comparison

PyTorch example:

import torch

x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = torch.add(x, y)

TensorFlow Probability example:

import tensorflow as tf
import tensorflow_probability as tfp

x = tf.constant([1, 2, 3])
y = tf.constant([4, 5, 6])
z = tfp.math.add(x, y)

Both examples demonstrate basic tensor operations, but PyTorch's syntax is generally considered more straightforward and Pythonic. TensorFlow Probability, being built on top of TensorFlow, inherits its more verbose syntax but offers additional probabilistic modeling capabilities not shown in this simple example.

stan

2,650

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.

Pros of Stan

More focused on Bayesian inference and probabilistic modeling
Provides a domain-specific language for statistical modeling
Offers advanced MCMC sampling techniques like Hamiltonian Monte Carlo

Cons of Stan

Steeper learning curve for users not familiar with probabilistic programming
Less integration with deep learning frameworks
Smaller ecosystem compared to TensorFlow Probability

Code Comparison

Stan:

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
model {
  y ~ normal(alpha + beta * x, sigma);
}

TensorFlow Probability:

import tensorflow_probability as tfp

def model(x, y):
    alpha = tfp.distributions.Normal(0., 1.)
    beta = tfp.distributions.Normal(0., 1.)
    sigma = tfp.distributions.HalfNormal(1.)
    
    return tfp.distributions.Normal(alpha + beta * x, sigma).log_prob(y)

Stan uses its own domain-specific language for model specification, while TensorFlow Probability leverages Python and TensorFlow's ecosystem. Stan's syntax is more declarative, whereas TensorFlow Probability offers a more programmatic approach to model building.

dowhy

7,424

Pros of DoWhy

Focused specifically on causal inference and effect estimation
More accessible for users without deep machine learning expertise
Provides a unified framework for causal inference across various methods

Cons of DoWhy

Smaller community and fewer contributors compared to TensorFlow Probability
Less integration with broader machine learning ecosystems
More limited in scope, focusing primarily on causal inference

Code Comparison

DoWhy:

import dowhy
from dowhy import CausalModel

model = CausalModel(
    data=data,
    treatment=treatment,
    outcome=outcome,
    graph=graph
)

TensorFlow Probability:

import tensorflow_probability as tfp

distribution = tfp.distributions.Normal(loc=0., scale=1.)
samples = distribution.sample(100)

DoWhy is designed for causal inference tasks, with a focus on estimating causal effects. It provides a high-level API for specifying causal models and performing various causal inference methods.

TensorFlow Probability, on the other hand, is a more comprehensive library for probabilistic reasoning and statistical modeling. It offers a wide range of probability distributions, statistical models, and inference algorithms, integrated with the TensorFlow ecosystem.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

TensorFlow Probability

TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFlow ecosystem, TensorFlow Probability provides integration of probabilistic methods with deep networks, gradient-based inference via automatic differentiation, and scalability to large datasets and models via hardware acceleration (e.g., GPUs) and distributed computation.

TFP also works as "Tensor-friendly Probability" in pure JAX!: from tensorflow_probability.substrates import jax as tfp -- Learn more here.

Our probabilistic machine learning tools are structured as follows.

Layer 0: TensorFlow. Numerical operations. In particular, the LinearOperator class enables matrix-free implementations that can exploit special structure (diagonal, low-rank, etc.) for efficient computation. It is built and maintained by the TensorFlow Probability team and is now part of tf.linalg in core TF.

Layer 1: Statistical Building Blocks

Distributions (tfp.distributions): A large collection of probability distributions and related statistics with batch and broadcasting semantics. See the Distributions Tutorial.
Bijectors (tfp.bijectors): Reversible and composable transformations of random variables. Bijectors provide a rich class of transformed distributions, from classical examples like the log-normal distribution to sophisticated deep learning models such as masked autoregressive flows.

Layer 2: Model Building

Joint Distributions (e.g., tfp.distributions.JointDistributionSequential): Joint distributions over one or more possibly-interdependent distributions. For an introduction to modeling with TFP's JointDistributions, check out this colab
Probabilistic Layers (tfp.layers): Neural network layers with uncertainty over the functions they represent, extending TensorFlow Layers.

Layer 3: Probabilistic Inference

Markov chain Monte Carlo (tfp.mcmc): Algorithms for approximating integrals via sampling. Includes Hamiltonian Monte Carlo, random-walk Metropolis-Hastings, and the ability to build custom transition kernels.
Variational Inference (tfp.vi): Algorithms for approximating integrals via optimization.
Optimizers (tfp.optimizer): Stochastic optimization methods, extending TensorFlow Optimizers. Includes Stochastic Gradient Langevin Dynamics.
Monte Carlo (tfp.monte_carlo): Tools for computing Monte Carlo expectations.

TensorFlow Probability is under active development. Interfaces may change at any time.

Examples

See tensorflow_probability/examples/ for end-to-end examples. It includes tutorial notebooks such as:

Linear Mixed Effects Models. A hierarchical linear model for sharing statistical strength across examples.
Eight Schools. A hierarchical normal model for exchangeable treatment effects.
Hierarchical Linear Models. Hierarchical linear models compared among TensorFlow Probability, R, and Stan.
Bayesian Gaussian Mixture Models. Clustering with a probabilistic generative model.
Probabilistic Principal Components Analysis. Dimensionality reduction with latent variables.
Gaussian Copulas. Probability distributions for capturing dependence across random variables.
TensorFlow Distributions: A Gentle Introduction. Introduction to TensorFlow Distributions.
Understanding TensorFlow Distributions Shapes. How to distinguish between samples, batches, and events for arbitrarily shaped probabilistic computations.
TensorFlow Probability Case Study: Covariance Estimation. A user's case study in applying TensorFlow Probability to estimate covariances.

It also includes example scripts such as:

Representation learning with a latent code and variational inference.

Vector-Quantized Autoencoder. Discrete representation learning with vector quantization.
Disentangled Sequential Variational Autoencoder Disentangled representation learning over sequences with variational inference.
Bayesian Neural Networks. Neural networks with uncertainty over their weights.
Bayesian Logistic Regression. Bayesian inference for binary classification.

Installation

For additional details on installing TensorFlow, guidance installing prerequisites, and (optionally) setting up virtual environments, see the TensorFlow installation guide.

Stable Builds

To install the latest stable version, run the following:

# Notes:

# - The `--upgrade` flag ensures you'll get the latest version.
# - The `--user` flag ensures the packages are installed to your user directory
#   rather than the system directory.
# - TensorFlow 2 packages require a pip >= 19.0
python -m pip install --upgrade --user pip
python -m pip install --upgrade --user tensorflow tensorflow_probability

For CPU-only usage (and a smaller install), install with tensorflow-cpu.

To use a pre-2.0 version of TensorFlow, run:

python -m pip install --upgrade --user "tensorflow<2" "tensorflow_probability<0.9"

Note: Since TensorFlow is not included as a dependency of the TensorFlow Probability package (in setup.py), you must explicitly install the TensorFlow package (tensorflow or tensorflow-cpu). This allows us to maintain one package instead of separate packages for CPU and GPU-enabled TensorFlow. See the TFP release notes for more details about dependencies between TensorFlow and TensorFlow Probability.

Nightly Builds

There are also nightly builds of TensorFlow Probability under the pip package tfp-nightly, which depends on one of tf-nightly or tf-nightly-cpu. Nightly builds include newer features, but may be less stable than the versioned releases. Both stable and nightly docs are available here.

python -m pip install --upgrade --user tf-nightly tfp-nightly

Installing from Source

You can also install from source. This requires the Bazel build system. It is highly recommended that you install the nightly build of TensorFlow (tf-nightly) before trying to build TensorFlow Probability from source. The most recent version of Bazel that TFP currently supports is 6.4.0; support for 7.0.0+ is WIP.

# sudo apt-get install bazel git python-pip  # Ubuntu; others, see above links.
python -m pip install --upgrade --user tf-nightly
git clone https://github.com/tensorflow/probability.git
cd probability
bazel build --copt=-O3 --copt=-march=native :pip_pkg
PKGDIR=$(mktemp -d)
./bazel-bin/pip_pkg $PKGDIR
python -m pip install --upgrade --user $PKGDIR/*.whl

Community

As part of TensorFlow, we're committed to fostering an open and welcoming environment.

Stack Overflow: Ask or answer technical questions.
GitHub: Report bugs or make feature requests.
TensorFlow Blog: Stay up to date on content from the TensorFlow team and best articles from the community.
Youtube Channel: Follow TensorFlow shows.
tfprobability@tensorflow.org: Open mailing list for discussion and questions.

See the TensorFlow Community page for more details. Check out our latest publicity here:

Contributing

We're eager to collaborate with you! See CONTRIBUTING.md for a guide on how to contribute. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

References

If you use TensorFlow Probability in a paper, please cite:

TensorFlow Distributions. Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A. Saurous. arXiv preprint arXiv:1711.10604, 2017.

(We're aware there's a lot more to TensorFlow Probability than Distributions, but the Distributions paper lays out our vision and is a fine thing to cite for now.)

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot