pyprobml

Python code for "Probabilistic Machine learning" book by Kevin Murphy

6,737

1,559

6,737

View on GitHub

Top Related Projects

pymc

8,983

Bayesian Modeling and Probabilistic Programming in Python

pyro

8,740

Deep universal probabilistic programming with Python and PyTorch

probability

4,330

Probabilistic reasoning and statistical analysis in TensorFlow

stan

2,650

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.

pgmpy

2,880

Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.

Quick Overview

The pyprobml repository is a collection of Python code for the book "Machine Learning: A Probabilistic Perspective" by Kevin Murphy. It contains implementations of various machine learning algorithms, utilities, and examples to support the concepts discussed in the book.

Pros

Comprehensive collection of machine learning algorithms and techniques
Well-organized codebase with clear structure and documentation
Provides practical implementations of concepts from a popular ML textbook
Regularly updated with new examples and improvements

Cons

May require some background knowledge in machine learning to fully utilize
Some implementations might not be optimized for production use
Dependency management can be challenging due to the wide range of libraries used
Limited community support compared to more established ML libraries

Code Examples

Loading and preprocessing data:

import numpy as np
from pyprobml import util

X, y = util.load_iris_data()
X_train, X_test, y_train, y_test = util.train_test_split(X, y, test_size=0.2, random_state=42)

Training a simple logistic regression model:

from pyprobml.sklearn import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Plotting results:

import matplotlib.pyplot as plt
from pyprobml import plotting

plt.figure(figsize=(10, 6))
plotting.plot_classifier_decision_boundary(model, X, y)
plt.title("Logistic Regression Decision Boundary")
plt.show()

Getting Started

To get started with pyprobml, follow these steps:

Clone the repository:

git clone https://github.com/probml/pyprobml.git

Install the required dependencies:
```
pip install -r requirements.txt
```

Import and use the desired modules in your Python script:

from pyprobml import util, plotting
from pyprobml.sklearn import LogisticRegression

# Your code here

Explore the examples in the repository to learn how to use different algorithms and utilities provided by pyprobml.

Competitor Comparisons

pymc

8,983

Bayesian Modeling and Probabilistic Programming in Python

Pros of PyMC

More mature and widely-used probabilistic programming framework
Extensive documentation and community support
Seamless integration with NumPy and Theano for efficient computations

Cons of PyMC

Steeper learning curve for beginners
Less focus on machine learning applications compared to PyProbML

Code Comparison

PyMC example:

import pymc as pm

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sigma=1)
    obs = pm.Normal('obs', mu=mu, sigma=1, observed=[0, 1, 2])
    trace = pm.sample(1000)

PyProbML example:

import pyprobml as pml
import numpy as np

X = np.array([0, 1, 2])
model = pml.GaussianModel(X)
posterior = model.fit()

PyMC offers a more explicit probabilistic modeling approach, while PyProbML provides a higher-level interface for common machine learning tasks. PyMC's syntax is more verbose but allows for greater flexibility in model specification. PyProbML's code is more concise and focuses on ease of use for typical machine learning scenarios.

pyro

8,740

Deep universal probabilistic programming with Python and PyTorch

Pros of Pyro

More mature and widely adopted probabilistic programming framework
Extensive documentation and tutorials for easier learning
Seamless integration with PyTorch for deep probabilistic models

Cons of Pyro

Steeper learning curve for beginners in probabilistic programming
Less focus on educational content compared to PyProbML

Code Comparison

PyProbML example (simple Gaussian model):

import jax.numpy as jnp
from jax import random
import numpyro
import numpyro.distributions as dist

def model(x):
    mu = numpyro.sample('mu', dist.Normal(0, 1))
    numpyro.sample('obs', dist.Normal(mu, 1), obs=x)

Pyro example (simple Gaussian model):

import torch
import pyro
import pyro.distributions as dist

def model(x):
    mu = pyro.sample('mu', dist.Normal(0, 1))
    pyro.sample('obs', dist.Normal(mu, 1), obs=x)

Both repositories provide implementations for probabilistic machine learning. PyProbML focuses more on educational content and examples, while Pyro offers a more comprehensive framework for building and deploying probabilistic models. Pyro's integration with PyTorch makes it particularly suitable for deep probabilistic models, whereas PyProbML uses JAX and NumPyro for its implementations.

probability

4,330

Probabilistic reasoning and statistical analysis in TensorFlow

Pros of TensorFlow Probability

Extensive library with a wide range of probabilistic models and tools
Seamless integration with TensorFlow ecosystem
Well-documented with comprehensive API references

Cons of TensorFlow Probability

Steeper learning curve for beginners
Heavier dependency on TensorFlow framework
May be overkill for simpler probabilistic modeling tasks

Code Comparison

TensorFlow Probability:

import tensorflow_probability as tfp
tfd = tfp.distributions

# Create a normal distribution
normal = tfd.Normal(loc=0., scale=1.)

# Sample from the distribution
samples = normal.sample(100)

PyProbML:

import numpy as np
from scipy import stats

# Create a normal distribution
normal = stats.norm(loc=0, scale=1)

# Sample from the distribution
samples = normal.rvs(100)

TensorFlow Probability offers more advanced features and integration with TensorFlow, while PyProbML provides a simpler interface for basic probabilistic modeling tasks.

stan

2,650

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.

Pros of Stan

More mature and widely adopted probabilistic programming language
Highly optimized C++ backend for efficient sampling and inference
Extensive documentation and community support

Cons of Stan

Steeper learning curve, especially for those new to probabilistic programming
Less flexibility in model specification compared to PyProbML's Python-based approach

Code Comparison

Stan:

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
model {
  y ~ normal(alpha + beta * x, sigma);
}

PyProbML:

import pymc3 as pm

with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10)
    sigma = pm.HalfNormal('sigma', sd=1)
    y = pm.Normal('y', mu=alpha + beta * x, sd=sigma, observed=y_data)

Stan uses its own domain-specific language, while PyProbML leverages Python's syntax and existing libraries like PyMC3. Stan's approach may be more efficient for complex models, but PyProbML offers greater flexibility and easier integration with Python ecosystems.

pgmpy

2,880

Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.

Pros of pgmpy

More comprehensive library for probabilistic graphical models
Better documentation and API reference
Larger community and more frequent updates

Cons of pgmpy

Steeper learning curve for beginners
Less focus on modern machine learning techniques
More complex installation process

Code Comparison

pgmpy example:

from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD

model = BayesianNetwork([('A', 'B'), ('B', 'C')])
cpd_a = TabularCPD('A', 2, [[0.6], [0.4]])
cpd_b = TabularCPD('B', 2, [[0.7, 0.3], [0.3, 0.7]], evidence=['A'], evidence_card=[2])
model.add_cpds(cpd_a, cpd_b)

pyprobml example:

import pyprobml_utils as pml
import numpy as np

X = np.random.randn(100, 2)
y = np.random.randint(0, 2, 100)
pml.plot_classifier_boundaries(X, y)

Both libraries offer unique features and cater to different aspects of probabilistic machine learning. pgmpy is more focused on graphical models, while pyprobml provides a broader range of machine learning utilities and visualizations.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

pyprobml

Python 3 code to reproduce the figures in the books Probabilistic Machine Learning: An Introduction (aka "book 1") and Probabilistic Machine Learning: Advanced Topics (aka "book 2"). The code uses the standard Python libraries, such as numpy, scipy, matplotlib, sklearn, etc. Some of the code (especially in book 2) also uses JAX, and in some parts of book 1, we also use Tensorflow 2 and a little bit of Torch. See also probml-utils for some utility code that is shared across multiple notebooks.

For the latest status of the code, see Book 1 dashboard and Book 2 dashboard. As of September 2022, this code is now in maintenance mode.

Running the notebooks

The notebooks needed to make all the figures are available at the following locations.

Running notebooks in colab

Colab has most of the libraries you will need (e.g., scikit-learn, JAX) pre-installed, and gives you access to a free GPU and TPU. We have a created a colab intro notebook with more details. To run the notebooks on colab in any browser, you can go to a particular notebook on GitHub and change the domain from github.com to githubtocolab.com as suggested here. If you are using Google Chrome browser, you can use "Open in Colab" Chrome extension to do the same with a single click.

Running the notebooks locally

We assume you have already installed JAX and Tensorflow and Torch, since the details on how to do this depend on whether you have a CPU, GPU, etc.

You can use any of the following options to install the other requirements.

Option 1

pip install -r https://raw.githubusercontent.com/probml/pyprobml/master/requirements.txt

Option 2

Download requirements.txt locally to your path and run

pip install -r requirements.txt

Option 3

Run the following. (Note the --depth 1 prevents installing the whole history, which is very large).

git clone --depth 1 https://github.com/probml/pyprobml.git

Then install manually.

If you want to save the figures, you first need to execute something like this

#export FIG_DIR="/teamspace/studios/this_studio/figures"

import os
os.environ["FIG_DIR"] = "/teamspace/studios/this_studio/pyprobml/notebooks/figures"
os.environ["DUAL_SAVE"] = "1" # both pdf and png

This is used by the savefig function to store pdf files.

Cloud computing

When you want more power or control than colab gives you, I recommend you use https://lightning.ai/docs/overview/studios, which makes it very easy to develop using VScode, running on a VM accessed from your web browser; you can then launch on one or more GPUs when needed with a single button click. Alternatively, if you are a power user, you can try Google Cloud Platform, which supports GPUs and TPUs; see this short tutorial on Colab, GCP and TPUs.

How to contribute

See this guide for how to contribute code. Please follow these guidelines to contribute new notebooks to the notebooks directory.

Metrics

GSOC

For a summary of some of the contributions to this codebase during Google Summer of Code (GSOC), see these links: 2021 and 2022.

Acknowledgements

For a list of contributors, see this list.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot