Convert Figma logo to code with AI

probml logopyprobml

Python code for "Probabilistic Machine learning" book by Kevin Murphy

6,450
1,517
6,450
32

Top Related Projects

8,623

Bayesian Modeling and Probabilistic Programming in Python

8,493

Deep universal probabilistic programming with Python and PyTorch

Probabilistic reasoning and statistical analysis in TensorFlow

2,570

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.

2,699

Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.

Quick Overview

The pyprobml repository is a collection of Python code for the book "Machine Learning: A Probabilistic Perspective" by Kevin Murphy. It contains implementations of various machine learning algorithms, utilities, and examples to support the concepts discussed in the book.

Pros

  • Comprehensive collection of machine learning algorithms and techniques
  • Well-organized codebase with clear structure and documentation
  • Provides practical implementations of concepts from a popular ML textbook
  • Regularly updated with new examples and improvements

Cons

  • May require some background knowledge in machine learning to fully utilize
  • Some implementations might not be optimized for production use
  • Dependency management can be challenging due to the wide range of libraries used
  • Limited community support compared to more established ML libraries

Code Examples

  1. Loading and preprocessing data:
import numpy as np
from pyprobml import util

X, y = util.load_iris_data()
X_train, X_test, y_train, y_test = util.train_test_split(X, y, test_size=0.2, random_state=42)
  1. Training a simple logistic regression model:
from pyprobml.sklearn import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
  1. Plotting results:
import matplotlib.pyplot as plt
from pyprobml import plotting

plt.figure(figsize=(10, 6))
plotting.plot_classifier_decision_boundary(model, X, y)
plt.title("Logistic Regression Decision Boundary")
plt.show()

Getting Started

To get started with pyprobml, follow these steps:

  1. Clone the repository:

    git clone https://github.com/probml/pyprobml.git
    
  2. Install the required dependencies:

    pip install -r requirements.txt
    
  3. Import and use the desired modules in your Python script:

    from pyprobml import util, plotting
    from pyprobml.sklearn import LogisticRegression
    
    # Your code here
    
  4. Explore the examples in the repository to learn how to use different algorithms and utilities provided by pyprobml.

Competitor Comparisons

8,623

Bayesian Modeling and Probabilistic Programming in Python

Pros of PyMC

  • More mature and widely-used probabilistic programming framework
  • Extensive documentation and community support
  • Seamless integration with NumPy and Theano for efficient computations

Cons of PyMC

  • Steeper learning curve for beginners
  • Less focus on machine learning applications compared to PyProbML

Code Comparison

PyMC example:

import pymc as pm

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sigma=1)
    obs = pm.Normal('obs', mu=mu, sigma=1, observed=[0, 1, 2])
    trace = pm.sample(1000)

PyProbML example:

import pyprobml as pml
import numpy as np

X = np.array([0, 1, 2])
model = pml.GaussianModel(X)
posterior = model.fit()

PyMC offers a more explicit probabilistic modeling approach, while PyProbML provides a higher-level interface for common machine learning tasks. PyMC's syntax is more verbose but allows for greater flexibility in model specification. PyProbML's code is more concise and focuses on ease of use for typical machine learning scenarios.

8,493

Deep universal probabilistic programming with Python and PyTorch

Pros of Pyro

  • More mature and widely adopted probabilistic programming framework
  • Extensive documentation and tutorials for easier learning
  • Seamless integration with PyTorch for deep probabilistic models

Cons of Pyro

  • Steeper learning curve for beginners in probabilistic programming
  • Less focus on educational content compared to PyProbML

Code Comparison

PyProbML example (simple Gaussian model):

import jax.numpy as jnp
from jax import random
import numpyro
import numpyro.distributions as dist

def model(x):
    mu = numpyro.sample('mu', dist.Normal(0, 1))
    numpyro.sample('obs', dist.Normal(mu, 1), obs=x)

Pyro example (simple Gaussian model):

import torch
import pyro
import pyro.distributions as dist

def model(x):
    mu = pyro.sample('mu', dist.Normal(0, 1))
    pyro.sample('obs', dist.Normal(mu, 1), obs=x)

Both repositories provide implementations for probabilistic machine learning. PyProbML focuses more on educational content and examples, while Pyro offers a more comprehensive framework for building and deploying probabilistic models. Pyro's integration with PyTorch makes it particularly suitable for deep probabilistic models, whereas PyProbML uses JAX and NumPyro for its implementations.

Probabilistic reasoning and statistical analysis in TensorFlow

Pros of TensorFlow Probability

  • Extensive library with a wide range of probabilistic models and tools
  • Seamless integration with TensorFlow ecosystem
  • Well-documented with comprehensive API references

Cons of TensorFlow Probability

  • Steeper learning curve for beginners
  • Heavier dependency on TensorFlow framework
  • May be overkill for simpler probabilistic modeling tasks

Code Comparison

TensorFlow Probability:

import tensorflow_probability as tfp
tfd = tfp.distributions

# Create a normal distribution
normal = tfd.Normal(loc=0., scale=1.)

# Sample from the distribution
samples = normal.sample(100)

PyProbML:

import numpy as np
from scipy import stats

# Create a normal distribution
normal = stats.norm(loc=0, scale=1)

# Sample from the distribution
samples = normal.rvs(100)

TensorFlow Probability offers more advanced features and integration with TensorFlow, while PyProbML provides a simpler interface for basic probabilistic modeling tasks.

2,570

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.

Pros of Stan

  • More mature and widely adopted probabilistic programming language
  • Highly optimized C++ backend for efficient sampling and inference
  • Extensive documentation and community support

Cons of Stan

  • Steeper learning curve, especially for those new to probabilistic programming
  • Less flexibility in model specification compared to PyProbML's Python-based approach

Code Comparison

Stan:

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
model {
  y ~ normal(alpha + beta * x, sigma);
}

PyProbML:

import pymc3 as pm

with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10)
    sigma = pm.HalfNormal('sigma', sd=1)
    y = pm.Normal('y', mu=alpha + beta * x, sd=sigma, observed=y_data)

Stan uses its own domain-specific language, while PyProbML leverages Python's syntax and existing libraries like PyMC3. Stan's approach may be more efficient for complex models, but PyProbML offers greater flexibility and easier integration with Python ecosystems.

2,699

Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.

Pros of pgmpy

  • More comprehensive library for probabilistic graphical models
  • Better documentation and API reference
  • Larger community and more frequent updates

Cons of pgmpy

  • Steeper learning curve for beginners
  • Less focus on modern machine learning techniques
  • More complex installation process

Code Comparison

pgmpy example:

from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD

model = BayesianNetwork([('A', 'B'), ('B', 'C')])
cpd_a = TabularCPD('A', 2, [[0.6], [0.4]])
cpd_b = TabularCPD('B', 2, [[0.7, 0.3], [0.3, 0.7]], evidence=['A'], evidence_card=[2])
model.add_cpds(cpd_a, cpd_b)

pyprobml example:

import pyprobml_utils as pml
import numpy as np

X = np.random.randn(100, 2)
y = np.random.randint(0, 2, 100)
pml.plot_classifier_boundaries(X, y)

Both libraries offer unique features and cater to different aspects of probabilistic machine learning. pgmpy is more focused on graphical models, while pyprobml provides a broader range of machine learning utilities and visualizations.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

pyprobml

Python 3 code to reproduce the figures in the books Probabilistic Machine Learning: An Introduction (aka "book 1") and Probabilistic Machine Learning: Advanced Topics (aka "book 2"). The code uses the standard Python libraries, such as numpy, scipy, matplotlib, sklearn, etc. Some of the code (especially in book 2) also uses JAX, and in some parts of book 1, we also use Tensorflow 2 and a little bit of Torch. See also probml-utils for some utility code that is shared across multiple notebooks.

For the latest status of the code, see Book 1 dashboard and Book 2 dashboard. As of September 2022, this code is now in maintenance mode.

Running the notebooks

The notebooks needed to make all the figures are available at the following locations.

Running notebooks in colab

Colab has most of the libraries you will need (e.g., scikit-learn, JAX) pre-installed, and gives you access to a free GPU and TPU. We have a created a colab intro notebook with more details. To run the notebooks on colab in any browser, you can go to a particular notebook on GitHub and change the domain from github.com to githubtocolab.com as suggested here. If you are using Google Chrome browser, you can use "Open in Colab" Chrome extension to do the same with a single click.

Running the notebooks locally

We assume you have already installed JAX and Tensorflow and Torch, since the details on how to do this depend on whether you have a CPU, GPU, etc.

You can use any of the following options to install the other requirements.

  • Option 1
pip install -r https://raw.githubusercontent.com/probml/pyprobml/master/requirements.txt
  • Option 2

Download requirements.txt locally to your path and run

pip install -r requirements.txt
  • Option 3

Run the following. (Note the --depth 1 prevents installing the whole history, which is very large).

git clone --depth 1 https://github.com/probml/pyprobml.git

Then install manually.

If you want to save the figures, you first need to execute something like this

#export FIG_DIR="/teamspace/studios/this_studio/figures"

import os
os.environ["FIG_DIR"] = "/teamspace/studios/this_studio/pyprobml/notebooks/figures"
os.environ["DUAL_SAVE"] = "1" # both pdf and png

This is used by the savefig function to store pdf files.

Cloud computing

When you want more power or control than colab gives you, I recommend you use https://lightning.ai/docs/overview/studios, which makes it very easy to develop using VScode, running on a VM accessed from your web browser; you can then launch on one or more GPUs when needed with a single button click. Alternatively, if you are a power user, you can try Google Cloud Platform, which supports GPUs and TPUs; see this short tutorial on Colab, GCP and TPUs.

How to contribute

See this guide for how to contribute code. Please follow these guidelines to contribute new notebooks to the notebooks directory.

Metrics

Stargazers over time

GSOC

For a summary of some of the contributions to this codebase during Google Summer of Code (GSOC), see these links: 2021 and 2022.

Acknowledgements

For a list of contributors, see this list.