Top Related Projects
Bayesian Modeling and Probabilistic Programming in Python
Deep universal probabilistic programming with Python and PyTorch
Probabilistic reasoning and statistical analysis in TensorFlow
Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.
Quick Overview
The pyprobml repository is a collection of Python code for the book "Machine Learning: A Probabilistic Perspective" by Kevin Murphy. It contains implementations of various machine learning algorithms, utilities, and examples to support the concepts discussed in the book.
Pros
- Comprehensive collection of machine learning algorithms and techniques
- Well-organized codebase with clear structure and documentation
- Provides practical implementations of concepts from a popular ML textbook
- Regularly updated with new examples and improvements
Cons
- May require some background knowledge in machine learning to fully utilize
- Some implementations might not be optimized for production use
- Dependency management can be challenging due to the wide range of libraries used
- Limited community support compared to more established ML libraries
Code Examples
- Loading and preprocessing data:
import numpy as np
from pyprobml import util
X, y = util.load_iris_data()
X_train, X_test, y_train, y_test = util.train_test_split(X, y, test_size=0.2, random_state=42)
- Training a simple logistic regression model:
from pyprobml.sklearn import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
- Plotting results:
import matplotlib.pyplot as plt
from pyprobml import plotting
plt.figure(figsize=(10, 6))
plotting.plot_classifier_decision_boundary(model, X, y)
plt.title("Logistic Regression Decision Boundary")
plt.show()
Getting Started
To get started with pyprobml, follow these steps:
-
Clone the repository:
git clone https://github.com/probml/pyprobml.git
-
Install the required dependencies:
pip install -r requirements.txt
-
Import and use the desired modules in your Python script:
from pyprobml import util, plotting from pyprobml.sklearn import LogisticRegression # Your code here
-
Explore the examples in the repository to learn how to use different algorithms and utilities provided by pyprobml.
Competitor Comparisons
Bayesian Modeling and Probabilistic Programming in Python
Pros of PyMC
- More mature and widely-used probabilistic programming framework
- Extensive documentation and community support
- Seamless integration with NumPy and Theano for efficient computations
Cons of PyMC
- Steeper learning curve for beginners
- Less focus on machine learning applications compared to PyProbML
Code Comparison
PyMC example:
import pymc as pm
with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sigma=1)
obs = pm.Normal('obs', mu=mu, sigma=1, observed=[0, 1, 2])
trace = pm.sample(1000)
PyProbML example:
import pyprobml as pml
import numpy as np
X = np.array([0, 1, 2])
model = pml.GaussianModel(X)
posterior = model.fit()
PyMC offers a more explicit probabilistic modeling approach, while PyProbML provides a higher-level interface for common machine learning tasks. PyMC's syntax is more verbose but allows for greater flexibility in model specification. PyProbML's code is more concise and focuses on ease of use for typical machine learning scenarios.
Deep universal probabilistic programming with Python and PyTorch
Pros of Pyro
- More mature and widely adopted probabilistic programming framework
- Extensive documentation and tutorials for easier learning
- Seamless integration with PyTorch for deep probabilistic models
Cons of Pyro
- Steeper learning curve for beginners in probabilistic programming
- Less focus on educational content compared to PyProbML
Code Comparison
PyProbML example (simple Gaussian model):
import jax.numpy as jnp
from jax import random
import numpyro
import numpyro.distributions as dist
def model(x):
mu = numpyro.sample('mu', dist.Normal(0, 1))
numpyro.sample('obs', dist.Normal(mu, 1), obs=x)
Pyro example (simple Gaussian model):
import torch
import pyro
import pyro.distributions as dist
def model(x):
mu = pyro.sample('mu', dist.Normal(0, 1))
pyro.sample('obs', dist.Normal(mu, 1), obs=x)
Both repositories provide implementations for probabilistic machine learning. PyProbML focuses more on educational content and examples, while Pyro offers a more comprehensive framework for building and deploying probabilistic models. Pyro's integration with PyTorch makes it particularly suitable for deep probabilistic models, whereas PyProbML uses JAX and NumPyro for its implementations.
Probabilistic reasoning and statistical analysis in TensorFlow
Pros of TensorFlow Probability
- Extensive library with a wide range of probabilistic models and tools
- Seamless integration with TensorFlow ecosystem
- Well-documented with comprehensive API references
Cons of TensorFlow Probability
- Steeper learning curve for beginners
- Heavier dependency on TensorFlow framework
- May be overkill for simpler probabilistic modeling tasks
Code Comparison
TensorFlow Probability:
import tensorflow_probability as tfp
tfd = tfp.distributions
# Create a normal distribution
normal = tfd.Normal(loc=0., scale=1.)
# Sample from the distribution
samples = normal.sample(100)
PyProbML:
import numpy as np
from scipy import stats
# Create a normal distribution
normal = stats.norm(loc=0, scale=1)
# Sample from the distribution
samples = normal.rvs(100)
TensorFlow Probability offers more advanced features and integration with TensorFlow, while PyProbML provides a simpler interface for basic probabilistic modeling tasks.
Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
Pros of Stan
- More mature and widely adopted probabilistic programming language
- Highly optimized C++ backend for efficient sampling and inference
- Extensive documentation and community support
Cons of Stan
- Steeper learning curve, especially for those new to probabilistic programming
- Less flexibility in model specification compared to PyProbML's Python-based approach
Code Comparison
Stan:
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
y ~ normal(alpha + beta * x, sigma);
}
PyProbML:
import pymc3 as pm
with pm.Model() as model:
alpha = pm.Normal('alpha', mu=0, sd=10)
beta = pm.Normal('beta', mu=0, sd=10)
sigma = pm.HalfNormal('sigma', sd=1)
y = pm.Normal('y', mu=alpha + beta * x, sd=sigma, observed=y_data)
Stan uses its own domain-specific language, while PyProbML leverages Python's syntax and existing libraries like PyMC3. Stan's approach may be more efficient for complex models, but PyProbML offers greater flexibility and easier integration with Python ecosystems.
Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.
Pros of pgmpy
- More comprehensive library for probabilistic graphical models
- Better documentation and API reference
- Larger community and more frequent updates
Cons of pgmpy
- Steeper learning curve for beginners
- Less focus on modern machine learning techniques
- More complex installation process
Code Comparison
pgmpy example:
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
model = BayesianNetwork([('A', 'B'), ('B', 'C')])
cpd_a = TabularCPD('A', 2, [[0.6], [0.4]])
cpd_b = TabularCPD('B', 2, [[0.7, 0.3], [0.3, 0.7]], evidence=['A'], evidence_card=[2])
model.add_cpds(cpd_a, cpd_b)
pyprobml example:
import pyprobml_utils as pml
import numpy as np
X = np.random.randn(100, 2)
y = np.random.randint(0, 2, 100)
pml.plot_classifier_boundaries(X, y)
Both libraries offer unique features and cater to different aspects of probabilistic machine learning. pgmpy is more focused on graphical models, while pyprobml provides a broader range of machine learning utilities and visualizations.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
pyprobml
Python 3 code to reproduce the figures in the books Probabilistic Machine Learning: An Introduction (aka "book 1") and Probabilistic Machine Learning: Advanced Topics (aka "book 2"). The code uses the standard Python libraries, such as numpy, scipy, matplotlib, sklearn, etc. Some of the code (especially in book 2) also uses JAX, and in some parts of book 1, we also use Tensorflow 2 and a little bit of Torch. See also probml-utils for some utility code that is shared across multiple notebooks.
For the latest status of the code, see Book 1 dashboard and Book 2 dashboard. As of September 2022, this code is now in maintenance mode.
Running the notebooks
The notebooks needed to make all the figures are available at the following locations.
- All notebooks (sorted by filename)
- Book 1 notebooks (sorted by chapter)
- Book 2 notebooks (sorted by chapter).
Running notebooks in colab
Colab has most of the libraries you will need (e.g., scikit-learn, JAX) pre-installed, and gives you access to a free GPU and TPU. We have a created a
colab intro
notebook with more details. To run the notebooks on colab in any browser, you can go to a particular notebook on GitHub and change the domain from github.com
to githubtocolab.com
as suggested here. If you are using Google Chrome browser, you can use "Open in Colab" Chrome extension to do the same with a single click.
Running the notebooks locally
We assume you have already installed JAX and Tensorflow and Torch, since the details on how to do this depend on whether you have a CPU, GPU, etc.
You can use any of the following options to install the other requirements.
- Option 1
pip install -r https://raw.githubusercontent.com/probml/pyprobml/master/requirements.txt
- Option 2
Download requirements.txt locally to your path and run
pip install -r requirements.txt
- Option 3
Run the following. (Note the --depth 1
prevents installing the whole history, which is very large).
git clone --depth 1 https://github.com/probml/pyprobml.git
Then install manually.
If you want to save the figures, you first need to execute something like this
#export FIG_DIR="/teamspace/studios/this_studio/figures"
import os
os.environ["FIG_DIR"] = "/teamspace/studios/this_studio/pyprobml/notebooks/figures"
os.environ["DUAL_SAVE"] = "1" # both pdf and png
This is used by the savefig function to store pdf files.
Cloud computing
When you want more power or control than colab gives you, I recommend you use https://lightning.ai/docs/overview/studios, which makes it very easy to develop using VScode, running on a VM accessed from your web browser; you can then launch on one or more GPUs when needed with a single button click. Alternatively, if you are a power user, you can try Google Cloud Platform, which supports GPUs and TPUs; see this short tutorial on Colab, GCP and TPUs.
How to contribute
See this guide for how to contribute code. Please follow these guidelines to contribute new notebooks to the notebooks directory.
Metrics
GSOC
For a summary of some of the contributions to this codebase during Google Summer of Code (GSOC), see these links: 2021 and 2022.
Acknowledgements
For a list of contributors, see this list.
Top Related Projects
Bayesian Modeling and Probabilistic Programming in Python
Deep universal probabilistic programming with Python and PyTorch
Probabilistic reasoning and statistical analysis in TensorFlow
Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot