Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

27,671

7,931

27,671

205

View on GitHub

Top Related Projects

pymc

9,146

Bayesian Modeling and Probabilistic Programming in Python

pomegranate

3,467

Fast, flexible and easy to use probabilistic modelling in Python.

stan

2,679

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.

probability

4,355

Probabilistic reasoning and statistical analysis in TensorFlow

pyro

8,833

Deep universal probabilistic programming with Python and PyTorch

pyprobml

6,854

Python code for "Probabilistic Machine learning" book by Kevin Murphy

Quick Overview

Probabilistic Programming and Bayesian Methods for Hackers is an open-source book and educational resource that introduces Bayesian methods and probabilistic programming using a code-first, computation-centric approach. It aims to make these complex topics accessible to programmers and data scientists through practical examples and hands-on coding exercises.

Pros

Practical, code-focused approach to learning Bayesian methods
Free and open-source, making it accessible to a wide audience
Covers a range of topics from basic probability to advanced Bayesian techniques
Includes interactive Jupyter notebooks for hands-on learning

Cons

May be challenging for those without a strong programming background
Some examples and libraries used may become outdated over time
Focuses primarily on PyMC3, which may limit exposure to other probabilistic programming frameworks
Advanced mathematical concepts may still be difficult for some readers to grasp

Code Examples

Basic PyMC3 model:

import pymc3 as pm

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sd=1)
    obs = pm.Normal('obs', mu=mu, sd=1, observed=[0, 1, 2])
    trace = pm.sample(1000)

pm.plot_posterior(trace)

This code creates a simple Bayesian model using PyMC3, defining priors and likelihood, then sampling from the posterior distribution.

Bayesian A/B testing:

import pymc3 as pm
import numpy as np

data = np.array([0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1])

with pm.Model() as model:
    p = pm.Beta('p', alpha=1, beta=1)
    y = pm.Bernoulli('y', p=p, observed=data)
    trace = pm.sample(1000)

pm.plot_posterior(trace)

This example demonstrates how to perform Bayesian A/B testing using PyMC3, modeling binary outcomes with a Beta-Bernoulli model.

Hierarchical model:

import pymc3 as pm
import numpy as np

data = np.random.randn(100, 3)

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sd=1)
    sigma = pm.HalfNormal('sigma', sd=1)
    y = pm.Normal('y', mu=mu, sd=sigma, observed=data)
    trace = pm.sample(1000)

pm.summary(trace)

This code shows how to create a hierarchical model in PyMC3, which is useful for modeling grouped or nested data structures.

Getting Started

To get started with the book and examples:

Clone the repository:

git clone https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers.git

Install required dependencies:
```
pip install -r requirements.txt
```
Launch Jupyter Notebook:
```
jupyter notebook
```
Open the desired chapter's notebook and start exploring the content and running the code examples.

Competitor Comparisons

pymc

9,146

Bayesian Modeling and Probabilistic Programming in Python

Pros of PyMC

Comprehensive probabilistic programming library with extensive features
Active development and maintenance by a dedicated team
Robust documentation and extensive examples for various use cases

Cons of PyMC

Steeper learning curve for beginners in Bayesian methods
Less focus on intuitive explanations compared to the "Hackers" approach
May require more setup and dependencies for complex models

Code Comparison

Probabilistic-Programming-and-Bayesian-Methods-for-Hackers:

import pymc3 as pm

with pm.Model() as model:
    theta = pm.Beta('theta', alpha=1, beta=1)
    y = pm.Bernoulli('y', p=theta, observed=[1, 1, 1, 0, 1, 1])
    trace = pm.sample(1000)

PyMC:

import pymc as pm

with pm.Model() as model:
    theta = pm.Beta('theta', alpha=1, beta=1)
    y = pm.Bernoulli('y', p=theta, observed=[1, 1, 1, 0, 1, 1])
    idata = pm.sample(1000)

The code examples are very similar, with PyMC using updated syntax and returning an InferenceData object instead of a trace.

pomegranate

3,467

Fast, flexible and easy to use probabilistic modelling in Python.

Pros of pomegranate

Comprehensive library for probabilistic modeling with a wide range of algorithms
Efficient implementation using Cython for improved performance
Extensive documentation and examples for ease of use

Cons of pomegranate

Steeper learning curve for beginners compared to Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
Less focus on educational content and explanations of underlying concepts
May require more setup and configuration for complex models

Code Comparison

Probabilistic-Programming-and-Bayesian-Methods-for-Hackers:

import pymc3 as pm

with pm.Model() as model:
    theta = pm.Beta('theta', alpha=1, beta=1)
    y = pm.Bernoulli('y', p=theta, observed=[1, 1, 1, 0, 1, 1])
    trace = pm.sample(1000)

pomegranate:

from pomegranate import *

model = BayesianNetwork()
model.add_state(DiscreteDistribution({'1': 0.5, '0': 0.5}), name="theta")
model.add_state(ConditionalProbabilityTable([['1', '1', 0.5], ['1', '0', 0.5], ['0', '1', 0.5], ['0', '0', 0.5]], ['theta']), name="y")
model.add_edge("theta", "y")
model.bake()

Both repositories offer valuable resources for probabilistic programming and Bayesian methods. Probabilistic-Programming-and-Bayesian-Methods-for-Hackers is more focused on education and understanding concepts, while pomegranate provides a robust library for practical implementation of probabilistic models.

stan

2,679

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.

Pros of Stan

Highly optimized and efficient probabilistic programming language
Extensive documentation and active community support
Supports a wide range of statistical models and inference methods

Cons of Stan

Steeper learning curve for beginners
Requires compilation, which can slow down development process
Less focus on interactive exploration compared to Probabilistic Programming and Bayesian Methods for Hackers

Code Comparison

Stan:

data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real mu;
  real<lower=0> sigma;
}
model {
  y ~ normal(mu, sigma);
}

Probabilistic Programming and Bayesian Methods for Hackers:

import pymc3 as pm

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sd=10)
    sigma = pm.HalfNormal('sigma', sd=1)
    y = pm.Normal('y', mu=mu, sd=sigma, observed=data)

Stan provides a more explicit and statically-typed approach, while Probabilistic Programming and Bayesian Methods for Hackers uses Python with PyMC3, offering a more familiar syntax for Python users and easier integration with data science workflows.

probability

4,355

Probabilistic reasoning and statistical analysis in TensorFlow

Pros of TensorFlow Probability

Comprehensive library with a wide range of probabilistic models and inference algorithms
Seamless integration with TensorFlow ecosystem for scalable machine learning
Active development and support from Google and the open-source community

Cons of TensorFlow Probability

Steeper learning curve, especially for those new to probabilistic programming
More complex setup and installation process
Less focus on intuitive explanations and practical examples for beginners

Code Comparison

Probabilistic-Programming-and-Bayesian-Methods-for-Hackers:

import pymc3 as pm

with pm.Model() as model:
    theta = pm.Beta('theta', alpha=1, beta=1)
    y = pm.Bernoulli('y', p=theta, observed=[1, 1, 1, 0, 1, 1])
    trace = pm.sample(1000)

TensorFlow Probability:

import tensorflow_probability as tfp

model = tfp.distributions.JointDistributionSequential([
    tfp.distributions.Beta(1., 1.),
    lambda theta: tfp.distributions.Bernoulli(probs=theta)
])
posterior = tfp.mcmc.sample_chain(1000, model, observed={"1": [1, 1, 1, 0, 1, 1]})

Both repositories offer valuable resources for probabilistic programming, but they cater to different audiences. Probabilistic-Programming-and-Bayesian-Methods-for-Hackers is more suitable for beginners and focuses on practical examples, while TensorFlow Probability provides a comprehensive toolkit for advanced users and large-scale applications.

pyro

8,833

Deep universal probabilistic programming with Python and PyTorch

Pros of Pyro

Powerful probabilistic programming library built on PyTorch
Supports deep learning and scalable inference algorithms
Actively maintained with regular updates and extensive documentation

Cons of Pyro

Steeper learning curve for beginners in probabilistic programming
More complex setup and dependencies compared to the Jupyter notebook approach
Focuses on advanced techniques, which may be overwhelming for newcomers

Code Comparison

Probabilistic-Programming-and-Bayesian-Methods-for-Hackers:

import pymc3 as pm

with pm.Model() as model:
    theta = pm.Beta('theta', alpha=1, beta=1)
    y = pm.Bernoulli('y', p=theta, observed=[1, 1, 1, 0, 1, 1])
    trace = pm.sample(1000, tune=1000)

Pyro:

import pyro
import pyro.distributions as dist

def model(data):
    theta = pyro.sample('theta', dist.Beta(1, 1))
    with pyro.plate('data', len(data)):
        pyro.sample('obs', dist.Bernoulli(theta), obs=data)

nuts_kernel = pyro.infer.NUTS(model)
mcmc = pyro.infer.MCMC(nuts_kernel, num_samples=1000, warmup_steps=1000)
mcmc.run(torch.tensor([1., 1., 1., 0., 1., 1.]))

The code comparison shows that Pyro requires more setup and is more verbose, but offers greater flexibility and integration with PyTorch. Probabilistic-Programming-and-Bayesian-Methods-for-Hackers uses PyMC3, which provides a more concise syntax for simple models but may be less scalable for complex problems.

pyprobml

6,854

Python code for "Probabilistic Machine learning" book by Kevin Murphy

Pros of pyprobml

More comprehensive coverage of machine learning topics beyond just Bayesian methods
Regularly updated with new content and examples
Includes implementations in multiple frameworks (PyTorch, TensorFlow, JAX)

Cons of pyprobml

Less focused on practical, hands-on examples for beginners
May be overwhelming for those new to probabilistic programming
Lacks the narrative structure found in Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

Code Comparison

pyprobml example (using PyTorch):

import torch
import torch.distributions as dist

def beta_binomial(n, a, b):
    p = dist.Beta(a, b).sample()
    return dist.Binomial(n, p).sample()

result = beta_binomial(10, 2, 2)

Probabilistic-Programming-and-Bayesian-Methods-for-Hackers example (using PyMC3):

import pymc3 as pm

with pm.Model() as model:
    p = pm.Beta('p', alpha=2, beta=2)
    y = pm.Binomial('y', n=10, p=p, observed=7)
    trace = pm.sample(1000)

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Bayesian Methods for Hackers

Using Python and PyMC

The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a so-what feeling about Bayesian inference. In fact, this was the author's own prior opinion.

After some recent success of Bayesian methods in machine-learning competitions, I decided to investigate the subject again. Even with my mathematical background, it took me three straight-days of reading examples and trying to put the pieces together to understand the methods. There was simply not enough literature bridging theory to practice. The problem with my misunderstanding was the disconnect between Bayesian mathematics and probabilistic programming. That being said, I suffered then so the reader would not have to now. This book attempts to bridge the gap.

If Bayesian inference is the destination, then mathematical analysis is a particular path towards it. On the other hand, computing power is cheap enough that we can afford to take an alternate route via probabilistic programming. The latter path is much more useful, as it denies the necessity of mathematical intervention at each step, that is, we remove often-intractable mathematical analysis as a prerequisite to Bayesian inference. Simply put, this latter computational path proceeds via small intermediate jumps from beginning to end, where as the first path proceeds by enormous leaps, often landing far away from our target. Furthermore, without a strong mathematical background, the analysis required by the first path cannot even take place.

Bayesian Methods for Hackers is designed as an introduction to Bayesian inference from a computational/understanding-first, and mathematics-second, point of view. Of course as an introductory book, we can only leave it at that: an introductory book. For the mathematically trained, they may cure the curiosity this text generates with other texts designed with mathematical analysis in mind. For the enthusiast with less mathematical background, or one who is not interested in the mathematics but simply the practice of Bayesian methods, this text should be sufficient and entertaining.

The choice of PyMC as the probabilistic programming language is two-fold. As of this writing, there is currently no central resource for examples and explanations in the PyMC universe. The official documentation assumes prior knowledge of Bayesian inference and probabilistic programming. We hope this book encourages users at every level to look at PyMC. Secondly, with recent core developments and popularity of the scientific stack in Python, PyMC is likely to become a core component soon enough.

PyMC does have dependencies to run, namely NumPy and (optionally) SciPy. To not limit the user, the examples in this book will rely only on PyMC, NumPy, SciPy and Matplotlib.

Printed Version by Addison-Wesley

Bayesian Methods for Hackers is now available as a printed book! You can pick up a copy on Amazon. What are the differences between the online version and the printed version?

Additional Chapter on Bayesian A/B testing
Updated examples
Answers to the end of chapter questions
Additional explanation, and rewritten sections to aid the reader.

See the project homepage here for examples, too.

The below chapters are rendered via the nbviewer at nbviewer.jupyter.org/, and is read-only and rendered in real-time. Interactive notebooks + examples can be downloaded by cloning!

PyMC2

Prologue: Why we do it.
Chapter 1: Introduction to Bayesian Methods Introduction to the philosophy and practice of Bayesian methods and answering the question, "What is probabilistic programming?" Examples include:
- Inferring human behaviour changes from text message rates
Chapter 2: A little more on PyMC We explore modeling Bayesian problems using Python's PyMC library through examples. How do we create Bayesian models? Examples include:
- Detecting the frequency of cheating students, while avoiding liars
- Calculating probabilities of the Challenger space-shuttle disaster
Chapter 3: Opening the Black Box of MCMC We discuss how MCMC operates and diagnostic tools. Examples include:
- Bayesian clustering with mixture models
Chapter 4: The Greatest Theorem Never Told We explore an incredibly useful, and dangerous, theorem: The Law of Large Numbers. Examples include:
- Exploring a Kaggle dataset and the pitfalls of naive analysis
- How to sort Reddit comments from best to worst (not as easy as you think)
Chapter 5: Would you rather lose an arm or a leg? The introduction of loss functions and their (awesome) use in Bayesian methods. Examples include:
- Solving the Price is Right's Showdown
- Optimizing financial predictions
- Winning solution to the Kaggle Dark World's competition
Chapter 6: Getting our prior-ities straight Probably the most important chapter. We draw on expert opinions to answer questions. Examples include:
- Multi-Armed Bandits and the Bayesian Bandit solution.
- What is the relationship between data sample size and prior?
- Estimating financial unknowns using expert priors
We explore useful tips to be objective in analysis as well as common pitfalls of priors.

PyMC3

Prologue: Why we do it.
Chapter 1: Introduction to Bayesian Methods Introduction to the philosophy and practice of Bayesian methods and answering the question, "What is probabilistic programming?" Examples include:
- Inferring human behaviour changes from text message rates
Chapter 2: A little more on PyMC We explore modeling Bayesian problems using Python's PyMC library through examples. How do we create Bayesian models? Examples include:
- Detecting the frequency of cheating students, while avoiding liars
- Calculating probabilities of the Challenger space-shuttle disaster
Chapter 3: Opening the Black Box of MCMC We discuss how MCMC operates and diagnostic tools. Examples include:
- Bayesian clustering with mixture models
Chapter 4: The Greatest Theorem Never Told We explore an incredibly useful, and dangerous, theorem: The Law of Large Numbers. Examples include:
- Exploring a Kaggle dataset and the pitfalls of naive analysis
- How to sort Reddit comments from best to worst (not as easy as you think)
Chapter 5: Would you rather lose an arm or a leg? The introduction of loss functions and their (awesome) use in Bayesian methods. Examples include:
- Solving the Price is Right's Showdown
- Optimizing financial predictions
- Winning solution to the Kaggle Dark World's competition
Chapter 6: Getting our prior-ities straight Probably the most important chapter. We draw on expert opinions to answer questions. Examples include:
- Multi-Armed Bandits and the Bayesian Bandit solution.
- What is the relationship between data sample size and prior?
- Estimating financial unknowns using expert priors
We explore useful tips to be objective in analysis as well as common pitfalls of priors.

More questions about PyMC? Please post your modeling, convergence, or any other PyMC question on cross-validated, the statistics stack-exchange.

Using the book

The book can be read in three different ways, starting from most recommended to least recommended:

The most recommended option is to clone the repository to download the .ipynb files to your local machine. If you have Jupyter installed, you can view the chapters in your browser plus edit and run the code provided (and try some practice questions). This is the preferred option to read this book, though it comes with some dependencies.
- Jupyter is a requirement to view the ipynb files. It can be downloaded here. Jupyter notebooks can be run by (your-virtualenv) ~/path/to/the/book/Chapter1_Introduction $ jupyter notebook
- For Linux users, you should not have a problem installing NumPy, SciPy, Matplotlib and PyMC. For Windows users, check out pre-compiled versions if you have difficulty.
- In the styles/ directory are a number of files (.matplotlirc) that used to make things pretty. These are not only designed for the book, but they offer many improvements over the default settings of matplotlib.
The second, preferred, option is to use the nbviewer.jupyter.org site, which display Jupyter notebooks in the browser (example). The contents are updated synchronously as commits are made to the book. You can use the Contents section above to link to the chapters.
PDFs are the least-preferred method to read the book, as PDFs are static and non-interactive. If PDFs are desired, they can be created dynamically using the nbconvert utility.

Installation and configuration

If you would like to run the Jupyter notebooks locally, (option 1. above), you'll need to install the following:

Jupyter is a requirement to view the ipynb files. It can be downloaded here
Necessary packages are PyMC, NumPy, SciPy and Matplotlib.
- For Linux/OSX users, you should not have a problem installing the above, except for Matplotlib on OSX.
- For Windows users, check out pre-compiled versions if you have difficulty.
- also recommended, for data-mining exercises, are PRAW and requests.
New to Python or Jupyter, and help with the namespaces? Check out this answer.
In the styles/ directory are a number of files that are customized for the notebook. These are not only designed for the book, but they offer many improvements over the default settings of matplotlib and the Jupyter notebook. The in notebook style has not been finalized yet.

Development

This book has an unusual development design. The content is open-sourced, meaning anyone can be an author. Authors submit content or revisions using the GitHub interface.

How to contribute

What to contribute?

The current chapter list is not finalized. If you see something that is missing (MCMC, MAP, Bayesian networks, good prior choices, Potential classes etc.), feel free to start there.
Cleaning up Python code and making code more PyMC-esque
Giving better explanations
Spelling/grammar mistakes
Suggestions
Contributing to the Jupyter notebook styles

Commiting

All commits are welcome, even if they are minor ;)
If you are unfamiliar with Github, you can email me contributions to the email below.

Reviews

these are satirical, but real

"No, but it looks good" - John D. Cook

"I ... read this book ... I like it!" - Andrew Gelman

"This book is a godsend, and a direct refutation to that 'hmph! you don't know maths, piss off!' school of thought... The publishing model is so unusual. Not only is it open source but it relies on pull requests from anyone in order to progress the book. This is ingenious and heartening" - excited Reddit user

Contributions and Thanks

Thanks to all our contributing authors, including (in chronological order):

Authors
Cameron Davidson-Pilon	Stef Gibson	Vincent Ohprecio	Lars Buitinck
Paul Magwene	Matthias Bussonnier	Jens Rantil	y-p
Ethan Brown	Jonathan Whitmore	Mattia Rigotti	Colby Lemon
Gustav W Delius	Matthew Conlen	Jim Radford	Vannessa Sabino
Thomas Bratt	Nisan Haramati	Robert Grant	Matthew Wampler-Doty
Yaroslav Halchenko	Alex Garel	Oleksandr Lysenko	liori
ducky427	Pablo de Oliveira Castro	sergeyfogelson	Mattia Rigotti
Matt Bauman	Andrew Duberstein	Carsten Brandt	Bob Jansen
ugurthemaster	William Scott	Min RK	Bulwersator
elpres	Augusto Hack	Michael Feldmann	Youki
Jens Rantil	Kyle Meyer	Eric Martin	Inconditus
Kleptine	Stuart Layton	Antonino Ingargiola	vsl9
Tom Christie	bclow	Simon Potter	Garth Snyder
Daniel Beauchamp	Philipp Singer	gbenmartin	Peadar Coyle

We would like to thank the Python community for building an amazing architecture. We would like to thank the statistics community for building an amazing architecture.

Similarly, the book is only possible because of the PyMC library. A big thanks to the core devs of PyMC: Chris Fonnesbeck, Anand Patil, David Huard and John Salvatier.

One final thanks. This book was generated by Jupyter Notebook, a wonderful tool for developing in Python. We thank the IPython/Jupyter community for developing the Notebook interface. All Jupyter notebook files are available for download on the GitHub repository.

Contact

Contact the main author, Cam Davidson-Pilon at cam.davidson.pilon@gmail.com or @cmrndp

Imgur

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of PyMC

Cons of PyMC

Code Comparison

Pros of pomegranate

Cons of pomegranate

Code Comparison

Pros of Stan

Cons of Stan

Code Comparison

Pros of TensorFlow Probability

Cons of TensorFlow Probability

Code Comparison

Pros of Pyro

Cons of Pyro

Code Comparison

Pros of pyprobml

Cons of pyprobml

Code Comparison

Convert designs to code with AI

README

Using Python and PyMC

Printed Version by Addison-Wesley

Contents

PyMC2

PyMC3

Using the book

Installation and configuration

Development

How to contribute

What to contribute?

Commiting

Reviews

Contributions and Thanks

Contact

Top Related Projects

Convert designs to code with AI