Convert Figma logo to code with AI

pymc-devs logopymc

Bayesian Modeling and Probabilistic Programming in Python

8,623
1,990
8,623
302

Top Related Projects

2,570

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.

Probabilistic reasoning and statistical analysis in TensorFlow

8,493

Deep universal probabilistic programming with Python and PyTorch

1,853

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

6,997

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

Quick Overview

PyMC is a probabilistic programming library for Python that allows users to build and analyze Bayesian models. It provides a high-level interface for defining models, performing inference, and analyzing results, making it accessible to both beginners and advanced users in the field of Bayesian statistics.

Pros

  • Intuitive and expressive syntax for model specification
  • Supports a wide range of statistical distributions and inference algorithms
  • Excellent documentation and active community support
  • Seamless integration with popular Python scientific libraries like NumPy and Pandas

Cons

  • Can be computationally intensive for large or complex models
  • Learning curve for users new to Bayesian statistics
  • Limited support for certain specialized models compared to some other statistical packages

Code Examples

  1. Simple linear regression:
import pymc as pm
import numpy as np

# Generate synthetic data
X = np.random.randn(100)
y = 2 * X + 1 + np.random.randn(100) * 0.5

with pm.Model() as model:
    # Priors
    alpha = pm.Normal('alpha', mu=0, sigma=10)
    beta = pm.Normal('beta', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=1)
    
    # Linear model
    mu = alpha + beta * X
    
    # Likelihood
    Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=y)
    
    # Inference
    trace = pm.sample(1000, tune=1000)

pm.plot_posterior(trace)
  1. Logistic regression:
import pymc as pm
import numpy as np

# Generate synthetic data
X = np.random.randn(100, 2)
y = np.random.binomial(1, 1 / (1 + np.exp(-X.sum(axis=1))))

with pm.Model() as model:
    # Priors
    alpha = pm.Normal('alpha', mu=0, sigma=10)
    beta = pm.Normal('beta', mu=0, sigma=10, shape=2)
    
    # Logistic model
    p = pm.math.invlogit(alpha + pm.math.dot(X, beta))
    
    # Likelihood
    Y_obs = pm.Bernoulli('Y_obs', p=p, observed=y)
    
    # Inference
    trace = pm.sample(1000, tune=1000)

pm.plot_forest(trace)
  1. Hierarchical model:
import pymc as pm
import numpy as np

# Generate synthetic data
groups = 3
n_per_group = 30
group_means = np.random.normal(0, 1, groups)
data = np.random.normal(np.repeat(group_means, n_per_group), 1)
group = np.repeat(np.arange(groups), n_per_group)

with pm.Model() as model:
    # Hyperpriors
    mu = pm.Normal('mu', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=10)
    
    # Group-level effects
    group_effects = pm.Normal('group_effects', mu=0, sigma=sigma, shape=groups)
    
    # Model
    y = pm.Normal('y', mu=mu + group_effects[group], sigma=1, observed=data)
    
    # Inference
    trace = pm.sample(1000, tune=1000)

pm.plot_forest(trace, var_names=['group_effects'])

Getting Started

To get started with PyMC:

  1. Install PyMC:
pip install pymc
  1. Import the library:
import pymc as pm
  1. Define your model using a context manager:
with pm.Model() as model:
    # Define priors, likelihood, and observed data
    ...
    
    # Perform inference
    trace = pm.sample(1000, tune=1000)

# Analyze results
pm.plot_

Competitor Comparisons

2,570

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.

Pros of Stan

  • Generally faster execution for complex models
  • More extensive documentation and user guides
  • Wider adoption in certain scientific fields (e.g., physics, ecology)

Cons of Stan

  • Steeper learning curve, especially for those new to probabilistic programming
  • Less seamless integration with Python ecosystem
  • Requires compilation step, which can slow down iterative development

Code Comparison

Stan:

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
model {
  y ~ normal(alpha + beta * x, sigma);
}

PyMC:

with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sigma=10)
    beta = pm.Normal('beta', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=1)
    mu = alpha + beta * x
    y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)

Both Stan and PyMC are powerful probabilistic programming frameworks, but they cater to slightly different user bases and use cases. Stan excels in performance and has a more extensive ecosystem, while PyMC offers a more Pythonic approach and easier integration with the broader Python data science stack.

Probabilistic reasoning and statistical analysis in TensorFlow

Pros of TensorFlow Probability

  • Seamless integration with TensorFlow ecosystem
  • Highly scalable for large datasets and distributed computing
  • Extensive support for deep learning models

Cons of TensorFlow Probability

  • Steeper learning curve, especially for those new to TensorFlow
  • Less intuitive for traditional statistical modeling compared to PyMC
  • Requires more boilerplate code for simple models

Code Comparison

PyMC example:

import pymc as pm

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sigma=1)
    obs = pm.Normal('obs', mu=mu, sigma=1, observed=data)
    trace = pm.sample(1000)

TensorFlow Probability example:

import tensorflow_probability as tfp

def model():
    mu = tfp.distributions.Normal(loc=0., scale=1.)
    return tfp.distributions.Normal(loc=mu, scale=1.)

data = tfp.distributions.Normal(loc=0., scale=1.).sample(100)
posterior = tfp.mcmc.sample_chain(1000, model, current_state=[0.])

Both libraries offer powerful probabilistic programming capabilities, but PyMC is generally more user-friendly for traditional statistical modeling, while TensorFlow Probability excels in scalability and integration with deep learning workflows.

8,493

Deep universal probabilistic programming with Python and PyTorch

Pros of Pyro

  • Built on PyTorch, offering better GPU acceleration and deep learning integration
  • More flexible and expressive for complex probabilistic models
  • Supports dynamic computational graphs, allowing for models with varying structure

Cons of Pyro

  • Steeper learning curve, especially for those not familiar with PyTorch
  • Less extensive documentation and smaller community compared to PyMC
  • Fewer built-in statistical distributions and inference algorithms

Code Comparison

PyMC example:

import pymc as pm

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sigma=1)
    obs = pm.Normal('obs', mu=mu, sigma=1, observed=data)
    trace = pm.sample(1000)

Pyro example:

import pyro
import pyro.distributions as dist

def model(data):
    mu = pyro.sample('mu', dist.Normal(0, 1))
    with pyro.plate('data', len(data)):
        pyro.sample('obs', dist.Normal(mu, 1), obs=data)

pyro.infer.NUTS(model).run(data)

Both PyMC and Pyro are powerful probabilistic programming libraries, with PyMC being more accessible for beginners and Pyro offering greater flexibility for complex models and deep learning integration.

1,853

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Pros of Orbit

  • Specialized for time series forecasting and causal impact analysis
  • Offers a high-level API for easier model building and forecasting
  • Includes built-in visualization tools for time series analysis

Cons of Orbit

  • More limited in scope compared to PyMC's general-purpose probabilistic programming
  • Smaller community and ecosystem
  • Less extensive documentation and tutorials

Code Comparison

PyMC example:

import pymc as pm

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sigma=1)
    obs = pm.Normal('obs', mu=mu, sigma=1, observed=data)
    trace = pm.sample(1000)

Orbit example:

from orbit.models import DLT

model = DLT(response_col='y', date_col='ds')
model.fit(df)
prediction = model.predict(df)

Both PyMC and Orbit are powerful libraries for probabilistic modeling, but they serve different purposes. PyMC is a more general-purpose probabilistic programming framework, while Orbit focuses specifically on time series forecasting and causal impact analysis. Orbit provides a higher-level API that may be easier for beginners to use, especially for time series tasks, but PyMC offers more flexibility for a wider range of probabilistic modeling problems.

6,997

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

Pros of DoWhy

  • Focused specifically on causal inference and effect estimation
  • Provides a unified framework for various causal inference methods
  • Includes tools for sensitivity analysis and robustness checks

Cons of DoWhy

  • More limited scope compared to PyMC's broader statistical modeling capabilities
  • Smaller community and ecosystem of extensions/plugins
  • Less comprehensive documentation and tutorials

Code Comparison

DoWhy example:

from dowhy import CausalModel
model = CausalModel(data, treatment, outcome, graph)
identified_estimand = model.identify_effect()
estimate = model.estimate_effect(identified_estimand)

PyMC example:

import pymc as pm
with pm.Model() as model:
    treatment = pm.Normal('treatment', mu=0, sigma=1)
    outcome = pm.Normal('outcome', mu=treatment, sigma=1)
    trace = pm.sample()

Both libraries offer different approaches to causal inference. DoWhy provides a more structured framework specifically for causal analysis, while PyMC offers a flexible probabilistic programming environment for a wider range of statistical modeling tasks. The choice between them depends on the specific requirements of your project and your familiarity with Bayesian methods versus potential outcomes frameworks.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

.. image:: https://cdn.rawgit.com/pymc-devs/pymc/main/docs/logos/svg/PyMC_banner.svg :height: 100px :alt: PyMC logo :align: center

|Build Status| |Coverage| |NumFOCUS_badge| |Binder| |Dockerhub| |DOIzenodo| |Conda Downloads|

PyMC (formerly PyMC3) is a Python package for Bayesian statistical modeling focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.

Check out the PyMC overview <https://docs.pymc.io/en/latest/learn/core_notebooks/pymc_overview.html>, or one of the many examples <https://www.pymc.io/projects/examples/en/latest/gallery.html>! For questions on PyMC, head on over to our PyMC Discourse <https://discourse.pymc.io/>__ forum.

Features

  • Intuitive model specification syntax, for example, x ~ N(0,1) translates to x = Normal('x',0,1)
  • Powerful sampling algorithms, such as the No U-Turn Sampler <http://www.jmlr.org/papers/v15/hoffman14a.html>__, allow complex models with thousands of parameters with little specialized knowledge of fitting algorithms.
  • Variational inference: ADVI <http://www.jmlr.org/papers/v18/16-107.html>__ for fast approximate posterior estimation as well as mini-batch ADVI for large data sets.
  • Relies on PyTensor <https://pytensor.readthedocs.io/en/latest/>__ which provides:
    • Computation optimization and dynamic C or JAX compilation
    • NumPy broadcasting and advanced indexing
    • Linear algebra operators
    • Simple extensibility
  • Transparent support for missing value imputation

Linear Regression Example

Plant growth can be influenced by multiple factors, and understanding these relationships is crucial for optimizing agricultural practices.

Imagine we conduct an experiment to predict the growth of a plant based on different environmental variables.

.. code-block:: python

import pymc as pm

Taking draws from a normal distribution

seed = 42 x_dist = pm.Normal.dist(shape=(100, 3)) x_data = pm.draw(x_dist, random_seed=seed)

Independent Variables:

Sunlight Hours: Number of hours the plant is exposed to sunlight daily.

Water Amount: Daily water amount given to the plant (in milliliters).

Soil Nitrogen Content: Percentage of nitrogen content in the soil.

Dependent Variable:

Plant Growth (y): Measured as the increase in plant height (in centimeters) over a certain period.

Define coordinate values for all dimensions of the data

coords={ "trial": range(100), "features": ["sunlight hours", "water amount", "soil nitrogen"], }

Define generative model

with pm.Model(coords=coords) as generative_model: x = pm.Data("x", x_data, dims=["trial", "features"])

  # Model parameters
  betas = pm.Normal("betas", dims="features")
  sigma = pm.HalfNormal("sigma")

  # Linear model
  mu = x @ betas

  # Likelihood
  # Assuming we measure deviation of each plant from baseline
  plant_growth = pm.Normal("plant growth", mu, sigma, dims="trial")

Generating data from model by fixing parameters

fixed_parameters = { "betas": [5, 20, 2], "sigma": 0.5, } with pm.do(generative_model, fixed_parameters) as synthetic_model: idata = pm.sample_prior_predictive(random_seed=seed) # Sample from prior predictive distribution. synthetic_y = idata.prior["plant growth"].sel(draw=0, chain=0)

Infer parameters conditioned on observed data

with pm.observe(generative_model, {"plant growth": synthetic_y}) as inference_model: idata = pm.sample(random_seed=seed)

  summary = pm.stats.summary(idata, var_names=["betas", "sigma"])
  print(summary)

From the summary, we can see that the mean of the inferred parameters are very close to the fixed parameters

===================== ====== ===== ======== ========= =========== ========= ========== ========== ======= Params mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat ===================== ====== ===== ======== ========= =========== ========= ========== ========== ======= betas[sunlight hours] 4.972 0.054 4.866 5.066 0.001 0.001 3003 1257 1 betas[water amount] 19.963 0.051 19.872 20.062 0.001 0.001 3112 1658 1 betas[soil nitrogen] 1.994 0.055 1.899 2.107 0.001 0.001 3221 1559 1 sigma 0.511 0.037 0.438 0.575 0.001 0 2945 1522 1 ===================== ====== ===== ======== ========= =========== ========= ========== ========== =======

.. code-block:: python

Simulate new data conditioned on inferred parameters

new_x_data = pm.draw( pm.Normal.dist(shape=(3, 3)), random_seed=seed, ) new_coords = coords | {"trial": [0, 1, 2]}

with inference_model: pm.set_data({"x": new_x_data}, coords=new_coords) pm.sample_posterior_predictive( idata, predictions=True, extend_inferencedata=True, random_seed=seed, )

pm.stats.summary(idata.predictions, kind="stats")

The new data conditioned on inferred parameters would look like:

================ ======== ======= ======== ========= Output mean sd hdi_3% hdi_97% ================ ======== ======= ======== ========= plant growth[0] 14.229 0.515 13.325 15.272 plant growth[1] 24.418 0.511 23.428 25.326 plant growth[2] -6.747 0.511 -7.740 -5.797 ================ ======== ======= ======== =========

.. code-block:: python

Simulate new data, under a scenario where the first beta is zero

with pm.do( inference_model, {inference_model["betas"]: inference_model["betas"] * [0, 1, 1]}, ) as plant_growth_model: new_predictions = pm.sample_posterior_predictive( idata, predictions=True, random_seed=seed, )

pm.stats.summary(new_predictions, kind="stats")

The new data, under the above scenario would look like:

================ ======== ======= ======== ========= Output mean sd hdi_3% hdi_97% ================ ======== ======= ======== ========= plant growth[0] 12.149 0.515 11.193 13.135 plant growth[1] 29.809 0.508 28.832 30.717 plant growth[2] -0.131 0.507 -1.121 0.791 ================ ======== ======= ======== =========

Getting started

If you already know about Bayesian statistics:

  • API quickstart guide <https://www.pymc.io/projects/examples/en/latest/introductory/api_quickstart.html>__
  • The PyMC tutorial <https://docs.pymc.io/en/latest/learn/core_notebooks/pymc_overview.html>__
  • PyMC examples <https://www.pymc.io/projects/examples/en/latest/gallery.html>__ and the API reference <https://docs.pymc.io/en/stable/api.html>__

Learn Bayesian statistics with a book together with PyMC

  • Bayesian Analysis with Python <http://bap.com.ar/>__ (third edition) by Osvaldo Martin: Great introductory book.
  • Probabilistic Programming and Bayesian Methods for Hackers <https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers>__: Fantastic book with many applied code examples.
  • PyMC port of the book "Doing Bayesian Data Analysis" by John Kruschke <https://github.com/cluhmann/DBDA-python>__ as well as the first edition <https://github.com/aloctavodia/Doing_bayesian_data_analysis>__.
  • PyMC port of the book "Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath <https://github.com/pymc-devs/resources/tree/master/Rethinking>__
  • PyMC port of the book "Bayesian Cognitive Modeling" by Michael Lee and EJ Wagenmakers <https://github.com/pymc-devs/resources/tree/master/BCM>__: Focused on using Bayesian statistics in cognitive modeling.

Audio & Video

  • Here is a YouTube playlist <https://www.youtube.com/playlist?list=PL1Ma_1DBbE82OVW8Fz_6Ts1oOeyOAiovy>__ gathering several talks on PyMC.
  • You can also find all the talks given at PyMCon 2020 here <https://discourse.pymc.io/c/pymcon/2020talks/15>__.
  • The "Learning Bayesian Statistics" podcast <https://www.learnbayesstats.com/>__ helps you discover and stay up-to-date with the vast Bayesian community. Bonus: it's hosted by Alex Andorra, one of the PyMC core devs!

Installation

To install PyMC on your system, follow the instructions on the installation guide <https://www.pymc.io/projects/docs/en/latest/installation.html>__.

Citing PyMC

Please choose from the following:

  • |DOIpaper| PyMC: A Modern and Comprehensive Probabilistic Programming Framework in Python, Abril-Pla O, Andreani V, Carroll C, Dong L, Fonnesbeck CJ, Kochurov M, Kumar R, Lao J, Luhmann CC, Martin OA, Osthege M, Vieira R, Wiecki T, Zinkov R. (2023)
  • |DOIzenodo| A DOI for all versions.
  • DOIs for specific versions are shown on Zenodo and under Releases <https://github.com/pymc-devs/pymc/releases>_

.. |DOIpaper| image:: https://img.shields.io/badge/DOI-10.7717%2Fpeerj--cs.1516-blue.svg :target: https://doi.org/10.7717/peerj-cs.1516 .. |DOIzenodo| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.4603970.svg :target: https://doi.org/10.5281/zenodo.4603970

Contact

We are using discourse.pymc.io <https://discourse.pymc.io/>__ as our main communication channel.

To ask a question regarding modeling or usage of PyMC we encourage posting to our Discourse forum under the “Questions” Category <https://discourse.pymc.io/c/questions>. You can also suggest feature in the “Development” Category <https://discourse.pymc.io/c/development>.

You can also follow us on these social media platforms for updates and other announcements:

  • LinkedIn @pymc <https://www.linkedin.com/company/pymc/>__
  • YouTube @PyMCDevelopers <https://www.youtube.com/c/PyMCDevelopers>__
  • X @pymc_devs <https://x.com/pymc_devs>__
  • Mastodon @pymc@bayes.club <https://bayes.club/@pymc>__

To report an issue with PyMC please use the issue tracker <https://github.com/pymc-devs/pymc/issues>__.

Finally, if you need to get in touch for non-technical information about the project, send us an e-mail <info@pymc-devs.org>__.

License

Apache License, Version 2.0 <https://github.com/pymc-devs/pymc/blob/main/LICENSE>__

Software using PyMC

General purpose

  • Bambi <https://github.com/bambinos/bambi>__: BAyesian Model-Building Interface (BAMBI) in Python.
  • calibr8 <https://calibr8.readthedocs.io>__: A toolbox for constructing detailed observation models to be used as likelihoods in PyMC.
  • gumbi <https://github.com/JohnGoertz/Gumbi>__: A high-level interface for building GP models.
  • SunODE <https://github.com/aseyboldt/sunode>__: Fast ODE solver, much faster than the one that comes with PyMC.
  • pymc-learn <https://github.com/pymc-learn/pymc-learn>__: Custom PyMC models built on top of pymc3_models/scikit-learn API

Domain specific

  • Exoplanet <https://github.com/dfm/exoplanet>__: a toolkit for modeling of transit and/or radial velocity observations of exoplanets and other astronomical time series.
  • beat <https://github.com/hvasbath/beat>__: Bayesian Earthquake Analysis Tool.
  • CausalPy <https://github.com/pymc-labs/CausalPy>__: A package focussing on causal inference in quasi-experimental settings.

Please contact us if your software is not listed here.

Papers citing PyMC

See Google Scholar here <https://scholar.google.com/scholar?cites=6357998555684300962>__ and here <https://scholar.google.com/scholar?cites=6936955228135731011>__ for a continuously updated list.

Contributors

See the GitHub contributor page <https://github.com/pymc-devs/pymc/graphs/contributors>. Also read our Code of Conduct <https://github.com/pymc-devs/pymc/blob/main/CODE_OF_CONDUCT.md> guidelines for a better contributing experience.

Support

PyMC is a non-profit project under NumFOCUS umbrella. If you want to support PyMC financially, you can donate here <https://numfocus.salsalabs.org/donate-to-pymc3/index.html>__.

Professional Consulting Support

You can get professional consulting support from PyMC Labs <https://www.pymc-labs.io>__.

Sponsors

|NumFOCUS|

|PyMCLabs|

|Mistplay|

|ODSC|

Thanks to our contributors

|contributors|

.. |Binder| image:: https://mybinder.org/badge_logo.svg :target: https://mybinder.org/v2/gh/pymc-devs/pymc/main?filepath=%2Fdocs%2Fsource%2Fnotebooks .. |Build Status| image:: https://github.com/pymc-devs/pymc/workflows/pytest/badge.svg :target: https://github.com/pymc-devs/pymc/actions .. |Coverage| image:: https://codecov.io/gh/pymc-devs/pymc/branch/main/graph/badge.svg :target: https://codecov.io/gh/pymc-devs/pymc .. |Dockerhub| image:: https://img.shields.io/docker/automated/pymc/pymc.svg :target: https://hub.docker.com/r/pymc/pymc .. |NumFOCUS_badge| image:: https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A :target: http://www.numfocus.org/ .. |NumFOCUS| image:: https://github.com/pymc-devs/brand/blob/main/sponsors/sponsor_logos/sponsor_numfocus.png?raw=true :target: http://www.numfocus.org/ .. |PyMCLabs| image:: https://github.com/pymc-devs/brand/blob/main/sponsors/sponsor_logos/sponsor_pymc_labs.png?raw=true :target: https://pymc-labs.io .. |Mistplay| image:: https://github.com/pymc-devs/brand/blob/main/sponsors/sponsor_logos/sponsor_mistplay.png?raw=true :target: https://www.mistplay.com/ .. |ODSC| image:: https://github.com/pymc-devs/brand/blob/main/sponsors/sponsor_logos/odsc/sponsor_odsc.png?raw=true :target: https://odsc.com/california/?utm_source=pymc&utm_medium=referral .. |contributors| image:: https://contrib.rocks/image?repo=pymc-devs/pymc :target: https://github.com/pymc-devs/pymc/graphs/contributors .. |Conda Downloads| image:: https://anaconda.org/conda-forge/pymc/badges/downloads.svg :target: https://anaconda.org/conda-forge/pymc