stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.

2,679

380

2,679

144

View on GitHub

Top Related Projects

pymc

9,146

Bayesian Modeling and Probabilistic Programming in Python

probability

4,355

Probabilistic reasoning and statistical analysis in TensorFlow

pyro

8,830

Deep universal probabilistic programming with Python and PyTorch

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

Quick Overview

Stan is an open-source probabilistic programming language and platform for statistical modeling, data analysis, and prediction. It provides a powerful and flexible framework for Bayesian inference, allowing users to define complex statistical models and perform efficient parameter estimation using state-of-the-art algorithms.

Pros

Highly expressive language for specifying complex statistical models
Efficient and robust sampling algorithms, including Hamiltonian Monte Carlo (HMC) and No-U-Turn Sampler (NUTS)
Extensive documentation, tutorials, and community support
Interfaces available for multiple programming languages (R, Python, Julia, MATLAB, etc.)

Cons

Steep learning curve for users new to Bayesian statistics or probabilistic programming
Can be computationally intensive for large or complex models
Limited support for certain types of models (e.g., some discrete parameter spaces)
Debugging can be challenging due to the nature of probabilistic programming

Code Examples

Simple linear regression model:

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}

parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}

model {
  y ~ normal(alpha + beta * x, sigma);
}

Logistic regression model:

data {
  int<lower=0> N;
  int<lower=0> K;
  matrix[N, K] X;
  int<lower=0,upper=1> y[N];
}

parameters {
  vector[K] beta;
}

model {
  y ~ bernoulli_logit(X * beta);
}

Hierarchical model:

data {
  int<lower=0> N;
  int<lower=0> J;
  int<lower=1,upper=J> group[N];
  vector[N] y;
}

parameters {
  vector[J] mu;
  real<lower=0> sigma;
  real<lower=0> tau;
  real mu_0;
}

model {
  mu ~ normal(mu_0, tau);
  y ~ normal(mu[group], sigma);
}

Getting Started

Install Stan (instructions vary by platform and interface)
Write your Stan model in a .stan file
Use an interface (e.g., RStan, PyStan) to compile and run the model:

# R example using RStan
library(rstan)

# Compile and fit the model
fit <- stan(file = "model.stan", data = stan_data)

# Examine results
print(fit)
plot(fit)

Competitor Comparisons

pymc

9,146

Bayesian Modeling and Probabilistic Programming in Python

Pros of PyMC

Python-based, integrating seamlessly with the Python ecosystem
User-friendly API, making it more accessible for beginners
Extensive documentation and tutorials available

Cons of PyMC

Generally slower performance compared to Stan
Less flexibility in model specification for complex hierarchical models

Code Comparison

PyMC example:

import pymc as pm

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sigma=1)
    obs = pm.Normal('obs', mu=mu, sigma=1, observed=data)
    trace = pm.sample(1000)

Stan example:

data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real mu;
}
model {
  mu ~ normal(0, 1);
  y ~ normal(mu, 1);
}

Both Stan and PyMC are powerful probabilistic programming frameworks, but they cater to different user bases and have distinct strengths. Stan offers superior performance and flexibility for complex models, while PyMC provides a more accessible entry point for Python users and integrates well with the broader Python data science ecosystem.

probability

4,355

Probabilistic reasoning and statistical analysis in TensorFlow

Pros of TensorFlow Probability

Seamless integration with TensorFlow ecosystem for deep learning and neural networks
Supports GPU acceleration for faster computations on large datasets
Offers a wider range of probabilistic models and distributions

Cons of TensorFlow Probability

Steeper learning curve, especially for those not familiar with TensorFlow
Less focus on Bayesian inference compared to Stan
May be overkill for simpler statistical modeling tasks

Code Comparison

Stan:

parameters {
  real mu;
  real<lower=0> sigma;
}
model {
  y ~ normal(mu, sigma);
}

TensorFlow Probability:

import tensorflow_probability as tfp
tfd = tfp.distributions

model = tfd.Normal(loc=tf.Variable(0.), scale=tf.Variable(1.))
loss = -model.log_prob(y)

Both examples show a simple normal distribution model, but TensorFlow Probability integrates more closely with TensorFlow's computational graph and automatic differentiation system.

pyro

8,830

Deep universal probabilistic programming with Python and PyTorch

Pros of Pyro

Built on PyTorch, allowing seamless integration with deep learning models
Supports dynamic computation graphs, enabling more flexible model structures
Offers a wide range of inference algorithms, including variational inference and MCMC

Cons of Pyro

Less mature and potentially less stable compared to Stan
Smaller community and fewer resources for learning and troubleshooting
May be less efficient for traditional statistical models

Code Comparison

Stan (example of a simple linear regression):

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
model {
  y ~ normal(alpha + beta * x, sigma);
}

Pyro (equivalent linear regression):

import pyro
import torch

def model(x, y):
    alpha = pyro.sample("alpha", pyro.distributions.Normal(0, 10))
    beta = pyro.sample("beta", pyro.distributions.Normal(0, 10))
    sigma = pyro.sample("sigma", pyro.distributions.HalfNormal(10))
    mean = alpha + beta * x
    with pyro.plate("data", len(x)):
        pyro.sample("obs", pyro.distributions.Normal(mean, sigma), obs=y)

dowhy

7,619

Pros of DoWhy

Focused specifically on causal inference and effect estimation
Python-based, making it more accessible for data scientists familiar with Python ecosystems
Provides a unified interface for various causal inference methods

Cons of DoWhy

Less mature and comprehensive compared to Stan's broader statistical modeling capabilities
Smaller community and ecosystem of extensions/tools
More limited in terms of advanced probabilistic modeling features

Code Comparison

Stan (probabilistic modeling):

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
model {
  y ~ normal(alpha + beta * x, sigma);
}

DoWhy (causal inference):

from dowhy import CausalModel
import pandas as pd

data = pd.read_csv("data.csv")
model = CausalModel(
    data=data,
    treatment='treatment',
    outcome='outcome',
    common_causes=['confounding_variable']
)
identified_estimand = model.identify_effect()
estimate = model.estimate_effect(identified_estimand)

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Stan is a C++ package providing

full Bayesian inference using the No-U-Turn sampler (NUTS), a variant of Hamiltonian Monte Carlo (HMC),
approximate Bayesian inference using automatic differentiation variational inference (ADVI), and
penalized maximum likelihood estimation (MLE) using L-BFGS optimization.

It is built on top of the Stan Math library, which provides

a full first- and higher-order automatic differentiation library based on C++ template overloads, and
a supporting fully-templated matrix, linear algebra, and probability special function library.

There are interfaces available in R, Python, MATLAB, Julia, Stata, Mathematica, and for the command line.

Home Page

Stan's home page, with links to everything you'll need to use Stan is:

http://mc-stan.org/

Interfaces

There are separate repositories in the stan-dev GitHub organization for the interfaces, higher-level libraries and lower-level libraries.

Source Repository

Stan's source-code repository is hosted here on GitHub.

Licensing

The Stan math library, core Stan code, and CmdStan are licensed under new BSD. RStan and PyStan are licensed under GPLv3, with other interfaces having other open-source licenses.

Note that the Stan math library depends on the Intel TBB library which is licensed under the Apache 2.0 license. This dependency implies an additional restriction as compared to the new BSD lincense alone. The Apache 2.0 license is incompatible with GPL-2 licensed code if distributed as a unitary binary. You may refer to the Licensing page on the Stan wiki.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot