stan
Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
Top Related Projects
Bayesian Modeling and Probabilistic Programming in Python
Probabilistic reasoning and statistical analysis in TensorFlow
Deep universal probabilistic programming with Python and PyTorch
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
Quick Overview
Stan is an open-source probabilistic programming language and platform for statistical modeling, data analysis, and prediction. It provides a powerful and flexible framework for Bayesian inference, allowing users to define complex statistical models and perform efficient parameter estimation using state-of-the-art algorithms.
Pros
- Highly expressive language for specifying complex statistical models
- Efficient and robust sampling algorithms, including Hamiltonian Monte Carlo (HMC) and No-U-Turn Sampler (NUTS)
- Extensive documentation, tutorials, and community support
- Interfaces available for multiple programming languages (R, Python, Julia, MATLAB, etc.)
Cons
- Steep learning curve for users new to Bayesian statistics or probabilistic programming
- Can be computationally intensive for large or complex models
- Limited support for certain types of models (e.g., some discrete parameter spaces)
- Debugging can be challenging due to the nature of probabilistic programming
Code Examples
- Simple linear regression model:
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
y ~ normal(alpha + beta * x, sigma);
}
- Logistic regression model:
data {
int<lower=0> N;
int<lower=0> K;
matrix[N, K] X;
int<lower=0,upper=1> y[N];
}
parameters {
vector[K] beta;
}
model {
y ~ bernoulli_logit(X * beta);
}
- Hierarchical model:
data {
int<lower=0> N;
int<lower=0> J;
int<lower=1,upper=J> group[N];
vector[N] y;
}
parameters {
vector[J] mu;
real<lower=0> sigma;
real<lower=0> tau;
real mu_0;
}
model {
mu ~ normal(mu_0, tau);
y ~ normal(mu[group], sigma);
}
Getting Started
- Install Stan (instructions vary by platform and interface)
- Write your Stan model in a .stan file
- Use an interface (e.g., RStan, PyStan) to compile and run the model:
# R example using RStan
library(rstan)
# Compile and fit the model
fit <- stan(file = "model.stan", data = stan_data)
# Examine results
print(fit)
plot(fit)
Competitor Comparisons
Bayesian Modeling and Probabilistic Programming in Python
Pros of PyMC
- Python-based, integrating seamlessly with the Python ecosystem
- User-friendly API, making it more accessible for beginners
- Extensive documentation and tutorials available
Cons of PyMC
- Generally slower performance compared to Stan
- Less flexibility in model specification for complex hierarchical models
Code Comparison
PyMC example:
import pymc as pm
with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sigma=1)
obs = pm.Normal('obs', mu=mu, sigma=1, observed=data)
trace = pm.sample(1000)
Stan example:
data {
int<lower=0> N;
vector[N] y;
}
parameters {
real mu;
}
model {
mu ~ normal(0, 1);
y ~ normal(mu, 1);
}
Both Stan and PyMC are powerful probabilistic programming frameworks, but they cater to different user bases and have distinct strengths. Stan offers superior performance and flexibility for complex models, while PyMC provides a more accessible entry point for Python users and integrates well with the broader Python data science ecosystem.
Probabilistic reasoning and statistical analysis in TensorFlow
Pros of TensorFlow Probability
- Seamless integration with TensorFlow ecosystem for deep learning and neural networks
- Supports GPU acceleration for faster computations on large datasets
- Offers a wider range of probabilistic models and distributions
Cons of TensorFlow Probability
- Steeper learning curve, especially for those not familiar with TensorFlow
- Less focus on Bayesian inference compared to Stan
- May be overkill for simpler statistical modeling tasks
Code Comparison
Stan:
parameters {
real mu;
real<lower=0> sigma;
}
model {
y ~ normal(mu, sigma);
}
TensorFlow Probability:
import tensorflow_probability as tfp
tfd = tfp.distributions
model = tfd.Normal(loc=tf.Variable(0.), scale=tf.Variable(1.))
loss = -model.log_prob(y)
Both examples show a simple normal distribution model, but TensorFlow Probability integrates more closely with TensorFlow's computational graph and automatic differentiation system.
Deep universal probabilistic programming with Python and PyTorch
Pros of Pyro
- Built on PyTorch, allowing seamless integration with deep learning models
- Supports dynamic computation graphs, enabling more flexible model structures
- Offers a wide range of inference algorithms, including variational inference and MCMC
Cons of Pyro
- Less mature and potentially less stable compared to Stan
- Smaller community and fewer resources for learning and troubleshooting
- May be less efficient for traditional statistical models
Code Comparison
Stan (example of a simple linear regression):
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
y ~ normal(alpha + beta * x, sigma);
}
Pyro (equivalent linear regression):
import pyro
import torch
def model(x, y):
alpha = pyro.sample("alpha", pyro.distributions.Normal(0, 10))
beta = pyro.sample("beta", pyro.distributions.Normal(0, 10))
sigma = pyro.sample("sigma", pyro.distributions.HalfNormal(10))
mean = alpha + beta * x
with pyro.plate("data", len(x)):
pyro.sample("obs", pyro.distributions.Normal(mean, sigma), obs=y)
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
Pros of DoWhy
- Focused specifically on causal inference and effect estimation
- Python-based, making it more accessible for data scientists familiar with Python ecosystems
- Provides a unified interface for various causal inference methods
Cons of DoWhy
- Less mature and comprehensive compared to Stan's broader statistical modeling capabilities
- Smaller community and ecosystem of extensions/tools
- More limited in terms of advanced probabilistic modeling features
Code Comparison
Stan (probabilistic modeling):
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
y ~ normal(alpha + beta * x, sigma);
}
DoWhy (causal inference):
from dowhy import CausalModel
import pandas as pd
data = pd.read_csv("data.csv")
model = CausalModel(
data=data,
treatment='treatment',
outcome='outcome',
common_causes=['confounding_variable']
)
identified_estimand = model.identify_effect()
estimate = model.estimate_effect(identified_estimand)
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Stan is a C++ package providing
- full Bayesian inference using the No-U-Turn sampler (NUTS), a variant of Hamiltonian Monte Carlo (HMC),
- approximate Bayesian inference using automatic differentiation variational inference (ADVI), and
- penalized maximum likelihood estimation (MLE) using L-BFGS optimization.
It is built on top of the Stan Math library, which provides
- a full first- and higher-order automatic differentiation library based on C++ template overloads, and
- a supporting fully-templated matrix, linear algebra, and probability special function library.
There are interfaces available in R, Python, MATLAB, Julia, Stata, Mathematica, and for the command line.
Home Page
Stan's home page, with links to everything you'll need to use Stan is:
Interfaces
There are separate repositories in the stan-dev GitHub organization for the interfaces, higher-level libraries and lower-level libraries.
Source Repository
Stan's source-code repository is hosted here on GitHub.
Licensing
The Stan math library, core Stan code, and CmdStan are licensed under new BSD. RStan and PyStan are licensed under GPLv3, with other interfaces having other open-source licenses.
Note that the Stan math library depends on the Intel TBB library which is licensed under the Apache 2.0 license. This dependency implies an additional restriction as compared to the new BSD lincense alone. The Apache 2.0 license is incompatible with GPL-2 licensed code if distributed as a unitary binary. You may refer to the Licensing page on the Stan wiki.
Top Related Projects
Bayesian Modeling and Probabilistic Programming in Python
Probabilistic reasoning and statistical analysis in TensorFlow
Deep universal probabilistic programming with Python and PyTorch
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot