causalml

Uplift modeling and causal inference with machine learning algorithms

5,453

819

5,453

View on GitHub

Top Related Projects

EconML

4,178

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.

Ax

2,476

Adaptive Experimentation Platform

luigi

18,399

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

probability

4,330

Probabilistic reasoning and statistical analysis in TensorFlow

pymc

8,983

Bayesian Modeling and Probabilistic Programming in Python

scikit-learn

62,466

scikit-learn: machine learning in Python

Quick Overview

CausalML is an open-source Python library developed by Uber for causal inference and uplift modeling. It provides a suite of tools for estimating causal effects, including various machine learning-based methods for heterogeneous treatment effect estimation and uplift modeling.

Pros

Comprehensive collection of causal inference methods
Well-documented with extensive examples and tutorials
Actively maintained and regularly updated
Integrates seamlessly with popular data science libraries like pandas and scikit-learn

Cons

Steep learning curve for users unfamiliar with causal inference concepts
Some advanced features may require additional dependencies
Performance can be slow for very large datasets
Limited support for time-series causal inference

Code Examples

Estimating Average Treatment Effect (ATE) using S-Learner:

from causalml.inference.meta import SLearner
from causalml.dataset import synthetic_data

# Generate synthetic data
y, X, treatment, _, _ = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)

# Initialize and fit S-Learner
sl = SLearner(random_state=42)
sl.fit(X, treatment, y)

# Estimate ATE
ate = sl.estimate_ate(X, treatment, y)
print(f"Estimated ATE: {ate:.4f}")

Uplift modeling with Two Models approach:

from causalml.inference.meta import TLearner
from causalml.metrics import plot_gain

# Initialize and fit T-Learner
tl = TLearner(random_state=42)
tl.fit(X, treatment, y)

# Estimate CATE
cate = tl.estimate_ate(X, treatment, y)

# Plot uplift curve
plot_gain(y, treatment, cate, normalize=True)

Interpreting causal effects using SHAP values:

from causalml.inference.meta import XLearner
from causalml.explain import plot_shap_values

# Initialize and fit X-Learner
xl = XLearner(random_state=42)
xl.fit(X, treatment, y)

# Plot SHAP values
plot_shap_values(xl.model_t.estimators_[0], X, plot_type="bar")

Getting Started

To get started with CausalML, follow these steps:

Install the library:

pip install causalml

Import necessary modules and generate sample data:

from causalml.inference.meta import BaseXRegressor
from causalml.dataset import synthetic_data

y, X, treatment, _, _ = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)

Initialize a model, fit it, and estimate treatment effects:

xr = BaseXRegressor(random_state=42)
xr.fit(X, treatment, y)
ate = xr.estimate_ate(X, treatment, y)
print(f"Estimated ATE: {ate:.4f}")

For more detailed examples and usage, refer to the official documentation and tutorials on the CausalML GitHub repository.

Competitor Comparisons

EconML

4,178

Pros of EconML

More comprehensive set of econometric and machine learning methods for causal inference
Better integration with scikit-learn's API and ecosystem
More extensive documentation and examples

Cons of EconML

Steeper learning curve due to more complex API
Less focus on visualization tools compared to CausalML
May be overkill for simpler causal inference tasks

Code Comparison

EconML example:

from econml.dml import LinearDML

model = LinearDML()
model.fit(Y, T, X, W)
treatment_effects = model.effect(X_test)

CausalML example:

from causalml.inference.meta import LRSRegressor

learner = LRSRegressor()
te, lb, ub = learner.estimate_ate(X, y, w)

Both libraries offer powerful tools for causal inference, but EconML provides a wider range of methods and integrates better with scikit-learn. CausalML, on the other hand, offers a simpler API and more built-in visualization tools, making it easier to get started with basic causal inference tasks. The choice between the two depends on the complexity of your project and your familiarity with econometric methods.

Ax

2,476

Adaptive Experimentation Platform

Pros of Ax

More comprehensive platform for adaptive experimentation and optimization
Supports Bayesian optimization and multi-armed bandits
Integrates well with PyTorch for machine learning experiments

Cons of Ax

Steeper learning curve due to more complex features
Less focused on causal inference specifically
May be overkill for simpler A/B testing scenarios

Code Comparison

CausalML example:

from causalml.inference.meta import LRSRegressor
X, y, w = load_data()
learner = LRSRegressor()
te, lb, ub = learner.estimate_ate(X, w, y)

Ax example:

from ax import AxClient
ax_client = AxClient()
ax_client.create_experiment(
    name="experiment",
    parameters=[{"name": "x1", "type": "range", "bounds": [-5.0, 10.0]}],
    objective_name="metric",
)

Summary

CausalML is more focused on causal inference and uplift modeling, while Ax offers a broader suite of tools for experimentation and optimization. CausalML may be easier to use for specific causal inference tasks, but Ax provides more flexibility for complex experimental designs and optimization problems. The choice between them depends on the specific use case and the level of complexity required in the experimentation process.

luigi

18,399

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Pros of Luigi

Designed for building complex data pipelines and workflows
Supports a wide range of data sources and targets
Large and active community with extensive documentation

Cons of Luigi

Steeper learning curve due to its comprehensive feature set
May be overkill for simpler data processing tasks
Less focused on causal inference and machine learning

Code Comparison

Luigi example:

class MyTask(luigi.Task):
    def requires(self):
        return SomeOtherTask()

    def run(self):
        # Process data
        with self.output().open('w') as f:
            f.write(processed_data)

    def output(self):
        return luigi.LocalTarget('output.txt')

CausalML example:

from causalml.inference.meta import LRSRegressor

X, y, w = load_data()
lr = LRSRegressor()
tau = lr.estimate_ate(X, w, y)
print(f'Average Treatment Effect: {tau:.4f}')

Summary

Luigi is a powerful tool for building data pipelines and managing complex workflows, while CausalML focuses specifically on causal inference and machine learning tasks. Luigi offers more flexibility for general data processing but may be more complex to set up. CausalML provides a streamlined approach for causal inference but has a narrower scope. The choice between the two depends on the specific requirements of your project and the complexity of your data processing needs.

probability

4,330

Probabilistic reasoning and statistical analysis in TensorFlow

Pros of TensorFlow Probability

Broader scope for probabilistic modeling and statistical inference
Seamless integration with TensorFlow ecosystem
Extensive documentation and community support

Cons of TensorFlow Probability

Steeper learning curve for users not familiar with TensorFlow
Less focused on causal inference specifically
May be overkill for simpler causal inference tasks

Code Comparison

CausalML example:

from causalml.inference.meta import LRSRegressor
from causalml.inference.meta import XGBTRegressor

lr = LRSRegressor()
xgb = XGBTRegressor()

TensorFlow Probability example:

import tensorflow_probability as tfp
import tensorflow as tf

tfd = tfp.distributions
normal = tfd.Normal(loc=0., scale=1.)

While CausalML focuses on specific causal inference methods, TensorFlow Probability provides a more general framework for probabilistic modeling. CausalML's API is more straightforward for causal inference tasks, while TensorFlow Probability offers greater flexibility but requires more setup for similar tasks.

pymc

8,983

Bayesian Modeling and Probabilistic Programming in Python

Pros of PyMC

More comprehensive probabilistic programming framework, supporting a wider range of statistical models
Extensive documentation and tutorials, making it easier for beginners to get started
Larger and more active community, providing better support and more frequent updates

Cons of PyMC

Steeper learning curve for users not familiar with Bayesian statistics
Can be slower for simpler causal inference tasks compared to CausalML's specialized algorithms
Less focused on causal inference specifically, requiring more user knowledge to implement causal models

Code Comparison

PyMC example (basic linear regression):

import pymc as pm

with pm.Model() as model:
    x = pm.Normal('x', mu=0, sigma=1)
    y = pm.Normal('y', mu=2 * x + 1, sigma=0.5)
    trace = pm.sample(1000)

CausalML example (uplift modeling):

from causalml.inference.meta import XGBTRegressor

uplift_model = XGBTRegressor()
uplift_model.fit(X, treatment, y)
uplift_scores = uplift_model.predict(X)

scikit-learn

62,466

scikit-learn: machine learning in Python

Pros of scikit-learn

Broader scope, covering a wide range of machine learning tasks
Larger community and more extensive documentation
More mature and stable, with frequent updates and improvements

Cons of scikit-learn

Lacks specialized tools for causal inference and uplift modeling
May require more custom code for specific causal analysis tasks
Less focus on interpretability for causal effects

Code Comparison

scikit-learn (general machine learning):

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)

causalml (causal inference):

from causalml.inference.meta import LRSRegressor
from causalml.metrics import ate

learner = LRSRegressor()
te, lb, ub = learner.estimate_ate(X, treatment, y)
print(f'ATE: {ate:.4f} ({lb:.4f}, {ub:.4f})')

While scikit-learn excels in general machine learning tasks, causalml provides specialized tools for causal inference and uplift modeling. scikit-learn offers a broader range of algorithms and techniques, while causalml focuses on estimating treatment effects and causal relationships.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Disclaimer

This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to change.

Causal ML: A Python Package for Uplift Modeling and Causal Inference with ML

Causal ML is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research [1]. It provides a standard interface that allows user to estimate the Conditional Average Treatment Effect (CATE) or Individual Treatment Effect (ITE) from experimental or observational data. Essentially, it estimates the causal impact of intervention T on outcome Y for users with observed features X, without strong assumptions on the model form. Typical use cases include

Campaign targeting optimization: An important lever to increase ROI in an advertising campaign is to target the ad to the set of customers who will have a favorable response in a given KPI such as engagement or sales. CATE identifies these customers by estimating the effect of the KPI from ad exposure at the individual level from A/B experiment or historical observational data.
Personalized engagement: A company has multiple options to interact with its customers such as different product choices in up-sell or messaging channels for communications. One can use CATE to estimate the heterogeneous treatment effect for each customer and treatment option combination for an optimal personalized recommendation system.

Documentation

Documentation is available at:

https://causalml.readthedocs.io/en/latest/about.html

Installation

Installation instructions are available at:

https://causalml.readthedocs.io/en/latest/installation.html

Quickstart

Quickstarts with code-snippets are available at:

https://causalml.readthedocs.io/en/latest/quickstart.html

Example Notebooks

Example notebooks are available at:

https://causalml.readthedocs.io/en/latest/examples.html

Contributing

We welcome community contributors to the project. Before you start, please read our code of conduct and check out contributing guidelines first.

Versioning

We document versions and changes in our changelog.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

References

Documentation

Causal ML API documentation

Workshops, Talks, and Publications

(Workshop) 3rd Workshop on Causal Inference and Machine Learning in Practice at KDD 2025
(Workshop) 2nd Workshop on Causal Inference and Machine Learning in Practice at KDD 2024
(Workshop) Causal Inference and Machine Learning in Practice: Use cases for Product, Brand, Policy and Beyond at KDD 2023
(Talk) Introduction to CausalML at Causal Data Science Meeting 2021
(Talk) Introduction to CausalML at 2021 Conference on Digital Experimentation @ MIT (CODE@MIT)
(Tutorial) Causal Inference and Machine Learning in Practice with EconML and CausalML: Industrial Use Cases at Microsoft, TripAdvisor, Uber at KDD 2021
(Publication) CausalML: Python package for causal machine learning
(Publication) Uplift Modeling for Multiple Treatments with Cost Optimization at 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
(Publication) Feature Selection Methods for Uplift Modeling

Citation

To cite CausalML in publications, you can refer to the following sources:

Whitepaper: CausalML: Python Package for Causal Machine Learning

Bibtex:

@misc{chen2020causalml, title={CausalML: Python Package for Causal Machine Learning}, author={Huigang Chen and Totte Harinen and Jeong-Yoon Lee and Mike Yung and Zhenyu Zhao}, year={2020}, eprint={2002.11631}, archivePrefix={arXiv}, primaryClass={cs.CY} }

Literature

Chen, Huigang, Totte Harinen, Jeong-Yoon Lee, Mike Yung, and Zhenyu Zhao. "Causalml: Python package for causal machine learning." arXiv preprint arXiv:2002.11631 (2020).
Radcliffe, Nicholas J., and Patrick D. Surry. "Real-world uplift modelling with significance-based uplift trees." White Paper TR-2011-1, Stochastic Solutions (2011): 1-33.
Zhao, Yan, Xiao Fang, and David Simchi-Levi. "Uplift modeling with multiple treatments and general response types." Proceedings of the 2017 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2017.
Hansotia, Behram, and Brad Rukstales. "Incremental value modeling." Journal of Interactive Marketing 16.3 (2002): 35-46.
Jannik RÃ¶Ãler, Richard Guse, and Detlef Schoder. "The Best of Two Worlds: Using Recent Advances from Uplift Modeling and Heterogeneous Treatment Effects to Optimize Targeting Policies". International Conference on Information Systems (2022)
Su, Xiaogang, et al. "Subgroup analysis via recursive partitioning." Journal of Machine Learning Research 10.2 (2009).
Su, Xiaogang, et al. "Facilitating score and causal inference trees for large observational studies." Journal of Machine Learning Research 13 (2012): 2955.
Athey, Susan, and Guido Imbens. "Recursive partitioning for heterogeneous causal effects." Proceedings of the National Academy of Sciences 113.27 (2016): 7353-7360.
KÃ¼nzel, SÃ¶ren R., et al. "Metalearners for estimating heterogeneous treatment effects using machine learning." Proceedings of the national academy of sciences 116.10 (2019): 4156-4165.
Nie, Xinkun, and Stefan Wager. "Quasi-oracle estimation of heterogeneous treatment effects." arXiv preprint arXiv:1712.04912 (2017).
Bang, Heejung, and James M. Robins. "Doubly robust estimation in missing data and causal inference models." Biometrics 61.4 (2005): 962-973.
Van Der Laan, Mark J., and Daniel Rubin. "Targeted maximum likelihood learning." The international journal of biostatistics 2.1 (2006).
Kennedy, Edward H. "Optimal doubly robust estimation of heterogeneous causal effects." arXiv preprint arXiv:2004.14497 (2020).
Louizos, Christos, et al. "Causal effect inference with deep latent-variable models." arXiv preprint arXiv:1705.08821 (2017).
Shi, Claudia, David M. Blei, and Victor Veitch. "Adapting neural networks for the estimation of treatment effects." 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), 2019.
Zhao, Zhenyu, Yumin Zhang, Totte Harinen, and Mike Yung. "Feature Selection Methods for Uplift Modeling." arXiv preprint arXiv:2005.03447 (2020).
Zhao, Zhenyu, and Totte Harinen. "Uplift modeling for multiple treatments with cost optimization." In 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 422-431. IEEE, 2019.

Related projects

uplift: uplift models in R
grf: generalized random forests that include heterogeneous treatment effect estimation in R
rlearner: A R package that implements R-Learner
DoWhy: Causal inference in Python based on Judea Pearl's do-calculus
EconML: A Python package that implements heterogeneous treatment effect estimators from econometrics and machine learning methods

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot