Convert Figma logo to code with AI

uber logocausalml

Uplift modeling and causal inference with machine learning algorithms

5,136
785
5,136
55

Top Related Projects

3,894

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.

2,362

Adaptive Experimentation Platform

17,783

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Probabilistic reasoning and statistical analysis in TensorFlow

8,676

Bayesian Modeling and Probabilistic Programming in Python

scikit-learn: machine learning in Python

Quick Overview

CausalML is an open-source Python library developed by Uber for causal inference and uplift modeling. It provides a suite of tools for estimating causal effects, including various machine learning-based methods for heterogeneous treatment effect estimation and uplift modeling.

Pros

  • Comprehensive collection of causal inference methods
  • Well-documented with extensive examples and tutorials
  • Actively maintained and regularly updated
  • Integrates seamlessly with popular data science libraries like pandas and scikit-learn

Cons

  • Steep learning curve for users unfamiliar with causal inference concepts
  • Some advanced features may require additional dependencies
  • Performance can be slow for very large datasets
  • Limited support for time-series causal inference

Code Examples

  1. Estimating Average Treatment Effect (ATE) using S-Learner:
from causalml.inference.meta import SLearner
from causalml.dataset import synthetic_data

# Generate synthetic data
y, X, treatment, _, _ = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)

# Initialize and fit S-Learner
sl = SLearner(random_state=42)
sl.fit(X, treatment, y)

# Estimate ATE
ate = sl.estimate_ate(X, treatment, y)
print(f"Estimated ATE: {ate:.4f}")
  1. Uplift modeling with Two Models approach:
from causalml.inference.meta import TLearner
from causalml.metrics import plot_gain

# Initialize and fit T-Learner
tl = TLearner(random_state=42)
tl.fit(X, treatment, y)

# Estimate CATE
cate = tl.estimate_ate(X, treatment, y)

# Plot uplift curve
plot_gain(y, treatment, cate, normalize=True)
  1. Interpreting causal effects using SHAP values:
from causalml.inference.meta import XLearner
from causalml.explain import plot_shap_values

# Initialize and fit X-Learner
xl = XLearner(random_state=42)
xl.fit(X, treatment, y)

# Plot SHAP values
plot_shap_values(xl.model_t.estimators_[0], X, plot_type="bar")

Getting Started

To get started with CausalML, follow these steps:

  1. Install the library:
pip install causalml
  1. Import necessary modules and generate sample data:
from causalml.inference.meta import BaseXRegressor
from causalml.dataset import synthetic_data

y, X, treatment, _, _ = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)
  1. Initialize a model, fit it, and estimate treatment effects:
xr = BaseXRegressor(random_state=42)
xr.fit(X, treatment, y)
ate = xr.estimate_ate(X, treatment, y)
print(f"Estimated ATE: {ate:.4f}")

For more detailed examples and usage, refer to the official documentation and tutorials on the CausalML GitHub repository.

Competitor Comparisons

3,894

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.

Pros of EconML

  • More comprehensive set of econometric and machine learning methods for causal inference
  • Better integration with scikit-learn's API and ecosystem
  • More extensive documentation and examples

Cons of EconML

  • Steeper learning curve due to more complex API
  • Less focus on visualization tools compared to CausalML
  • May be overkill for simpler causal inference tasks

Code Comparison

EconML example:

from econml.dml import LinearDML

model = LinearDML()
model.fit(Y, T, X, W)
treatment_effects = model.effect(X_test)

CausalML example:

from causalml.inference.meta import LRSRegressor

learner = LRSRegressor()
te, lb, ub = learner.estimate_ate(X, y, w)

Both libraries offer powerful tools for causal inference, but EconML provides a wider range of methods and integrates better with scikit-learn. CausalML, on the other hand, offers a simpler API and more built-in visualization tools, making it easier to get started with basic causal inference tasks. The choice between the two depends on the complexity of your project and your familiarity with econometric methods.

2,362

Adaptive Experimentation Platform

Pros of Ax

  • More comprehensive platform for adaptive experimentation and optimization
  • Supports Bayesian optimization and multi-armed bandits
  • Integrates well with PyTorch for machine learning experiments

Cons of Ax

  • Steeper learning curve due to more complex features
  • Less focused on causal inference specifically
  • May be overkill for simpler A/B testing scenarios

Code Comparison

CausalML example:

from causalml.inference.meta import LRSRegressor
X, y, w = load_data()
learner = LRSRegressor()
te, lb, ub = learner.estimate_ate(X, w, y)

Ax example:

from ax import AxClient
ax_client = AxClient()
ax_client.create_experiment(
    name="experiment",
    parameters=[{"name": "x1", "type": "range", "bounds": [-5.0, 10.0]}],
    objective_name="metric",
)

Summary

CausalML is more focused on causal inference and uplift modeling, while Ax offers a broader suite of tools for experimentation and optimization. CausalML may be easier to use for specific causal inference tasks, but Ax provides more flexibility for complex experimental designs and optimization problems. The choice between them depends on the specific use case and the level of complexity required in the experimentation process.

17,783

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Pros of Luigi

  • Designed for building complex data pipelines and workflows
  • Supports a wide range of data sources and targets
  • Large and active community with extensive documentation

Cons of Luigi

  • Steeper learning curve due to its comprehensive feature set
  • May be overkill for simpler data processing tasks
  • Less focused on causal inference and machine learning

Code Comparison

Luigi example:

class MyTask(luigi.Task):
    def requires(self):
        return SomeOtherTask()

    def run(self):
        # Process data
        with self.output().open('w') as f:
            f.write(processed_data)

    def output(self):
        return luigi.LocalTarget('output.txt')

CausalML example:

from causalml.inference.meta import LRSRegressor

X, y, w = load_data()
lr = LRSRegressor()
tau = lr.estimate_ate(X, w, y)
print(f'Average Treatment Effect: {tau:.4f}')

Summary

Luigi is a powerful tool for building data pipelines and managing complex workflows, while CausalML focuses specifically on causal inference and machine learning tasks. Luigi offers more flexibility for general data processing but may be more complex to set up. CausalML provides a streamlined approach for causal inference but has a narrower scope. The choice between the two depends on the specific requirements of your project and the complexity of your data processing needs.

Probabilistic reasoning and statistical analysis in TensorFlow

Pros of TensorFlow Probability

  • Broader scope for probabilistic modeling and statistical inference
  • Seamless integration with TensorFlow ecosystem
  • Extensive documentation and community support

Cons of TensorFlow Probability

  • Steeper learning curve for users not familiar with TensorFlow
  • Less focused on causal inference specifically
  • May be overkill for simpler causal inference tasks

Code Comparison

CausalML example:

from causalml.inference.meta import LRSRegressor
from causalml.inference.meta import XGBTRegressor

lr = LRSRegressor()
xgb = XGBTRegressor()

TensorFlow Probability example:

import tensorflow_probability as tfp
import tensorflow as tf

tfd = tfp.distributions
normal = tfd.Normal(loc=0., scale=1.)

While CausalML focuses on specific causal inference methods, TensorFlow Probability provides a more general framework for probabilistic modeling. CausalML's API is more straightforward for causal inference tasks, while TensorFlow Probability offers greater flexibility but requires more setup for similar tasks.

8,676

Bayesian Modeling and Probabilistic Programming in Python

Pros of PyMC

  • More comprehensive probabilistic programming framework, supporting a wider range of statistical models
  • Extensive documentation and tutorials, making it easier for beginners to get started
  • Larger and more active community, providing better support and more frequent updates

Cons of PyMC

  • Steeper learning curve for users not familiar with Bayesian statistics
  • Can be slower for simpler causal inference tasks compared to CausalML's specialized algorithms
  • Less focused on causal inference specifically, requiring more user knowledge to implement causal models

Code Comparison

PyMC example (basic linear regression):

import pymc as pm

with pm.Model() as model:
    x = pm.Normal('x', mu=0, sigma=1)
    y = pm.Normal('y', mu=2 * x + 1, sigma=0.5)
    trace = pm.sample(1000)

CausalML example (uplift modeling):

from causalml.inference.meta import XGBTRegressor

uplift_model = XGBTRegressor()
uplift_model.fit(X, treatment, y)
uplift_scores = uplift_model.predict(X)

scikit-learn: machine learning in Python

Pros of scikit-learn

  • Broader scope, covering a wide range of machine learning tasks
  • Larger community and more extensive documentation
  • More mature and stable, with frequent updates and improvements

Cons of scikit-learn

  • Lacks specialized tools for causal inference and uplift modeling
  • May require more custom code for specific causal analysis tasks
  • Less focus on interpretability for causal effects

Code Comparison

scikit-learn (general machine learning):

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)

causalml (causal inference):

from causalml.inference.meta import LRSRegressor
from causalml.metrics import ate

learner = LRSRegressor()
te, lb, ub = learner.estimate_ate(X, treatment, y)
print(f'ATE: {ate:.4f} ({lb:.4f}, {ub:.4f})')

While scikit-learn excels in general machine learning tasks, causalml provides specialized tools for causal inference and uplift modeling. scikit-learn offers a broader range of algorithms and techniques, while causalml focuses on estimating treatment effects and causal relationships.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README


PyPI Version Build Status Documentation Status Downloads CII Best Practices

Disclaimer

This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to change.

Causal ML: A Python Package for Uplift Modeling and Causal Inference with ML

Causal ML is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research [1]. It provides a standard interface that allows user to estimate the Conditional Average Treatment Effect (CATE) or Individual Treatment Effect (ITE) from experimental or observational data. Essentially, it estimates the causal impact of intervention T on outcome Y for users with observed features X, without strong assumptions on the model form. Typical use cases include

  • Campaign targeting optimization: An important lever to increase ROI in an advertising campaign is to target the ad to the set of customers who will have a favorable response in a given KPI such as engagement or sales. CATE identifies these customers by estimating the effect of the KPI from ad exposure at the individual level from A/B experiment or historical observational data.

  • Personalized engagement: A company has multiple options to interact with its customers such as different product choices in up-sell or messaging channels for communications. One can use CATE to estimate the heterogeneous treatment effect for each customer and treatment option combination for an optimal personalized recommendation system.

Documentation

Documentation is available at:

https://causalml.readthedocs.io/en/latest/about.html

Installation

Installation instructions are available at:

https://causalml.readthedocs.io/en/latest/installation.html

Quickstart

Quickstarts with code-snippets are available at:

https://causalml.readthedocs.io/en/latest/quickstart.html

Example Notebooks

Example notebooks are available at:

https://causalml.readthedocs.io/en/latest/examples.html

Contributing

We welcome community contributors to the project. Before you start, please read our code of conduct and check out contributing guidelines first.

Versioning

We document versions and changes in our changelog.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

References

Documentation

Conference Talks and Publications by CausalML Team

Citation

To cite CausalML in publications, you can refer to the following sources:

Whitepaper: CausalML: Python Package for Causal Machine Learning

Bibtex:

@misc{chen2020causalml, title={CausalML: Python Package for Causal Machine Learning}, author={Huigang Chen and Totte Harinen and Jeong-Yoon Lee and Mike Yung and Zhenyu Zhao}, year={2020}, eprint={2002.11631}, archivePrefix={arXiv}, primaryClass={cs.CY} }

Literature

  1. Chen, Huigang, Totte Harinen, Jeong-Yoon Lee, Mike Yung, and Zhenyu Zhao. "Causalml: Python package for causal machine learning." arXiv preprint arXiv:2002.11631 (2020).
  2. Radcliffe, Nicholas J., and Patrick D. Surry. "Real-world uplift modelling with significance-based uplift trees." White Paper TR-2011-1, Stochastic Solutions (2011): 1-33.
  3. Zhao, Yan, Xiao Fang, and David Simchi-Levi. "Uplift modeling with multiple treatments and general response types." Proceedings of the 2017 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2017.
  4. Hansotia, Behram, and Brad Rukstales. "Incremental value modeling." Journal of Interactive Marketing 16.3 (2002): 35-46.
  5. Jannik Rößler, Richard Guse, and Detlef Schoder. "The Best of Two Worlds: Using Recent Advances from Uplift Modeling and Heterogeneous Treatment Effects to Optimize Targeting Policies". International Conference on Information Systems (2022)
  6. Su, Xiaogang, et al. "Subgroup analysis via recursive partitioning." Journal of Machine Learning Research 10.2 (2009).
  7. Su, Xiaogang, et al. "Facilitating score and causal inference trees for large observational studies." Journal of Machine Learning Research 13 (2012): 2955.
  8. Athey, Susan, and Guido Imbens. "Recursive partitioning for heterogeneous causal effects." Proceedings of the National Academy of Sciences 113.27 (2016): 7353-7360.
  9. Künzel, Sören R., et al. "Metalearners for estimating heterogeneous treatment effects using machine learning." Proceedings of the national academy of sciences 116.10 (2019): 4156-4165.
  10. Nie, Xinkun, and Stefan Wager. "Quasi-oracle estimation of heterogeneous treatment effects." arXiv preprint arXiv:1712.04912 (2017).
  11. Bang, Heejung, and James M. Robins. "Doubly robust estimation in missing data and causal inference models." Biometrics 61.4 (2005): 962-973.
  12. Van Der Laan, Mark J., and Daniel Rubin. "Targeted maximum likelihood learning." The international journal of biostatistics 2.1 (2006).
  13. Kennedy, Edward H. "Optimal doubly robust estimation of heterogeneous causal effects." arXiv preprint arXiv:2004.14497 (2020).
  14. Louizos, Christos, et al. "Causal effect inference with deep latent-variable models." arXiv preprint arXiv:1705.08821 (2017).
  15. Shi, Claudia, David M. Blei, and Victor Veitch. "Adapting neural networks for the estimation of treatment effects." 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), 2019.
  16. Zhao, Zhenyu, Yumin Zhang, Totte Harinen, and Mike Yung. "Feature Selection Methods for Uplift Modeling." arXiv preprint arXiv:2005.03447 (2020).
  17. Zhao, Zhenyu, and Totte Harinen. "Uplift modeling for multiple treatments with cost optimization." In 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 422-431. IEEE, 2019.

Related projects

  • uplift: uplift models in R
  • grf: generalized random forests that include heterogeneous treatment effect estimation in R
  • rlearner: A R package that implements R-Learner
  • DoWhy: Causal inference in Python based on Judea Pearl's do-calculus
  • EconML: A Python package that implements heterogeneous treatment effect estimators from econometrics and machine learning methods