causalnex

A Python library that helps data scientists to infer causation rather than observing correlation.

2,344

267

2,344

View on GitHub

Top Related Projects

luigi

18,399

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

causalml

5,453

Uplift modeling and causal inference with machine learning algorithms

pymc

8,983

Bayesian Modeling and Probabilistic Programming in Python

Quick Overview

CausalNex is an open-source Python library for causal reasoning and "what-if" analysis using Bayesian Networks. It provides tools for structure learning, inference, and counterfactual analysis, making it easier for data scientists and researchers to perform causal inference tasks.

Pros

Comprehensive toolkit for causal inference and Bayesian Network analysis
Supports both structure learning and inference tasks
Integrates well with popular data science libraries like pandas and scikit-learn
Provides visualizations for better understanding of causal relationships

Cons

Steep learning curve for users new to causal inference concepts
Limited documentation and examples compared to more established libraries
Performance may be slower for very large datasets
Requires careful interpretation of results, as causal inference is complex

Code Examples

Creating a Bayesian Network structure:

from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas

sm = StructureModel()
sm = from_pandas(df)

Fitting a Bayesian Network:

from causalnex.network import BayesianNetwork

bn = BayesianNetwork(sm)
bn = bn.fit_node_states(df)
bn = bn.fit_cpds(df, method="BayesianEstimator", bayes_prior="K2")

Performing inference:

from causalnex.inference import InferenceEngine

ie = InferenceEngine(bn)
posterior = ie.query({"feature1": "value1"})

Visualizing the network:

from causalnex.plots import plot_structure

viz = plot_structure(sm, graph_attributes={"scale": "2"})
viz.view()

Getting Started

To get started with CausalNex, follow these steps:

Install CausalNex:

pip install causalnex

Import necessary modules:

import pandas as pd
from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas
from causalnex.network import BayesianNetwork
from causalnex.inference import InferenceEngine

Load your data and create a Bayesian Network:

df = pd.read_csv("your_data.csv")
sm = from_pandas(df)
bn = BayesianNetwork(sm)
bn = bn.fit_node_states(df)
bn = bn.fit_cpds(df, method="BayesianEstimator", bayes_prior="K2")

Perform inference:

ie = InferenceEngine(bn)
posterior = ie.query({"feature1": "value1"})
print(posterior)

Competitor Comparisons

luigi

18,399

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Pros of Luigi

Robust workflow management system for complex data pipelines
Extensive ecosystem with many built-in task types and integrations
Scalable for large-scale data processing tasks

Cons of Luigi

Steeper learning curve due to its comprehensive feature set
Less focused on causal inference and probabilistic modeling
May be overkill for simpler data processing tasks

Code Comparison

Luigi example:

class MyTask(luigi.Task):
    def requires(self):
        return SomeOtherTask()

    def run(self):
        # Process data
        with self.output().open('w') as f:
            f.write(processed_data)

CausalNex example:

from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas

sm = StructureModel()
sm = from_pandas(df)

Key Differences

Luigi focuses on workflow management and task scheduling
CausalNex specializes in causal inference and Bayesian network modeling
Luigi is more suitable for general-purpose data pipelines
CausalNex is tailored for causal discovery and probabilistic reasoning

dowhy

7,619

Pros of DoWhy

More comprehensive causal inference framework, supporting multiple estimation methods
Better documentation and tutorials for beginners
Active community and frequent updates

Cons of DoWhy

Steeper learning curve due to more complex API
Less focus on Bayesian networks compared to CausalNex

Code Comparison

DoWhy:

from dowhy import CausalModel
model = CausalModel(
    data=data,
    treatment=treatment,
    outcome=outcome,
    graph=graph
)
identified_estimand = model.identify_effect()
estimate = model.estimate_effect(identified_estimand)

CausalNex:

from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas
sm = from_pandas(data)
sm.add_edge(treatment, outcome)

Both libraries offer powerful causal inference capabilities, but DoWhy provides a more comprehensive framework with multiple estimation methods, while CausalNex focuses more on Bayesian networks and structure learning. DoWhy has better documentation and a more active community, but it may have a steeper learning curve. CausalNex offers a simpler API for certain tasks, particularly related to Bayesian networks. The choice between the two depends on the specific requirements of your causal inference project and your familiarity with causal concepts.

causalml

5,453

Uplift modeling and causal inference with machine learning algorithms

Pros of CausalML

More comprehensive set of causal inference methods, including meta-learners and uplift modeling techniques
Better documentation and examples, making it easier for new users to get started
Active development with frequent updates and contributions from the community

Cons of CausalML

Steeper learning curve due to the wider range of methods and options available
Less focus on graphical models and Bayesian networks compared to CausalNex
Potentially more complex setup and dependencies for some advanced features

Code Comparison

CausalML example:

from causalml.inference.meta import LRSRegressor
from causalml.metrics import get_cumgain

X, y, treatment, tau = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)
lr = LRSRegressor()
tau_hat = lr.estimate_ate(X, treatment, y)

CausalNex example:

from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas

sm = StructureModel()
sm = from_pandas(df)
sm.remove_edges_below_threshold(0.8)

Both libraries offer powerful tools for causal inference, but CausalML provides a broader range of methods and more extensive documentation. CausalNex, on the other hand, excels in graphical modeling and Bayesian network analysis. The choice between them depends on the specific requirements of your causal inference project and your familiarity with different approaches.

pymc

8,983

Bayesian Modeling and Probabilistic Programming in Python

Pros of PyMC

More comprehensive probabilistic programming framework
Larger community and ecosystem with extensive documentation
Supports a wider range of statistical models and inference methods

Cons of PyMC

Steeper learning curve for beginners
Can be slower for simpler causal inference tasks
Less focused on causal inference specifically

Code Comparison

PyMC example:

import pymc as pm

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sigma=1)
    obs = pm.Normal('obs', mu=mu, sigma=1, observed=data)
    trace = pm.sample(1000)

CausalNex example:

from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas

sm = StructureModel()
sm = from_pandas(data)

PyMC offers a more flexible and powerful probabilistic modeling approach, while CausalNex provides a more streamlined experience for causal inference tasks. PyMC's code is more verbose but allows for greater customization, whereas CausalNex's code is more concise and focused on causal structure learning.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

CausalNex

Theme	Status
Latest Release
Python Version
`master` Branch Build
`develop` Branch Build
Documentation Build
License
Code Style

What is CausalNex?

"A toolkit for causal reasoning with Bayesian Networks."

CausalNex aims to become one of the leading libraries for causal reasoning and "what-if" analysis using Bayesian Networks. It helps to simplify the steps:

To learn causal structures,
To allow domain experts to augment the relationships,
To estimate the effects of potential interventions using data.

Why CausalNex?

CausalNex is built on our collective experience to leverage Bayesian Networks to identify causal relationships in data so that we can develop the right interventions from analytics. We developed CausalNex because:

We believe leveraging Bayesian Networks is more intuitive to describe causality compared to traditional machine learning methodology that are built on pattern recognition and correlation analysis.
Causal relationships are more accurate if we can easily encode or augment domain expertise in the graph model.
We can then use the graph model to assess the impact from changes to underlying features, i.e. counterfactual analysis, and identify the right intervention.

In our experience, a data scientist generally has to use at least 3-4 different open-source libraries before arriving at the final step of finding the right intervention. CausalNex aims to simplify this end-to-end process for causality and counterfactual analysis.

What are the main features of CausalNex?

The main features of this library are:

Use state-of-the-art structure learning methods to understand conditional dependencies between variables
Allow domain knowledge to augment model relationship
Build predictive models based on structural relationships
Fit probability distribution of the Bayesian Networks
Evaluate model quality with standard statistical checks
Simplify how causality is understood in Bayesian Networks through visualisation
Analyse the impact of interventions using Do-calculus

How do I install CausalNex?

CausalNex is a Python package. To install it, simply run:

pip install causalnex

Use all for a full installation of dependencies:

pip install "causalnex[all]"

See more detailed installation instructions, including how to setup Python virtual environments, in our installation guide and get started with our tutorial.

How do I use CausalNex?

You can find the documentation for the latest stable release here. It explains:

An end-to-end tutorial on how to use CausalNex
The main concepts and methods in using Bayesian Networks for Causal Inference

Note: You can find the notebook and markdown files used to build the docs in docs/source.

Can I contribute?

Yes! We'd love you to join us and help us build CausalNex. Check out our contributing documentation.

How do I upgrade CausalNex?

We use SemVer for versioning. The best way to upgrade safely is to check our release notes for any notable breaking changes.

How do I cite CausalNex?

You may click "Cite this repository" under the "About" section of this repository to get the citation information in APA and BibTeX formats.

What licence do you use?

See our LICENSE for more detail.

We're hiring!

Do you want to be part of the team that builds CausalNex and other great products at QuantumBlack? If so, you're in luck! QuantumBlack is currently hiring Machine Learning Engineers who love using data to drive their decisions. Take a look at our open positions and see if you're a fit.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot