Convert Figma logo to code with AI

mckinsey logocausalnex

A Python library that helps data scientists to infer causation rather than observing correlation.

2,211
256
2,211
33

Top Related Projects

17,681

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

6,997

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

Uplift modeling and causal inference with machine learning algorithms

8,623

Bayesian Modeling and Probabilistic Programming in Python

Quick Overview

CausalNex is an open-source Python library for causal reasoning and "what-if" analysis using Bayesian Networks. It provides tools for structure learning, inference, and counterfactual analysis, making it easier for data scientists and researchers to perform causal inference tasks.

Pros

  • Comprehensive toolkit for causal inference and Bayesian Network analysis
  • Supports both structure learning and inference tasks
  • Integrates well with popular data science libraries like pandas and scikit-learn
  • Provides visualizations for better understanding of causal relationships

Cons

  • Steep learning curve for users new to causal inference concepts
  • Limited documentation and examples compared to more established libraries
  • Performance may be slower for very large datasets
  • Requires careful interpretation of results, as causal inference is complex

Code Examples

  1. Creating a Bayesian Network structure:
from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas

sm = StructureModel()
sm = from_pandas(df)
  1. Fitting a Bayesian Network:
from causalnex.network import BayesianNetwork

bn = BayesianNetwork(sm)
bn = bn.fit_node_states(df)
bn = bn.fit_cpds(df, method="BayesianEstimator", bayes_prior="K2")
  1. Performing inference:
from causalnex.inference import InferenceEngine

ie = InferenceEngine(bn)
posterior = ie.query({"feature1": "value1"})
  1. Visualizing the network:
from causalnex.plots import plot_structure

viz = plot_structure(sm, graph_attributes={"scale": "2"})
viz.view()

Getting Started

To get started with CausalNex, follow these steps:

  1. Install CausalNex:
pip install causalnex
  1. Import necessary modules:
import pandas as pd
from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas
from causalnex.network import BayesianNetwork
from causalnex.inference import InferenceEngine
  1. Load your data and create a Bayesian Network:
df = pd.read_csv("your_data.csv")
sm = from_pandas(df)
bn = BayesianNetwork(sm)
bn = bn.fit_node_states(df)
bn = bn.fit_cpds(df, method="BayesianEstimator", bayes_prior="K2")
  1. Perform inference:
ie = InferenceEngine(bn)
posterior = ie.query({"feature1": "value1"})
print(posterior)

Competitor Comparisons

17,681

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Pros of Luigi

  • Robust workflow management system for complex data pipelines
  • Extensive ecosystem with many built-in task types and integrations
  • Scalable for large-scale data processing tasks

Cons of Luigi

  • Steeper learning curve due to its comprehensive feature set
  • Less focused on causal inference and probabilistic modeling
  • May be overkill for simpler data processing tasks

Code Comparison

Luigi example:

class MyTask(luigi.Task):
    def requires(self):
        return SomeOtherTask()

    def run(self):
        # Process data
        with self.output().open('w') as f:
            f.write(processed_data)

CausalNex example:

from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas

sm = StructureModel()
sm = from_pandas(df)

Key Differences

  • Luigi focuses on workflow management and task scheduling
  • CausalNex specializes in causal inference and Bayesian network modeling
  • Luigi is more suitable for general-purpose data pipelines
  • CausalNex is tailored for causal discovery and probabilistic reasoning
6,997

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

Pros of DoWhy

  • More comprehensive causal inference framework, supporting multiple estimation methods
  • Better documentation and tutorials for beginners
  • Active community and frequent updates

Cons of DoWhy

  • Steeper learning curve due to more complex API
  • Less focus on Bayesian networks compared to CausalNex

Code Comparison

DoWhy:

from dowhy import CausalModel
model = CausalModel(
    data=data,
    treatment=treatment,
    outcome=outcome,
    graph=graph
)
identified_estimand = model.identify_effect()
estimate = model.estimate_effect(identified_estimand)

CausalNex:

from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas
sm = from_pandas(data)
sm.add_edge(treatment, outcome)

Both libraries offer powerful causal inference capabilities, but DoWhy provides a more comprehensive framework with multiple estimation methods, while CausalNex focuses more on Bayesian networks and structure learning. DoWhy has better documentation and a more active community, but it may have a steeper learning curve. CausalNex offers a simpler API for certain tasks, particularly related to Bayesian networks. The choice between the two depends on the specific requirements of your causal inference project and your familiarity with causal concepts.

Uplift modeling and causal inference with machine learning algorithms

Pros of CausalML

  • More comprehensive set of causal inference methods, including meta-learners and uplift modeling techniques
  • Better documentation and examples, making it easier for new users to get started
  • Active development with frequent updates and contributions from the community

Cons of CausalML

  • Steeper learning curve due to the wider range of methods and options available
  • Less focus on graphical models and Bayesian networks compared to CausalNex
  • Potentially more complex setup and dependencies for some advanced features

Code Comparison

CausalML example:

from causalml.inference.meta import LRSRegressor
from causalml.metrics import get_cumgain

X, y, treatment, tau = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)
lr = LRSRegressor()
tau_hat = lr.estimate_ate(X, treatment, y)

CausalNex example:

from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas

sm = StructureModel()
sm = from_pandas(df)
sm.remove_edges_below_threshold(0.8)

Both libraries offer powerful tools for causal inference, but CausalML provides a broader range of methods and more extensive documentation. CausalNex, on the other hand, excels in graphical modeling and Bayesian network analysis. The choice between them depends on the specific requirements of your causal inference project and your familiarity with different approaches.

8,623

Bayesian Modeling and Probabilistic Programming in Python

Pros of PyMC

  • More comprehensive probabilistic programming framework
  • Larger community and ecosystem with extensive documentation
  • Supports a wider range of statistical models and inference methods

Cons of PyMC

  • Steeper learning curve for beginners
  • Can be slower for simpler causal inference tasks
  • Less focused on causal inference specifically

Code Comparison

PyMC example:

import pymc as pm

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sigma=1)
    obs = pm.Normal('obs', mu=mu, sigma=1, observed=data)
    trace = pm.sample(1000)

CausalNex example:

from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas

sm = StructureModel()
sm = from_pandas(data)

PyMC offers a more flexible and powerful probabilistic modeling approach, while CausalNex provides a more streamlined experience for causal inference tasks. PyMC's code is more verbose but allows for greater customization, whereas CausalNex's code is more concise and focused on causal structure learning.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

CausalNex


ThemeStatus
Latest ReleasePyPI version
Python VersionPython Version
master Branch BuildCircleCI
develop Branch BuildCircleCI
Documentation BuildDocumentation
LicenseLicense
Code StyleCode Style: Black

What is CausalNex?

"A toolkit for causal reasoning with Bayesian Networks."

CausalNex aims to become one of the leading libraries for causal reasoning and "what-if" analysis using Bayesian Networks. It helps to simplify the steps:

  • To learn causal structures,
  • To allow domain experts to augment the relationships,
  • To estimate the effects of potential interventions using data.

Why CausalNex?

CausalNex is built on our collective experience to leverage Bayesian Networks to identify causal relationships in data so that we can develop the right interventions from analytics. We developed CausalNex because:

  • We believe leveraging Bayesian Networks is more intuitive to describe causality compared to traditional machine learning methodology that are built on pattern recognition and correlation analysis.
  • Causal relationships are more accurate if we can easily encode or augment domain expertise in the graph model.
  • We can then use the graph model to assess the impact from changes to underlying features, i.e. counterfactual analysis, and identify the right intervention.

In our experience, a data scientist generally has to use at least 3-4 different open-source libraries before arriving at the final step of finding the right intervention. CausalNex aims to simplify this end-to-end process for causality and counterfactual analysis.

What are the main features of CausalNex?

The main features of this library are:

  • Use state-of-the-art structure learning methods to understand conditional dependencies between variables
  • Allow domain knowledge to augment model relationship
  • Build predictive models based on structural relationships
  • Fit probability distribution of the Bayesian Networks
  • Evaluate model quality with standard statistical checks
  • Simplify how causality is understood in Bayesian Networks through visualisation
  • Analyse the impact of interventions using Do-calculus

How do I install CausalNex?

CausalNex is a Python package. To install it, simply run:

pip install causalnex

Use all for a full installation of dependencies:

pip install "causalnex[all]"

See more detailed installation instructions, including how to setup Python virtual environments, in our installation guide and get started with our tutorial.

How do I use CausalNex?

You can find the documentation for the latest stable release here. It explains:

Note: You can find the notebook and markdown files used to build the docs in docs/source.

Can I contribute?

Yes! We'd love you to join us and help us build CausalNex. Check out our contributing documentation.

How do I upgrade CausalNex?

We use SemVer for versioning. The best way to upgrade safely is to check our release notes for any notable breaking changes.

How do I cite CausalNex?

You may click "Cite this repository" under the "About" section of this repository to get the citation information in APA and BibTeX formats.

What licence do you use?

See our LICENSE for more detail.

We're hiring!

Do you want to be part of the team that builds CausalNex and other great products at QuantumBlack? If so, you're in luck! QuantumBlack is currently hiring Machine Learning Engineers who love using data to drive their decisions. Take a look at our open positions and see if you're a fit.