causalnex
A Python library that helps data scientists to infer causation rather than observing correlation.
Top Related Projects
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
Uplift modeling and causal inference with machine learning algorithms
Bayesian Modeling and Probabilistic Programming in Python
Quick Overview
CausalNex is an open-source Python library for causal reasoning and "what-if" analysis using Bayesian Networks. It provides tools for structure learning, inference, and counterfactual analysis, making it easier for data scientists and researchers to perform causal inference tasks.
Pros
- Comprehensive toolkit for causal inference and Bayesian Network analysis
- Supports both structure learning and inference tasks
- Integrates well with popular data science libraries like pandas and scikit-learn
- Provides visualizations for better understanding of causal relationships
Cons
- Steep learning curve for users new to causal inference concepts
- Limited documentation and examples compared to more established libraries
- Performance may be slower for very large datasets
- Requires careful interpretation of results, as causal inference is complex
Code Examples
- Creating a Bayesian Network structure:
from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas
sm = StructureModel()
sm = from_pandas(df)
- Fitting a Bayesian Network:
from causalnex.network import BayesianNetwork
bn = BayesianNetwork(sm)
bn = bn.fit_node_states(df)
bn = bn.fit_cpds(df, method="BayesianEstimator", bayes_prior="K2")
- Performing inference:
from causalnex.inference import InferenceEngine
ie = InferenceEngine(bn)
posterior = ie.query({"feature1": "value1"})
- Visualizing the network:
from causalnex.plots import plot_structure
viz = plot_structure(sm, graph_attributes={"scale": "2"})
viz.view()
Getting Started
To get started with CausalNex, follow these steps:
- Install CausalNex:
pip install causalnex
- Import necessary modules:
import pandas as pd
from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas
from causalnex.network import BayesianNetwork
from causalnex.inference import InferenceEngine
- Load your data and create a Bayesian Network:
df = pd.read_csv("your_data.csv")
sm = from_pandas(df)
bn = BayesianNetwork(sm)
bn = bn.fit_node_states(df)
bn = bn.fit_cpds(df, method="BayesianEstimator", bayes_prior="K2")
- Perform inference:
ie = InferenceEngine(bn)
posterior = ie.query({"feature1": "value1"})
print(posterior)
Competitor Comparisons
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Pros of Luigi
- Robust workflow management system for complex data pipelines
- Extensive ecosystem with many built-in task types and integrations
- Scalable for large-scale data processing tasks
Cons of Luigi
- Steeper learning curve due to its comprehensive feature set
- Less focused on causal inference and probabilistic modeling
- May be overkill for simpler data processing tasks
Code Comparison
Luigi example:
class MyTask(luigi.Task):
def requires(self):
return SomeOtherTask()
def run(self):
# Process data
with self.output().open('w') as f:
f.write(processed_data)
CausalNex example:
from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas
sm = StructureModel()
sm = from_pandas(df)
Key Differences
- Luigi focuses on workflow management and task scheduling
- CausalNex specializes in causal inference and Bayesian network modeling
- Luigi is more suitable for general-purpose data pipelines
- CausalNex is tailored for causal discovery and probabilistic reasoning
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
Pros of DoWhy
- More comprehensive causal inference framework, supporting multiple estimation methods
- Better documentation and tutorials for beginners
- Active community and frequent updates
Cons of DoWhy
- Steeper learning curve due to more complex API
- Less focus on Bayesian networks compared to CausalNex
Code Comparison
DoWhy:
from dowhy import CausalModel
model = CausalModel(
data=data,
treatment=treatment,
outcome=outcome,
graph=graph
)
identified_estimand = model.identify_effect()
estimate = model.estimate_effect(identified_estimand)
CausalNex:
from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas
sm = from_pandas(data)
sm.add_edge(treatment, outcome)
Both libraries offer powerful causal inference capabilities, but DoWhy provides a more comprehensive framework with multiple estimation methods, while CausalNex focuses more on Bayesian networks and structure learning. DoWhy has better documentation and a more active community, but it may have a steeper learning curve. CausalNex offers a simpler API for certain tasks, particularly related to Bayesian networks. The choice between the two depends on the specific requirements of your causal inference project and your familiarity with causal concepts.
Uplift modeling and causal inference with machine learning algorithms
Pros of CausalML
- More comprehensive set of causal inference methods, including meta-learners and uplift modeling techniques
- Better documentation and examples, making it easier for new users to get started
- Active development with frequent updates and contributions from the community
Cons of CausalML
- Steeper learning curve due to the wider range of methods and options available
- Less focus on graphical models and Bayesian networks compared to CausalNex
- Potentially more complex setup and dependencies for some advanced features
Code Comparison
CausalML example:
from causalml.inference.meta import LRSRegressor
from causalml.metrics import get_cumgain
X, y, treatment, tau = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)
lr = LRSRegressor()
tau_hat = lr.estimate_ate(X, treatment, y)
CausalNex example:
from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas
sm = StructureModel()
sm = from_pandas(df)
sm.remove_edges_below_threshold(0.8)
Both libraries offer powerful tools for causal inference, but CausalML provides a broader range of methods and more extensive documentation. CausalNex, on the other hand, excels in graphical modeling and Bayesian network analysis. The choice between them depends on the specific requirements of your causal inference project and your familiarity with different approaches.
Bayesian Modeling and Probabilistic Programming in Python
Pros of PyMC
- More comprehensive probabilistic programming framework
- Larger community and ecosystem with extensive documentation
- Supports a wider range of statistical models and inference methods
Cons of PyMC
- Steeper learning curve for beginners
- Can be slower for simpler causal inference tasks
- Less focused on causal inference specifically
Code Comparison
PyMC example:
import pymc as pm
with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sigma=1)
obs = pm.Normal('obs', mu=mu, sigma=1, observed=data)
trace = pm.sample(1000)
CausalNex example:
from causalnex.structure import StructureModel
from causalnex.structure.notears import from_pandas
sm = StructureModel()
sm = from_pandas(data)
PyMC offers a more flexible and powerful probabilistic modeling approach, while CausalNex provides a more streamlined experience for causal inference tasks. PyMC's code is more verbose but allows for greater customization, whereas CausalNex's code is more concise and focused on causal structure learning.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Theme | Status |
---|---|
Latest Release | |
Python Version | |
master Branch Build | |
develop Branch Build | |
Documentation Build | |
License | |
Code Style |
What is CausalNex?
"A toolkit for causal reasoning with Bayesian Networks."
CausalNex aims to become one of the leading libraries for causal reasoning and "what-if" analysis using Bayesian Networks. It helps to simplify the steps:
- To learn causal structures,
- To allow domain experts to augment the relationships,
- To estimate the effects of potential interventions using data.
Why CausalNex?
CausalNex is built on our collective experience to leverage Bayesian Networks to identify causal relationships in data so that we can develop the right interventions from analytics. We developed CausalNex because:
- We believe leveraging Bayesian Networks is more intuitive to describe causality compared to traditional machine learning methodology that are built on pattern recognition and correlation analysis.
- Causal relationships are more accurate if we can easily encode or augment domain expertise in the graph model.
- We can then use the graph model to assess the impact from changes to underlying features, i.e. counterfactual analysis, and identify the right intervention.
In our experience, a data scientist generally has to use at least 3-4 different open-source libraries before arriving at the final step of finding the right intervention. CausalNex aims to simplify this end-to-end process for causality and counterfactual analysis.
What are the main features of CausalNex?
The main features of this library are:
- Use state-of-the-art structure learning methods to understand conditional dependencies between variables
- Allow domain knowledge to augment model relationship
- Build predictive models based on structural relationships
- Fit probability distribution of the Bayesian Networks
- Evaluate model quality with standard statistical checks
- Simplify how causality is understood in Bayesian Networks through visualisation
- Analyse the impact of interventions using Do-calculus
How do I install CausalNex?
CausalNex is a Python package. To install it, simply run:
pip install causalnex
Use all
for a full installation of dependencies:
pip install "causalnex[all]"
See more detailed installation instructions, including how to setup Python virtual environments, in our installation guide and get started with our tutorial.
How do I use CausalNex?
You can find the documentation for the latest stable release here. It explains:
- An end-to-end tutorial on how to use CausalNex
- The main concepts and methods in using Bayesian Networks for Causal Inference
Note: You can find the notebook and markdown files used to build the docs in
docs/source
.
Can I contribute?
Yes! We'd love you to join us and help us build CausalNex. Check out our contributing documentation.
How do I upgrade CausalNex?
We use SemVer for versioning. The best way to upgrade safely is to check our release notes for any notable breaking changes.
How do I cite CausalNex?
You may click "Cite this repository" under the "About" section of this repository to get the citation information in APA and BibTeX formats.
What licence do you use?
See our LICENSE for more detail.
We're hiring!
Do you want to be part of the team that builds CausalNex and other great products at QuantumBlack? If so, you're in luck! QuantumBlack is currently hiring Machine Learning Engineers who love using data to drive their decisions. Take a look at our open positions and see if you're a fit.
Top Related Projects
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
Uplift modeling and causal inference with machine learning algorithms
Bayesian Modeling and Probabilistic Programming in Python
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot