deepchecks

Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.

3,830

272

3,830

253

View on GitHub

Top Related Projects

alibi

2,540

Algorithms for explaining machine learning models

responsible-ai-toolbox

1,593

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

fairlearn

2,107

A Python package to assess and improve fairness of machine learning models.

AIF360

2,630

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

xai

1,180

XAI - An eXplainability toolbox for machine learning

interpret

6,630

Fit interpretable models. Explain blackbox machine learning.

Quick Overview

Deepchecks is an open-source Python library for testing and validating machine learning models and data. It provides a comprehensive suite of tests to detect issues in data quality, model performance, and model behavior across various stages of the ML lifecycle, from data validation to production monitoring.

Pros

Comprehensive testing: Covers a wide range of checks for data integrity, model performance, and model behavior
Easy integration: Can be easily incorporated into existing ML pipelines and workflows
Customizable: Allows users to create custom checks and suites tailored to specific use cases
Supports multiple ML frameworks: Compatible with popular libraries like scikit-learn, TensorFlow, and PyTorch

Cons

Learning curve: May require some time to understand and effectively utilize all available checks
Performance overhead: Running extensive checks might impact processing time for large datasets
Limited to Python: Not directly usable with other programming languages or environments
Evolving project: As an active open-source project, it may undergo frequent changes and updates

Code Examples

Basic data integrity check:

from deepchecks.tabular import Dataset
from deepchecks.tabular.checks import DataDuplicates

# Load your data into a Deepchecks Dataset
dataset = Dataset(df, label='target_column')

# Run the DataDuplicates check
check = DataDuplicates()
result = check.run(dataset)
result.show()

Model performance check:

from deepchecks.tabular import Dataset
from deepchecks.tabular.checks import PerformanceReport

# Load your data and model
train_dataset = Dataset(train_df, label='target_column')
test_dataset = Dataset(test_df, label='target_column')

# Run the PerformanceReport check
check = PerformanceReport()
result = check.run(train_dataset, test_dataset, model)
result.show()

Custom check creation:

from deepchecks import BaseCheck, CheckResult

class MyCustomCheck(BaseCheck):
    def run(self, dataset):
        # Implement your custom logic here
        # Return a CheckResult object
        return CheckResult(value=result, display=display_obj)

# Use the custom check
custom_check = MyCustomCheck()
result = custom_check.run(dataset)
result.show()

Getting Started

To get started with Deepchecks, follow these steps:

Install the library:
```
pip install deepchecks
```

Import and use Deepchecks in your Python script:

from deepchecks.tabular import Dataset, Suite
from deepchecks.tabular.checks import DataDuplicates, FeatureFeatureCorrelation

# Load your data
dataset = Dataset(df, label='target_column')

# Create a suite of checks
suite = Suite(
    DataDuplicates(),
    FeatureFeatureCorrelation()
)

# Run the suite
result = suite.run(dataset)
result.show()

This basic example demonstrates how to create a dataset, define a suite of checks, and run them on your data.

Competitor Comparisons

alibi

2,540

Algorithms for explaining machine learning models

Pros of Alibi

Focuses on model interpretability and explanation techniques
Provides advanced algorithms for counterfactual explanations and anchor explanations
Supports a wider range of model types, including black-box models

Cons of Alibi

Less comprehensive in terms of data validation and model performance checks
May require more expertise to use effectively due to its focus on advanced explanation techniques
Smaller community and fewer contributors compared to Deepchecks

Code Comparison

Alibi example (counterfactual explanation):

from alibi.explainers import CounterfactualRF

explainer = CounterfactualRF(predict_fn, X_train)
explanation = explainer.explain(X)
print(explanation.cf)

Deepchecks example (data integrity check):

from deepchecks.tabular import Dataset, integrity_checks

ds = Dataset(df, label='target')
suite = integrity_checks()
result = suite.run(ds)
result.show()

Both libraries offer valuable tools for machine learning practitioners, but they serve different purposes. Alibi excels in model interpretability and explanation, while Deepchecks focuses on comprehensive data and model validation checks throughout the ML lifecycle.

responsible-ai-toolbox

1,593

Pros of Responsible AI Toolbox

Comprehensive suite of tools for responsible AI, including interpretability, fairness, and error analysis
Strong integration with Azure Machine Learning and other Microsoft services
Extensive documentation and tutorials for ease of use

Cons of Responsible AI Toolbox

Primarily focused on tabular data, with limited support for other data types
Steeper learning curve due to its broad scope and integration with Microsoft ecosystem
Less frequent updates compared to Deepchecks

Code Comparison

Responsible AI Toolbox:

from raiwidgets import ResponsibleAIDashboard
ResponsibleAIDashboard(model, dataset, true_y, pred_y, 
                       categorical_features=['category'], 
                       task_type='classification')

Deepchecks:

from deepchecks.tabular import Dataset, Suite
dataset = Dataset(df, label='target', cat_features=['category'])
suite = Suite()
suite.run(dataset)

Both libraries offer comprehensive tools for responsible AI, but Responsible AI Toolbox is more tightly integrated with Microsoft services and offers a broader range of features. Deepchecks, on the other hand, is more focused on data validation and model evaluation, with a simpler API and faster iteration cycles. The choice between them depends on specific project requirements and ecosystem preferences.

fairlearn

2,107

A Python package to assess and improve fairness of machine learning models.

Pros of Fairlearn

Focused specifically on fairness and bias mitigation in machine learning
Provides a comprehensive set of fairness metrics and algorithms
Integrates well with popular ML frameworks like scikit-learn

Cons of Fairlearn

Limited scope compared to Deepchecks' broader testing capabilities
Less emphasis on data validation and integrity checks
Smaller community and fewer contributors

Code Comparison

Fairlearn example:

from fairlearn.metrics import demographic_parity_difference
from fairlearn.reductions import DemographicParity

dp = DemographicParity()
mitigator = dp.fit(X, y, sensitive_features=A)
y_pred_mitigated = mitigator.predict(X)

Deepchecks example:

from deepchecks.tabular import Dataset, Suite
from deepchecks.tabular.checks import FeatureDrift

dataset = Dataset(df, label='target', cat_features=['category'])
suite = Suite(FeatureDrift())
result = suite.run(train_dataset=dataset, test_dataset=test_dataset)

Both libraries offer valuable tools for ensuring responsible AI development. Fairlearn excels in fairness-specific metrics and algorithms, while Deepchecks provides a broader range of data and model validation checks. The choice between them depends on the specific needs of the project and the depth of fairness analysis required.

AIF360

2,630

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

Pros of AIF360

Comprehensive toolkit for bias detection and mitigation across the entire machine learning pipeline
Extensive documentation and educational resources, including tutorials and real-world use cases
Strong focus on fairness metrics and algorithms specifically designed for bias mitigation

Cons of AIF360

Steeper learning curve due to its more complex architecture and specialized algorithms
Less emphasis on general model performance and data quality checks compared to Deepchecks
Primarily focused on tabular data, with limited support for other data types

Code Comparison

AIF360:

from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric

dataset = BinaryLabelDataset(df=df, label_name='label', protected_attribute_names=['race'])
metric = BinaryLabelDatasetMetric(dataset, unprivileged_groups=[{'race': 0}], privileged_groups=[{'race': 1}])
print(metric.statistical_parity_difference())

Deepchecks:

from deepchecks.tabular import Dataset
from deepchecks.tabular.checks import FeatureDrift

ds = Dataset(df, label='label', cat_features=['race'])
check = FeatureDrift()
result = check.run(train_dataset=ds, test_dataset=ds_test)
result.show()

xai

1,180

XAI - An eXplainability toolbox for machine learning

Pros of xai

Focuses specifically on ethical AI and responsible machine learning
Provides tools for bias detection and mitigation
Includes a comprehensive framework for explainable AI

Cons of xai

Less active development and community support
More limited in scope compared to deepchecks' broader testing capabilities
Fewer integrations with popular ML frameworks

Code Comparison

xai:

from xai import XAITabularExplainer

explainer = XAITabularExplainer(model, X_train, feature_names=feature_names)
shap_values = explainer.shap_values(X_test)

deepchecks:

from deepchecks.tabular import Dataset, Suite
from deepchecks.tabular.suites import full_suite

ds = Dataset(df, label='target', cat_features=['category'])
suite = full_suite()
result = suite.run(train_dataset=ds, test_dataset=ds)

xai focuses on explainability and ethical considerations, providing tools specifically for these purposes. deepchecks offers a more comprehensive suite of tests for data and model validation, covering a broader range of potential issues. While xai is tailored for responsible AI practices, deepchecks provides a more general-purpose testing framework for machine learning pipelines.

interpret

6,630

Fit interpretable models. Explain blackbox machine learning.

Pros of Interpret

Broader scope of interpretability techniques, including global and local explanations
More extensive documentation and tutorials
Stronger focus on model-agnostic interpretability methods

Cons of Interpret

Less emphasis on data validation and integrity checks
Fewer automated testing and monitoring features
May require more manual configuration for specific use cases

Code Comparison

Interpret example:

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())

from interpret.glassbox import ExplainableBoostingClassifier
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X, y = data.data, data.target

ebm = ExplainableBoostingClassifier()
ebm.fit(X, y)

ebm_global = ebm.explain_global()
ebm_global.visualize()

Deepchecks example:

from deepchecks.tabular import Dataset
from deepchecks.tabular.checks import FeatureDrift
import pandas as pd

train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')

train_ds = Dataset(train_df, label='target')
test_ds = Dataset(test_df, label='target')

check = FeatureDrift()
result = check.run(train_dataset=train_ds, test_dataset=test_ds)
result.show()

Both libraries offer valuable tools for model interpretation and validation, but they focus on different aspects of the machine learning pipeline. Interpret provides a wider range of interpretability techniques, while Deepchecks emphasizes data validation and model monitoring. The choice between them depends on specific project requirements and the stage of the ML lifecycle being addressed.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

pkgVersion

Deepchecks - Continuous Validation for AI & ML: Testing, CI & Monitoring

Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling you to thoroughly test your data and models from research to production.

ð Join Slack | ð Documentation | ð Blog | ð¦ Twitter

ð§© Components

Deepchecks includes:

Deepchecks Testing (Quickstart, docs):
- Running built-in & your own custom Checks and Suites for Tabular, NLP & CV validation (open source).
CI & Testing Management (Quickstart, docs):
- Collaborating over test results and iterating efficiently until model is production-ready and can be deployed (open source & managed offering).
Deepchecks Monitoring (Quickstart, docs):
- Tracking and validating your deployed models behavior when in production (open source & managed offering).

This repo is our main repo as all components use the deepchecks checks in their core. See the Getting Started section for more information about installation and quickstarts for each of the components. If you want to see deepchecks monitoring's code, you can check out the deepchecks/monitoring repo.

ð» Installation

Deepchecks Testing (and CI) Installation

pip install deepchecks -U --user

For installing the nlp / vision submodules or with conda:

For NLP: Replace deepchecks with "deepchecks[nlp]", and optionally install alsodeepchecks[nlp-properties]
For Computer Vision: Replace deepchecks with "deepchecks[vision]".
For installing with conda, similarly use: conda install -c conda-forge deepchecks.

Check out the full installation instructions for deepchecks testing here.

Deepchecks Monitoring Installation

To use deepchecks for production monitoring, you can either use our SaaS service, or deploy a local instance in one line on Linux/MacOS (Windows is WIP!) with Docker. Create a new directory for the installation files, open a terminal within that directory and run the following:

pip install deepchecks-installer
deepchecks-installer install-monitoring

This will automatically download the necessary dependencies, run the installation process and then start the application locally.

The installation will take a few minutes. Then you can open the deployment url (default is http://localhost), and start the system onboarding. Check out the full monitoring open source installation & quickstart.

Note that the open source product is built such that each deployment supports monitoring of a single model.

ðââï¸ Quickstarts

Deepchecks Testing Quickstart

Jump right into the respective quickstart docs:

to have it up and running on your data.

Inside the quickstarts, you'll see how to create the relevant deepchecks object for holding your data and metadata (Dataset, TextData or VisionData, corresponding to the data type), and run a Suite or Check. The code snippet for running it will look something like the following, depending on the chosen Suite or Check.

from deepchecks.tabular.suites import model_evaluation
suite = model_evaluation()
suite_result = suite.run(train_dataset=train_dataset, test_dataset=test_dataset, model=model)
suite_result.save_as_html() # replace this with suite_result.show() or suite_result.show_in_window() to see results inline or in window
# or suite_result.results[0].value with the relevant check index to process the check result's values in python

The output will be a report that enables you to inspect the status and results of the chosen checks:

Deepchecks Monitoring Quickstart

Jump right into the open source monitoring quickstart docs to have it up and running on your data. You'll then be able to see the checks results over time, set alerts, and interact with the dynamic deepchecks UI that looks like this:

Deepchecks CI & Testing Management Quickstart

Deepchecks managed CI & Testing management is currently in closed preview. Book a demo for more information about the offering.

For building and maintaining your own CI process while utilizing Deepchecks Testing for it, check out our docs for Using Deepchecks in CI/CD.

ð§® How does it work?

At its core, deepchecks includes a wide variety of built-in Checks, for testing all types of data and model related issues. These checks are implemented for various models and data types (Tabular, NLP, Vision), and can easily be customized and expanded.

The check results can be used to automatically make informed decisions about your model's production-readiness, and for monitoring it over time in production. The check results can be examined with visual reports (by saving them to an HTML file, or seeing them in Jupyter), processed with code (using their pythonic / json output), and inspected and collaborated on with Deepchecks' dynamic UI (for examining test results and for production monitoring).

â Deepchecks' Core: The Checks

All of the Checks and the framework for customizing them are implemented inside the Deepchecks Testing Python package (this repo).
Each check tests for a specific potential problem. Deepchecks has many pre-implemented checks for finding issues with the model's performance (e.g. identifying weak segments), data distribution (e.g. detect drifts or leakages) and data integrity (e.g. find conflicting labels).
Customizable: each check has many configurable parameters, and custom checks can easily be implemented.
Can be run manually (during research) or triggered automatically (in CI processes or production monitoring)
Check results can be consumed by:
- Visual output report - Saving to HTML(result.save_to_html('output_report_name.html')) or viewing them in Jupyter (result.show()).
- Processing with code - with python using the check result's value attribute, or saving a JSON output
- Deepchecks' UI - for dynamic inspection and collaboration (of test results and production monitoring)
Optional conditions can be added and customized, to automatically validate check results, with a a pass â, fail â or warning ! status
An ordered list of checks (with optional conditions) can be run together in a "Suite" (and the output is a concluding report of all checks that ran)

ð Open Source vs Paid

Deepchecks' projects (deepchecks/deepchecks & deepchecks/monitoring) are open source and are released under AGPL 3.0.

The only exception are the Deepchecks Monitoring components (in the deepchecks/monitoring repo), that are under the (backend/deepchecks_monitoring/ee) directory, that are subject to a commercial license (see the license here). That directory isn't used by default, and is packaged as part of the deepchecks monitoring repository simply to support upgrading to the commercial edition without downtime.

Enabling premium features (contained in the backend/deepchecks_monitoring/ee directory) with a self-hosted instance requires a Deepchecks license. To learn more, book a demo or see our pricing page.

Looking for a ð¯% open-source solution for deepcheck monitoring? Check out the Monitoring OSS repository, which is purged of all proprietary code and features.

ð Community, Contributing, Docs & Support

Deepchecks is an open source solution. We are committed to a transparent development process and highly appreciate any contributions. Whether you are helping us fix bugs, propose new features, improve our documentation or spread the word, we would love to have you as part of our community.

Give us a âï¸ github star âï¸ on the top of this page to support what we're doing, it means a lot for open source projects!
Read our docs for more info about how to use and customize deepchecks, and for step-by-step tutorials.
Post a Github Issue to submit a bug report, feature request, or suggest an improvement.
To contribute to the package, check out our first good issues and contribution guidelines, and open a PR.

Join our Slack to give us feedback, connect with the maintainers and fellow users, ask questions, get help for package usage or contributions, or engage in discussions about ML testing!

â¨ Contributors

Thanks goes to these wonderful people (emoji key):

_{Itay Gabbay} ð» ð ð¤	_matanper ð ð¤ ð»	_JKL98ISR ð¤ ð» ð	_{Yurii Romanyshyn} ð¤ ð» ð	_{Noam Bressler} ð» ð ð¤	_{Nir Hutnik} ð» ð ð¤	_Nadav-Barak ð» ð ð¤
_Sol ð» ð ð¤	_DanArlowski ð» ð	_DBI ð»	_OrlyShmorly ð¨	_shir22 ð¤ ð ð¢	_yaronzo1 ð¤ ð	_ptannor ð¤ ð
_avitzd ð ð¹	_DanBasson ð ð ð¡	_S.Kishore ð» ð ð	_{Shay Palachy-Affek} ð£ ð¡ ð	_{Cemal GURPINAR} ð ð	_{David de la Iglesia Castro} ð»	_{Levi Bard} ð
_{Julien Schuermans} ð	_{Nir Ben-Zvi} ð» ð¤	_{Shiv Shankar Dayal} ð	_RonItay ð ð»	_{Jeroen Van Goey} ð ð	_idow09 ð ð¡	_{Ikko Ashimine} ð
_{Jason Wohlgemuth} ð	_{Lokin Sethia} ð» ð	_{Ingo Marquart} ð» ð	_Oscar ð»	_{Richard W} ð» ð ð¤	_Bernardo ð» ð	_{Olivier Binette} ð» ð ð¤
_éé¼å½¦ ð	_{Andres Vargas} ð	_{Michael Marien} ð ð	_OrdoAbChao ð»	_{Matt Chan} ð»	_{Harsh Jain} ð» ð ð	_arterm-sedov ð
_{AIT ALI YAHIA Rayane} ð» ð¤	_{Chris Santiago} ð ð»