Convert Figma logo to code with AI

evidentlyai logoevidently

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.

5,241
592
5,241
192

Top Related Projects

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

18,503

Open source platform for the machine learning lifecycle

Algorithms for outlier, adversarial and drift detection

23,213

A game theoretic approach to explain the output of any machine learning model.

Fit interpretable models. Explain blackbox machine learning.

1,622

Interpretability and explainability of data and machine learning models

Quick Overview

Evidently AI is an open-source library that provides a set of tools for monitoring and evaluating machine learning models in production. It helps data scientists and machine learning engineers to automatically generate reports and dashboards to monitor model performance, data drift, and other key metrics.

Pros

  • Comprehensive Monitoring: Evidently AI offers a wide range of monitoring capabilities, including model performance, data drift, and data quality checks.
  • Automated Reporting: The library can automatically generate detailed reports and dashboards, saving time and effort for data teams.
  • Customizable Checks: Users can define custom checks and metrics to suit their specific needs.
  • Easy Integration: Evidently AI can be easily integrated into existing machine learning workflows and pipelines.

Cons

  • Limited Deployment Options: The library currently only supports deployment as a Python package, which may not be suitable for all use cases.
  • Steep Learning Curve: Evidently AI has a relatively complex API and may require some time to get familiar with.
  • Dependency on Other Libraries: The library relies on several other Python packages, which can increase the complexity of the setup process.
  • Limited Community Support: Compared to some other popular machine learning libraries, Evidently AI has a smaller community and may have fewer resources available.

Code Examples

Here are a few examples of how to use Evidently AI:

  1. Generating a Model Performance Report:
from evidently.report import Report
from evidently.metrics import PerformanceMetrics

report = Report(metrics=[PerformanceMetrics()])
report.run(reference_data=X_ref, production_data=X_prod, target_ref=y_ref, target_prod=y_prod)
report.save("performance_report.html")

This code generates a comprehensive report on the performance of a machine learning model, including metrics like accuracy, precision, recall, and F1-score.

  1. Detecting Data Drift:
from evidently.report import Report
from evidently.metrics import DataDriftMetrics

report = Report(metrics=[DataDriftMetrics()])
report.run(reference_data=X_ref, production_data=X_prod)
report.save("data_drift_report.html")

This code generates a report on the data drift between the reference and production datasets, which can help identify potential issues with model performance.

  1. Customizing Checks:
from evidently.report import Report
from evidently.metrics import PerformanceMetrics, DataDriftMetrics
from evidently.checks import Check, CheckConfig

custom_check = Check(
    name="Custom Check",
    type="value_drift",
    config=CheckConfig(
        metric_name="mean",
        max_drift=0.1
    )
)

report = Report(metrics=[PerformanceMetrics(), DataDriftMetrics()], checks=[custom_check])
report.run(reference_data=X_ref, production_data=X_prod, target_ref=y_ref, target_prod=y_prod)
report.save("custom_report.html")

This code demonstrates how to define a custom check to monitor a specific metric, in this case, the mean value drift between the reference and production datasets.

Getting Started

To get started with Evidently AI, follow these steps:

  1. Install the library using pip:
pip install evidently
  1. Import the necessary modules and define your reference and production datasets:
from evidently.report import Report
from evidently.metrics import PerformanceMetrics, DataDriftMetrics

X_ref, y_ref = get_reference_data()
X_prod, y_prod = get_production_data()
  1. Generate a report and save it to an HTML file:
report = Report(metrics=[PerformanceMetrics(), DataDriftMetrics()])
report.run(reference_data=X_ref, production_data=X_prod, target_ref=y_ref, target_prod=y_prod)
report.save("report.html")
  1. Open the generated HTML report in a web browser to view the results.

That's it! You can now start using Evidently AI to monitor and evaluate your machine learning models

Competitor Comparisons

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

Pros of Responsible AI Toolbox

  • Comprehensive suite of tools for responsible AI, including interpretability, fairness, and error analysis
  • Integrates well with Azure Machine Learning and other Microsoft services
  • Offers both GUI and programmatic interfaces for accessibility

Cons of Responsible AI Toolbox

  • Primarily focused on tabular data, with limited support for other data types
  • Steeper learning curve due to its extensive feature set
  • Requires more setup and configuration compared to Evidently

Code Comparison

Responsible AI Toolbox:

from raiwidgets import ResponsibleAIDashboard
ResponsibleAIDashboard(model, dataset, true_y, pred_y, 
                       categorical_features=['category'], 
                       task_type='classification')

Evidently:

from evidently import ColumnMapping
from evidently.report import ClassificationPerformanceReport
report = ClassificationPerformanceReport(column_mapping=ColumnMapping(
    target='target', prediction='prediction', numerical_features=['feature1']))
report.run(reference_data=ref_data, current_data=current_data)

Both tools offer powerful capabilities for analyzing and improving AI models, but they cater to different use cases. Responsible AI Toolbox provides a more comprehensive suite of tools with tighter integration into the Microsoft ecosystem, while Evidently offers a more lightweight and flexible approach to model monitoring and analysis.

18,503

Open source platform for the machine learning lifecycle

Pros of MLflow

  • Comprehensive end-to-end ML lifecycle management, including experiment tracking, model packaging, and deployment
  • Integrates well with various ML frameworks and tools, offering a unified platform for diverse ML workflows
  • Provides a user-friendly UI for experiment tracking and model comparison

Cons of MLflow

  • Steeper learning curve due to its broader scope and feature set
  • May be overkill for smaller projects or teams focused primarily on model monitoring

Code Comparison

MLflow:

import mlflow

mlflow.start_run()
mlflow.log_param("param1", value1)
mlflow.log_metric("metric1", value2)
mlflow.end_run()

Evidently:

from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=ref_data, current_data=curr_data, column_mapping=column_mapping)

MLflow offers a more general-purpose logging API for tracking experiments, while Evidently focuses specifically on generating data quality and model performance reports. MLflow is better suited for managing the entire ML lifecycle, whereas Evidently excels in detailed model monitoring and data drift detection.

Algorithms for outlier, adversarial and drift detection

Pros of Alibi Detect

  • Focuses on drift detection and outlier detection in machine learning models
  • Provides advanced algorithms for detecting concept drift and data drift
  • Supports both batch and online drift detection scenarios

Cons of Alibi Detect

  • Steeper learning curve due to more complex algorithms and concepts
  • Less emphasis on data quality and model performance monitoring
  • Requires more setup and configuration for basic use cases

Code Comparison

Alibi Detect:

from alibi_detect.cd import TabularDrift

cd = TabularDrift(X_ref, p_val=.05, categories_per_feature=categories_per_feature)
preds = cd.predict(X)

Evidently:

from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=X_ref, current_data=X, column_mapping=column_mapping)

Both libraries offer drift detection capabilities, but Alibi Detect provides more advanced algorithms and configurations, while Evidently focuses on simplicity and ease of use for basic monitoring tasks.

23,213

A game theoretic approach to explain the output of any machine learning model.

Pros of SHAP

  • Focuses on model interpretability with advanced techniques like Shapley values
  • Provides detailed feature importance and impact analysis
  • Supports a wide range of machine learning models and frameworks

Cons of SHAP

  • Primarily centered on model explanation, not comprehensive ML monitoring
  • Can be computationally intensive for large datasets or complex models
  • Steeper learning curve for users new to Shapley values and model interpretation

Code Comparison

SHAP example:

import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
shap.summary_plot(shap_values, X)

Evidently example:

from evidently.dashboard import Dashboard
from evidently.tabs import DataDriftTab, CatTargetDriftTab
dashboard = Dashboard(tabs=[DataDriftTab(), CatTargetDriftTab()])
dashboard.calculate(reference_data, production_data, column_mapping=column_mapping)
dashboard.save("drift_report.html")

SHAP excels in detailed model interpretation, while Evidently offers a broader suite of ML monitoring tools, including data drift detection and model performance tracking. SHAP is ideal for in-depth feature analysis, whereas Evidently provides a more comprehensive approach to ML model monitoring and reporting.

Fit interpretable models. Explain blackbox machine learning.

Pros of Interpret

  • Broader scope of interpretability techniques, including global and local explanations
  • Supports a wider range of machine learning models, including tree-based models and neural networks
  • Offers interactive visualizations for exploring model behavior

Cons of Interpret

  • Steeper learning curve due to its more comprehensive feature set
  • Less focused on data drift and model monitoring compared to Evidently
  • May require more computational resources for complex models and large datasets

Code Comparison

Interpret:

from interpret import show
from interpret.glassbox import ExplainableBoostingClassifier

ebm = ExplainableBoostingClassifier()
ebm.fit(X_train, y_train)
ebm_global = ebm.explain_global()
show(ebm_global)

Evidently:

from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=reference, current_data=current, column_mapping=column_mapping)
report.save_html("data_drift_report.html")

Both libraries offer valuable tools for model interpretation and monitoring, with Interpret providing a more comprehensive set of interpretability techniques, while Evidently focuses on data drift and model performance monitoring.

1,622

Interpretability and explainability of data and machine learning models

Pros of AIX360

  • Comprehensive suite of explainability algorithms for various AI models
  • Supports both local and global explanations for model interpretability
  • Includes educational resources and tutorials for understanding AI explainability

Cons of AIX360

  • Steeper learning curve due to its broader scope and complexity
  • Less focused on data drift and model performance monitoring
  • Requires more setup and configuration for specific use cases

Code Comparison

AIX360:

from aix360.algorithms.protodash import ProtodashExplainer

explainer = ProtodashExplainer()
explanation = explainer.explain(X_train, X_test, k=5)

Evidently:

from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=ref_data, current_data=cur_data, column_mapping=column_mapping)

AIX360 focuses on generating explanations for AI models, while Evidently specializes in data and model monitoring, including drift detection. AIX360 offers a wider range of explainability techniques, but Evidently provides more straightforward tools for ongoing model performance evaluation and data quality checks.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Evidently

An open-source framework to evaluate, test and monitor ML and LLM-powered systems.

PyPi Downloads License PyPi

Evidently

Documentation | Discord Community | Blog | Twitter | Evidently Cloud

:new: New release

Evidently 0.4.25. LLM evaluation -> Tutorial

:bar_chart: What is Evidently?

Evidently is an open-source Python library for ML and LLM evaluation and observability. It helps evaluate, test, and monitor AI-powered systems and data pipelines from experimentation to production. 

  • 🔡 Works with tabular, text data, and embeddings.
  • ✨ Supports predictive and generative systems, from classification to RAG.
  • 📚 100+ built-in metrics from data drift detection to LLM judges.
  • 🛠️ Python interface for custom metrics and tests. 
  • 🚦 Both offline evals and live monitoring.
  • 💻 Open architecture: easily export data and integrate with existing tools. 

Evidently is very modular. You can start with one-off evaluations using Reports or Test Suites in Python or get a real-time monitoring Dashboard service.

1. Reports

Reports compute various data, ML and LLM quality metrics. You can start with Presets or customize.

  • Out-of-the-box interactive visuals.
  • Best for exploratory analysis and debugging.
  • Get results in Python, export as JSON, Python dictionary, HTML, DataFrame, or view in monitoring UI.
Reports
Report example

2. Test Suites

Test Suites check for defined conditions on metric values and return a pass or fail result.

  • Best for regression testing, CI/CD checks, or data validation pipelines.
  • Zero setup option: auto-generate test conditions from the reference dataset.
  • Simple syntax to set custom test conditions as gt (greater than), lt (less than), etc.
  • Get results in Python, export as JSON, Python dictionary, HTML, DataFrame, or view in monitoring UI.
Test Suite
Test example

3. Monitoring Dashboard

Monitoring UI service helps visualize metrics and test results over time.

You can choose:

Evidently Cloud offers a generous free tier and extra features like user management, alerting, and no-code evals.

Dashboard
Dashboard example

:woman_technologist: Install Evidently

Evidently is available as a PyPI package. To install it using pip package manager, run:

pip install evidently

To install Evidently using conda installer, run:

conda install -c conda-forge evidently

:arrow_forward: Getting started

Option 1: Test Suites

This is a simple Hello World. Check the Tutorials for more: Tabular data or LLM evaluation.

Import the Test Suite, evaluation Preset and toy tabular dataset.

import pandas as pd

from sklearn import datasets

from evidently.test_suite import TestSuite
from evidently.test_preset import DataStabilityTestPreset

iris_data = datasets.load_iris(as_frame=True)
iris_frame = iris_data.frame

Split the DataFrame into reference and current. Run the Data Stability Test Suite that will automatically generate checks on column value ranges, missing values, etc. from the reference. Get the output in Jupyter notebook:

data_stability= TestSuite(tests=[
    DataStabilityTestPreset(),
])
data_stability.run(current_data=iris_frame.iloc[:60], reference_data=iris_frame.iloc[60:], column_mapping=None)
data_stability

You can also save an HTML file. You'll need to open it from the destination folder.

data_stability.save_html("file.html")

To get the output as JSON:

data_stability.json()

You can choose other Presets, individual Tests and set conditions.

Option 2: Reports

Import the Report, evaluation Preset and toy tabular dataset.

import pandas as pd

from sklearn import datasets

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

iris_data = datasets.load_iris(as_frame=True)
iris_frame = iris_data.frame

Run the Data Drift Report that will compare column distributions between current and reference:

data_drift_report = Report(metrics=[
    DataDriftPreset(),
])

data_drift_report.run(current_data=iris_frame.iloc[:60], reference_data=iris_frame.iloc[60:], column_mapping=None)
data_drift_report

Save the report as HTML. You'll later need to open it from the destination folder.

data_drift_report.save_html("file.html")

To get the output as JSON:

data_drift_report.json()

You can choose other Presets and individual Metrics, including LLM evaluations for text data.

Option 3: ML monitoring dashboard

This launches a demo project in the Evidently UI. Check tutorials for Self-hosting or Evidently Cloud.

Recommended step: create a virtual environment and activate it.

pip install virtualenv
virtualenv venv
source venv/bin/activate

After installing Evidently (pip install evidently), run the Evidently UI with the demo projects:

evidently ui --demo-projects all

Access Evidently UI service in your browser. Go to the localhost:8000.

🚦 What can you evaluate?

Evidently has 100+ built-in evals. You can also add custom ones. Each metric has an optional visualization: you can use it in Reports, Test Suites, or plot on a Dashboard.

Here are examples of things you can check:

🔡 Text descriptors📝 LLM outputs
Length, sentiment, toxicity, language, special symbols, regular expression matches, etc.Semantic similarity, retrieval relevance, summarization quality, etc. with model- and LLM-based evals.
🛢 Data quality📊 Data distribution drift
Missing values, duplicates, min-max ranges, new categorical values, correlations, etc.20+ statistical tests and distance metrics to compare shifts in data distribution.
🎯 Classification📈 Regression
Accuracy, precision, recall, ROC AUC, confusion matrix, bias, etc.MAE, ME, RMSE, error distribution, error normality, error bias, etc.
🗂 Ranking (inc. RAG)🛒 Recommendations
NDCG, MAP, MRR, Hit Rate, etc.Serendipity, novelty, diversity, popularity bias, etc.

:computer: Contributions

We welcome contributions! Read the Guide to learn more.

:books: Documentation

For more information, refer to a complete Documentation. You can start with the tutorials:

See more examples in the Docs.

How-to guides

Explore the How-to guides to understand specific features in Evidently.

:white_check_mark: Discord Community

If you want to chat and connect, join our Discord community!