auto-sklearn

Automated Machine Learning with scikit-learn

7,817

1,297

7,817

205

View on GitHub

Top Related Projects

mljar-supervised

3,150

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation

tpot

9,892

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

nni

14,238

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

pycaret

9,298

An open-source, low-code machine learning library in Python

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Quick Overview

Auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. It automatically searches for the right learning algorithm for a given machine learning task and optimizes its hyperparameters, eliminating the need for manual algorithm selection and hyperparameter tuning.

Pros

Automates the entire machine learning pipeline, including preprocessing, feature engineering, and model selection
Saves time and effort in the model development process
Consistently produces high-quality models, often outperforming manually tuned solutions
Integrates seamlessly with scikit-learn, making it easy to use for those familiar with the scikit-learn API

Cons

Can be computationally expensive and time-consuming for large datasets or complex problems
May not always find the optimal solution, especially for highly specialized or domain-specific problems
Limited customization options compared to manual tuning
Requires some understanding of machine learning concepts to interpret and use results effectively

Code Examples

Basic usage:

from autosklearn.classification import AutoSklearnClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

# Load data and split into train and test sets
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Create and fit the AutoSklearnClassifier
automl = AutoSklearnClassifier(time_left_for_this_task=120, per_run_time_limit=30)
automl.fit(X_train, y_train)

# Make predictions and print the accuracy score
y_pred = automl.predict(X_test)
print(f"Accuracy score: {automl.score(X_test, y_test)}")

Customizing the search space:

from autosklearn.classification import AutoSklearnClassifier
from autosklearn.pipeline.components.classification import ClassifierChoice

# Create a custom classifier choice
custom_classifiers = ClassifierChoice(
    {"random_forest": RandomForestClassifier,
     "extra_trees": ExtraTreesClassifier}
)

# Create AutoSklearnClassifier with custom classifiers
automl = AutoSklearnClassifier(
    time_left_for_this_task=300,
    per_run_time_limit=60,
    include={"classifier": custom_classifiers}
)

# Fit the model (assuming X_train and y_train are defined)
automl.fit(X_train, y_train)

Ensemble building:

from autosklearn.classification import AutoSklearnClassifier

# Create AutoSklearnClassifier with ensemble building
automl = AutoSklearnClassifier(
    time_left_for_this_task=300,
    per_run_time_limit=60,
    ensemble_size=50,
    ensemble_nbest=20
)

# Fit the model and print the final ensemble composition
automl.fit(X_train, y_train)
print(automl.show_models())

Getting Started

To get started with auto-sklearn, follow these steps:

Install auto-sklearn:

pip install auto-sklearn

Import and use AutoSklearnClassifier or AutoSklearnRegressor:

from autosklearn.classification import AutoSklearnClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

# Load data and split into train and test sets
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Create and fit the AutoSklearnClassifier
automl = AutoSklearnClassifier(time_left_for_this_task=120, per_run_time_limit=30)
automl.fit(X_train, y_train)

# Evaluate the model
print(f"Accuracy score: {

Competitor Comparisons

mljar-supervised

3,150

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation

Pros of mljar-supervised

Easier to use with a simpler API and more intuitive interface
Supports both classification and regression tasks out of the box
Provides detailed explanations and visualizations of model performance

Cons of mljar-supervised

Less extensive algorithm selection compared to auto-sklearn
May not perform as well on complex datasets or specialized tasks
Smaller community and fewer contributions

Code Comparison

mljar-supervised:

from supervised import AutoML

automl = AutoML(results_path="automl_results")
automl.fit(X, y)
predictions = automl.predict(X_test)

auto-sklearn:

import autosklearn.classification

automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
predictions = automl.predict(X_test)

Both libraries aim to automate the machine learning pipeline, but mljar-supervised focuses on simplicity and interpretability, while auto-sklearn offers more advanced features and a wider range of algorithms. The choice between them depends on the specific requirements of your project and your level of expertise in machine learning.

tpot

9,892

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Pros of TPOT

Uses genetic programming to optimize machine learning pipelines
Supports a wider range of algorithms and preprocessing steps
More flexible and customizable pipeline structure

Cons of TPOT

Generally slower than auto-sklearn due to genetic programming approach
Less automated hyperparameter tuning compared to auto-sklearn
May require more computational resources for large datasets

Code Comparison

TPOT:

from tpot import TPOTClassifier
tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_pipeline.py')

auto-sklearn:

import autosklearn.classification
automl = autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=120)
automl.fit(X_train, y_train)
print(automl.score(X_test, y_test))

Both TPOT and auto-sklearn are powerful AutoML libraries, but they differ in their approaches. TPOT offers more flexibility and a wider range of algorithms, while auto-sklearn provides faster results and more automated hyperparameter tuning. The choice between them depends on specific project requirements, available computational resources, and desired level of customization.

nni

14,238

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Pros of NNI

Supports a wider range of ML frameworks (TensorFlow, PyTorch, Keras, etc.)
Offers more diverse optimization algorithms and tuning strategies
Provides a user-friendly web UI for experiment management and visualization

Cons of NNI

Steeper learning curve due to more complex configuration options
Less focus on automated feature engineering compared to auto-sklearn
May require more manual setup and customization for specific use cases

Code Comparison

NNI configuration example:

authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
trainingServicePlatform: local
searchSpacePath: search_space.json
useAnnotation: false
tuner:
  builtinTunerName: TPE
trial:
  command: python3 mnist.py
  codeDir: .

auto-sklearn usage example:

import autosklearn.classification
cls = autosklearn.classification.AutoSklearnClassifier()
cls.fit(X_train, y_train)
predictions = cls.predict(X_test)

Both NNI and auto-sklearn aim to automate machine learning workflows, but NNI offers more flexibility and support for deep learning frameworks, while auto-sklearn focuses on automating traditional machine learning tasks with scikit-learn estimators.

autokeras

9,224

AutoML library for deep learning

Pros of AutoKeras

Built on top of Keras, offering seamless integration with TensorFlow ecosystem
Supports image, text, and structured data out-of-the-box
User-friendly API, making it accessible for beginners

Cons of AutoKeras

Limited customization options compared to auto-sklearn
Slower search process, especially for large datasets
Less mature and fewer advanced features than auto-sklearn

Code Comparison

AutoKeras:

import autokeras as ak

clf = ak.StructuredDataClassifier(max_trials=10)
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)

auto-sklearn:

import autosklearn.classification

automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
y_pred = automl.predict(X_test)

Both libraries aim to automate the machine learning pipeline, but AutoKeras focuses on neural network architectures while auto-sklearn explores a broader range of ML algorithms. AutoKeras provides a more straightforward API, making it easier for beginners to get started with AutoML. However, auto-sklearn offers more advanced features and customization options, making it suitable for more complex tasks and experienced users.

pycaret

9,298

An open-source, low-code machine learning library in Python

Pros of PyCaret

Easier to use with a simpler API and more intuitive workflow
Supports a wider range of machine learning tasks, including clustering and anomaly detection
Offers built-in experiment logging and model interpretation features

Cons of PyCaret

Less customizable and flexible compared to auto-sklearn
May not perform as well on complex datasets or specialized problems
Fewer hyperparameter optimization options

Code Comparison

PyCaret:

from pycaret.classification import *
setup(data, target='target_column')
best_model = compare_models()
predict_model(best_model)

auto-sklearn:

import autosklearn.classification
automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
predictions = automl.predict(X_test)

PyCaret offers a more concise and user-friendly approach, while auto-sklearn provides more granular control over the AutoML process. PyCaret's setup function automatically handles preprocessing and feature engineering, whereas auto-sklearn requires more manual input but allows for finer-tuned configurations.

h2o-3

7,129

Pros of h2o-3

Supports a wider range of algorithms and models
Offers distributed computing capabilities for handling large datasets
Provides a user-friendly web interface for non-programmers

Cons of h2o-3

Steeper learning curve due to its extensive feature set
Requires more system resources for optimal performance
Less focused on automated machine learning compared to auto-sklearn

Code Comparison

h2o-3:

import h2o
from h2o.automl import H2OAutoML

h2o.init()
aml = H2OAutoML(max_models=20, seed=1)
aml.train(x=predictors, y=target, training_frame=train)

auto-sklearn:

import autosklearn.classification

automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
predictions = automl.predict(X_test)

Both libraries offer automated machine learning capabilities, but h2o-3 provides a more comprehensive suite of tools for data analysis and model building. auto-sklearn focuses specifically on automated machine learning and integrates well with the scikit-learn ecosystem. The choice between the two depends on the specific needs of the project and the user's familiarity with each library's ecosystem.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

auto-sklearn

auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.

Find the documentation here. Quick links:

auto-sklearn in one image

auto-sklearn in four lines of code

import autosklearn.classification
cls = autosklearn.classification.AutoSklearnClassifier()
cls.fit(X_train, y_train)
predictions = cls.predict(X_test)

Relevant publications

If you use auto-sklearn in scientific publications, we would appreciate citations.

Efficient and Robust Automated Machine Learning Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum and Frank Hutter Advances in Neural Information Processing Systems 28 (2015)

Link to publication.

@inproceedings{feurer-neurips15a,
    title     = {Efficient and Robust Automated Machine Learning},
    author    = {Feurer, Matthias and Klein, Aaron and Eggensperger, Katharina and Springenberg, Jost and Blum, Manuel and Hutter, Frank},
    booktitle = {Advances in Neural Information Processing Systems 28 (2015)},
    pages     = {2962--2970},
    year      = {2015}
}

Auto-Sklearn 2.0: The Next Generation Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer and Frank Hutter* arXiv:2007.04074 [cs.LG], 2020

Link to publication.

@article{feurer-arxiv20a,
    title     = {Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning},
    author    = {Feurer, Matthias and Eggensperger, Katharina and Falkner, Stefan and Lindauer, Marius and Hutter, Frank},
    booktitle = {arXiv:2007.04074 [cs.LG]},
    year      = {2020}
}

Also, have a look at the blog on automl.org where we regularly release blogposts.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot