Top Related Projects
Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
AutoML library for deep learning
An open-source, low-code machine learning library in Python
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Quick Overview
Auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. It automatically searches for the right learning algorithm for a given machine learning task and optimizes its hyperparameters, eliminating the need for manual algorithm selection and hyperparameter tuning.
Pros
- Automates the entire machine learning pipeline, including preprocessing, feature engineering, and model selection
- Saves time and effort in the model development process
- Consistently produces high-quality models, often outperforming manually tuned solutions
- Integrates seamlessly with scikit-learn, making it easy to use for those familiar with the scikit-learn API
Cons
- Can be computationally expensive and time-consuming for large datasets or complex problems
- May not always find the optimal solution, especially for highly specialized or domain-specific problems
- Limited customization options compared to manual tuning
- Requires some understanding of machine learning concepts to interpret and use results effectively
Code Examples
- Basic usage:
from autosklearn.classification import AutoSklearnClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
# Load data and split into train and test sets
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
# Create and fit the AutoSklearnClassifier
automl = AutoSklearnClassifier(time_left_for_this_task=120, per_run_time_limit=30)
automl.fit(X_train, y_train)
# Make predictions and print the accuracy score
y_pred = automl.predict(X_test)
print(f"Accuracy score: {automl.score(X_test, y_test)}")
- Customizing the search space:
from autosklearn.classification import AutoSklearnClassifier
from autosklearn.pipeline.components.classification import ClassifierChoice
# Create a custom classifier choice
custom_classifiers = ClassifierChoice(
{"random_forest": RandomForestClassifier,
"extra_trees": ExtraTreesClassifier}
)
# Create AutoSklearnClassifier with custom classifiers
automl = AutoSklearnClassifier(
time_left_for_this_task=300,
per_run_time_limit=60,
include={"classifier": custom_classifiers}
)
# Fit the model (assuming X_train and y_train are defined)
automl.fit(X_train, y_train)
- Ensemble building:
from autosklearn.classification import AutoSklearnClassifier
# Create AutoSklearnClassifier with ensemble building
automl = AutoSklearnClassifier(
time_left_for_this_task=300,
per_run_time_limit=60,
ensemble_size=50,
ensemble_nbest=20
)
# Fit the model and print the final ensemble composition
automl.fit(X_train, y_train)
print(automl.show_models())
Getting Started
To get started with auto-sklearn, follow these steps:
- Install auto-sklearn:
pip install auto-sklearn
- Import and use AutoSklearnClassifier or AutoSklearnRegressor:
from autosklearn.classification import AutoSklearnClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Load data and split into train and test sets
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
# Create and fit the AutoSklearnClassifier
automl = AutoSklearnClassifier(time_left_for_this_task=120, per_run_time_limit=30)
automl.fit(X_train, y_train)
# Evaluate the model
print(f"Accuracy score: {
Competitor Comparisons
Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
Pros of mljar-supervised
- Easier to use with a simpler API and more intuitive interface
- Supports both classification and regression tasks out of the box
- Provides detailed explanations and visualizations of model performance
Cons of mljar-supervised
- Less extensive algorithm selection compared to auto-sklearn
- May not perform as well on complex datasets or specialized tasks
- Smaller community and fewer contributions
Code Comparison
mljar-supervised:
from supervised import AutoML
automl = AutoML(results_path="automl_results")
automl.fit(X, y)
predictions = automl.predict(X_test)
auto-sklearn:
import autosklearn.classification
automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
predictions = automl.predict(X_test)
Both libraries aim to automate the machine learning pipeline, but mljar-supervised focuses on simplicity and interpretability, while auto-sklearn offers more advanced features and a wider range of algorithms. The choice between them depends on the specific requirements of your project and your level of expertise in machine learning.
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Pros of TPOT
- Uses genetic programming to optimize machine learning pipelines
- Supports a wider range of algorithms and preprocessing steps
- More flexible and customizable pipeline structure
Cons of TPOT
- Generally slower than auto-sklearn due to genetic programming approach
- Less automated hyperparameter tuning compared to auto-sklearn
- May require more computational resources for large datasets
Code Comparison
TPOT:
from tpot import TPOTClassifier
tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_pipeline.py')
auto-sklearn:
import autosklearn.classification
automl = autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=120)
automl.fit(X_train, y_train)
print(automl.score(X_test, y_test))
Both TPOT and auto-sklearn are powerful AutoML libraries, but they differ in their approaches. TPOT offers more flexibility and a wider range of algorithms, while auto-sklearn provides faster results and more automated hyperparameter tuning. The choice between them depends on specific project requirements, available computational resources, and desired level of customization.
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Pros of NNI
- Supports a wider range of ML frameworks (TensorFlow, PyTorch, Keras, etc.)
- Offers more diverse optimization algorithms and tuning strategies
- Provides a user-friendly web UI for experiment management and visualization
Cons of NNI
- Steeper learning curve due to more complex configuration options
- Less focus on automated feature engineering compared to auto-sklearn
- May require more manual setup and customization for specific use cases
Code Comparison
NNI configuration example:
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
trainingServicePlatform: local
searchSpacePath: search_space.json
useAnnotation: false
tuner:
builtinTunerName: TPE
trial:
command: python3 mnist.py
codeDir: .
auto-sklearn usage example:
import autosklearn.classification
cls = autosklearn.classification.AutoSklearnClassifier()
cls.fit(X_train, y_train)
predictions = cls.predict(X_test)
Both NNI and auto-sklearn aim to automate machine learning workflows, but NNI offers more flexibility and support for deep learning frameworks, while auto-sklearn focuses on automating traditional machine learning tasks with scikit-learn estimators.
AutoML library for deep learning
Pros of AutoKeras
- Built on top of Keras, offering seamless integration with TensorFlow ecosystem
- Supports image, text, and structured data out-of-the-box
- User-friendly API, making it accessible for beginners
Cons of AutoKeras
- Limited customization options compared to auto-sklearn
- Slower search process, especially for large datasets
- Less mature and fewer advanced features than auto-sklearn
Code Comparison
AutoKeras:
import autokeras as ak
clf = ak.StructuredDataClassifier(max_trials=10)
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)
auto-sklearn:
import autosklearn.classification
automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
y_pred = automl.predict(X_test)
Both libraries aim to automate the machine learning pipeline, but AutoKeras focuses on neural network architectures while auto-sklearn explores a broader range of ML algorithms. AutoKeras provides a more straightforward API, making it easier for beginners to get started with AutoML. However, auto-sklearn offers more advanced features and customization options, making it suitable for more complex tasks and experienced users.
An open-source, low-code machine learning library in Python
Pros of PyCaret
- Easier to use with a simpler API and more intuitive workflow
- Supports a wider range of machine learning tasks, including clustering and anomaly detection
- Offers built-in experiment logging and model interpretation features
Cons of PyCaret
- Less customizable and flexible compared to auto-sklearn
- May not perform as well on complex datasets or specialized problems
- Fewer hyperparameter optimization options
Code Comparison
PyCaret:
from pycaret.classification import *
setup(data, target='target_column')
best_model = compare_models()
predict_model(best_model)
auto-sklearn:
import autosklearn.classification
automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
predictions = automl.predict(X_test)
PyCaret offers a more concise and user-friendly approach, while auto-sklearn provides more granular control over the AutoML process. PyCaret's setup function automatically handles preprocessing and feature engineering, whereas auto-sklearn requires more manual input but allows for finer-tuned configurations.
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Pros of h2o-3
- Supports a wider range of algorithms and models
- Offers distributed computing capabilities for handling large datasets
- Provides a user-friendly web interface for non-programmers
Cons of h2o-3
- Steeper learning curve due to its extensive feature set
- Requires more system resources for optimal performance
- Less focused on automated machine learning compared to auto-sklearn
Code Comparison
h2o-3:
import h2o
from h2o.automl import H2OAutoML
h2o.init()
aml = H2OAutoML(max_models=20, seed=1)
aml.train(x=predictors, y=target, training_frame=train)
auto-sklearn:
import autosklearn.classification
automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
predictions = automl.predict(X_test)
Both libraries offer automated machine learning capabilities, but h2o-3 provides a more comprehensive suite of tools for data analysis and model building. auto-sklearn focuses specifically on automated machine learning and integrates well with the scikit-learn ecosystem. The choice between the two depends on the specific needs of the project and the user's familiarity with each library's ecosystem.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
auto-sklearn
auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
Find the documentation here. Quick links:
auto-sklearn in one image
auto-sklearn in four lines of code
import autosklearn.classification
cls = autosklearn.classification.AutoSklearnClassifier()
cls.fit(X_train, y_train)
predictions = cls.predict(X_test)
Relevant publications
If you use auto-sklearn in scientific publications, we would appreciate citations.
Efficient and Robust Automated Machine Learning Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum and Frank Hutter Advances in Neural Information Processing Systems 28 (2015)
Link to publication.
@inproceedings{feurer-neurips15a,
title = {Efficient and Robust Automated Machine Learning},
author = {Feurer, Matthias and Klein, Aaron and Eggensperger, Katharina and Springenberg, Jost and Blum, Manuel and Hutter, Frank},
booktitle = {Advances in Neural Information Processing Systems 28 (2015)},
pages = {2962--2970},
year = {2015}
}
Auto-Sklearn 2.0: The Next Generation Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer and Frank Hutter* arXiv:2007.04074 [cs.LG], 2020
Link to publication.
@article{feurer-arxiv20a,
title = {Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning},
author = {Feurer, Matthias and Eggensperger, Katharina and Falkner, Stefan and Lindauer, Marius and Hutter, Frank},
booktitle = {arXiv:2007.04074 [cs.LG]},
year = {2020}
}
Also, have a look at the blog on automl.org where we regularly release blogposts.
Top Related Projects
Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
AutoML library for deep learning
An open-source, low-code machine learning library in Python
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot