FLAML
A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
Top Related Projects
A hyperparameter optimization framework
Distributed Asynchronous Hyperparameter Optimization in Python
Sequential model-based optimization with a `scipy.optimize` interface
Adaptive Experimentation Platform
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Open source platform for the machine learning lifecycle
Quick Overview
FLAML (Fast and Lightweight AutoML) is an open-source Python library for automated machine learning and hyperparameter tuning. It is designed to be efficient, lightweight, and easy to use, making it suitable for both large-scale and resource-constrained scenarios.
Pros
- Fast and efficient AutoML with minimal computational resources
- Supports a wide range of ML tasks including classification, regression, and time series forecasting
- Highly customizable and extensible for advanced users
- Integrates well with popular ML frameworks like scikit-learn and XGBoost
Cons
- May not always produce the absolute best model compared to more resource-intensive AutoML tools
- Documentation could be more comprehensive for some advanced features
- Limited support for deep learning tasks compared to some other AutoML frameworks
Code Examples
- Basic AutoML for classification:
from flaml import AutoML
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
automl = AutoML()
automl.fit(X, y, task="classification")
print(automl.model.estimator)
- Time series forecasting with FLAML:
from flaml import AutoML
import pandas as pd
# Assuming 'data' is a pandas DataFrame with a datetime index and target column
automl = AutoML()
automl.fit(data, target_col="target", task="ts_forecast", time_col="date")
predictions = automl.predict(data)
- Custom search space for hyperparameter tuning:
from flaml import AutoML
from flaml.tune import Choice, Real
custom_space = {
"n_estimators": Choice([100, 200, 300, 400, 500]),
"learning_rate": Real(0.01, 0.1, log=True),
}
automl = AutoML()
automl.fit(X, y, task="classification", custom_hp=custom_space)
Getting Started
To get started with FLAML, first install it using pip:
pip install flaml
Then, you can use FLAML for a basic classification task:
from flaml import AutoML
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Initialize and train AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification", time_budget=60)
# Evaluate the model
print(f"Best ML leaner: {automl.best_estimator}")
print(f"Best hyperparmeter config: {automl.best_config}")
print(f"Best accuracy on validation data: {1 - automl.best_loss}")
print(f"Training duration: {automl.best_config_train_time:.2f} s")
This example demonstrates how to use FLAML for a basic classification task, including model training, evaluation, and reporting of the best model and hyperparameters.
Competitor Comparisons
A hyperparameter optimization framework
Pros of Optuna
- More mature and widely adopted project with a larger community
- Supports a broader range of optimization algorithms and techniques
- Offers advanced visualization tools for analyzing optimization results
Cons of Optuna
- Can be more complex to set up and use for simple optimization tasks
- May require more manual configuration for hyperparameter search spaces
Code Comparison
Optuna:
import optuna
def objective(trial):
x = trial.suggest_float('x', -10, 10)
return (x - 2) ** 2
study = optuna.create_study()
study.optimize(objective, n_trials=100)
FLAML:
from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")
Summary
Optuna is a more established and feature-rich hyperparameter optimization framework, offering a wide range of algorithms and visualization tools. It's well-suited for complex optimization tasks and research purposes. FLAML, on the other hand, focuses on simplicity and efficiency, making it easier to use for quick automated machine learning tasks. FLAML's code is more concise and requires less setup, while Optuna provides more flexibility and control over the optimization process.
Distributed Asynchronous Hyperparameter Optimization in Python
Pros of Hyperopt
- More mature and widely adopted project with a larger community
- Supports a broader range of optimization algorithms (e.g., Tree of Parzen Estimators, Adaptive TPE)
- Flexible and can be used for various optimization tasks beyond hyperparameter tuning
Cons of Hyperopt
- Steeper learning curve and more complex API
- Less focus on AutoML and automated feature engineering
- May require more manual configuration for optimal performance
Code Comparison
Hyperopt:
from hyperopt import fmin, tpe, hp
space = {
'x': hp.uniform('x', -5, 5),
'y': hp.uniform('y', -5, 5),
}
def objective(params):
x, y = params['x'], params['y']
return x**2 + y**2
best = fmin(objective, space, algo=tpe.suggest, max_evals=100)
FLAML:
from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")
predictions = automl.predict(X_test)
FLAML offers a more streamlined API for AutoML tasks, while Hyperopt provides greater flexibility for custom optimization problems. FLAML is better suited for quick, automated machine learning workflows, whereas Hyperopt excels in scenarios requiring fine-grained control over the optimization process.
Sequential model-based optimization with a `scipy.optimize` interface
Pros of scikit-optimize
- More established and mature project with a larger community
- Broader range of optimization algorithms and techniques
- Extensive documentation and examples
Cons of scikit-optimize
- Less focus on automated machine learning (AutoML) tasks
- May require more manual configuration for hyperparameter tuning
- Slower development pace compared to FLAML
Code Comparison
FLAML:
from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")
predictions = automl.predict(X_test)
scikit-optimize:
from skopt import BayesSearchCV
from sklearn.svm import SVC
opt = BayesSearchCV(SVC(), {'C': (1e-6, 1e+6, 'log-uniform')})
opt.fit(X_train, y_train)
predictions = opt.predict(X_test)
FLAML is designed for simplicity and automated machine learning, while scikit-optimize offers more flexibility for various optimization tasks. FLAML's AutoML approach requires less manual configuration, whereas scikit-optimize allows for more fine-grained control over the optimization process. Both libraries have their strengths, with FLAML excelling in ease of use for AutoML tasks and scikit-optimize providing a broader range of optimization techniques for various applications.
Adaptive Experimentation Platform
Pros of Ax
- More comprehensive Bayesian optimization framework with advanced features like multi-objective optimization and multi-fidelity optimization
- Stronger focus on scientific and industrial applications, with built-in support for A/B testing and experimentation
- Better integration with PyTorch, making it suitable for deep learning hyperparameter tuning
Cons of Ax
- Steeper learning curve due to its more complex architecture and extensive features
- Less focus on automated machine learning (AutoML) compared to FLAML
- Requires more setup and configuration for simple optimization tasks
Code Comparison
FLAML:
from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")
Ax:
from ax import optimize
best_parameters, values, experiment, model = optimize(
parameters=[
{"name": "x1", "type": "range", "bounds": [-10.0, 10.0]},
{"name": "x2", "type": "range", "bounds": [-10.0, 10.0]},
],
evaluation_function=evaluation_function,
objective_name="objective",
)
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Pros of Ray
- Comprehensive distributed computing framework with support for various ML tasks
- Highly scalable, designed for large-scale distributed applications
- Rich ecosystem with libraries for reinforcement learning, hyperparameter tuning, and more
Cons of Ray
- Steeper learning curve due to its extensive feature set
- Higher overhead for simple tasks or small-scale projects
- More complex setup and configuration compared to FLAML
Code Comparison
Ray example:
import ray
@ray.remote
def f(x):
return x * x
futures = [f.remote(i) for i in range(4)]
print(ray.get(futures))
FLAML example:
from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")
predictions = automl.predict(X_test)
Ray offers a more general-purpose distributed computing framework, while FLAML focuses specifically on automated machine learning. Ray's code example demonstrates its distributed nature, whereas FLAML's example showcases its simplicity for AutoML tasks.
Open source platform for the machine learning lifecycle
Pros of MLflow
- Comprehensive experiment tracking and model management
- Supports multiple ML frameworks and languages
- Robust deployment and serving capabilities
Cons of MLflow
- Steeper learning curve for beginners
- Can be overkill for small projects or simple workflows
- Requires more setup and infrastructure
Code Comparison
MLflow:
import mlflow
mlflow.start_run()
mlflow.log_param("param1", value1)
mlflow.log_metric("metric1", value2)
mlflow.end_run()
FLAML:
from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")
Key Differences
- MLflow focuses on experiment tracking and model management across the entire ML lifecycle
- FLAML specializes in automated machine learning and hyperparameter optimization
- MLflow is more versatile and supports various ML frameworks, while FLAML is primarily for AutoML tasks
- FLAML is easier to use for quick AutoML experiments, while MLflow requires more setup but offers broader functionality
Use Cases
- Choose MLflow for comprehensive ML project management and deployment
- Opt for FLAML when rapid AutoML and hyperparameter tuning are the primary goals
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
A Fast Library for Automated Machine Learning & Tuning
:fire: FLAML supports AutoML and Hyperparameter Tuning in Microsoft Fabric Data Science. In addition, we've introduced Python 3.11 support, along with a range of new estimators, and comprehensive integration with MLflowâthanks to contributions from the Microsoft Fabric product team.
:fire: Heads-up: We have migrated AutoGen into a dedicated github repository. Alongside this move, we have also launched a dedicated Discord server and a website for comprehensive documentation.
:fire: The automated multi-agent chat framework in AutoGen is in preview from v2.0.0.
:fire: FLAML is highlighted in OpenAI's cookbook.
:fire: autogen is released with support for ChatGPT and GPT-4, based on Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference.
What is FLAML
FLAML is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance.
- FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness.
- For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range.
- It supports fast and economical automatic tuning (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations), capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping.
FLAML is powered by a series of research studies from Microsoft Research and collaborators such as Penn State University, Stevens Institute of Technology, University of Washington, and University of Waterloo.
FLAML has a .NET implementation in ML.NET, an open-source, cross-platform machine learning framework for .NET.
Installation
FLAML requires Python version >= 3.8. It can be installed from pip:
pip install flaml
Minimal dependencies are installed without extra options. You can install extra options based on the feature you need. For example, use the following to install the dependencies needed by the autogen
package.
pip install "flaml[autogen]"
Find more options in Installation.
Each of the notebook examples
may require a specific option to be installed.
Quickstart
- (New) The autogen package enables the next-gen GPT-X applications with a generic multi-agent conversation framework. It offers customizable and conversable agents which integrate LLMs, tools and human. By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code. For example,
from flaml import autogen
assistant = autogen.AssistantAgent("assistant")
user_proxy = autogen.UserProxyAgent("user_proxy")
user_proxy.initiate_chat(
assistant,
message="Show me the YTD gain of 10 largest technology companies as of today.",
)
# This initiates an automated chat between the two agents to solve the task
Autogen also helps maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4. It offers a drop-in replacement of openai.Completion
or openai.ChatCompletion
with powerful functionalites like tuning, caching, templating, filtering. For example, you can optimize generations by LLM with your own tuning data, success metrics and budgets.
# perform tuning
config, analysis = autogen.Completion.tune(
data=tune_data,
metric="success",
mode="max",
eval_func=eval_func,
inference_budget=0.05,
optimization_budget=3,
num_samples=-1,
)
# perform inference for a test instance
response = autogen.Completion.create(context=test_instance, **config)
- With three lines of code, you can start using this economical and fast AutoML engine as a scikit-learn style estimator.
from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")
- You can restrict the learners and use FLAML as a fast hyperparameter tuning tool for XGBoost, LightGBM, Random Forest etc. or a customized learner.
automl.fit(X_train, y_train, task="classification", estimator_list=["lgbm"])
- You can also run generic hyperparameter tuning for a custom function.
from flaml import tune
tune.run(evaluation_function, config={â¦}, low_cost_partial_config={â¦}, time_budget_s=3600)
- Zero-shot AutoML allows using the existing training API from lightgbm, xgboost etc. while getting the benefit of AutoML in choosing high-performance hyperparameter configurations per task.
from flaml.default import LGBMRegressor
# Use LGBMRegressor in the same way as you use lightgbm.LGBMRegressor.
estimator = LGBMRegressor()
# The hyperparameters are automatically set according to the training data.
estimator.fit(X_train, y_train)
Documentation
You can find a detailed documentation about FLAML here.
In addition, you can find:
-
ML.NET documentation and tutorials for Model Builder, ML.NET CLI, and AutoML API.
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
If you are new to GitHub here is a detailed help source on getting involved with development on GitHub.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Contributors Wall
Top Related Projects
A hyperparameter optimization framework
Distributed Asynchronous Hyperparameter Optimization in Python
Sequential model-based optimization with a `scipy.optimize` interface
Adaptive Experimentation Platform
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Open source platform for the machine learning lifecycle
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot