FLAML

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.

4,179

541

4,179

232

View on GitHub

Top Related Projects

optuna

12,406

A hyperparameter optimization framework

hyperopt

7,435

Distributed Asynchronous Hyperparameter Optimization in Python

scikit-optimize

2,775

Sequential model-based optimization with a `scipy.optimize` interface

ray

38,187

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

Quick Overview

FLAML (Fast and Lightweight AutoML) is an open-source Python library for automated machine learning and hyperparameter tuning. It is designed to be efficient, lightweight, and easy to use, making it suitable for both large-scale and resource-constrained scenarios.

Pros

Fast and efficient AutoML with minimal computational resources
Supports a wide range of ML tasks including classification, regression, and time series forecasting
Highly customizable and extensible for advanced users
Integrates well with popular ML frameworks like scikit-learn and XGBoost

Cons

May not always produce the absolute best model compared to more resource-intensive AutoML tools
Documentation could be more comprehensive for some advanced features
Limited support for deep learning tasks compared to some other AutoML frameworks

Code Examples

Basic AutoML for classification:

from flaml import AutoML
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
automl = AutoML()
automl.fit(X, y, task="classification")
print(automl.model.estimator)

Time series forecasting with FLAML:

from flaml import AutoML
import pandas as pd

# Assuming 'data' is a pandas DataFrame with a datetime index and target column
automl = AutoML()
automl.fit(data, target_col="target", task="ts_forecast", time_col="date")
predictions = automl.predict(data)

Custom search space for hyperparameter tuning:

from flaml import AutoML
from flaml.tune import Choice, Real

custom_space = {
    "n_estimators": Choice([100, 200, 300, 400, 500]),
    "learning_rate": Real(0.01, 0.1, log=True),
}

automl = AutoML()
automl.fit(X, y, task="classification", custom_hp=custom_space)

Getting Started

To get started with FLAML, first install it using pip:

pip install flaml

Then, you can use FLAML for a basic classification task:

from flaml import AutoML
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Initialize and train AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification", time_budget=60)

# Evaluate the model
print(f"Best ML leaner: {automl.best_estimator}")
print(f"Best hyperparmeter config: {automl.best_config}")
print(f"Best accuracy on validation data: {1 - automl.best_loss}")
print(f"Training duration: {automl.best_config_train_time:.2f} s")

This example demonstrates how to use FLAML for a basic classification task, including model training, evaluation, and reporting of the best model and hyperparameters.

Competitor Comparisons

optuna

12,406

A hyperparameter optimization framework

Pros of Optuna

More mature and widely adopted project with a larger community
Supports a broader range of optimization algorithms and techniques
Offers advanced visualization tools for analyzing optimization results

Cons of Optuna

Can be more complex to set up and use for simple optimization tasks
May require more manual configuration for hyperparameter search spaces

Code Comparison

Optuna:

import optuna

def objective(trial):
    x = trial.suggest_float('x', -10, 10)
    return (x - 2) ** 2

study = optuna.create_study()
study.optimize(objective, n_trials=100)

FLAML:

from flaml import AutoML

automl = AutoML()
automl.fit(X_train, y_train, task="classification")

Summary

Optuna is a more established and feature-rich hyperparameter optimization framework, offering a wide range of algorithms and visualization tools. It's well-suited for complex optimization tasks and research purposes. FLAML, on the other hand, focuses on simplicity and efficiency, making it easier to use for quick automated machine learning tasks. FLAML's code is more concise and requires less setup, while Optuna provides more flexibility and control over the optimization process.

hyperopt

7,435

Distributed Asynchronous Hyperparameter Optimization in Python

Pros of Hyperopt

More mature and widely adopted project with a larger community
Supports a broader range of optimization algorithms (e.g., Tree of Parzen Estimators, Adaptive TPE)
Flexible and can be used for various optimization tasks beyond hyperparameter tuning

Cons of Hyperopt

Steeper learning curve and more complex API
Less focus on AutoML and automated feature engineering
May require more manual configuration for optimal performance

Code Comparison

Hyperopt:

from hyperopt import fmin, tpe, hp

space = {
    'x': hp.uniform('x', -5, 5),
    'y': hp.uniform('y', -5, 5),
}

def objective(params):
    x, y = params['x'], params['y']
    return x**2 + y**2

best = fmin(objective, space, algo=tpe.suggest, max_evals=100)

FLAML:

from flaml import AutoML

automl = AutoML()
automl.fit(X_train, y_train, task="classification")
predictions = automl.predict(X_test)

FLAML offers a more streamlined API for AutoML tasks, while Hyperopt provides greater flexibility for custom optimization problems. FLAML is better suited for quick, automated machine learning workflows, whereas Hyperopt excels in scenarios requiring fine-grained control over the optimization process.

scikit-optimize

2,775

Sequential model-based optimization with a `scipy.optimize` interface

Pros of scikit-optimize

More established and mature project with a larger community
Broader range of optimization algorithms and techniques
Extensive documentation and examples

Cons of scikit-optimize

Less focus on automated machine learning (AutoML) tasks
May require more manual configuration for hyperparameter tuning
Slower development pace compared to FLAML

Code Comparison

FLAML:

from flaml import AutoML

automl = AutoML()
automl.fit(X_train, y_train, task="classification")
predictions = automl.predict(X_test)

scikit-optimize:

from skopt import BayesSearchCV
from sklearn.svm import SVC

opt = BayesSearchCV(SVC(), {'C': (1e-6, 1e+6, 'log-uniform')})
opt.fit(X_train, y_train)
predictions = opt.predict(X_test)

FLAML is designed for simplicity and automated machine learning, while scikit-optimize offers more flexibility for various optimization tasks. FLAML's AutoML approach requires less manual configuration, whereas scikit-optimize allows for more fine-grained control over the optimization process. Both libraries have their strengths, with FLAML excelling in ease of use for AutoML tasks and scikit-optimize providing a broader range of optimization techniques for various applications.

Ax

2,544

Adaptive Experimentation Platform

Pros of Ax

More comprehensive Bayesian optimization framework with advanced features like multi-objective optimization and multi-fidelity optimization
Stronger focus on scientific and industrial applications, with built-in support for A/B testing and experimentation
Better integration with PyTorch, making it suitable for deep learning hyperparameter tuning

Cons of Ax

Steeper learning curve due to its more complex architecture and extensive features
Less focus on automated machine learning (AutoML) compared to FLAML
Requires more setup and configuration for simple optimization tasks

Code Comparison

FLAML:

from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")

Ax:

from ax import optimize
best_parameters, values, experiment, model = optimize(
    parameters=[
        {"name": "x1", "type": "range", "bounds": [-10.0, 10.0]},
        {"name": "x2", "type": "range", "bounds": [-10.0, 10.0]},
    ],
    evaluation_function=evaluation_function,
    objective_name="objective",
)

ray

38,187

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Pros of Ray

Comprehensive distributed computing framework with support for various ML tasks
Highly scalable, designed for large-scale distributed applications
Rich ecosystem with libraries for reinforcement learning, hyperparameter tuning, and more

Cons of Ray

Steeper learning curve due to its extensive feature set
Higher overhead for simple tasks or small-scale projects
More complex setup and configuration compared to FLAML

Code Comparison

Ray example:

import ray

@ray.remote
def f(x):
    return x * x

futures = [f.remote(i) for i in range(4)]
print(ray.get(futures))

FLAML example:

from flaml import AutoML

automl = AutoML()
automl.fit(X_train, y_train, task="classification")
predictions = automl.predict(X_test)

Ray offers a more general-purpose distributed computing framework, while FLAML focuses specifically on automated machine learning. Ray's code example demonstrates its distributed nature, whereas FLAML's example showcases its simplicity for AutoML tasks.

mlflow

21,434

Pros of MLflow

Comprehensive experiment tracking and model management
Supports multiple ML frameworks and languages
Robust deployment and serving capabilities

Cons of MLflow

Steeper learning curve for beginners
Can be overkill for small projects or simple workflows
Requires more setup and infrastructure

Code Comparison

MLflow:

import mlflow

mlflow.start_run()
mlflow.log_param("param1", value1)
mlflow.log_metric("metric1", value2)
mlflow.end_run()

FLAML:

from flaml import AutoML

automl = AutoML()
automl.fit(X_train, y_train, task="classification")

Key Differences

MLflow focuses on experiment tracking and model management across the entire ML lifecycle
FLAML specializes in automated machine learning and hyperparameter optimization
MLflow is more versatile and supports various ML frameworks, while FLAML is primarily for AutoML tasks
FLAML is easier to use for quick AutoML experiments, while MLflow requires more setup but offers broader functionality

Use Cases

Choose MLflow for comprehensive ML project management and deployment
Opt for FLAML when rapid AutoML and hyperparameter tuning are the primary goals

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Conda version

A Fast Library for Automated Machine Learning & Tuning

:fire: FLAML supports AutoML and Hyperparameter Tuning in Microsoft Fabric Data Science. In addition, we've introduced Python 3.11 support, along with a range of new estimators, and comprehensive integration with MLflowâthanks to contributions from the Microsoft Fabric product team.

:fire: Heads-up: We have migrated AutoGen into a dedicated github repository. Alongside this move, we have also launched a dedicated Discord server and a website for comprehensive documentation.

:fire: The automated multi-agent chat framework in AutoGen is in preview from v2.0.0.

:fire: FLAML is highlighted in OpenAI's cookbook.

:fire: autogen is released with support for ChatGPT and GPT-4, based on Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference.

What is FLAML

FLAML is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance.

FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness.
For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range.
It supports fast and economical automatic tuning (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations), capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping.

FLAML is powered by a series of research studies from Microsoft Research and collaborators such as Penn State University, Stevens Institute of Technology, University of Washington, and University of Waterloo.

FLAML has a .NET implementation in ML.NET, an open-source, cross-platform machine learning framework for .NET.

Installation

FLAML requires Python version >= 3.9. It can be installed from pip:

pip install flaml

Minimal dependencies are installed without extra options. You can install extra options based on the feature you need. For example, use the following to install the dependencies needed by the autogen package.

pip install "flaml[autogen]"

Find more options in Installation. Each of the notebook examples may require a specific option to be installed.

Quickstart

(New) The autogen package enables the next-gen GPT-X applications with a generic multi-agent conversation framework. It offers customizable and conversable agents which integrate LLMs, tools and human. By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code. For example,

from flaml import autogen

assistant = autogen.AssistantAgent("assistant")
user_proxy = autogen.UserProxyAgent("user_proxy")
user_proxy.initiate_chat(
    assistant,
    message="Show me the YTD gain of 10 largest technology companies as of today.",
)
# This initiates an automated chat between the two agents to solve the task

Autogen also helps maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4. It offers a drop-in replacement of openai.Completion or openai.ChatCompletion with powerful functionalites like tuning, caching, templating, filtering. For example, you can optimize generations by LLM with your own tuning data, success metrics and budgets.

# perform tuning
config, analysis = autogen.Completion.tune(
    data=tune_data,
    metric="success",
    mode="max",
    eval_func=eval_func,
    inference_budget=0.05,
    optimization_budget=3,
    num_samples=-1,
)
# perform inference for a test instance
response = autogen.Completion.create(context=test_instance, **config)

With three lines of code, you can start using this economical and fast AutoML engine as a scikit-learn style estimator.

from flaml import AutoML

automl = AutoML()
automl.fit(X_train, y_train, task="classification")

You can restrict the learners and use FLAML as a fast hyperparameter tuning tool for XGBoost, LightGBM, Random Forest etc. or a customized learner.

automl.fit(X_train, y_train, task="classification", estimator_list=["lgbm"])

You can also run generic hyperparameter tuning for a custom function.

from flaml import tune
tune.run(evaluation_function, config={â¦}, low_cost_partial_config={â¦}, time_budget_s=3600)

Zero-shot AutoML allows using the existing training API from lightgbm, xgboost etc. while getting the benefit of AutoML in choosing high-performance hyperparameter configurations per task.

from flaml.default import LGBMRegressor

# Use LGBMRegressor in the same way as you use lightgbm.LGBMRegressor.
estimator = LGBMRegressor()
# The hyperparameters are automatically set according to the training data.
estimator.fit(X_train, y_train)

Documentation

You can find a detailed documentation about FLAML here.

In addition, you can find:

Research and blogposts around FLAML.
Discord.
Contributing guide.
ML.NET documentation and tutorials for Model Builder, ML.NET CLI, and AutoML API.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

If you are new to GitHub here is a detailed help source on getting involved with development on GitHub.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Contributors Wall

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot