Convert Figma logo to code with AI

microsoft logoLightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

16,516
3,821
16,516
361

Top Related Projects

26,078

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

scikit-learn: machine learning in Python

6,854

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

185,446

An Open Source Machine Learning Framework for Everyone

82,049

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Quick Overview

LightGBM is a fast, distributed, high-performance gradient boosting framework based on decision tree algorithms, used for ranking, classification, and many other machine learning tasks. It is developed by Microsoft and is designed to be efficient, scalable, and accurate, particularly for large datasets.

Pros

  • Faster training speed and higher efficiency compared to other boosting frameworks
  • Lower memory usage due to its histogram-based algorithm
  • Supports parallel, distributed, and GPU learning
  • Handles large-scale data with ease

Cons

  • Can be prone to overfitting if not properly tuned
  • May require more careful parameter tuning compared to some other frameworks
  • Less interpretable than simpler models like decision trees
  • Documentation can be challenging for beginners

Code Examples

  1. Basic binary classification:
import lightgbm as lgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2)

# Create dataset for LightGBM
train_data = lgb.Dataset(X_train, label=y_train)

# Set parameters
params = {
    'objective': 'binary',
    'metric': 'binary_logloss',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9
}

# Train model
model = lgb.train(params, train_data, num_boost_round=100)

# Make predictions
y_pred = model.predict(X_test)
  1. Feature importance visualization:
import matplotlib.pyplot as plt

# Get feature importance
importance = model.feature_importance()
feature_names = data.feature_names

# Plot feature importance
plt.figure(figsize=(10, 6))
plt.bar(range(len(importance)), importance)
plt.xticks(range(len(importance)), feature_names, rotation=90)
plt.title('Feature Importance')
plt.tight_layout()
plt.show()
  1. Cross-validation:
from sklearn.model_selection import cross_val_score

# Prepare LightGBM dataset
lgb_dataset = lgb.Dataset(data.data, label=data.target)

# Perform 5-fold cross-validation
cv_results = lgb.cv(params, lgb_dataset, num_boost_round=100, nfold=5, stratified=True, shuffle=True)

# Print mean and standard deviation of AUC
print(f"AUC: {cv_results['auc-mean'][-1]:.4f} (+/- {cv_results['auc-stdv'][-1]:.4f})")

Getting Started

To get started with LightGBM:

  1. Install LightGBM:

    pip install lightgbm
    
  2. Import the library:

    import lightgbm as lgb
    
  3. Prepare your data and create a LightGBM dataset:

    train_data = lgb.Dataset(X_train, label=y_train)
    
  4. Set parameters and train the model:

    params = {'objective': 'binary'}
    model = lgb.train(params, train_data, num_boost_round=100)
    
  5. Make predictions:

    y_pred = model.predict(X_test)
    

Competitor Comparisons

26,078

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Pros of XGBoost

  • More mature and widely adopted in industry and competitions
  • Better handling of missing values
  • Stronger support for distributed and GPU computing

Cons of XGBoost

  • Generally slower training speed, especially for large datasets
  • Higher memory usage
  • More complex hyperparameter tuning

Code Comparison

XGBoost:

import xgboost as xgb
model = xgb.XGBClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

LightGBM:

import lightgbm as lgb
model = lgb.LGBMClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

Both libraries offer similar APIs, making it easy to switch between them. The main differences lie in their underlying algorithms and performance characteristics. XGBoost is often preferred for smaller datasets or when dealing with missing values, while LightGBM shines with larger datasets and faster training times. The choice between the two often depends on the specific use case and dataset characteristics.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Pros of CatBoost

  • Better handling of categorical features without manual preprocessing
  • Improved performance on datasets with high cardinality categorical features
  • Built-in GPU support for faster training

Cons of CatBoost

  • Generally slower training time compared to LightGBM
  • Less extensive documentation and community support
  • Fewer advanced features and customization options

Code Comparison

CatBoost:

from catboost import CatBoostRegressor

model = CatBoostRegressor(iterations=1000, learning_rate=0.1)
model.fit(X_train, y_train, cat_features=cat_features)
predictions = model.predict(X_test)

LightGBM:

import lightgbm as lgb

train_data = lgb.Dataset(X_train, label=y_train)
params = {'num_leaves': 31, 'learning_rate': 0.1}
model = lgb.train(params, train_data, num_boost_round=1000)
predictions = model.predict(X_test)

Both CatBoost and LightGBM are powerful gradient boosting libraries, each with its own strengths. CatBoost excels in handling categorical features and provides built-in GPU support, while LightGBM offers faster training times and more advanced customization options. The choice between the two depends on the specific requirements of your project and the nature of your dataset.

scikit-learn: machine learning in Python

Pros of scikit-learn

  • Comprehensive library with a wide range of machine learning algorithms
  • Excellent documentation and community support
  • Consistent API across different algorithms, making it easy to use and switch between models

Cons of scikit-learn

  • Generally slower performance compared to specialized libraries like LightGBM
  • Less efficient for large-scale datasets and high-dimensional problems
  • Limited support for GPU acceleration

Code Comparison

scikit-learn:

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

LightGBM:

import lightgbm as lgb
train_data = lgb.Dataset(X_train, label=y_train)
params = {'objective': 'binary'}
model = lgb.train(params, train_data)
predictions = model.predict(X_test)

Both libraries offer easy-to-use APIs, but LightGBM is more focused on gradient boosting and provides faster training times, especially for large datasets. scikit-learn offers a broader range of algorithms and is more suitable for general-purpose machine learning tasks, while LightGBM excels in gradient boosting applications and handling high-dimensional data efficiently.

6,854

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Pros of h2o-3

  • Supports a wider range of algorithms and models, including deep learning
  • Offers a user-friendly web interface for non-programmers
  • Provides built-in distributed computing capabilities

Cons of h2o-3

  • Generally slower performance compared to LightGBM
  • More complex setup and configuration process
  • Larger memory footprint, especially for big datasets

Code Comparison

h2o-3:

import h2o
from h2o.estimators import H2OGradientBoostingEstimator

h2o.init()
data = h2o.import_file("data.csv")
model = H2OGradientBoostingEstimator()
model.train(x=["feature1", "feature2"], y="target", training_frame=data)

LightGBM:

import lightgbm as lgb
from sklearn.datasets import load_iris

data = load_iris()
train_data = lgb.Dataset(data.data, label=data.target)
params = {'objective': 'multiclass', 'num_class': 3}
model = lgb.train(params, train_data)

Both libraries offer gradient boosting implementations, but LightGBM focuses on efficiency and speed, while h2o-3 provides a broader range of algorithms and features. LightGBM's code is more concise and straightforward, while h2o-3 requires additional setup steps but offers more flexibility in terms of data handling and model configuration.

185,446

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

  • Comprehensive ecosystem for deep learning and neural networks
  • Supports distributed computing and GPU acceleration
  • Extensive community support and resources

Cons of TensorFlow

  • Steeper learning curve compared to LightGBM
  • Higher computational requirements for simple tasks
  • More complex setup and configuration

Code Comparison

LightGBM:

import lightgbm as lgb
model = lgb.LGBMClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

TensorFlow:

import tensorflow as tf
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(X_train, y_train, epochs=10)
predictions = model.predict(X_test)

LightGBM is more concise and straightforward for gradient boosting tasks, while TensorFlow offers more flexibility for complex neural network architectures but requires more code for simple models.

82,049

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

  • More flexible and dynamic computational graph, allowing for easier debugging and experimentation
  • Broader ecosystem and community support, with extensive libraries for various deep learning tasks
  • Better support for GPU acceleration and distributed computing

Cons of PyTorch

  • Steeper learning curve for beginners compared to LightGBM's simpler API
  • Generally slower training speed for traditional machine learning tasks

Code Comparison

PyTorch example (neural network):

import torch.nn as nn

model = nn.Sequential(
    nn.Linear(10, 5),
    nn.ReLU(),
    nn.Linear(5, 1)
)

LightGBM example (gradient boosting):

import lightgbm as lgb

model = lgb.LGBMRegressor(
    n_estimators=100,
    learning_rate=0.1
)

Summary

PyTorch is a deep learning framework offering flexibility and a rich ecosystem, ideal for complex neural network architectures and research. LightGBM, on the other hand, is a gradient boosting framework optimized for efficiency and speed in traditional machine learning tasks. While PyTorch excels in deep learning and GPU utilization, LightGBM is often preferred for its simplicity and faster training on structured data.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

<img src=https://github.com/microsoft/LightGBM/blob/master/docs/logo/LightGBM_logo_black_text.svg width=300 />

Light Gradient Boosting Machine

Python-package GitHub Actions Build Status R-package GitHub Actions Build Status CUDA Version GitHub Actions Build Status Static Analysis GitHub Actions Build Status Azure Pipelines Build Status Appveyor Build Status Documentation Status Link checks License Python Versions PyPI Version conda Version CRAN Version NuGet Version

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:

  • Faster training speed and higher efficiency.
  • Lower memory usage.
  • Better accuracy.
  • Support of parallel, distributed, and GPU learning.
  • Capable of handling large-scale data.

For further details, please refer to Features.

Benefiting from these advantages, LightGBM is being widely-used in many winning solutions of machine learning competitions.

Comparison experiments on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, distributed learning experiments show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.

Get Started and Documentation

Our primary documentation is at https://lightgbm.readthedocs.io/ and is generated from this repository. If you are new to LightGBM, follow the installation instructions on that site.

Next you may want to read:

Documentation for contributors:

News

Please refer to changelogs at GitHub releases page.

External (Unofficial) Repositories

Projects listed here offer alternative ways to use LightGBM. They are not maintained or officially endorsed by the LightGBM development team.

LightGBMLSS (An extension of LightGBM to probabilistic modelling from which prediction intervals and quantiles can be derived): https://github.com/StatMixedML/LightGBMLSS

FLAML (AutoML library for hyperparameter optimization): https://github.com/microsoft/FLAML

supertree (interactive visualization of decision trees): https://github.com/mljar/supertree

Optuna (hyperparameter optimization framework): https://github.com/optuna/optuna

Julia-package: https://github.com/IQVIA-ML/LightGBM.jl

JPMML (Java PMML converter): https://github.com/jpmml/jpmml-lightgbm

Nyoka (Python PMML converter): https://github.com/SoftwareAG/nyoka

Treelite (model compiler for efficient deployment): https://github.com/dmlc/treelite

lleaves (LLVM-based model compiler for efficient inference): https://github.com/siboehm/lleaves

Hummingbird (model compiler into tensor computations): https://github.com/microsoft/hummingbird

cuML Forest Inference Library (GPU-accelerated inference): https://github.com/rapidsai/cuml

daal4py (Intel CPU-accelerated inference): https://github.com/intel/scikit-learn-intelex/tree/master/daal4py

m2cgen (model appliers for various languages): https://github.com/BayesWitnesses/m2cgen

leaves (Go model applier): https://github.com/dmitryikh/leaves

ONNXMLTools (ONNX converter): https://github.com/onnx/onnxmltools

SHAP (model output explainer): https://github.com/slundberg/shap

Shapash (model visualization and interpretation): https://github.com/MAIF/shapash

dtreeviz (decision tree visualization and model interpretation): https://github.com/parrt/dtreeviz

SynapseML (LightGBM on Spark): https://github.com/microsoft/SynapseML

Kubeflow Fairing (LightGBM on Kubernetes): https://github.com/kubeflow/fairing

Kubeflow Operator (LightGBM on Kubernetes): https://github.com/kubeflow/xgboost-operator

lightgbm_ray (LightGBM on Ray): https://github.com/ray-project/lightgbm_ray

Mars (LightGBM on Mars): https://github.com/mars-project/mars

ML.NET (.NET/C#-package): https://github.com/dotnet/machinelearning

LightGBM.NET (.NET/C#-package): https://github.com/rca22/LightGBM.Net

Ruby gem: https://github.com/ankane/lightgbm-ruby

LightGBM4j (Java high-level binding): https://github.com/metarank/lightgbm4j

lightgbm3 (Rust binding): https://github.com/Mottl/lightgbm3-rs

MLflow (experiment tracking, model monitoring framework): https://github.com/mlflow/mlflow

{bonsai} (R {parsnip}-compliant interface): https://github.com/tidymodels/bonsai

{mlr3extralearners} (R {mlr3}-compliant interface): https://github.com/mlr-org/mlr3extralearners

lightgbm-transform (feature transformation binding): https://github.com/microsoft/lightgbm-transform

postgresml (LightGBM training and prediction in SQL, via a Postgres extension): https://github.com/postgresml/postgresml

vaex-ml (Python DataFrame library with its own interface to LightGBM): https://github.com/vaexio/vaex

Support

How to Contribute

Check CONTRIBUTING page.

Microsoft Open Source Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Reference Papers

Yu Shi, Guolin Ke, Zhuoming Chen, Shuxin Zheng, Tie-Yan Liu. "Quantized Training of Gradient Boosting Decision Trees" (link). Advances in Neural Information Processing Systems 35 (NeurIPS 2022), pp. 18822-18833.

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree". Advances in Neural Information Processing Systems 30 (NIPS 2017), pp. 3149-3157.

Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu. "A Communication-Efficient Parallel Algorithm for Decision Tree". Advances in Neural Information Processing Systems 29 (NIPS 2016), pp. 1279-1287.

Huan Zhang, Si Si and Cho-Jui Hsieh. "GPU Acceleration for Large-scale Tree Boosting". SysML Conference, 2018.

License

This project is licensed under the terms of the MIT license. See LICENSE for additional details.

NPM DownloadsLast 30 Days