LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

17,445

3,918

17,445

454

View on GitHub View on NPM

Top Related Projects

xgboost

27,179

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

catboost

8,500

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

scikit-learn

62,466

scikit-learn: machine learning in Python

h2o-3

7,244

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Quick Overview

LightGBM is a fast, distributed, high-performance gradient boosting framework based on decision tree algorithms, used for ranking, classification, and many other machine learning tasks. It is developed by Microsoft and is designed to be efficient, scalable, and accurate, particularly for large datasets.

Pros

Faster training speed and higher efficiency compared to other boosting frameworks
Lower memory usage due to its histogram-based algorithm
Supports parallel, distributed, and GPU learning
Handles large-scale data with ease

Cons

Can be prone to overfitting if not properly tuned
May require more careful parameter tuning compared to some other frameworks
Less interpretable than simpler models like decision trees
Documentation can be challenging for beginners

Code Examples

Basic binary classification:

import lightgbm as lgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2)

# Create dataset for LightGBM
train_data = lgb.Dataset(X_train, label=y_train)

# Set parameters
params = {
    'objective': 'binary',
    'metric': 'binary_logloss',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9
}

# Train model
model = lgb.train(params, train_data, num_boost_round=100)

# Make predictions
y_pred = model.predict(X_test)

Feature importance visualization:

import matplotlib.pyplot as plt

# Get feature importance
importance = model.feature_importance()
feature_names = data.feature_names

# Plot feature importance
plt.figure(figsize=(10, 6))
plt.bar(range(len(importance)), importance)
plt.xticks(range(len(importance)), feature_names, rotation=90)
plt.title('Feature Importance')
plt.tight_layout()
plt.show()

Cross-validation:

from sklearn.model_selection import cross_val_score

# Prepare LightGBM dataset
lgb_dataset = lgb.Dataset(data.data, label=data.target)

# Perform 5-fold cross-validation
cv_results = lgb.cv(params, lgb_dataset, num_boost_round=100, nfold=5, stratified=True, shuffle=True)

# Print mean and standard deviation of AUC
print(f"AUC: {cv_results['auc-mean'][-1]:.4f} (+/- {cv_results['auc-stdv'][-1]:.4f})")

Getting Started

To get started with LightGBM:

Install LightGBM:
```
pip install lightgbm
```
Import the library:
```
import lightgbm as lgb
```

Prepare your data and create a LightGBM dataset:

train_data = lgb.Dataset(X_train, label=y_train)

Set parameters and train the model:

params = {'objective': 'binary'}
model = lgb.train(params, train_data, num_boost_round=100)

Make predictions:
```
y_pred = model.predict(X_test)
```

Competitor Comparisons

xgboost

27,179

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Pros of XGBoost

More mature and widely adopted in industry and competitions
Better handling of missing values
Stronger support for distributed and GPU computing

Cons of XGBoost

Generally slower training speed, especially for large datasets
Higher memory usage
More complex hyperparameter tuning

Code Comparison

XGBoost:

import xgboost as xgb
model = xgb.XGBClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

LightGBM:

import lightgbm as lgb
model = lgb.LGBMClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

Both libraries offer similar APIs, making it easy to switch between them. The main differences lie in their underlying algorithms and performance characteristics. XGBoost is often preferred for smaller datasets or when dealing with missing values, while LightGBM shines with larger datasets and faster training times. The choice between the two often depends on the specific use case and dataset characteristics.

catboost

8,500

Pros of CatBoost

Better handling of categorical features without manual preprocessing
Improved performance on datasets with high cardinality categorical features
Built-in GPU support for faster training

Cons of CatBoost

Generally slower training time compared to LightGBM
Less extensive documentation and community support
Fewer advanced features and customization options

Code Comparison

CatBoost:

from catboost import CatBoostRegressor

model = CatBoostRegressor(iterations=1000, learning_rate=0.1)
model.fit(X_train, y_train, cat_features=cat_features)
predictions = model.predict(X_test)

LightGBM:

import lightgbm as lgb

train_data = lgb.Dataset(X_train, label=y_train)
params = {'num_leaves': 31, 'learning_rate': 0.1}
model = lgb.train(params, train_data, num_boost_round=1000)
predictions = model.predict(X_test)

Both CatBoost and LightGBM are powerful gradient boosting libraries, each with its own strengths. CatBoost excels in handling categorical features and provides built-in GPU support, while LightGBM offers faster training times and more advanced customization options. The choice between the two depends on the specific requirements of your project and the nature of your dataset.

scikit-learn

62,466

scikit-learn: machine learning in Python

Pros of scikit-learn

Comprehensive library with a wide range of machine learning algorithms
Excellent documentation and community support
Consistent API across different algorithms, making it easy to use and switch between models

Cons of scikit-learn

Generally slower performance compared to specialized libraries like LightGBM
Less efficient for large-scale datasets and high-dimensional problems
Limited support for GPU acceleration

Code Comparison

scikit-learn:

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

LightGBM:

import lightgbm as lgb
train_data = lgb.Dataset(X_train, label=y_train)
params = {'objective': 'binary'}
model = lgb.train(params, train_data)
predictions = model.predict(X_test)

Both libraries offer easy-to-use APIs, but LightGBM is more focused on gradient boosting and provides faster training times, especially for large datasets. scikit-learn offers a broader range of algorithms and is more suitable for general-purpose machine learning tasks, while LightGBM excels in gradient boosting applications and handling high-dimensional data efficiently.

h2o-3

7,244

Pros of h2o-3

Supports a wider range of algorithms and models, including deep learning
Offers a user-friendly web interface for non-programmers
Provides built-in distributed computing capabilities

Cons of h2o-3

Generally slower performance compared to LightGBM
More complex setup and configuration process
Larger memory footprint, especially for big datasets

Code Comparison

h2o-3:

import h2o
from h2o.estimators import H2OGradientBoostingEstimator

h2o.init()
data = h2o.import_file("data.csv")
model = H2OGradientBoostingEstimator()
model.train(x=["feature1", "feature2"], y="target", training_frame=data)

LightGBM:

import lightgbm as lgb
from sklearn.datasets import load_iris

data = load_iris()
train_data = lgb.Dataset(data.data, label=data.target)
params = {'objective': 'multiclass', 'num_class': 3}
model = lgb.train(params, train_data)

Both libraries offer gradient boosting implementations, but LightGBM focuses on efficiency and speed, while h2o-3 provides a broader range of algorithms and features. LightGBM's code is more concise and straightforward, while h2o-3 requires additional setup steps but offers more flexibility in terms of data handling and model configuration.

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Comprehensive ecosystem for deep learning and neural networks
Supports distributed computing and GPU acceleration
Extensive community support and resources

Cons of TensorFlow

Steeper learning curve compared to LightGBM
Higher computational requirements for simple tasks
More complex setup and configuration

Code Comparison

LightGBM:

import lightgbm as lgb
model = lgb.LGBMClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

TensorFlow:

import tensorflow as tf
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(X_train, y_train, epochs=10)
predictions = model.predict(X_test)

LightGBM is more concise and straightforward for gradient boosting tasks, while TensorFlow offers more flexibility for complex neural network architectures but requires more code for simple models.

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

More flexible and dynamic computational graph, allowing for easier debugging and experimentation
Broader ecosystem and community support, with extensive libraries for various deep learning tasks
Better support for GPU acceleration and distributed computing

Cons of PyTorch

Steeper learning curve for beginners compared to LightGBM's simpler API
Generally slower training speed for traditional machine learning tasks

Code Comparison

PyTorch example (neural network):

import torch.nn as nn

model = nn.Sequential(
    nn.Linear(10, 5),
    nn.ReLU(),
    nn.Linear(5, 1)
)

LightGBM example (gradient boosting):

import lightgbm as lgb

model = lgb.LGBMRegressor(
    n_estimators=100,
    learning_rate=0.1
)

Summary

PyTorch is a deep learning framework offering flexibility and a rich ecosystem, ideal for complex neural network architectures and research. LightGBM, on the other hand, is a gradient boosting framework optimized for efficiency and speed in traditional machine learning tasks. While PyTorch excels in deep learning and GPU utilization, LightGBM is often preferred for its simplicity and faster training on structured data.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Light Gradient Boosting Machine

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:

Faster training speed and higher efficiency.
Lower memory usage.
Better accuracy.
Support of parallel, distributed, and GPU learning.
Capable of handling large-scale data.

For further details, please refer to Features.

Benefiting from these advantages, LightGBM is being widely-used in many winning solutions of machine learning competitions.

Comparison experiments on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, distributed learning experiments show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.

Get Started and Documentation

Our primary documentation is at https://lightgbm.readthedocs.io/ and is generated from this repository. If you are new to LightGBM, follow the installation instructions on that site.

Next you may want to read:

Examples showing command line usage of common tasks.
Features and algorithms supported by LightGBM.
Parameters is an exhaustive list of customization you can make.
Distributed Learning and GPU Learning can speed up computation.
FLAML provides automated tuning for LightGBM (code examples).
Optuna Hyperparameter Tuner provides automated tuning for LightGBM hyperparameters (code examples).
Understanding LightGBM Parameters (and How to Tune Them using Neptune).

Documentation for contributors:

How we update readthedocs.io.
Check out the Development Guide.

News

Please refer to changelogs at GitHub releases page.

External (Unofficial) Repositories

Projects listed here offer alternative ways to use LightGBM. They are not maintained or officially endorsed by the LightGBM development team.

JPMML (Java PMML converter): https://github.com/jpmml/jpmml-lightgbm

Nyoka (Python PMML converter): https://github.com/SoftwareAG/nyoka

Treelite (model compiler for efficient deployment): https://github.com/dmlc/treelite

lleaves (LLVM-based model compiler for efficient inference): https://github.com/siboehm/lleaves

Hummingbird (model compiler into tensor computations): https://github.com/microsoft/hummingbird

cuML Forest Inference Library (GPU-accelerated inference): https://github.com/rapidsai/cuml

daal4py (Intel CPU-accelerated inference): https://github.com/intel/scikit-learn-intelex/tree/master/daal4py

m2cgen (model appliers for various languages): https://github.com/BayesWitnesses/m2cgen

leaves (Go model applier): https://github.com/dmitryikh/leaves

ONNXMLTools (ONNX converter): https://github.com/onnx/onnxmltools

SHAP (model output explainer): https://github.com/slundberg/shap

Shapash (model visualization and interpretation): https://github.com/MAIF/shapash

dtreeviz (decision tree visualization and model interpretation): https://github.com/parrt/dtreeviz

supertree (interactive visualization of decision trees): https://github.com/mljar/supertree

SynapseML (LightGBM on Spark): https://github.com/microsoft/SynapseML

Kubeflow Fairing (LightGBM on Kubernetes): https://github.com/kubeflow/fairing

Kubeflow Operator (LightGBM on Kubernetes): https://github.com/kubeflow/xgboost-operator

lightgbm_ray (LightGBM on Ray): https://github.com/ray-project/lightgbm_ray

Mars (LightGBM on Mars): https://github.com/mars-project/mars

ML.NET (.NET/C#-package): https://github.com/dotnet/machinelearning

LightGBM.NET (.NET/C#-package): https://github.com/rca22/LightGBM.Net

LightGBM Ruby (Ruby gem): https://github.com/ankane/lightgbm-ruby

LightGBM4j (Java high-level binding): https://github.com/metarank/lightgbm4j

LightGBM4J (JVM interface for LightGBM written in Scala): https://github.com/seek-oss/lightgbm4j

Julia-package: https://github.com/IQVIA-ML/LightGBM.jl

lightgbm3 (Rust binding): https://github.com/Mottl/lightgbm3-rs

MLServer (inference server for LightGBM): https://github.com/SeldonIO/MLServer

MLflow (experiment tracking, model monitoring framework): https://github.com/mlflow/mlflow

FLAML (AutoML library for hyperparameter optimization): https://github.com/microsoft/FLAML

MLJAR AutoML (AutoML on tabular data): https://github.com/mljar/mljar-supervised

Optuna (hyperparameter optimization framework): https://github.com/optuna/optuna

LightGBMLSS (probabilistic modelling with LightGBM): https://github.com/StatMixedML/LightGBMLSS

mlforecast (time series forecasting with LightGBM): https://github.com/Nixtla/mlforecast

skforecast (time series forecasting with LightGBM): https://github.com/JoaquinAmatRodrigo/skforecast

{bonsai} (R {parsnip}-compliant interface): https://github.com/tidymodels/bonsai

{mlr3extralearners} (R {mlr3}-compliant interface): https://github.com/mlr-org/mlr3extralearners

lightgbm-transform (feature transformation binding): https://github.com/microsoft/lightgbm-transform

postgresml (LightGBM training and prediction in SQL, via a Postgres extension): https://github.com/postgresml/postgresml

pyodide (run lightgbm Python-package in a web browser): https://github.com/pyodide/pyodide

vaex-ml (Python DataFrame library with its own interface to LightGBM): https://github.com/vaexio/vaex

Support

Ask a question on Stack Overflow with the lightgbm tag, we monitor this for new questions.
Open bug reports and feature requests on GitHub issues.

How to Contribute

Check CONTRIBUTING page.

Microsoft Open Source Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Reference Papers

Yu Shi, Guolin Ke, Zhuoming Chen, Shuxin Zheng, Tie-Yan Liu. "Quantized Training of Gradient Boosting Decision Trees" (link). Advances in Neural Information Processing Systems 35 (NeurIPS 2022), pp. 18822-18833.

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree". Advances in Neural Information Processing Systems 30 (NIPS 2017), pp. 3149-3157.

Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu. "A Communication-Efficient Parallel Algorithm for Decision Tree". Advances in Neural Information Processing Systems 29 (NIPS 2016), pp. 1279-1287.

Huan Zhang, Si Si and Cho-Jui Hsieh. "GPU Acceleration for Large-scale Tree Boosting". SysML Conference, 2018.

License

This project is licensed under the terms of the MIT license. See LICENSE for additional details.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of XGBoost

Cons of XGBoost

Code Comparison

Pros of CatBoost

Cons of CatBoost

Code Comparison

Pros of scikit-learn

Cons of scikit-learn

Code Comparison

Pros of h2o-3

Cons of h2o-3

Code Comparison

Pros of TensorFlow

Cons of TensorFlow

Code Comparison

Pros of PyTorch

Cons of PyTorch

Code Comparison

Summary

Convert designs to code with AI

README

Light Gradient Boosting Machine

Get Started and Documentation

News

External (Unofficial) Repositories

Support

How to Contribute

Microsoft Open Source Code of Conduct

Reference Papers

License

Top Related Projects

Convert designs to code with AI

NPM DownloadsLast 30 Days