recommenders

Best Practices on Recommendation Systems

20,431

3,217

20,431

177

View on GitHub

Top Related Projects

recommenders

20,552

Best Practices on Recommendation Systems

implicit

3,697

Fast Python Collaborative Filtering for Implicit Feedback Datasets

lightfm

4,975

A Python implementation of LightFM, a hybrid recommendation algorithm.

spotlight

3,024

Deep recommender models using PyTorch.

RecBole

3,816

A unified, comprehensive and efficient recommendation library

Surprise

6,642

A Python scikit for building and analyzing recommender systems

Quick Overview

The recommenders-team/recommenders repository is an open-source project by Microsoft that provides a collection of recommendation system algorithms and utilities. It aims to make it easier for developers and data scientists to build and evaluate recommender systems, offering a variety of state-of-the-art algorithms, evaluation metrics, and data processing tools.

Pros

Comprehensive collection of recommendation algorithms, including collaborative filtering, content-based, and hybrid approaches
Extensive evaluation metrics and tools for benchmarking recommender systems
Well-documented and actively maintained by Microsoft and the community
Includes utilities for data preprocessing, feature engineering, and model interpretation

Cons

Steep learning curve for beginners due to the wide range of algorithms and techniques
Some advanced algorithms may require significant computational resources
Limited support for certain specialized recommendation scenarios (e.g., session-based recommendations)
Dependency on specific versions of libraries may cause compatibility issues

Code Examples

Creating a Simple Collaborative Filtering Model:

from recommenders.models.sar import SAR
from recommenders.datasets import movielens

# Load MovieLens data
df = movielens.load_pandas_df()

# Create SAR model
model = SAR(
    col_user="userID",
    col_item="movieID",
    col_rating="rating",
    col_timestamp="timestamp",
    similarity_type="jaccard",
    time_decay_coefficient=30,
    timedecay_formula=True
)

# Fit the model
model.fit(df)

# Get top 10 recommendations for user 1
recommendations = model.recommend_k_items(df[df["userID"] == 1], top_k=10, remove_seen=True)

Evaluating a Model:

from recommenders.evaluation.python_evaluation import (
    map_at_k,
    ndcg_at_k,
    precision_at_k,
    recall_at_k
)
from recommenders.utils.python_utils import create_rating_matrix

# Split data into train and test sets
train, test = python_random_split(df, ratio=0.75)

# Create rating matrix
user_item_matrix = create_rating_matrix(train)

# Get predictions
predictions = model.predict(user_item_matrix)

# Evaluate the model
k = 10
eval_map = map_at_k(test, predictions, k=k)
eval_ndcg = ndcg_at_k(test, predictions, k=k)
eval_precision = precision_at_k(test, predictions, k=k)
eval_recall = recall_at_k(test, predictions, k=k)

print(f"MAP@{k}: {eval_map:.4f}")
print(f"NDCG@{k}: {eval_ndcg:.4f}")
print(f"Precision@{k}: {eval_precision:.4f}")
print(f"Recall@{k}: {eval_recall:.4f}")

Using a Deep Learning Model:

from recommenders.models.deeprec.models.xdeepfm import XDeepFMModel
from recommenders.models.deeprec.DataModel.ImplicitCF import ImplicitCF

# Prepare data
data = ImplicitCF(train, test, user_col="userID", item_col="movieID", behavior_col="rating")

# Create and train the model
model = XDeepFMModel(hparams, data)
model.fit()

# Make predictions
predictions = model.predict(test)

Getting Started

To get started with the recommenders library:

Install the library:

pip install recommenders

Import and use the desired modules:

from recommenders.datasets import movielens
from recommenders.models.sar import SAR

# Load data
df = movielens.load_pandas_df()

# Create and train a model
model = SAR(col_user="userID", col_item="movieID", col_rating="rating")
model.fit(df)

# Get recommendations
recommendations = model.recommend_k_items(df[df["user

Competitor Comparisons

recommenders

20,552

Best Practices on Recommendation Systems

Pros of recommenders

More comprehensive and feature-rich recommendation system toolkit
Actively maintained with regular updates and contributions
Extensive documentation and examples for various recommendation scenarios

Cons of recommenders

Potentially more complex to set up and use for beginners
May have higher computational requirements due to advanced features
Larger codebase might make it harder to customize or extend

Code Comparison

recommenders:

from recommenders.models.deeprec.models.xdeepfm import XDeepFMModel
from recommenders.models.deeprec.io.iterator import FFMTextIterator

model = XDeepFMModel(hparams, iterator_creator=FFMTextIterator)
model.fit(train_file, valid_file)

recommenders>:

# No direct code comparison available as recommenders> is not a valid repository name.
# The comparison request appears to be a typo or mistake.

Note: The comparison between recommenders and "recommenders>" is not possible as the latter is not a valid repository name. The provided information focuses solely on the recommenders repository.

implicit

3,697

Fast Python Collaborative Filtering for Implicit Feedback Datasets

Pros of implicit

Specialized focus on implicit feedback recommender algorithms
Faster performance due to C++ implementations with Python bindings
Smaller, more focused codebase for easier integration

Cons of implicit

Limited to implicit feedback models only
Fewer evaluation metrics and utilities compared to recommenders
Less extensive documentation and examples

Code comparison

implicit:

model = implicit.als.AlternatingLeastSquares(factors=50)
model.fit(user_item_matrix)
recommendations = model.recommend(user_id, user_item_matrix[user_id])

recommenders:

model = cornac.models.BPR(k=10, max_iter=100, learning_rate=0.001)
recommender = cornac.models.BPR.from_splits(train_data)
recommender.fit(train_data)
predictions = recommender.predict(test_data)

Both libraries offer concise APIs for training and making recommendations, but implicit focuses on efficient implementations of specific algorithms, while recommenders provides a wider range of models and evaluation tools. implicit's code is more streamlined for its specialized use case, while recommenders offers more flexibility and options for different recommendation scenarios.

lightfm

4,975

A Python implementation of LightFM, a hybrid recommendation algorithm.

Pros of LightFM

Focused on hybrid and content-based recommendation algorithms
Efficient implementation in C++ with Python bindings
Supports both implicit and explicit feedback

Cons of LightFM

Limited to matrix factorization-based models
Smaller community and fewer contributors
Less comprehensive documentation compared to Recommenders

Code Comparison

LightFM:

from lightfm import LightFM
model = LightFM(loss='warp')
model.fit(train, epochs=10)

Recommenders:

from recommenders.models.ncf.ncf_singlenode import NCF
model = NCF(n_users, n_items, model_type='NeuMF')
model.fit(train_loader, epochs=10)

Summary

LightFM is a specialized library for hybrid recommendation systems, offering efficient implementations of matrix factorization models. It's suitable for projects requiring specific hybrid algorithms and performance optimization.

Recommenders is a more comprehensive toolkit with a broader range of algorithms and utilities. It provides extensive documentation and a larger community, making it ideal for diverse recommendation tasks and experimentation.

The choice between the two depends on the specific requirements of the project, such as the need for hybrid models, performance considerations, and the desired level of community support and documentation.

spotlight

3,024

Deep recommender models using PyTorch.

Pros of Spotlight

Focused on deep learning-based recommender systems
Provides GPU acceleration for faster training
Offers a clean, PyTorch-based API for building recommendation models

Cons of Spotlight

Limited to specific types of recommendation algorithms
Smaller community and fewer updates compared to Recommenders
Less comprehensive documentation and examples

Code Comparison

Spotlight:

from spotlight.interactions import Interactions
from spotlight.factorization.explicit import ExplicitFactorizationModel

model = ExplicitFactorizationModel(n_iter=1)
model.fit(interactions)

Recommenders:

from recommenders.models.fastai.fastai_recommender import FastaiRecommender

model = FastaiRecommender(n_factors=40, y_range=(0.5, 5.5))
model.fit(train, val)

Summary

Spotlight is a specialized library for deep learning-based recommender systems, offering GPU acceleration and a PyTorch-based API. However, it has a narrower focus and smaller community compared to Recommenders. Recommenders provides a more comprehensive toolkit with various algorithms and extensive documentation, making it suitable for a wider range of recommendation tasks. The choice between the two depends on specific project requirements and familiarity with deep learning frameworks.

RecBole

3,816

A unified, comprehensive and efficient recommendation library

Pros of RecBole

Offers a wider range of recommendation algorithms, including more advanced and recent models
Provides a more comprehensive set of evaluation metrics and tools for model analysis
Supports more diverse data formats and preprocessing options

Cons of RecBole

Steeper learning curve due to its more complex architecture and extensive features
Less focus on industrial-scale deployment compared to Recommenders
Documentation may be less comprehensive for beginners

Code Comparison

RecBole:

from recbole.quick_start import run_recbole

run_recbole(model='BPR', dataset='ml-100k')

Recommenders:

from recommenders.models.bpr import BPR
from recommenders.datasets import movielens

data = movielens.load_pandas_df()
model = BPR()
model.fit(data)

Both libraries offer easy-to-use interfaces for running recommendation models, but RecBole provides a more streamlined approach with its run_recbole function. Recommenders requires more explicit steps for data loading and model initialization, which can offer more flexibility but may be less convenient for quick experimentation.

Surprise

6,642

A Python scikit for building and analyzing recommender systems

Pros of Surprise

Lightweight and focused specifically on collaborative filtering algorithms
Easy to use with a scikit-learn inspired API
Extensive documentation and examples for beginners

Cons of Surprise

Limited to collaborative filtering, lacking content-based and hybrid approaches
Fewer advanced features and customization options
Smaller community and less frequent updates

Code Comparison

Surprise:

from surprise import SVD
from surprise import Dataset
from surprise.model_selection import cross_validate

data = Dataset.load_builtin('ml-100k')
algo = SVD()
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Recommenders:

from recommenders.models.svd import SVD
from recommenders.datasets import movielens
from recommenders.evaluation.python_evaluation import mae, rmse

data = movielens.load_pandas_df()
model = SVD()
model.fit(data)
predictions = model.predict(data)
print(f"RMSE: {rmse(data, predictions)}, MAE: {mae(data, predictions)}")

Both libraries offer implementations of popular recommendation algorithms, but Recommenders provides a broader range of techniques and more flexibility for large-scale applications. Surprise is more beginner-friendly and focused on collaborative filtering, while Recommenders offers a comprehensive toolkit for various recommendation scenarios.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

What's New (April, 2025)

We reached 20,000 stars!!

We are happy to announce that we have reached 20,000 stars on GitHub! Thank you for your support and contributions to the Recommenders project. We are excited to continue building and improving this project with your help.

Check out the release Recommenders 1.2.1!

We fixed a lot of bugs due to dependencies, improved security, reviewed the notebooks and the libraries.

Introduction

Recommenders objective is to assist researchers, developers and enthusiasts in prototyping, experimenting with and bringing to production a range of classic and state-of-the-art recommendation systems.

Recommenders is a project under the Linux Foundation of AI and Data.

This repository contains examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:

Prepare Data: Preparing and loading data for each recommendation algorithm.
Model: Building models using various classical and deep learning recommendation algorithms such as Alternating Least Squares (ALS) or eXtreme Deep Factorization Machines (xDeepFM).
Evaluate: Evaluating algorithms with offline metrics.
Model Select and Optimize: Tuning and optimizing hyperparameters for recommendation models.
Operationalize: Operationalizing models in a production environment on Azure.

Several utilities are provided in recommenders to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. See the Recommenders documentation.

For a more detailed overview of the repository, please see the documents on the wiki page.

For some of the practical scenarios where recommendation systems have been applied, see scenarios.

Getting Started

We recommend conda for environment management, and VS Code for development. To install the recommenders package and run an example notebook on Linux/WSL:

# 1. Install gcc if it is not installed already. On Ubuntu, this could done by using the command
# sudo apt install gcc

# 2. Create and activate a new conda environment
conda create -n <environment_name> python=3.9
conda activate <environment_name>

# 3. Install the core recommenders package. It can run all the CPU notebooks.
pip install recommenders

# 4. create a Jupyter kernel
python -m ipykernel install --user --name <environment_name> --display-name <kernel_name>

# 5. Clone this repo within VSCode or using command line:
git clone https://github.com/recommenders-team/recommenders.git

# 6. Within VSCode:
#   a. Open a notebook, e.g., examples/00_quick_start/sar_movielens.ipynb;  
#   b. Select Jupyter kernel <kernel_name>;
#   c. Run the notebook.

For more information about setup on other platforms (e.g., Windows and macOS) and different configurations (e.g., GPU, Spark and experimental features), see the Setup Guide.

In addition to the core package, several extras are also provided, including:

[gpu]: Needed for running GPU models.
[spark]: Needed for running Spark models.
[dev]: Needed for development for the repo.
[all]: [gpu]|[spark]|[dev]
[experimental]: Models that are not thoroughly tested and/or may require additional steps in installation.

Algorithms

The table below lists the recommendation algorithms currently available in the repository. Notebooks are linked under the Example column as Quick start, showcasing an easy to run example of the algorithm, or as Deep dive, explaining in detail the math and implementation of the algorithm.

Algorithm	Type	Description	Example
Alternating Least Squares (ALS)	Collaborative Filtering	Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized for scalability and distributed computing capability. It works in the PySpark environment.	Quick start / Deep dive
Attentive Asynchronous Singular Value Decomposition (A2SVD)^*	Collaborative Filtering	Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism. It works in the CPU/GPU environment.	Quick start
Cornac/Bayesian Personalized Ranking (BPR)	Collaborative Filtering	Matrix factorization algorithm for predicting item ranking with implicit feedback. It works in the CPU environment.	Deep dive
Cornac/Bilateral Variational Autoencoder (BiVAE)	Collaborative Filtering	Generative model for dyadic data (e.g., user-item interactions). It works in the CPU/GPU environment.	Deep dive
Convolutional Sequence Embedding Recommendation (Caser)	Collaborative Filtering	Algorithm based on convolutions that aim to capture both userâs general preferences and sequential patterns. It works in the CPU/GPU environment.	Quick start
Deep Knowledge-Aware Network (DKN)^*	Content-Based Filtering	Deep learning algorithm incorporating a knowledge graph and article embeddings for providing news or article recommendations. It works in the CPU/GPU environment.	Quick start / Deep dive
Extreme Deep Factorization Machine (xDeepFM)^*	Collaborative Filtering	Deep learning based algorithm for implicit and explicit feedback with user/item features. It works in the CPU/GPU environment.	Quick start
FastAI Embedding Dot Bias (FAST)	Collaborative Filtering	General purpose algorithm with embeddings and biases for users and items. It works in the CPU/GPU environment.	Quick start
LightFM/Factorization Machine	Collaborative Filtering	Factorization Machine algorithm for both implicit and explicit feedbacks. It works in the CPU environment.	Quick start
LightGBM/Gradient Boosting Tree^*	Content-Based Filtering	Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems. It works in the CPU/GPU/PySpark environments.	Quick start in CPU / Deep dive in PySpark
LightGCN	Collaborative Filtering	Deep learning algorithm which simplifies the design of GCN for predicting implicit feedback. It works in the CPU/GPU environment.	Deep dive
GeoIMC^*	Collaborative Filtering	Matrix completion algorithm that takes into account user and item features using Riemannian conjugate gradient optimization and follows a geometric approach. It works in the CPU environment.	Quick start
GRU	Collaborative Filtering	Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks. It works in the CPU/GPU environment.	Quick start
Multinomial VAE	Collaborative Filtering	Generative model for predicting user/item interactions. It works in the CPU/GPU environment.	Deep dive
Neural Recommendation with Long- and Short-term User Representations (LSTUR)^*	Content-Based Filtering	Neural recommendation algorithm for recommending news articles with long- and short-term user interest modeling. It works in the CPU/GPU environment.	Quick start
Neural Recommendation with Attentive Multi-View Learning (NAML)^*	Content-Based Filtering	Neural recommendation algorithm for recommending news articles with attentive multi-view learning. It works in the CPU/GPU environment.	Quick start
Neural Collaborative Filtering (NCF)	Collaborative Filtering	Deep learning algorithm with enhanced performance for user/item implicit feedback. It works in the CPU/GPU environment.	Quick start / Deep dive
Neural Recommendation with Personalized Attention (NPA)^*	Content-Based Filtering	Neural recommendation algorithm for recommending news articles with personalized attention network. It works in the CPU/GPU environment.	Quick start
Neural Recommendation with Multi-Head Self-Attention (NRMS)^*	Content-Based Filtering	Neural recommendation algorithm for recommending news articles with multi-head self-attention. It works in the CPU/GPU environment.	Quick start
Next Item Recommendation (NextItNet)	Collaborative Filtering	Algorithm based on dilated convolutions and residual network that aims to capture sequential patterns. It considers both user/item interactions and features. It works in the CPU/GPU environment.	Quick start
Restricted Boltzmann Machines (RBM)	Collaborative Filtering	Neural network based algorithm for learning the underlying probability distribution for explicit or implicit user/item feedback. It works in the CPU/GPU environment.	Quick start / Deep dive
Riemannian Low-rank Matrix Completion (RLRMC)^*	Collaborative Filtering	Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption to predict user/item interactions. It works in the CPU environment.	Quick start
Simple Algorithm for Recommendation (SAR)^*	Collaborative Filtering	Similarity-based algorithm for implicit user/item feedback. It works in the CPU environment.	Quick start / Deep dive
Self-Attentive Sequential Recommendation (SASRec)	Collaborative Filtering	Transformer based algorithm for sequential recommendation. It works in the CPU/GPU environment.	Quick start
Short-term and Long-term Preference Integrated Recommender (SLi-Rec)^*	Collaborative Filtering	Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism, a time-aware controller and a content-aware controller. It works in the CPU/GPU environment.	Quick start
Multi-Interest-Aware Sequential User Modeling (SUM)^*	Collaborative Filtering	An enhanced memory network-based sequential user model which aims to capture users' multiple interests. It works in the CPU/GPU environment.	Quick start
Sequential Recommendation Via Personalized Transformer (SSEPT)	Collaborative Filtering	Transformer based algorithm for sequential recommendation with User embedding. It works in the CPU/GPU environment.	Quick start
Standard VAE	Collaborative Filtering	Generative Model for predicting user/item interactions. It works in the CPU/GPU environment.	Deep dive
Surprise/Singular Value Decomposition (SVD)	Collaborative Filtering	Matrix factorization algorithm for predicting explicit rating feedback in small datasets. It works in the CPU/GPU environment.	Deep dive
Term Frequency - Inverse Document Frequency (TF-IDF)	Content-Based Filtering	Simple similarity-based algorithm for content-based recommendations with text datasets. It works in the CPU environment.	Quick start
Vowpal Wabbit (VW)^*	Content-Based Filtering	Fast online learning algorithms, great for scenarios where user features / context are constantly changing. It uses the CPU for online learning.	Deep dive
Wide and Deep	Collaborative Filtering	Deep learning algorithm that can memorize feature interactions and generalize user features. It works in the CPU/GPU environment.	Quick start
xLearn/Factorization Machine (FM) & Field-Aware FM (FFM)	Collaborative Filtering	Quick and memory efficient algorithm to predict labels with user/item features. It works in the CPU/GPU environment.	Deep dive

NOTE: ^* indicates algorithms invented/contributed by Microsoft.

Independent or incubating algorithms and utilities are candidates for the contrib folder. This will house contributions which may not easily fit into the core repository or need time to refactor or mature the code and add necessary tests.

Algorithm	Type	Description	Example
SARplus ^*	Collaborative Filtering	Optimized implementation of SAR for Spark	Quick start

Algorithm Comparison

We provide a benchmark notebook to illustrate how different algorithms could be evaluated and compared. In this notebook, the MovieLens dataset is split into training/test sets at a 75/25 ratio using a stratified split. A recommendation model is trained using each of the collaborative filtering algorithms below. We utilize empirical parameter values reported in literature here. For ranking metrics we use k=10 (top 10 recommended items). We run the comparison on a Standard NC6s_v2 Azure DSVM (6 vCPUs, 112 GB memory and 1 P100 GPU). Spark ALS is run in local standalone mode. In this table we show the results on Movielens 100k, running the algorithms for 15 epochs.

Algo	MAP	nDCG@k	Precision@k	Recall@k	RMSE	MAE	R²	Explained Variance
ALS	0.004732	0.044239	0.048462	0.017796	0.965038	0.753001	0.255647	0.251648
BiVAE	0.146126	0.475077	0.411771	0.219145	N/A	N/A	N/A	N/A
BPR	0.132478	0.441997	0.388229	0.212522	N/A	N/A	N/A	N/A
FastAI	0.025503	0.147866	0.130329	0.053824	0.943084	0.744337	0.285308	0.287671
LightGCN	0.088526	0.419846	0.379626	0.144336	N/A	N/A	N/A	N/A
NCF	0.107720	0.396118	0.347296	0.180775	N/A	N/A	N/A	N/A
SAR	0.110591	0.382461	0.330753	0.176385	1.253805	1.048484	-0.569363	0.030474
SVD	0.012873	0.095930	0.091198	0.032783	0.938681	0.742690	0.291967	0.291971

Contributing

This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.

This project adheres to this Code of Conduct in order to foster a welcoming and inspiring community for all.

Build Status

These tests are the nightly builds, which compute the asynchronous tests. main is our principal branch and staging is our development branch. We use pytest for testing python utilities in recommenders and the Recommenders notebook executor for the notebooks.

For more information about the testing pipelines, please see the test documentation.

AzureML Nightly Build Status

The nightly build tests are run daily on AzureML.

Build Type	Branch	Branch
Linux CPU	main	staging
Linux GPU	main	staging
Linux Spark	main	staging

References

FREE COURSE: M. GonzÃ¡lez-Fierro, "Recommendation Systems: A Practical Introduction", LinkedIn Learning, 2024. Available on this link.
D. Li, J. Lian, L. Zhang, K. Ren, D. Lu, T. Wu, X. Xie, "Recommender Systems: Frontiers and Practices", Springer, Beijing, 2024. Available on this link.
A. Argyriou, M. GonzÃ¡lez-Fierro, and L. Zhang, "Microsoft Recommenders: Best Practices for Production-Ready Recommendation Systems", WWW 2020: International World Wide Web Conference Taipei, 2020. Available online: https://dl.acm.org/doi/abs/10.1145/3366424.3382692
S. Graham, J.K. Min, T. Wu, "Microsoft recommenders: tools to accelerate developing recommender systems", RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems, 2019. Available online: https://dl.acm.org/doi/10.1145/3298689.3346967
L. Zhang, T. Wu, X. Xie, A. Argyriou, M. GonzÃ¡lez-Fierro and J. Lian, "Building Production-Ready Recommendation System at Scale", ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (KDD 2019), 2019.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot