implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets

3,662

621

3,662

100

View on GitHub

Top Related Projects

spotlight

3,017

Deep recommender models using PyTorch.

lightfm

4,906

A Python implementation of LightFM, a hybrid recommendation algorithm.

recommenders

20,124

Best Practices on Recommendation Systems

scikit-learn

62,466

scikit-learn: machine learning in Python

Surprise

6,597

A Python scikit for building and analyzing recommender systems

Quick Overview

Implicit is a fast Python collaborative filtering library for building recommender systems. It utilizes modern CPU and GPU optimizations to provide efficient implementations of various matrix factorization algorithms, including Alternating Least Squares (ALS) and Bayesian Personalized Ranking (BPR).

Pros

High performance: Optimized for speed using Cython and CUDA
Supports both CPU and GPU computations
Implements multiple recommendation algorithms
Easy to use with scikit-learn-like API

Cons

Limited to collaborative filtering techniques
Requires additional dependencies for GPU support
Documentation could be more comprehensive
May have a steeper learning curve for beginners in recommender systems

Code Examples

Basic usage with Alternating Least Squares (ALS):

from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens

# Load the MovieLens 100k dataset
ratings = get_movielens()

# Initialize and train the ALS model
model = AlternatingLeastSquares(factors=50, iterations=10)
model.fit(ratings)

# Get recommendations for a user
user_id = 1
recommendations = model.recommend(user_id, ratings[user_id])

Using Bayesian Personalized Ranking (BPR):

from implicit.bpr import BayesianPersonalizedRanking

# Initialize and train the BPR model
model = BayesianPersonalizedRanking(factors=100, iterations=20)
model.fit(ratings)

# Get similar items
item_id = 1
similar_items = model.similar_items(item_id)

Evaluating model performance:

from implicit.evaluation import train_test_split, precision_at_k

# Split the data into train and test sets
train, test = train_test_split(ratings)

# Train the model on the training data
model = AlternatingLeastSquares(factors=50)
model.fit(train)

# Evaluate the model using precision@k
p_at_k = precision_at_k(model, train, test, K=10)
print(f"Precision@10: {p_at_k:.4f}")

Getting Started

To get started with Implicit, follow these steps:

Install the library:
```
pip install implicit
```

Import the necessary modules:

from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens

Load a dataset and train a model:

ratings = get_movielens()
model = AlternatingLeastSquares()
model.fit(ratings)

Generate recommendations:

user_id = 1
recommendations = model.recommend(user_id, ratings[user_id])
print(recommendations)

Competitor Comparisons

spotlight

3,017

Deep recommender models using PyTorch.

Pros of Spotlight

Built on PyTorch, allowing for GPU acceleration and automatic differentiation
Supports more advanced models like sequence models and factorization machines
Offers a higher-level API for easier model experimentation

Cons of Spotlight

Less mature and potentially less stable compared to Implicit
Smaller community and fewer contributors
May have a steeper learning curve for users not familiar with PyTorch

Code Comparison

Spotlight example:

from spotlight.interactions import Interactions
from spotlight.factorization.explicit import ExplicitFactorizationModel

model = ExplicitFactorizationModel(n_iter=1)
model.fit(interactions)

Implicit example:

from implicit.als import AlternatingLeastSquares

model = AlternatingLeastSquares(factors=50)
model.fit(user_item_matrix)

Both libraries offer concise APIs for creating and training recommendation models. Spotlight's API is more PyTorch-like, while Implicit's is more scikit-learn-like in its approach.

Spotlight provides more flexibility in model architecture and loss functions, making it suitable for advanced use cases. Implicit, on the other hand, focuses on efficient implementations of classic collaborative filtering algorithms, making it a solid choice for production environments with large-scale data.

lightfm

4,906

A Python implementation of LightFM, a hybrid recommendation algorithm.

Pros of LightFM

Supports both explicit and implicit feedback
Includes features for content-based filtering and hybrid recommendations
Offers more advanced loss functions like WARP and BPR

Cons of LightFM

Generally slower performance, especially for large datasets
Less optimized for pure collaborative filtering tasks
Requires more memory for training and prediction

Code Comparison

LightFM example:

from lightfm import LightFM
model = LightFM(loss='warp')
model.fit(train, epochs=10)

implicit example:

from implicit.als import AlternatingLeastSquares
model = AlternatingLeastSquares()
model.fit(train)

Key Differences

implicit focuses on collaborative filtering with implicit feedback, while LightFM supports a broader range of recommendation tasks
implicit is generally faster and more memory-efficient for large-scale problems
LightFM offers more flexibility in terms of loss functions and hybrid models
implicit provides specialized algorithms like ALS, while LightFM uses a more general factorization approach

Both libraries are valuable tools for recommendation systems, with implicit being more suitable for large-scale collaborative filtering tasks and LightFM offering greater versatility for complex recommendation scenarios.

recommenders

20,124

Best Practices on Recommendation Systems

Pros of recommenders

Broader scope with multiple recommendation algorithms and evaluation metrics
More comprehensive documentation and examples
Active development with regular updates and contributions

Cons of recommenders

Larger codebase and dependencies, potentially more complex to set up
May have slower performance for specific tasks compared to specialized libraries
Steeper learning curve due to the wide range of features

Code Comparison

implicit:

model = implicit.als.AlternatingLeastSquares(factors=50)
model.fit(user_item_matrix)
recommendations = model.recommend(user_id, user_item_matrix[user_id])

recommenders:

model = NCF(n_users, n_items, model_type="NeuMF", n_factors=8)
model.fit(train, epochs=10, batch_size=256, verbose=1)
recommendations = model.recommend_k_items(test, top_k=10, remove_seen=True)

Both libraries offer collaborative filtering capabilities, but recommenders provides a wider range of algorithms and more flexibility in model configuration. implicit focuses on efficient implementations of specific algorithms, while recommenders offers a more comprehensive toolkit for recommendation systems. The choice between the two depends on the specific requirements of the project, such as performance needs, desired algorithms, and the level of customization required.

gensim

15,988

Topic Modelling for Humans

Pros of Gensim

Broader scope: Gensim offers a wide range of NLP and topic modeling functionalities, while Implicit focuses primarily on collaborative filtering
More extensive documentation and tutorials, making it easier for beginners to get started
Larger and more active community, resulting in frequent updates and contributions

Cons of Gensim

Can be slower for certain operations compared to Implicit's optimized C++ implementations
May have a steeper learning curve due to its broader feature set
Requires more system resources for large-scale tasks

Code Comparison

Gensim (Word2Vec model):

from gensim.models import Word2Vec

sentences = [["cat", "say", "meow"], ["dog", "say", "woof"]]
model = Word2Vec(sentences, min_count=1)

Implicit (ALS model):

from implicit.als import AlternatingLeastSquares

model = AlternatingLeastSquares()
model.fit(user_item_matrix)

Both libraries offer easy-to-use APIs, but Gensim's code tends to be more verbose due to its broader functionality. Implicit's code is more concise and focused on collaborative filtering tasks.

scikit-learn

62,466

scikit-learn: machine learning in Python

Pros of scikit-learn

Comprehensive machine learning library with a wide range of algorithms
Well-established, large community support, and extensive documentation
Integrates seamlessly with other scientific Python libraries

Cons of scikit-learn

Not specialized for recommendation systems or implicit feedback datasets
May have slower performance for specific recommendation tasks
Lacks some advanced recommendation algorithms found in implicit

Code Comparison

scikit-learn (using SVD for recommendations):

from sklearn.decomposition import TruncatedSVD
svd = TruncatedSVD(n_components=10)
svd.fit(user_item_matrix)
recommendations = svd.transform(user_item_matrix)

implicit (using Alternating Least Squares):

from implicit.als import AlternatingLeastSquares
model = AlternatingLeastSquares(factors=50)
model.fit(user_item_matrix)
recommendations = model.recommend(user_id, user_item_matrix[user_id])

scikit-learn provides a general-purpose machine learning toolkit suitable for various tasks, while implicit focuses specifically on recommendation systems using implicit feedback. scikit-learn offers broader functionality, but implicit may provide better performance and more specialized algorithms for recommendation tasks.

Surprise

6,597

A Python scikit for building and analyzing recommender systems

Pros of Surprise

More extensive documentation and tutorials
Wider range of algorithms implemented, including non-matrix factorization methods
Built-in cross-validation and hyperparameter tuning tools

Cons of Surprise

Generally slower performance compared to Implicit
Less optimized for large-scale datasets
Fewer options for implicit feedback scenarios

Code Comparison

Surprise example:

from surprise import SVD, Dataset
data = Dataset.load_builtin('ml-100k')
algo = SVD()
algo.fit(data.build_full_trainset())
prediction = algo.predict('user_id', 'item_id')

Implicit example:

from implicit.als import AlternatingLeastSquares
model = AlternatingLeastSquares()
model.fit(user_item_matrix)
recommendations = model.recommend(user_id, user_item_matrix[user_id])

Both libraries offer easy-to-use APIs for collaborative filtering, but Implicit focuses more on implicit feedback scenarios and optimized performance, while Surprise provides a broader range of algorithms and evaluation tools. Surprise is more suitable for experimentation and research, while Implicit is better suited for production environments with large-scale data and performance requirements.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Implicit

Fast Python Collaborative Filtering for Implicit Datasets.

This project provides fast Python implementations of several different popular recommendation algorithms for implicit feedback datasets:

Alternating Least Squares as described in the papers Collaborative Filtering for Implicit Feedback Datasets and Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering.
Bayesian Personalized Ranking.
Logistic Matrix Factorization
Item-Item Nearest Neighbour models using Cosine, TFIDF or BM25 as a distance metric.

All models have multi-threaded training routines, using Cython and OpenMP to fit the models in parallel among all available CPU cores. In addition, the ALS and BPR models both have custom CUDA kernels - enabling fitting on compatible GPU's. Approximate nearest neighbours libraries such as Annoy, NMSLIB and Faiss can also be used by Implicit to speed up making recommendations.

Installation

Implicit can be installed from pypi with:

pip install implicit

Installing with pip will use prebuilt binary wheels on x86_64 Linux, Windows and OSX. These wheels include GPU support on Linux.

Implicit can also be installed with conda:

# CPU only package
conda install -c conda-forge implicit

# CPU+GPU package
conda install -c conda-forge implicit implicit-proc=*=gpu

Basic Usage

import implicit

# initialize a model
model = implicit.als.AlternatingLeastSquares(factors=50)

# train the model on a sparse matrix of user/item/confidence weights
model.fit(user_item_data)

# recommend items for a user
recommendations = model.recommend(userid, user_item_data[userid])

# find related items
related = model.similar_items(itemid)

The examples folder has a program showing how to use this to compute similar artists on the last.fm dataset.

For more information see the documentation.

Articles about Implicit

These blog posts describe the algorithms that power this library:

There are also several other articles about using Implicit to build recommendation systems:

Requirements

This library requires SciPy version 0.16 or later and Python version 3.6 or later.

GPU Support requires at least version 11 of the NVidia CUDA Toolkit.

This library is tested with Python 3.7, 3.8, 3.9, 3.10 and 3.11 on Ubuntu, OSX and Windows.

Benchmarks

Simple benchmarks comparing the ALS fitting time versus Spark can be found here.

Optimal Configuration

I'd recommend configuring SciPy to use Intel's MKL matrix libraries. One easy way of doing this is by installing the Anaconda Python distribution.

For systems using OpenBLAS, I highly recommend setting 'export OPENBLAS_NUM_THREADS=1'. This disables its internal multithreading ability, which leads to substantial speedups for this package. Likewise for Intel MKL, setting 'export MKL_NUM_THREADS=1' should also be set.

Released under the MIT License

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot