dlrm

An implementation of a deep learning recommendation model (DLRM)

3,972

865

3,972

View on GitHub

Top Related Projects

bert

39,558

TensorFlow code and pre-trained models for BERT

transformers

150,567

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

models

77,618

Models and examples built with TensorFlow

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

DeepSpeed

40,283

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DeepLearningExamples

14,427

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Quick Overview

The Facebook Research DLRM (Deep Learning Recommendation Model) is an open-source deep learning-based recommendation model designed for large-scale industrial applications. It is a highly scalable and efficient model that can handle large datasets and complex feature interactions.

Pros

Scalability: DLRM is designed to handle large-scale datasets and can be trained on multiple GPUs for improved performance.
Flexibility: The model supports a wide range of feature types, including categorical, continuous, and sparse features, making it suitable for a variety of recommendation tasks.
Efficiency: DLRM uses efficient data structures and algorithms to minimize memory usage and improve inference speed.
Customizability: The model architecture can be easily customized to fit specific use cases and requirements.

Cons

Complexity: The DLRM model can be complex to set up and configure, especially for users who are new to deep learning and recommendation systems.
Dependency on Data: The performance of DLRM is heavily dependent on the quality and quantity of the input data, which can be a challenge in some real-world scenarios.
Interpretability: The deep learning-based nature of DLRM can make it difficult to interpret the model's decision-making process, which can be a concern in certain applications.
Computational Resources: Training and running DLRM can be computationally intensive, requiring access to powerful hardware (e.g., GPUs) for optimal performance.

Code Examples

Here are a few code examples demonstrating the usage of DLRM:

from dlrm.data.criteo import CriteoDataset
from dlrm.models.dlrm import DLRM
from dlrm.train import train

# Load the Criteo dataset
dataset = CriteoDataset(data_dir='path/to/criteo/data')

# Create the DLRM model
model = DLRM(
    embedding_dim=128,
    bottom_mlp_sizes=[512, 256, 128],
    top_mlp_sizes=[1024, 512, 256, 1],
    num_dense_features=13,
    num_sparse_features=26
)

# Train the model
train(
    model=model,
    dataset=dataset,
    num_epochs=10,
    batch_size=2048,
    learning_rate=0.001
)

This code demonstrates how to load the Criteo dataset, create a DLRM model, and train the model using the provided train function.

from dlrm.inference import DLRMInference
from dlrm.data.criteo import CriteoDataset

# Load the Criteo dataset
dataset = CriteoDataset(data_dir='path/to/criteo/data')

# Create the DLRM inference model
inference_model = DLRMInference(
    model_path='path/to/trained/model.pth'
)

# Perform inference on a sample input
sample_input = dataset[0]
prediction = inference_model.predict(sample_input)
print(prediction)

This code demonstrates how to create a DLRM inference model and use it to make predictions on a sample input.

Getting Started

To get started with the DLRM project, follow these steps:

Clone the GitHub repository:

git clone https://github.com/facebookresearch/dlrm.git

Install the required dependencies:

cd dlrm
pip install -r requirements.txt

Prepare your dataset:
- The project provides support for the Criteo dataset, which can be downloaded from the Criteo AI Lab website.
- If you're using a different dataset, you'll need to implement a custom data loader that conforms to the DLRM data format.
Train the DLRM model:

from dlrm.data.criteo import CriteoDataset
from dlrm.models.dlrm import DLRM
from dlrm.train import train

# Load the Criteo dataset
dataset = CriteoData

Competitor Comparisons

bert

39,558

TensorFlow code and pre-trained models for BERT

Pros of BERT

More versatile for various NLP tasks (e.g., question answering, sentiment analysis)
Extensive pre-training on large text corpora for better language understanding
Widely adopted and supported by the research community

Cons of BERT

Higher computational requirements for training and inference
May not be optimized for specific recommendation tasks
Requires fine-tuning for domain-specific applications

Code Comparison

BERT (Python):

import tensorflow as tf
from transformers import BertTokenizer, TFBertModel

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = TFBertModel.from_pretrained('bert-base-uncased')

DLRM (Python):

import torch
from dlrm.models import DLRM

model = DLRM(
    embedding_dim=16,
    num_features=10,
    arch_mlp_bot=[13, 512, 256, 64],
    arch_mlp_top=[512, 256, 1],
)

BERT focuses on natural language processing tasks, while DLRM is specifically designed for recommendation systems. BERT uses transformer architecture and requires tokenization, whereas DLRM employs a combination of embedding layers and MLPs for feature interactions in recommendation tasks.

transformers

150,567

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

Broader scope: Supports a wide range of NLP tasks and models
Extensive documentation and community support
Regular updates and new model implementations

Cons of transformers

Higher complexity due to its comprehensive nature
Potentially steeper learning curve for beginners
May include unnecessary features for specific use cases

Code comparison

transformers:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

dlrm:

from dlrm.models import DLRM
model = DLRM(
    embedding_dim=16,
    num_embeddings=[1000, 2000, 3000],
    bottom_mlp=[13, 512, 256, 64],
    top_mlp=[512, 256, 1],
)

The transformers code showcases easy model loading and tokenization, while dlrm demonstrates a more specific model initialization for recommendation systems.

models

77,618

Models and examples built with TensorFlow

Pros of models

Broader scope, covering various ML tasks and architectures
Extensive documentation and tutorials
Large community support and regular updates

Cons of models

Can be overwhelming due to its size and complexity
May require more setup and configuration for specific tasks

Code comparison

models:

import tensorflow as tf
from official.nlp import modeling
model = modeling.networks.TransformerEncoder(
    vocab_size=1000,
    num_layers=6,
    num_attention_heads=8,
    hidden_size=512
)

dlrm:

import torch
from dlrm_s_pytorch import DLRM_Net
model = DLRM_Net(
    m_spa=16,
    ln_emb=[1000, 1000, 1000],
    ln_bot=[13, 512, 256, 64, 16],
    ln_top=[512, 256, 1],
    arch_interaction_op="dot"
)

Key differences

models is a comprehensive repository for various TensorFlow models
dlrm focuses specifically on deep learning recommendation models
models uses TensorFlow, while dlrm primarily uses PyTorch
dlrm offers a more specialized solution for recommendation systems
models provides a wider range of pre-trained models and examples

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

Broader scope: Supports a wide range of sequence-to-sequence tasks, including machine translation, text summarization, and speech recognition
More extensive documentation and examples, making it easier for new users to get started
Active development and frequent updates, ensuring compatibility with latest PyTorch versions

Cons of fairseq

Steeper learning curve due to its more complex architecture and wider range of features
Potentially higher resource requirements for training and inference, especially for simpler tasks

Code comparison

fairseq:

from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('/path/to/model', checkpoint_file='model.pt')
translations = model.translate(['Hello world!', 'How are you?'])

dlrm:

from dlrm.models import DLRM
model = DLRM(args)
loss = model(dense_features, sparse_features, labels)

fairseq offers a higher-level API for common tasks like translation, while dlrm provides a more specialized interface for recommendation systems. fairseq's code is more abstracted, hiding implementation details, whereas dlrm's usage is more direct and task-specific.

DeepSpeed

40,283

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

Broader scope: Supports various deep learning models, not limited to recommendation systems
Advanced optimization techniques: ZeRO, 3D parallelism, and pipeline parallelism for efficient training
Active development: More frequent updates and larger community support

Cons of DeepSpeed

Steeper learning curve: More complex to set up and configure for specific use cases
Potentially overkill: May introduce unnecessary complexity for simpler recommendation models

Code Comparison

DLRM (PyTorch):

model = DLRM_Net(
    m_spa, ln_emb, ln_bot, ln_top,
    arch_interaction_op, arch_interaction_itself,
    sigmoid_bot, sigmoid_top, loss_function
)

DeepSpeed (PyTorch with DeepSpeed):

model = MyModel(args)
model_engine, optimizer, _, _ = deepspeed.initialize(
    args=args,
    model=model,
    model_parameters=model.parameters()
)

Key Differences

DLRM focuses specifically on recommendation models, while DeepSpeed is a general-purpose deep learning optimization library
DeepSpeed offers more advanced parallelism and optimization techniques, potentially enabling better scalability for large models
DLRM provides a more straightforward implementation for recommendation systems, while DeepSpeed requires additional configuration but offers more flexibility

DeepLearningExamples

14,427

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Pros of DeepLearningExamples

Broader scope, covering multiple deep learning domains and frameworks
More extensive documentation and usage examples
Regular updates and active maintenance

Cons of DeepLearningExamples

Less specialized for recommendation systems compared to DLRM
Potentially more complex to navigate due to its broader scope
May require more setup time for specific use cases

Code Comparison

DLRM (PyTorch implementation):

class DLRM_Net(nn.Module):
    def create_mlp(self, ln, sigmoid_layer):
        layers = nn.ModuleList()
        for i in range(0, ln.size - 1):
            n = ln[i]
            m = ln[i + 1]
            LL = nn.Linear(int(n), int(m), bias=True)
            layers.append(LL)
            layers.append(nn.ReLU())
        layers.append(nn.Linear(int(ln[-1]), 1, bias=True))
        layers.append(nn.Sigmoid())
        return layers

DeepLearningExamples (TensorFlow implementation of DLRM):

def create_mlp(self, layer_sizes, activation='relu', dropout_rate=None):
    mlp = tf.keras.Sequential()
    for units in layer_sizes[:-1]:
        mlp.add(tf.keras.layers.Dense(units, activation=activation))
        if dropout_rate:
            mlp.add(tf.keras.layers.Dropout(dropout_rate))
    mlp.add(tf.keras.layers.Dense(layer_sizes[-1]))
    return mlp

Both implementations create multi-layer perceptrons, but DeepLearningExamples offers more flexibility with activation functions and dropout options.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Deep Learning Recommendation Model for Personalization and Recommendation Systems:

Copyright (c) Facebook, Inc. and its affiliates.

Description:

An implementation of a deep learning recommendation model (DLRM). The model input consists of dense and sparse features. The former is a vector of floating point values. The latter is a list of sparse indices into embedding tables, which consist of vectors of floating point values. The selected vectors are passed to mlp networks denoted by triangles, in some cases the vectors are interacted through operators (Ops).

output:
                    probability of a click
model:                        |
                             /\
                            /__\
                              |
      _____________________> Op  <___________________
    /                         |                      \
   /\                        /\                      /\
  /__\                      /__\           ...      /__\
   |                          |                       |
   |                         Op                      Op
   |                    ____/__\_____           ____/__\____
   |                   |_Emb_|____|__|    ...  |_Emb_|__|___|
input:
[ dense features ]     [sparse indices] , ..., [sparse indices]

More precise definition of model layers:

fully connected layers of an mlp

z = f(y)

y = Wx + b
embedding lookup (for a list of sparse indices p=[p1,...,pk])

z = Op(e1,...,ek)

obtain vectors e1=E[:,p1], ..., ek=E[:,pk]
Operator Op can be one of the following

Sum(e1,...,ek) = e1 + ... + ek

Dot(e1,...,ek) = [e1'e1, ..., e1'ek, ..., ek'e1, ..., ek'ek]

Cat(e1,...,ek) = [e1', ..., ek']'

where ' denotes transpose operation

See our blog post to learn more about DLRM: https://ai.facebook.com/blog/dlrm-an-advanced-open-source-deep-learning-recommendation-model/.

Cite Work:

@article{DLRM19,
  author    = {Maxim Naumov and Dheevatsa Mudigere and Hao{-}Jun Michael Shi and Jianyu Huang and Narayanan Sundaraman and Jongsoo Park and Xiaodong Wang and Udit Gupta and Carole{-}Jean Wu and Alisson G. Azzolini and Dmytro Dzhulgakov and Andrey Mallevich and Ilia Cherniavskii and Yinghai Lu and Raghuraman Krishnamoorthi and Ansha Yu and Volodymyr Kondratenko and Stephanie Pereira and Xianjie Chen and Wenlin Chen and Vijay Rao and Bill Jia and Liang Xiong and Misha Smelyanskiy},
  title     = {Deep Learning Recommendation Model for Personalization and Recommendation Systems},
  journal   = {CoRR},
  volume    = {abs/1906.00091},
  year      = {2019},
  url       = {https://arxiv.org/abs/1906.00091},
}

Related Work:

On the system architecture implications, with DLRM as one of the benchmarks,

@article{ArchImpl19,
  author    = {Udit Gupta and Xiaodong Wang and Maxim Naumov and Carole{-}Jean Wu and Brandon Reagen and David Brooks and Bradford Cottel and Kim M. Hazelwood and Bill Jia and Hsien{-}Hsin S. Lee and Andrey Malevich and Dheevatsa Mudigere and Mikhail Smelyanskiy and Liang Xiong and Xuan Zhang},
  title     = {The Architectural Implications of Facebook's DNN-based Personalized Recommendation},
  journal   = {CoRR},
  volume    = {abs/1906.03109},
  year      = {2019},
  url       = {https://arxiv.org/abs/1906.03109},
}

On the embedding compression techniques (for number of vectors), with DLRM as one of the benchmarks,

@article{QuoRemTrick19,
  author    = {Hao{-}Jun Michael Shi and Dheevatsa Mudigere and Maxim Naumov and Jiyan Yang},
  title     = {Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems},
  journal   = {CoRR},
  volume    = {abs/1909.02107},
  year      = {2019},
  url       = {https://arxiv.org/abs/1909.02107},
}

On the embedding compression techniques (for dimension of vectors), with DLRM as one of the benchmarks,

@article{MixDimTrick19,
  author    = {Antonio Ginart and Maxim Naumov and Dheevatsa Mudigere and Jiyan Yang and James Zou},
  title     = {Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems},
  journal   = {CoRR},
  volume    = {abs/1909.11810},
  year      = {2019},
  url       = {https://arxiv.org/abs/1909.11810},
}

Implementation

DLRM PyTorch. Implementation of DLRM in PyTorch framework:

   dlrm_s_pytorch.py

DLRM Caffe2. Implementation of DLRM in Caffe2 framework:

   dlrm_s_caffe2.py

DLRM Data. Implementation of DLRM data generation and loading:

   dlrm_data_pytorch.py, dlrm_data_caffe2.py, data_utils.py

DLRM Tests. Implementation of DLRM tests in ./test

   dlrm_s_test.sh

DLRM Benchmarks. Implementation of DLRM benchmarks in ./bench

   dlrm_s_criteo_kaggle.sh, dlrm_s_criteo_terabyte.sh, dlrm_s_benchmark.sh

Related Work:

On the Glow framework implementation

https://github.com/pytorch/glow/blob/master/tests/unittests/RecommendationSystemTest.cpp

On the FlexFlow framework distributed implementation with Legion backend

https://github.com/flexflow/FlexFlow/blob/master/examples/cpp/DLRM/dlrm.cc

How to run dlrm code?

A sample run of the code, with a tiny model is shown below

$ python dlrm_s_pytorch.py --mini-batch-size=2 --data-size=6
time/loss/accuracy (if enabled):
Finished training it 1/3 of epoch 0, -1.00 ms/it, loss 0.451893, accuracy 0.000%
Finished training it 2/3 of epoch 0, -1.00 ms/it, loss 0.402002, accuracy 0.000%
Finished training it 3/3 of epoch 0, -1.00 ms/it, loss 0.275460, accuracy 0.000%

A sample run of the code, with a tiny model in debug mode

$ python dlrm_s_pytorch.py --mini-batch-size=2 --data-size=6 --debug-mode
model arch:
mlp top arch 3 layers, with input to output dimensions:
[8 4 2 1]
# of interactions
8
mlp bot arch 2 layers, with input to output dimensions:
[4 3 2]
# of features (sparse and dense)
4
dense feature size
4
sparse feature size
2
# of embeddings (= # of sparse features) 3, with dimensions 2x:
[4 3 2]
data (inputs and targets):
mini-batch: 0
[[0.69647 0.28614 0.22685 0.55131]
 [0.71947 0.42311 0.98076 0.68483]]
[[[1], [0, 1]], [[0], [1]], [[1], [0]]]
[[0.55679]
 [0.15896]]
mini-batch: 1
[[0.36179 0.22826 0.29371 0.63098]
 [0.0921  0.4337  0.43086 0.49369]]
[[[1], [0, 2, 3]], [[1], [1, 2]], [[1], [1]]]
[[0.15307]
 [0.69553]]
mini-batch: 2
[[0.60306 0.54507 0.34276 0.30412]
 [0.41702 0.6813  0.87546 0.51042]]
[[[2], [0, 1, 2]], [[1], [2]], [[1], [1]]]
[[0.31877]
 [0.69197]]
initial parameters (weights and bias):
[[ 0.05438 -0.11105]
 [ 0.42513  0.34167]
 [-0.1426  -0.45641]
 [-0.19523 -0.10181]]
[[ 0.23667  0.57199]
 [-0.16638  0.30316]
 [ 0.10759  0.22136]]
[[-0.49338 -0.14301]
 [-0.36649 -0.22139]]
[[0.51313 0.66662 0.10591 0.13089]
 [0.32198 0.66156 0.84651 0.55326]
 [0.85445 0.38484 0.31679 0.35426]]
[0.17108 0.82911 0.33867]
[[0.55237 0.57855 0.52153]
 [0.00269 0.98835 0.90534]]
[0.20764 0.29249]
[[0.52001 0.90191 0.98363 0.25754 0.56436 0.80697 0.39437 0.73107]
 [0.16107 0.6007  0.86586 0.98352 0.07937 0.42835 0.20454 0.45064]
 [0.54776 0.09333 0.29686 0.92758 0.569   0.45741 0.75353 0.74186]
 [0.04858 0.7087  0.83924 0.16594 0.781   0.28654 0.30647 0.66526]]
[0.11139 0.66487 0.88786 0.69631]
[[0.44033 0.43821 0.7651  0.56564]
 [0.0849  0.58267 0.81484 0.33707]]
[0.92758 0.75072]
[[0.57406 0.75164]]
[0.07915]
DLRM_Net(
  (emb_l): ModuleList(
    (0): EmbeddingBag(4, 2, mode=sum)
    (1): EmbeddingBag(3, 2, mode=sum)
    (2): EmbeddingBag(2, 2, mode=sum)
  )
  (bot_l): Sequential(
    (0): Linear(in_features=4, out_features=3, bias=True)
    (1): ReLU()
    (2): Linear(in_features=3, out_features=2, bias=True)
    (3): ReLU()
  )
  (top_l): Sequential(
    (0): Linear(in_features=8, out_features=4, bias=True)
    (1): ReLU()
    (2): Linear(in_features=4, out_features=2, bias=True)
    (3): ReLU()
    (4): Linear(in_features=2, out_features=1, bias=True)
    (5): Sigmoid()
  )
)
time/loss/accuracy (if enabled):
Finished training it 1/3 of epoch 0, -1.00 ms/it, loss 0.451893, accuracy 0.000%
Finished training it 2/3 of epoch 0, -1.00 ms/it, loss 0.402002, accuracy 0.000%
Finished training it 3/3 of epoch 0, -1.00 ms/it, loss 0.275460, accuracy 0.000%
updated parameters (weights and bias):
[[ 0.0543  -0.1112 ]
 [ 0.42513  0.34167]
 [-0.14283 -0.45679]
 [-0.19532 -0.10197]]
[[ 0.23667  0.57199]
 [-0.1666   0.30285]
 [ 0.10751  0.22124]]
[[-0.49338 -0.14301]
 [-0.36664 -0.22164]]
[[0.51313 0.66663 0.10591 0.1309 ]
 [0.32196 0.66154 0.84649 0.55324]
 [0.85444 0.38482 0.31677 0.35425]]
[0.17109 0.82907 0.33863]
[[0.55238 0.57857 0.52154]
 [0.00265 0.98825 0.90528]]
[0.20764 0.29244]
[[0.51996 0.90184 0.98368 0.25752 0.56436 0.807   0.39437 0.73107]
 [0.16096 0.60055 0.86596 0.98348 0.07938 0.42842 0.20453 0.45064]
 [0.5476  0.0931  0.29701 0.92752 0.56902 0.45752 0.75351 0.74187]
 [0.04849 0.70857 0.83933 0.1659  0.78101 0.2866  0.30646 0.66526]]
[0.11137 0.66482 0.88778 0.69627]
[[0.44029 0.43816 0.76502 0.56561]
 [0.08485 0.5826  0.81474 0.33702]]
[0.92754 0.75067]
[[0.57379 0.7514 ]]
[0.07908]

Testing

Testing scripts to confirm functional correctness of the code

./test/dlrm_s_test.sh
Running commands ...
python dlrm_s_pytorch.py
python dlrm_s_caffe2.py
Checking results ...
diff test1 (no numeric values in the output = SUCCESS)
diff test2 (no numeric values in the output = SUCCESS)
diff test3 (no numeric values in the output = SUCCESS)
diff test4 (no numeric values in the output = SUCCESS)

NOTE: Testing scripts accept extra arguments which will be passed along to the model, such as --use-gpu

Benchmarking

Performance benchmarking
```
./bench/dlrm_s_benchmark.sh
```
The code supports interface with the Criteo Kaggle Display Advertising Challenge Dataset.
- Please do the following to prepare the dataset for use with DLRM code:
  - First, specify the raw data file (train.txt) as downloaded with --raw-data-file=<path/train.txt>
  - This is then pre-processed (categorize, concat across days...) to allow using with dlrm code
  - The processed data is stored as .npz file in <root_dir>/input/.npz
  - The processed file (.npz) can be used for subsequent runs with --processed-data-file=<path/.npz>
- The model can be trained using the following script
```
./bench/dlrm_s_criteo_kaggle.sh [--test-freq=1024]
```

The code supports interface with the Criteo Terabyte Dataset.
- Please do the following to prepare the dataset for use with DLRM code:
  - First, download the raw data files day_0.gz, ...,day_23.gz and unzip them
  - Specify the location of the unzipped text files day_0, ...,day_23, using --raw-data-file=<path/day> (the day number will be appended automatically)
  - These are then pre-processed (categorize, concat across days...) to allow using with dlrm code
  - The processed data is stored as .npz file in <root_dir>/input/.npz
  - The processed file (.npz) can be used for subsequent runs with --processed-data-file=<path/.npz>
- The model can be trained using the following script
```
  ./bench/dlrm_s_criteo_terabyte.sh ["--test-freq=10240 --memory-map --data-sub-sample-rate=0.875"]
```
- Corresponding pre-trained model is available under CC-BY-NC license and can be downloaded here dlrm_emb64_subsample0.875_maxindrange10M_pretrained.pt

NOTE: Benchmarking scripts accept extra arguments which will be passed along to the model, such as --num-batches=100 to limit the number of data samples

The code supports interface with MLPerf benchmark.

Please refer to the following training parameters

  --mlperf-logging that keeps track of multiple metrics, including area under the curve (AUC)

  --mlperf-acc-threshold that allows early stopping based on accuracy metric

  --mlperf-auc-threshold that allows early stopping based on AUC metric

  --mlperf-bin-loader that enables preprocessing of data into a single binary file

  --mlperf-bin-shuffle that controls whether a random shuffle of mini-batches is performed

The MLPerf training model is completely specified and can be trained using the following script

  ./bench/run_and_time.sh [--use-gpu]

Corresponding pre-trained model is available under CC-BY-NC license and can be downloaded here dlrm_emb128_subsample0.0_maxindrange40M_pretrained.pt

The code now supports synchronous distributed training, we support gloo/nccl/mpi backend, we provide launching mode for pytorch distributed launcher and Mpirun. For MPI, users need to write their own MPI launching scripts for configuring the running hosts. For example, using pytorch distributed launcher, we can have the following command as launching scripts:

# for single node 8 gpus and nccl as backend on randomly generated dataset:
python -m torch.distributed.launch --nproc_per_node=8 dlrm_s_pytorch.py --arch-embedding-size="80000-80000-80000-80000-80000-80000-80000-80000" --arch-sparse-feature-size=64 --arch-mlp-bot="128-128-128-128" --arch-mlp-top="512-512-512-256-1" --max-ind-range=40000000
--data-generation=random --loss-function=bce --round-targets=True --learning-rate=1.0 --mini-batch-size=2048 --print-freq=2 --print-time --test-freq=2 --test-mini-batch-size=2048 --memory-map --use-gpu --num-batches=100 --dist-backend=nccl

# for multiple nodes, user can add the related argument according to the launcher manual like:
--nnodes=2 --node_rank=0 --master_addr="192.168.1.1" --master_port=1234

Model checkpoint saving/loading

During training, the model can be saved using --save-model=<path/model.pt>

The model is saved if there is an improvement in test accuracy (which is checked at --test-freq intervals).

A previously saved model can be loaded using --load-model=<path/model.pt>

Once loaded the model can be used to continue training, with the saved model being a checkpoint. Alternatively, the saved model can be used to evaluate only on the test data-set by specifying --inference-only option.

Version

0.1 : Initial release of the DLRM code

1.0 : DLRM with distributed training, cpu support for row-wise adagrad optimizer

Requirements

pytorch-nightly (11/10/20)

scikit-learn

numpy

onnx (optional)

pydot (optional)

torchviz (optional)

mpi (optional for distributed backend)

License

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot