transfer-learning-conv-ai

🦄 State-of-the-Art Conversational AI with Transfer Learning

1,752

433

1,752

View on GitHub

Top Related Projects

DialoGPT

2,389

Large-scale pretraining for dialogue

ParlAI

10,607

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

DeepPavlov

6,902

An open source library for deep learning end-to-end dialog systems and chatbots.

botpress

13,880

The open-source hub to build & deploy GPT/LLM Agents ⚡️

Quick Overview

The huggingface/transfer-learning-conv-ai repository is a project focused on fine-tuning large language models for conversational AI tasks. It provides tools and examples for training and evaluating models on dialogue datasets, with a particular emphasis on transfer learning techniques to improve conversational abilities.

Pros

Leverages state-of-the-art language models for conversational AI
Includes pre-trained models and datasets for quick experimentation
Offers flexible training and evaluation scripts
Provides detailed documentation and examples for easy adoption

Cons

Requires significant computational resources for training large models
May have a steep learning curve for those new to NLP and transformers
Limited to specific dialogue datasets and tasks
Potential for generating biased or inappropriate responses if not carefully fine-tuned

Code Examples

Loading a pre-trained model and tokenizer:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "microsoft/DialoGPT-medium"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Generating a response to a given input:

input_text = "Hello, how are you?"
input_ids = tokenizer.encode(input_text + tokenizer.eos_token, return_tensors="pt")
output = model.generate(input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)

Fine-tuning the model on a custom dataset:

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    save_steps=10_000,
    save_total_limit=2,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    data_collator=lambda data: {'input_ids': torch.stack([f[0] for f in data]),
                                'attention_mask': torch.stack([f[1] for f in data]),
                                'labels': torch.stack([f[0] for f in data])},
)

trainer.train()

Getting Started

To get started with the huggingface/transfer-learning-conv-ai project:

Clone the repository:

git clone https://github.com/huggingface/transfer-learning-conv-ai.git
cd transfer-learning-conv-ai

Install the required dependencies:
```
pip install -r requirements.txt
```
Download the pre-trained models and datasets:
```
python download_pretrained_model.py
```

Run the interactive chat script to test the model:

python interactive.py --model_checkpoint ./models/medium

For more detailed instructions and advanced usage, refer to the repository's README and documentation.

Competitor Comparisons

DialoGPT

2,389

Large-scale pretraining for dialogue

Pros of DialoGPT

Larger scale training on Reddit data, potentially leading to more diverse responses
Implements a multi-turn dialogue system, allowing for more context-aware conversations
Provides pre-trained models of various sizes, offering flexibility for different use cases

Cons of DialoGPT

May generate inappropriate or biased responses due to training on unfiltered Reddit data
Requires more computational resources for inference, especially with larger model variants
Less focus on personalization compared to transfer-learning-conv-ai's persona-based approach

Code Comparison

transfer-learning-conv-ai:

from transformers import OpenAIGPTLMHeadModel, OpenAIGPTTokenizer

tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")
model = OpenAIGPTLMHeadModel.from_pretrained("openai-gpt")

DialoGPT:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

Both repositories use the Transformers library, but DialoGPT utilizes more recent Auto classes for model and tokenizer loading, while transfer-learning-conv-ai uses specific OpenAIGPT classes. DialoGPT also offers different model sizes (small, medium, large) for various applications.

ParlAI

10,607

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

Pros of ParlAI

More comprehensive and feature-rich platform for dialogue research
Supports a wider range of tasks and datasets
Active development and larger community support

Cons of ParlAI

Steeper learning curve due to its extensive features
May be overkill for simpler conversational AI projects
Requires more setup and configuration

Code Comparison

ParlAI:

from parlai.scripts.train_model import TrainModel

TrainModel.main(
    task='convai2',
    model='transformer/generator',
    model_file='/tmp/model_convai2',
    batchsize=32
)

transfer-learning-conv-ai:

from train import train

args = {
    'dataset_path': 'data/dataset.json',
    'model_checkpoint': 'gpt2',
    'num_epochs': 3
}
train(args)

The ParlAI example showcases its modular approach and built-in support for various tasks and models. The transfer-learning-conv-ai example demonstrates a more straightforward, focused implementation for transfer learning in conversational AI.

DeepPavlov

6,902

An open source library for deep learning end-to-end dialog systems and chatbots.

Pros of DeepPavlov

More comprehensive framework for building end-to-end dialogue systems
Supports a wider range of NLP tasks beyond conversational AI
Includes pre-trained models and datasets for various languages

Cons of DeepPavlov

Steeper learning curve due to its broader scope
Less focused on transfer learning for conversational AI
May require more setup and configuration for specific use cases

Code Comparison

DeepPavlov:

from deeppavlov import build_model, configs

model = build_model(configs.squad.squad_bert, download=True)
result = model(["Text", "Question"])

transfer-learning-conv-ai:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

DeepPavlov offers a more high-level API for building models, while transfer-learning-conv-ai focuses on using pre-trained models from the Hugging Face ecosystem. The latter is more specialized for conversational AI tasks, making it easier to get started with transfer learning for dialogue systems. However, DeepPavlov provides a broader range of NLP capabilities, making it suitable for more diverse applications beyond just conversational AI.

botpress

13,880

The open-source hub to build & deploy GPT/LLM Agents ⚡️

Pros of botpress

More comprehensive chatbot development platform with visual flow builder
Supports multiple channels (web, Messenger, Slack, etc.)
Includes built-in NLU engine and analytics

Cons of botpress

Steeper learning curve due to more complex architecture
Less focused on advanced NLP techniques like transfer learning
May be overkill for simple conversational AI experiments

Code comparison

transfer-learning-conv-ai:

from transformers import OpenAIGPTLMHeadModel, OpenAIGPTTokenizer

tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")
model = OpenAIGPTLMHeadModel.from_pretrained("openai-gpt")

botpress:

const botpress = require('botpress')

botpress({
  botfile: '<path to botfile.js>'
}).start()

transfer-learning-conv-ai focuses on leveraging pre-trained language models for conversational AI, while botpress provides a full-stack chatbot development platform. The former is more suitable for researchers and developers interested in cutting-edge NLP techniques, while the latter is better for building production-ready chatbots with a visual interface and multi-channel support. transfer-learning-conv-ai uses Python and popular NLP libraries, whereas botpress is built on Node.js and provides a more abstracted API for bot development.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ð¦ Building a State-of-the-Art Conversational AI with Transfer Learning

The present repo contains the code accompanying the blog post ð¦ How to build a State-of-the-Art Conversational AI with Transfer Learning.

This code is a clean and commented code base with training and testing scripts that can be used to train a dialog agent leveraging transfer Learning from an OpenAI GPT and GPT-2 Transformer language model.

This codebase can be used to reproduce the results of HuggingFace's participation to NeurIPS 2018 dialog competition ConvAI2 which was state-of-the-art on the automatic metrics. The 3k+ lines of competition code was distilled in about 250 lines of training code with distributed & FP16 options to form the present repository.

This model can be trained in about one hour on a 8 V100 cloud instance (currently costs about $25) and a pre-trained model is also made available.

Installation

To install and use the training and inference scripts please clone the repo and install the requirements:

git clone https://github.com/huggingface/transfer-learning-conv-ai
cd transfer-learning-conv-ai
pip install -r requirements.txt
python -m spacy download en

Installation with Docker

To install using docker please build the self-contained image:

docker build -t convai .

Note: Make sure your Docker setup allocates enough memory to building the container. Building with the default of 1.75GB will fail due to large Pytorch wheel.

You can then enter the image

ip-192-168-22-157:transfer-learning-conv-ai loretoparisi$ docker run --rm -it convai bash
root@91e241bb823e:/# ls
Dockerfile  README.md  boot                  dev  home         lib    media  models  proc              root  sbin  sys  train.py  utils.py
LICENCE     bin        convai_evaluation.py  etc  interact.py  lib64  mnt    opt     requirements.txt  run   srv   tmp  usr       var

You can then run the interact.py script on the pretrained model:

python3 interact.py --model models/

Pretrained model

We make a pretrained and fine-tuned model available on our S3 here. The easiest way to download and use this model is just to run the interact.py script to talk with the model. Without any argument, this script will automatically download and cache our model.

Using the training script

The training script can be used in single GPU or multi GPU settings:

python ./train.py  # Single GPU training
python -m torch.distributed.launch --nproc_per_node=8 ./train.py  # Training on 8 GPUs

The training script accept several arguments to tweak the training:

Argument	Type	Default value	Description
dataset_path	`str`	`""`	Path or url of the dataset. If empty download from S3.
dataset_cache	`str`	`'./dataset_cache.bin'`	Path or url of the dataset cache
model	`str`	`"openai-gpt"`	Path, url or short name of the model
num_candidates	`int`	`2`	Number of candidates for training
max_history	`int`	`2`	Number of previous exchanges to keep in history
train_batch_size	`int`	`4`	Batch size for training
valid_batch_size	`int`	`4`	Batch size for validation
gradient_accumulation_steps	`int`	`8`	Accumulate gradients on several steps
lr	`float`	`6.25e-5`	Learning rate
lm_coef	`float`	`1.0`	LM loss coefficient
mc_coef	`float`	`1.0`	Multiple-choice loss coefficient
max_norm	`float`	`1.0`	Clipping gradient norm
n_epochs	`int`	`3`	Number of training epochs
personality_permutations	`int`	`1`	Number of permutations of personality sentences
device	`str`	`"cuda" if torch.cuda.is_available() else "cpu"`	Device (cuda or cpu)
fp16	`str`	`""`	Set to O0, O1, O2 or O3 for fp16 training (see apex documentation)
local_rank	`int`	`-1`	Local rank for distributed training (-1: not distributed)

Here is how to reproduce our results on a server with 8 V100 GPUs (adapt number of nodes and batch sizes to your configuration):

python -m torch.distributed.launch --nproc_per_node=8 ./train.py --gradient_accumulation_steps=4 --lm_coef=2.0 --max_history=2 --n_epochs=1 --num_candidates=4 --personality_permutations=2 --train_batch_size=2 --valid_batch_size=2

This model should give a Hits@1 over 79, perplexity of 20.5 and F1 of 16.5 using the convai2 evaluation script (see below).

These numbers are slightly lower than the number we obtained in the ConvAI2 competition. Here is what you can tweak to reach the same results:

in the ConvAI2 competition we also used tweaked position emebddings so that the history of the dialog always start at with the same embeddings. This is easy to add with pytorch-transformers and should improve the hits@1 metric.
in the ConvAI2 competition we used a beam search decoder. While the results are better in term of f1 metric, our feeling is that the human experience is less compelling with beam search versus the nucleus sampling detector which is provided in the present repository.

Using the interaction script

The training script saves all the experiments and checkpoints in a sub-folder named with the timestamp of the experiment in the ./runs folder of the repository base folder.

You can then use the interactive script to interact with the model simply by pointing to this folder.

Here is an example command line to run the interactive script:

python ./interact.py --model_checkpoint ./data/Apr17_13-31-38_thunder/  # run the interactive script with a training checkpoint
python ./interact.py  # run the interactive script with the finetuned model on our S3

The fine-tuned model will gives FINAL Hits@1: 0.715

The interactive script accept a few arguments to tweak the decoding algorithm:

Argument	Type	Default value	Description
dataset_path	`str`	`""`	Path or url of the dataset. If empty download from S3.
dataset_cache	`str`	`'./dataset_cache.bin'`	Path or url of the dataset cache
model	`str`	`"openai-gpt"`	Path, url or short name of the model
max_history	`int`	`2`	Number of previous utterances to keep in history
device	`str`	`cuda` if `torch.cuda.is_available()` else `cpu`	Device (cuda or cpu)
no_sample	action `store_true`	Set to use greedy decoding instead of sampling
max_length	`int`	`20`	Maximum length of the output utterances
min_length	`int`	`1`	Minimum length of the output utterances
seed	`int`	`42`	Seed
temperature	`int`	`0.7`	Sampling softmax temperature
top_k	`int`	`0`	Filter top-k tokens before sampling (`<=0`: no filtering)
top_p	`float`	`0.9`	Nucleus filtering (top-p) before sampling (`<=0.0`: no filtering)

Running ConvAI2 evaluation scripts

To run the evaluation scripts of the ConvAI2 challenge, you first need to install ParlAI in the repo base folder like this:

git clone https://github.com/facebookresearch/ParlAI.git
cd ParlAI
python setup.py develop

You can then run the evaluation script from ParlAI base folder:

cd ParlAI
python ../convai_evaluation.py --eval_type hits@1  # to download and evaluate our fine-tuned model on hits@1 metric
python ../convai_evaluation.py --eval_type hits@1  --model_checkpoint ./data/Apr17_13-31-38_thunder/  # to evaluate a training checkpoint on hits@1 metric

The evaluation script accept a few arguments to select the evaluation metric and tweak the decoding algorithm:

Argument	Type	Default value	Description
eval_type	`str`	`"hits@1"`	Evaluate the model on `hits@1`, `ppl` or `f1` metric on the ConvAI2 validation dataset
model	`str`	`"openai-gpt"`	Path, url or short name of the model
max_history	`int`	`2`	Number of previous utterances to keep in history
device	`str`	`cuda` if `torch.cuda.is_available()` else `cpu`	Device (cuda or cpu)
no_sample	action `store_true`	Set to use greedy decoding instead of sampling
max_length	`int`	`20`	Maximum length of the output utterances
min_length	`int`	`1`	Minimum length of the output utterances
seed	`int`	`42`	Seed
temperature	`int`	`0.7`	Sampling softmax temperature
top_k	`int`	`0`	Filter top-k tokens before sampling (`<=0`: no filtering)
top_p	`float`	`0.9`	Nucleus filtering (top-p) before sampling (`<=0.0`: no filtering)

Data Format

see example_entry.py, and the comment at the top.

Citation

If you use this code in your research, you can cite our NeurIPS CAI workshop paper:

@article{DBLP:journals/corr/abs-1901-08149,
  author    = {Thomas Wolf and
               Victor Sanh and
               Julien Chaumond and
               Clement Delangue},
  title     = {TransferTransfo: {A} Transfer Learning Approach for Neural Network
               Based Conversational Agents},
  journal   = {CoRR},
  volume    = {abs/1901.08149},
  year      = {2019},
  url       = {http://arxiv.org/abs/1901.08149},
  archivePrefix = {arXiv},
  eprint    = {1901.08149},
  timestamp = {Sat, 02 Feb 2019 16:56:00 +0100},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1901-08149},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of DialoGPT

Cons of DialoGPT

Code Comparison

Pros of ParlAI

Cons of ParlAI

Code Comparison

Pros of DeepPavlov

Cons of DeepPavlov

Code Comparison

Pros of botpress

Cons of botpress

Code comparison

Convert designs to code with AI

README

ð¦ Building a State-of-the-Art Conversational AI with Transfer Learning

Installation

Installation with Docker

Pretrained model

Using the training script

Using the interaction script

Running ConvAI2 evaluation scripts

Data Format

Citation

Top Related Projects

Convert designs to code with AI

ð¦ Building a State-of-the-Art Conversational AI with Transfer Learning