Convert Figma logo to code with AI

abisee logopointer-generator

Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"

2,177
812
2,177
98

Top Related Projects

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

30,331

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

19,863

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Quick Overview

The abisee/pointer-generator repository is an implementation of a pointer-generator network for text summarization. It's based on the paper "Get To The Point: Summarization with Pointer-Generator Networks" by See et al. (2017) and provides a PyTorch implementation of the model for abstractive text summarization.

Pros

  • Implements an advanced text summarization technique combining extractive and abstractive methods
  • Provides a complete codebase for training and testing the model
  • Includes pre-trained models and sample data for quick experimentation
  • Well-documented and easy to understand for researchers and practitioners

Cons

  • Primarily focused on a specific summarization technique, limiting its versatility
  • May require significant computational resources for training on large datasets
  • Depends on older versions of PyTorch and Python, which might need updates for compatibility with newer systems
  • Limited to English language summarization tasks

Code Examples

  1. Loading a pre-trained model:
from model import Model

model = Model(opt, vocab)
model.load_state_dict(torch.load(model_file_path))
  1. Generating a summary:
encoder_input, decoder_input, _ = data.get_batch(batch)
enc_batch, enc_padding_mask, enc_lens, enc_batch_extend_vocab, extra_zeros, c_t_1, coverage = get_input_from_batch(batch)

encoder_outputs, encoder_feature, encoder_hidden = model.encoder(encoder_input)
summary = model.decoder(decoder_input, encoder_outputs, encoder_feature, encoder_hidden, enc_padding_mask)
  1. Calculating loss:
loss = model.train_one_batch(batch)

Getting Started

  1. Clone the repository:

    git clone https://github.com/abisee/pointer-generator.git
    cd pointer-generator
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Prepare your data:

    • Place your training data in data/train
    • Place your validation data in data/val
    • Place your test data in data/test
  4. Train the model:

    python train.py
    
  5. Generate summaries:

    python decode.py
    

Competitor Comparisons

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

Pros of OpenNMT-py

  • More comprehensive and feature-rich, supporting a wider range of NMT architectures and techniques
  • Actively maintained with regular updates and contributions from a larger community
  • Better documentation and extensive examples for various use cases

Cons of OpenNMT-py

  • Steeper learning curve due to its complexity and extensive feature set
  • May be overkill for simpler NMT tasks or when focusing specifically on pointer-generator networks

Code Comparison

pointer-generator:

class PointerGenerator(nn.Module):
    def __init__(self, hidden_dim, emb_dim, vocab_size):
        super(PointerGenerator, self).__init__()
        self.w_h = nn.Linear(hidden_dim, 1, bias=False)
        self.w_s = nn.Linear(hidden_dim, 1, bias=False)
        self.w_x = nn.Linear(emb_dim, 1, bias=True)

OpenNMT-py:

class GlobalAttention(nn.Module):
    def __init__(self, dim, coverage=False, attn_type="dot"):
        super(GlobalAttention, self).__init__()
        self.dim = dim
        self.attn_type = attn_type
        self.linear_in = nn.Linear(dim, dim, bias=False)
        self.linear_out = nn.Linear(dim * 2, dim, bias=False)

Both repositories implement attention mechanisms, but OpenNMT-py offers more flexibility with different attention types and additional features like coverage.

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pros of transformers

  • Extensive library with support for numerous state-of-the-art models
  • Active development and frequent updates
  • Comprehensive documentation and community support

Cons of transformers

  • Steeper learning curve due to its extensive features
  • Higher computational requirements for many models
  • Potentially overkill for simpler tasks

Code comparison

pointer-generator:

attention_distribution = self.attention(encoder_states, decoder_state)
context_vector = torch.bmm(attention_distribution, encoder_states)
p_gen = self.pointer_generator(context_vector, decoder_state, decoder_input)

transformers:

outputs = model(input_ids=input_ids, attention_mask=attention_mask)
last_hidden_states = outputs.last_hidden_state
logits = outputs.logits

Summary

transformers is a more comprehensive library offering a wide range of models and techniques, while pointer-generator focuses specifically on pointer-generator networks. transformers provides greater flexibility and access to cutting-edge models but may be more complex to use. pointer-generator is simpler and more focused but limited in scope. The choice between them depends on the specific requirements of your project and the level of complexity you're comfortable with.

30,331

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

  • More comprehensive and feature-rich, supporting a wide range of NLP tasks
  • Actively maintained with regular updates and contributions from the research community
  • Highly scalable and optimized for performance on large datasets

Cons of fairseq

  • Steeper learning curve due to its complexity and extensive features
  • Requires more computational resources for training and inference
  • May be overkill for simpler NLP tasks or smaller projects

Code Comparison

pointer-generator:

attention_distribution = F.softmax(e, dim=1)
context_vector = torch.bmm(attention_distribution.unsqueeze(1), encoder_outputs).squeeze(1)
p_gen = torch.sigmoid(self.w_h(h) + self.w_s(s) + self.w_x(x))

fairseq:

attn_weights = F.softmax(attn_scores, dim=-1)
attn = torch.bmm(attn_weights.unsqueeze(1), encoder_out.transpose(0, 1)).squeeze(1)
x = self.output_layer(torch.cat((x, attn), dim=-1))

Both repositories implement attention mechanisms, but fairseq's implementation is more modular and integrated into a larger framework, while pointer-generator focuses specifically on the pointer-generator network for summarization tasks.

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Pros of text-to-text-transfer-transformer

  • More versatile, capable of handling multiple NLP tasks
  • Utilizes advanced transformer architecture for better performance
  • Supports pre-training on large datasets for improved generalization

Cons of text-to-text-transfer-transformer

  • Higher computational requirements and complexity
  • Steeper learning curve for implementation and fine-tuning
  • May be overkill for simpler text generation tasks

Code Comparison

pointer-generator:

attention_distribution = F.softmax(e, dim=1)
context_vector = torch.bmm(attention_distribution.unsqueeze(1), encoder_outputs).squeeze(1)
p_gen = torch.sigmoid(self.w_h(h) + self.w_s(s) + self.w_x(x))

text-to-text-transfer-transformer:

encoder_output = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
decoder_output = self.decoder(input_ids=decoder_input_ids, encoder_hidden_states=encoder_output.last_hidden_state)
lm_logits = self.lm_head(decoder_output.last_hidden_state)

The pointer-generator code focuses on attention mechanisms and copy probability, while text-to-text-transfer-transformer utilizes encoder-decoder architecture with self-attention layers. The latter offers more flexibility but requires more complex implementation.

19,863

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Pros of UniLM

  • More versatile, supporting multiple NLP tasks beyond summarization
  • Utilizes pre-training on large-scale datasets for improved performance
  • Implements a unified framework for both natural language understanding and generation

Cons of UniLM

  • More complex architecture, potentially requiring more computational resources
  • May have a steeper learning curve for implementation and fine-tuning
  • Less specialized for summarization tasks compared to Pointer-Generator

Code Comparison

Pointer-Generator:

attention_distribution = F.softmax(e, dim=1)
context_vector = torch.bmm(attention_distribution.unsqueeze(1), encoder_outputs).squeeze(1)
p_gen = torch.sigmoid(self.w_h(h) + self.w_s(s) + self.w_x(x))

UniLM:

attention_mask = self.get_attention_mask(input_ids, token_type_ids)
outputs = self.bert(input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask)
sequence_output = outputs[0]
prediction_scores = self.cls(sequence_output)

Summary

Pointer-Generator focuses specifically on text summarization, utilizing a hybrid pointer-generator network. UniLM, on the other hand, is a more general-purpose model supporting various NLP tasks through a unified pre-training approach. While UniLM offers greater versatility and potentially better performance due to pre-training, it may require more resources and expertise to implement effectively. Pointer-Generator remains a solid choice for summarization tasks, especially when computational resources are limited or a more specialized solution is preferred.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Note: this code is no longer actively maintained. However, feel free to use the Issues section to discuss the code with other users. Some users have updated this code for newer versions of Tensorflow and Python - see information below and Issues section.


This repository contains code for the ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks. For an intuitive overview of the paper, read the blog post.

Looking for test set output?

The test set output of the models described in the paper can be found here.

Looking for pretrained model?

A pretrained model is available here:

(The only difference between these two is the naming of some of the variables in the checkpoint. Tensorflow 1.0 uses lstm_cell/biases and lstm_cell/weights whereas Tensorflow 1.2.1 uses lstm_cell/bias and lstm_cell/kernel).

Note: This pretrained model is not the exact same model that is reported in the paper. That is, it is the same architecture, trained with the same settings, but resulting from a different training run. Consequently this pretrained model has slightly lower ROUGE scores than those reported in the paper. This is probably due to us slightly overfitting to the randomness in our original experiments (in the original experiments we tried various hyperparameter settings and selected the model that performed best). Repeating the experiment once with the same settings did not perform quite as well. Better results might be obtained from further hyperparameter tuning.

Why can't you release the trained model reported in the paper? Due to changes to the code between the original experiments and the time of releasing the code (e.g. TensorFlow version changes, lots of code cleanup), it is not possible to release the original trained model files.

Looking for CNN / Daily Mail data?

Instructions are here.

About this code

This code is based on the TextSum code from Google Brain.

This code was developed for Tensorflow 0.12, but has been updated to run with Tensorflow 1.0. In particular, the code in attention_decoder.py is based on tf.contrib.legacy_seq2seq_attention_decoder, which is now outdated. Tensorflow 1.0's new seq2seq library probably provides a way to do this (as well as beam search) more elegantly and efficiently in the future.

Python 3 version: This code is in Python 2. If you want a Python 3 version, see @becxer's fork.

How to run

Get the dataset

To obtain the CNN / Daily Mail dataset, follow the instructions here. Once finished, you should have chunked datafiles train_000.bin, ..., train_287.bin, val_000.bin, ..., val_013.bin, test_000.bin, ..., test_011.bin (each contains 1000 examples) and a vocabulary file vocab.

Note: If you did this before 7th May 2017, follow the instructions here to correct a bug in the process.

Run training

To train your model, run:

python run_summarization.py --mode=train --data_path=/path/to/chunked/train_* --vocab_path=/path/to/vocab --log_root=/path/to/a/log/directory --exp_name=myexperiment

This will create a subdirectory of your specified log_root called myexperiment where all checkpoints and other data will be saved. Then the model will start training using the train_*.bin files as training data.

Warning: Using default settings as in the above command, both initializing the model and running training iterations will probably be quite slow. To make things faster, try setting the following flags (especially max_enc_steps and max_dec_steps) to something smaller than the defaults specified in run_summarization.py: hidden_dim, emb_dim, batch_size, max_enc_steps, max_dec_steps, vocab_size.

Increasing sequence length during training: Note that to obtain the results described in the paper, we increase the values of max_enc_steps and max_dec_steps in stages throughout training (mostly so we can perform quicker iterations during early stages of training). If you wish to do the same, start with small values of max_enc_steps and max_dec_steps, then interrupt and restart the job with larger values when you want to increase them.

Run (concurrent) eval

You may want to run a concurrent evaluation job, that runs your model on the validation set and logs the loss. To do this, run:

python run_summarization.py --mode=eval --data_path=/path/to/chunked/val_* --vocab_path=/path/to/vocab --log_root=/path/to/a/log/directory --exp_name=myexperiment

Note: you want to run the above command using the same settings you entered for your training job.

Restoring snapshots: The eval job saves a snapshot of the model that scored the lowest loss on the validation data so far. You may want to restore one of these "best models", e.g. if your training job has overfit, or if the training checkpoint has become corrupted by NaN values. To do this, run your train command plus the --restore_best_model=1 flag. This will copy the best model in the eval directory to the train directory. Then run the usual train command again.

Run beam search decoding

To run beam search decoding:

python run_summarization.py --mode=decode --data_path=/path/to/chunked/val_* --vocab_path=/path/to/vocab --log_root=/path/to/a/log/directory --exp_name=myexperiment

Note: you want to run the above command using the same settings you entered for your training job (plus any decode mode specific flags like beam_size).

This will repeatedly load random examples from your specified datafile and generate a summary using beam search. The results will be printed to screen.

Visualize your output: Additionally, the decode job produces a file called attn_vis_data.json. This file provides the data necessary for an in-browser visualization tool that allows you to view the attention distributions projected onto the text. To use the visualizer, follow the instructions here.

If you want to run evaluation on the entire validation or test set and get ROUGE scores, set the flag single_pass=1. This will go through the entire dataset in order, writing the generated summaries to file, and then run evaluation using pyrouge. (Note this will not produce the attn_vis_data.json files for the attention visualizer).

Evaluate with ROUGE

decode.py uses the Python package pyrouge to run ROUGE evaluation. pyrouge provides an easier-to-use interface for the official Perl ROUGE package, which you must install for pyrouge to work. Here are some useful instructions on how to do this:

Note: As of 18th May 2017 the website for the official Perl package appears to be down. Unfortunately you need to download a directory called ROUGE-1.5.5 from there. As an alternative, it seems that you can get that directory from here (however, the version of pyrouge in that repo appears to be outdated, so best to install pyrouge from the official source).

Tensorboard

Run Tensorboard from the experiment directory (in the example above, myexperiment). You should be able to see data from the train and eval runs. If you select "embeddings", you should also see your word embeddings visualized.

Help, I've got NaNs!

For reasons that are difficult to diagnose, NaNs sometimes occur during training, making the loss=NaN and sometimes also corrupting the model checkpoint with NaN values, making it unusable. Here are some suggestions:

  • If training stopped with the Loss is not finite. Stopping. exception, you can just try restarting. It may be that the checkpoint is not corrupted.
  • You can check if your checkpoint is corrupted by using the inspect_checkpoint.py script. If it says that all values are finite, then your checkpoint is OK and you can try resuming training with it.
  • The training job is set to keep 3 checkpoints at any one time (see the max_to_keep variable in run_summarization.py). If your newer checkpoint is corrupted, it may be that one of the older ones is not. You can switch to that checkpoint by editing the checkpoint file inside the train directory.
  • Alternatively, you can restore a "best model" from the eval directory. See the note Restoring snapshots above.
  • If you want to try to diagnose the cause of the NaNs, you can run with the --debug=1 flag turned on. This will run Tensorflow Debugger, which checks for NaNs and diagnoses their causes during training.