Convert Figma logo to code with AI

graykode logonlp-tutorial

Natural Language Processing Tutorial for Deep Learning Researchers

14,049
3,903
14,049
36

Top Related Projects

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

29,635

💫 Industrial-strength Natural Language Processing (NLP) in Python

13,806

A very simple framework for state-of-the-art Natural Language Processing (NLP)

15,551

Topic Modelling for Humans

30,129

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

76,949

Models and examples built with TensorFlow

Quick Overview

The graykode/nlp-tutorial repository is a comprehensive collection of Natural Language Processing (NLP) tutorials implemented in PyTorch. It covers a wide range of NLP tasks and models, from basic text classification to advanced transformer architectures. The tutorials are designed to be easy to understand and implement, making it an excellent resource for both beginners and intermediate practitioners in the field of NLP.

Pros

  • Covers a wide range of NLP topics and models
  • Implementations are in PyTorch, a popular deep learning framework
  • Code is well-organized and easy to follow
  • Includes both basic and advanced NLP concepts

Cons

  • Some tutorials may not be up-to-date with the latest advancements in NLP
  • Limited explanations in some sections, which may be challenging for absolute beginners
  • Lacks extensive documentation or accompanying blog posts for deeper understanding
  • Some advanced topics might require additional background knowledge

Code Examples

  1. Basic text classification using CNN:
class TextCNN(nn.Module):
    def __init__(self):
        super(TextCNN, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(1, 3, (3, embedding_dim)),
            nn.ReLU(),
            nn.MaxPool2d((sequence_length - 3 + 1, 1)),
        )
        self.fc = nn.Linear(3, num_classes)

    def forward(self, X):
        batch_size = X.shape[0]
        embedding_X = self.embedding(X)
        embedding_X = embedding_X.unsqueeze(1)
        conved = self.conv(embedding_X)
        flatten = conved.view(batch_size, -1)
        output = self.fc(flatten)
        return output
  1. Implementing attention mechanism:
class Attention(nn.Module):
    def __init__(self):
        super(Attention, self).__init__()
        self.attn = nn.Linear(n_hidden * 2, n_hidden)
        self.v = nn.Parameter(torch.randn(n_hidden))

    def forward(self, hidden, encoder_outputs):
        batch_size = encoder_outputs.size(0)
        hidden = hidden.repeat(1, max_len, 1)
        energy = torch.tanh(self.attn(torch.cat([hidden, encoder_outputs], 2)))
        attention = torch.sum(self.v * energy, dim=2)
        return F.softmax(attention, dim=1).unsqueeze(1)
  1. Transformer encoder implementation:
class TransformerEncoder(nn.Module):
    def __init__(self):
        super(TransformerEncoder, self).__init__()
        self.enc_self_attn = MultiHeadAttention()
        self.pos_ffn = PoswiseFeedForwardNet()

    def forward(self, enc_inputs, enc_self_attn_mask):
        enc_outputs, attn = self.enc_self_attn(enc_inputs, enc_inputs, enc_inputs, enc_self_attn_mask)
        enc_outputs = self.pos_ffn(enc_outputs)
        return enc_outputs, attn

Getting Started

To get started with the NLP tutorials:

  1. Clone the repository:

    git clone https://github.com/graykode/nlp-tutorial.git
    
  2. Install the required dependencies:

    pip install torch torchtext numpy
    
  3. Navigate to the desired tutorial directory and run the Python script:

    cd nlp-tutorial/1-1.NNLM
    python NNLM.py
    

Make sure to have Python 3.6+ and PyTorch installed on your system before running the tutorials.

Competitor Comparisons

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pros of transformers

  • Comprehensive library with state-of-the-art models and architectures
  • Extensive documentation and community support
  • Regular updates and maintenance

Cons of transformers

  • Steeper learning curve for beginners
  • Larger codebase and dependencies
  • May be overkill for simple NLP tasks

Code comparison

nlp-tutorial:

class BERT(nn.Module):
    def __init__(self):
        super(BERT, self).__init__()
        self.embedding = nn.Embedding(vocab_size, d_model)
        self.pos_emb = nn.Embedding(max_pos, d_model)
        self.layers = nn.ModuleList([EncoderLayer() for _ in range(n_layers)])

    def forward(self, x):
        seq_len = x.size(1)
        pos = torch.arange(seq_len, dtype=torch.long)
        pos = pos.unsqueeze(0).expand_as(x)
        embedding = self.embedding(x) + self.pos_emb(pos)
        for layer in self.layers:
            embedding = layer(embedding)
        return embedding

transformers:

from transformers import BertModel, BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

nlp-tutorial provides a more hands-on approach to understanding NLP models, while transformers offers pre-trained models and easier integration for production use. nlp-tutorial is better for learning, while transformers is more suitable for practical applications.

29,635

💫 Industrial-strength Natural Language Processing (NLP) in Python

Pros of spaCy

  • Production-ready, optimized library for industrial-strength NLP tasks
  • Comprehensive documentation and extensive community support
  • Offers pre-trained models and easy integration with deep learning frameworks

Cons of spaCy

  • Steeper learning curve for beginners compared to simpler tutorials
  • Less flexibility for customizing low-level NLP components
  • Heavier resource requirements, especially for large language models

Code Comparison

nlp-tutorial:

import torch
import torch.nn as nn

class TextCNN(nn.Module):
    def __init__(self):
        super(TextCNN, self).__init__()
        self.conv = nn.Conv2d(1, 3, (3, word_vec_dim))

spaCy:

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sentence.")
for token in doc:
    print(token.text, token.pos_, token.dep_)

The nlp-tutorial repository provides basic implementations of various NLP models, making it ideal for learning and experimentation. In contrast, spaCy offers a more comprehensive, production-ready solution with pre-trained models and optimized performance. While nlp-tutorial allows for greater customization and understanding of model architectures, spaCy provides a higher-level API that simplifies many NLP tasks but may obscure some lower-level details.

13,806

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Pros of flair

  • Comprehensive NLP framework with pre-trained models
  • Active development and community support
  • Extensive documentation and examples

Cons of flair

  • Steeper learning curve for beginners
  • Larger codebase and dependencies
  • May be overkill for simple NLP tasks

Code Comparison

nlp-tutorial:

import torch
import torch.nn as nn

class TextCNN(nn.Module):
    def __init__(self):
        super(TextCNN, self).__init__()
        self.conv = nn.Conv2d(1, 3, (3, 3))

flair:

from flair.data import Sentence
from flair.models import SequenceTagger

tagger = SequenceTagger.load('ner')
sentence = Sentence('John Doe is visiting New York.')
tagger.predict(sentence)

Summary

nlp-tutorial is a collection of simple NLP implementations, ideal for learning and understanding core concepts. It's lightweight and easy to follow but lacks advanced features.

flair is a full-fledged NLP library with state-of-the-art models and extensive functionality. It's more suitable for production use and complex NLP tasks but may be overwhelming for beginners.

Choose nlp-tutorial for educational purposes and quick prototypes, and flair for robust NLP applications and research projects.

15,551

Topic Modelling for Humans

Pros of gensim

  • Comprehensive library for topic modeling, document indexing, and similarity retrieval
  • Efficient implementation of popular algorithms like Word2Vec, Doc2Vec, and LDA
  • Scalable and optimized for large datasets

Cons of gensim

  • Steeper learning curve for beginners compared to nlp-tutorial
  • Less focus on deep learning-based NLP techniques
  • May require additional libraries for certain tasks

Code Comparison

nlp-tutorial:

import torch
import torch.nn as nn

class TextCNN(nn.Module):
    def __init__(self):
        super(TextCNN, self).__init__()
        self.conv = nn.Conv2d(1, 3, (3, 2))

gensim:

from gensim.models import Word2Vec

sentences = [["cat", "say", "meow"], ["dog", "say", "woof"]]
model = Word2Vec(sentences, min_count=1)

The nlp-tutorial repository focuses on implementing various NLP models from scratch using PyTorch, providing a hands-on learning experience. In contrast, gensim offers pre-implemented, production-ready algorithms for various NLP tasks, emphasizing efficiency and scalability. While nlp-tutorial is excellent for understanding the inner workings of NLP models, gensim is more suitable for practical applications and large-scale projects.

30,129

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

  • More comprehensive and production-ready toolkit for sequence modeling
  • Actively maintained by Facebook AI Research with frequent updates
  • Supports a wider range of models and tasks, including machine translation and speech recognition

Cons of fairseq

  • Steeper learning curve due to its complexity and extensive features
  • Requires more computational resources for training and inference
  • Less suitable for beginners or those looking for simple NLP implementations

Code Comparison

nlp-tutorial:

class TextCNN(nn.Module):
    def __init__(self):
        super(TextCNN, self).__init__()
        self.conv = nn.Conv2d(1, num_filters, (filter_sizes, embedding_size))
        self.fc = nn.Linear(num_filters * len(filter_sizes), num_classes)

fairseq:

class TransformerModel(FairseqEncoderDecoderModel):
    def __init__(self, args, encoder, decoder):
        super().__init__(encoder, decoder)
        self.args = args
        self.supports_align_args = True

The nlp-tutorial repository provides simpler, more focused implementations of NLP models, making it ideal for learning and understanding core concepts. In contrast, fairseq offers a more sophisticated and flexible framework for building and training advanced sequence models, but with increased complexity.

76,949

Models and examples built with TensorFlow

Pros of models

  • Comprehensive collection of official TensorFlow models and examples
  • Well-maintained with regular updates and contributions from the TensorFlow team
  • Extensive documentation and integration with TensorFlow ecosystem

Cons of models

  • Large and complex repository, potentially overwhelming for beginners
  • Focuses primarily on TensorFlow, limiting flexibility for other frameworks
  • May include more advanced models that require significant computational resources

Code Comparison

nlp-tutorial:

class TextCNN(nn.Module):
    def __init__(self):
        super(TextCNN, self).__init__()
        self.conv = nn.Conv2d(1, num_filters, (filter_sizes, embedding_size))
        self.dropout = nn.Dropout(dropout)
        self.fc = nn.Linear(num_filters * len(filter_sizes), num_classes)

models:

class TextCNN(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, num_filters, filter_sizes, num_classes):
        super(TextCNN, self).__init__()
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
        self.conv_layers = [tf.keras.layers.Conv1D(num_filters, fs, activation='relu') for fs in filter_sizes]
        self.dropout = tf.keras.layers.Dropout(0.5)
        self.fc = tf.keras.layers.Dense(num_classes, activation='softmax')

The nlp-tutorial example uses PyTorch, while models uses TensorFlow/Keras. The models implementation is more verbose but offers greater flexibility in defining model architecture.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

nlp-tutorial

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch. Most of the models in NLP were implemented with less than 100 lines of code.(except comments or blank lines)

  • [08-14-2020] Old TensorFlow v1 code is archived in the archive folder. For beginner readability, only pytorch version 1.0 or higher is supported.

Curriculum - (Example Purpose)

1. Basic Embedding Model

2. CNN(Convolutional Neural Network)

3. RNN(Recurrent Neural Network)

4. Attention Mechanism

5. Model based on Transformer

Dependencies

  • Python 3.5+
  • Pytorch 1.0.0+

Author