DeepMoji

State-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc.

1,551

318

1,551

View on GitHub

Top Related Projects

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

ParlAI

10,607

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

models

77,618

Models and examples built with TensorFlow

spaCy

31,840

💫 Industrial-strength Natural Language Processing (NLP) in Python

flair

14,239

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Quick Overview

DeepMoji is a deep learning model for understanding emoji usage and emotional content in text. It was trained on a large dataset of tweets and can predict emoji usage, analyze sentiment, and detect sarcasm in text. The project includes pre-trained models and tools for using DeepMoji in various natural language processing tasks.

Pros

Powerful emotion and sentiment analysis capabilities
Pre-trained models available for immediate use
Can be fine-tuned for specific tasks or domains
Includes visualization tools for emoji predictions

Cons

Primarily focused on English language text
May not perform as well on formal or non-social media text
Requires deep learning expertise for advanced usage or modifications
Limited documentation for some advanced features

Code Examples

Predicting emojis for a given text:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis
from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH

model = deepmoji_emojis(PRETRAINED_PATH)
st = SentenceTokenizer(VOCAB_PATH)

text = "I love machine learning!"
tokenized, _, _ = st.tokenize_sentences([text])
prob = model.predict(tokenized)

print(f"Top 5 emojis for '{text}':")
for i in prob[0].argsort()[-5:][::-1]:
    print(f"{emoji.emojize(EMOJI_UNICODE[i])} - {prob[0][i]:.4f}")

Extracting features from text:

from deepmoji.model_def import deepmoji_feature_encoding
from deepmoji.global_variables import PRETRAINED_PATH

model = deepmoji_feature_encoding(PRETRAINED_PATH)
text = "This is amazing!"
tokenized, _, _ = st.tokenize_sentences([text])
features = model.predict(tokenized)

print(f"Feature vector for '{text}':")
print(features[0][:10])  # Print first 10 features

Fine-tuning the model for sentiment analysis:

from deepmoji.finetuning import load_benchmark
from deepmoji.class_avg_finetuning import class_avg_finetune

data = load_benchmark("SE0714", vocab_path=VOCAB_PATH)
model = deepmoji_transfer(PRETRAINED_PATH)
model, f1 = class_avg_finetune(model, data['texts'], data['labels'], nb_classes=2, batch_size=32, epochs=1)

print(f"Fine-tuned model F1 score: {f1:.4f}")

Getting Started

Install DeepMoji:
```
pip install deepmoji
```

Download pre-trained models:

from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH
import urllib.request

urllib.request.urlretrieve("https://github.com/bfelbo/DeepMoji/raw/master/model/vocabulary.json", VOCAB_PATH)
urllib.request.urlretrieve("https://github.com/bfelbo/DeepMoji/raw/master/model/deepmoji_weights.hdf5", PRETRAINED_PATH)

Use DeepMoji in your project:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis
from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH

model = deepmoji_emojis(PRETRAINED_PATH)
st = SentenceTokenizer(VOCAB_PATH)

# Your code here

Competitor Comparisons

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

Broader scope: Supports a wide range of NLP tasks and models
Active development: Regularly updated with new models and features
Extensive documentation and community support

Cons of transformers

Higher complexity: Steeper learning curve for beginners
Resource-intensive: Requires more computational power for many models

Code comparison

deepmoji:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis
from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH

model = deepmoji_emojis(PRETRAINED_PATH)
tokenizer = SentenceTokenizer(VOCAB_PATH)

transformers:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

Summary

transformers offers a more comprehensive NLP toolkit with broader applications, while deepmoji focuses specifically on emoji prediction. transformers benefits from active development and extensive documentation but may be more complex and resource-intensive. deepmoji provides a simpler, more focused solution for emoji-related tasks.

ParlAI

10,607

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

Pros of ParlAI

Broader scope: ParlAI is a comprehensive platform for dialogue AI research, supporting various tasks and models
Active development: Regularly updated with new features and improvements
Extensive documentation and examples: Provides thorough guides and tutorials for users

Cons of ParlAI

Steeper learning curve: More complex to set up and use due to its extensive features
Resource-intensive: Requires more computational resources for training and running models

Code Comparison

DeepMoji:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis
from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH

model = deepmoji_emojis(PRETRAINED_PATH)
tokenizer = SentenceTokenizer(VOCAB_PATH)

ParlAI:

from parlai.core.params import ParlaiParser
from parlai.core.agents import create_agent
from parlai.core.worlds import create_task

parser = ParlaiParser(True, True)
opt = parser.parse_args()
agent = create_agent(opt)
world = create_task(opt, agent)

vaderSentiment

4,778

Pros of VADER Sentiment

Lightweight and easy to use, requiring no training or external data
Designed specifically for social media text, handling slang and emoticons well
Fast execution, suitable for real-time analysis

Cons of VADER Sentiment

Limited to English language text
May struggle with complex or nuanced sentiments
Less accurate on formal or domain-specific text compared to deep learning models

Code Comparison

VADER Sentiment:

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()
sentiment = analyzer.polarity_scores("Hello world!")
print(sentiment)

DeepMoji:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis
from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH

model = deepmoji_emojis(PRETRAINED_PATH)
tokenizer = SentenceTokenizer(VOCAB_PATH)
tokens, _, _ = tokenizer.tokenize_sentences(["Hello world!"])
prob = model.predict(tokens)

DeepMoji offers more sophisticated sentiment analysis using deep learning, potentially providing more accurate results for complex texts. However, it requires more setup, computational resources, and may be slower for real-time applications compared to VADER Sentiment's simplicity and speed.

models

77,618

Models and examples built with TensorFlow

Pros of TensorFlow Models

Extensive collection of pre-trained models and implementations
Regularly updated with new models and features
Backed by Google, ensuring long-term support and development

Cons of TensorFlow Models

Larger and more complex repository, potentially overwhelming for beginners
May require more computational resources due to its comprehensive nature

Code Comparison

DeepMoji:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis
from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH

maxlen = 30
model = deepmoji_emojis(maxlen, PRETRAINED_PATH)

TensorFlow Models:

import tensorflow as tf
import tensorflow_hub as hub

model = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
embeddings = model(["Hello, world!"])

Summary

DeepMoji focuses specifically on emoji prediction and sentiment analysis, while TensorFlow Models offers a broader range of pre-trained models and implementations. DeepMoji may be more suitable for projects specifically related to emoji prediction, while TensorFlow Models provides a more comprehensive toolkit for various machine learning tasks. The code examples demonstrate the different approaches: DeepMoji uses a custom model definition, while TensorFlow Models leverages TensorFlow Hub for easy model loading and usage.

spaCy

31,840

💫 Industrial-strength Natural Language Processing (NLP) in Python

Pros of spaCy

Comprehensive NLP library with a wide range of functionalities
Highly optimized for performance and efficiency
Active development and large community support

Cons of spaCy

Steeper learning curve due to its extensive features
Larger memory footprint compared to more specialized libraries

Code Comparison

spaCy:

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sentence.")
for token in doc:
    print(token.text, token.pos_, token.dep_)

DeepMoji:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis

maxlen = 30
model = deepmoji_emojis(maxlen, weight_path)
tokenizer = SentenceTokenizer()

Key Differences

spaCy is a general-purpose NLP library, while DeepMoji focuses on emoji prediction and sentiment analysis
spaCy offers more comprehensive language processing capabilities, including part-of-speech tagging and dependency parsing
DeepMoji is more specialized for understanding emotional content in text

Use Cases

Choose spaCy for broad NLP tasks and when performance is crucial
Opt for DeepMoji when working specifically with emoji prediction or sentiment analysis in social media contexts

flair

14,239

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Pros of Flair

Broader NLP capabilities: Flair offers a wide range of NLP tasks beyond sentiment analysis, including named entity recognition, part-of-speech tagging, and text classification.
Active development: Flair is regularly updated with new features and improvements, ensuring it stays current with the latest NLP advancements.
Extensive language support: Flair provides pre-trained models for numerous languages, making it more versatile for multilingual applications.

Cons of Flair

Higher complexity: Flair's broader scope may require a steeper learning curve compared to DeepMoji's focused approach to emoji prediction and sentiment analysis.
Resource intensity: Flair's more comprehensive models can be more resource-intensive, potentially requiring more computational power for training and inference.

Code Comparison

DeepMoji:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis
from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH

model = deepmoji_emojis(PRETRAINED_PATH)
tokenizer = SentenceTokenizer(VOCAB_PATH)

Flair:

from flair.models import TextClassifier
from flair.data import Sentence

classifier = TextClassifier.load('en-sentiment')
sentence = Sentence('I love this movie!')
classifier.predict(sentence)

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

------ Update September 2023 ------

The online demo is no longer available as it's not possible for us to renew the certificate. The code in this repo still works, but you might have to make some changes for it to work in Python 3 (see the open PRs). You can also check out the PyTorch version of this algorithm called torchMoji made by HuggingFace.

DeepMoji

(click image for video demonstration)

DeepMoji is a model trained on 1.2 billion tweets with emojis to understand how language is used to express emotions. Through transfer learning the model can obtain state-of-the-art performance on many emotion-related text modeling tasks.

See the paper or blog post for more details.

Overview

deepmoji/ contains all the underlying code needed to convert a dataset to our vocabulary and use our model.
examples/ contains short code snippets showing how to convert a dataset to our vocabulary, load up the model and run it on that dataset.
scripts/ contains code for processing and analysing datasets to reproduce results in the paper.
model/ contains the pretrained model and vocabulary.
data/ contains raw and processed datasets that we include in this repository for testing.
tests/ contains unit tests for the codebase.

To start out with, have a look inside the examples/ directory. See score_texts_emojis.py for how to use DeepMoji to extract emoji predictions, encode_texts.py for how to convert text into 2304-dimensional emotional feature vectors or finetune_youtube_last.py for how to use the model for transfer learning on a new dataset.

Please consider citing our paper if you use our model or code (see below for citation).

Frameworks

This code is based on Keras, which requires either Theano or Tensorflow as the backend. If you would rather use pyTorch there's an implementation available here, which has kindly been provided by Thomas Wolf.

Installation

We assume that you're using Python 2.7 with pip installed. As a backend you need to install either Theano (version 0.9+) or Tensorflow (version 1.3+). Once that's done you need to run the following inside the root directory to install the remaining dependencies:

pip install -e .

This will install the following dependencies:

Keras (the library was tested on version 2.0.5 but anything above 2.0.0 should work)
scikit-learn
h5py
text-unidecode
emoji

Ensure that Keras uses your chosen backend. You can find the instructions here, under the Switching from one backend to another section.

Run the included script, which downloads the pretrained DeepMoji weights (~85MB) from here and places them in the model/ directory:

python scripts/download_weights.py

Testing

To run the tests, install nose. After installing, navigate to the tests/ directory and run:

nosetests -v

By default, this will also run finetuning tests. These tests train the model for one epoch and then check the resulting accuracy, which may take several minutes to finish. If you'd prefer to exclude those, run the following instead:

nosetests -v -a '!slow'

Disclaimer

This code has been tested to work with Python 2.7 on an Ubuntu 16.04 machine. It has not been optimized for efficiency, but should be fast enough for most purposes. We do not give any guarantees that there are no bugs - use the code on your own responsibility!

Contributions

We welcome pull requests if you feel like something could be improved. You can also greatly help us by telling us how you felt when writing your most recent tweets. Just click here to contribute.

License

This code and the pretrained model is licensed under the MIT license.

Benchmark datasets

The benchmark datasets are uploaded to this repository for convenience purposes only. They were not released by us and we do not claim any rights on them. Use the datasets at your responsibility and make sure you fulfill the licenses that they were released with. If you use any of the benchmark datasets please consider citing the original authors.

Twitter dataset

We sadly cannot release our large Twitter dataset of tweets with emojis due to licensing restrictions.

Citation

@inproceedings{felbo2017,
  title={Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm},
  author={Felbo, Bjarke and Mislove, Alan and S{\o}gaard, Anders and Rahwan, Iyad and Lehmann, Sune},
  booktitle={Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2017}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot