Convert Figma logo to code with AI

bfelbo logoDeepMoji

State-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc.

1,511
313
1,511
12

Top Related Projects

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

25,883

Library for fast text representation and classification.

30,447

💫 Industrial-strength Natural Language Processing (NLP) in Python

15,616

Topic Modelling for Humans

186,879

An Open Source Machine Learning Framework for Everyone

85,015

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Quick Overview

DeepMoji is a deep learning model for understanding emoji usage and emotional content in text. It was trained on a large dataset of tweets and can predict emoji usage, analyze sentiment, and detect sarcasm in text. The project includes pre-trained models and tools for using DeepMoji in various natural language processing tasks.

Pros

  • Powerful emotion and sentiment analysis capabilities
  • Pre-trained models available for immediate use
  • Can be fine-tuned for specific tasks or domains
  • Includes visualization tools for emoji predictions

Cons

  • May have biases based on the Twitter dataset used for training
  • Limited to English language text
  • Requires deep learning expertise for advanced usage or modifications
  • Emoji predictions may not always align with human interpretations

Code Examples

  1. Predicting emojis for a given text:
from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis
from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH

model = deepmoji_emojis(PRETRAINED_PATH)
st = SentenceTokenizer(VOCAB_PATH)

text = "I love machine learning!"
tokenized, _, _ = st.tokenize_sentences([text])
prob = model.predict(tokenized)

print(f"Top 5 emojis for '{text}':")
for i in prob[0].argsort()[-5:][::-1]:
    print(f"{emoji.emojize(EMOJI_UNICODE[i])} - {prob[0][i]:.4f}")
  1. Extracting features from text:
from deepmoji.model_def import deepmoji_feature_encoding
from deepmoji.global_variables import PRETRAINED_PATH

model = deepmoji_feature_encoding(PRETRAINED_PATH)
text = "This project is amazing!"
tokenized, _, _ = st.tokenize_sentences([text])
encoding = model.predict(tokenized)

print(f"Feature encoding for '{text}':")
print(encoding)
  1. Fine-tuning the model for sentiment analysis:
from deepmoji.finetuning import load_benchmark
from deepmoji.class_avg_finetuning import class_avg_finetune

data = load_benchmark("SE0714", vocab_path=VOCAB_PATH)
model = deepmoji_transfer(PRETRAINED_PATH)
model, f1 = class_avg_finetune(model, data['texts'], data['labels'], nb_classes=2, batch_size=32, epochs=1)

print(f"Fine-tuned model F1 score: {f1:.4f}")

Getting Started

  1. Install DeepMoji:
pip install deepmoji
  1. Download pre-trained models:
from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH
import urllib.request

urllib.request.urlretrieve("https://github.com/bfelbo/DeepMoji/raw/master/model/vocabulary.json", VOCAB_PATH)
urllib.request.urlretrieve("https://github.com/bfelbo/DeepMoji/raw/master/model/deepmoji_weights.hdf5", PRETRAINED_PATH)
  1. Use DeepMoji in your project:
from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis
from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH

model = deepmoji_emojis(PRETRAINED_PATH)
st = SentenceTokenizer(VOCAB_PATH)

# Your code here

Competitor Comparisons

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pros of transformers

  • Broader scope: Supports a wide range of NLP tasks and models
  • Active development: Regularly updated with new models and features
  • Extensive documentation and community support

Cons of transformers

  • Higher complexity: Steeper learning curve for beginners
  • Resource-intensive: Requires more computational power for many models

Code comparison

DeepMoji:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis

maxlen = 30
model = deepmoji_emojis(maxlen, weight_path)
tokenizer = SentenceTokenizer()

tokens, _, _ = tokenizer.tokenize_sentences([text])
prob = model.predict(tokens)

transformers:

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
result = classifier(text)[0]
label = result['label']
score = result['score']

Summary

transformers offers a more versatile and actively maintained toolkit for various NLP tasks, while DeepMoji focuses specifically on emoji prediction and sentiment analysis. transformers provides a higher-level API, making it easier to use out-of-the-box for many tasks, but may require more resources. DeepMoji offers a more specialized approach for emoji-related tasks with potentially lower resource requirements.

25,883

Library for fast text representation and classification.

Pros of fastText

  • Efficient for large-scale text classification and word representation learning
  • Supports multiple languages and can handle out-of-vocabulary words
  • Lightweight and fast, suitable for production environments

Cons of fastText

  • Less specialized for emotion analysis compared to DeepMoji
  • May require more data preprocessing for specific tasks like sentiment analysis
  • Limited in capturing complex contextual information

Code Comparison

DeepMoji:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis
from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH

model = deepmoji_emojis(PRETRAINED_PATH)
tokenizer = SentenceTokenizer(VOCAB_PATH)

fastText:

import fasttext

model = fasttext.train_supervised("train.txt")
result = model.predict("example text")

DeepMoji focuses on emoji prediction and sentiment analysis, while fastText is more versatile for general text classification tasks. DeepMoji requires specific pretrained models and tokenizers, whereas fastText has a simpler API for training and prediction. fastText is more suitable for large-scale applications and multi-language support, while DeepMoji excels in emotion-related tasks with its specialized architecture.

30,447

💫 Industrial-strength Natural Language Processing (NLP) in Python

Pros of spaCy

  • Comprehensive NLP library with a wide range of functionalities
  • Highly optimized for performance and efficiency
  • Large community support and regular updates

Cons of spaCy

  • Steeper learning curve due to its extensive features
  • Larger memory footprint compared to more specialized libraries

Code Comparison

spaCy:

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sentence.")
for token in doc:
    print(token.text, token.pos_, token.dep_)

DeepMoji:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis

maxlen = 30
model = deepmoji_emojis(maxlen, weight_path)
tokenizer = SentenceTokenizer()

Key Differences

  • spaCy is a general-purpose NLP library, while DeepMoji focuses on emoji prediction and sentiment analysis
  • spaCy offers more comprehensive language processing capabilities, whereas DeepMoji specializes in understanding emotional content
  • spaCy has a larger user base and more frequent updates, while DeepMoji is more specialized but less actively maintained

Use Cases

  • Choose spaCy for general NLP tasks, including tokenization, part-of-speech tagging, and named entity recognition
  • Opt for DeepMoji when working specifically with emoji prediction or sentiment analysis in social media contexts
15,616

Topic Modelling for Humans

Pros of Gensim

  • Broader scope for general-purpose text processing and topic modeling
  • More extensive documentation and community support
  • Actively maintained with frequent updates

Cons of Gensim

  • Less specialized for emoji-related tasks
  • May require more setup and configuration for specific use cases
  • Potentially steeper learning curve for beginners

Code Comparison

DeepMoji example:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis

maxlen = 30
model = deepmoji_emojis(maxlen, weight_path)
tokenizer = SentenceTokenizer()

tokens, _, _ = tokenizer.tokenize_sentences(["Hello world!"])
prob = model.predict(tokens)

Gensim example:

from gensim.models import Word2Vec
from gensim.utils import simple_preprocess

sentences = [["Hello", "world"], ["Another", "example"]]
model = Word2Vec(sentences, min_count=1)

vector = model.wv["Hello"]
similar_words = model.wv.most_similar("Hello")

Both libraries offer powerful text processing capabilities, but DeepMoji focuses on emoji prediction and sentiment analysis, while Gensim provides a broader range of natural language processing tools. The choice between them depends on the specific requirements of your project.

186,879

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

  • Comprehensive machine learning framework with broad capabilities
  • Large community and extensive documentation
  • Supports multiple programming languages and platforms

Cons of TensorFlow

  • Steeper learning curve for beginners
  • Can be resource-intensive for smaller projects
  • More complex setup and configuration

Code Comparison

DeepMoji:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis
from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH

model = deepmoji_emojis(PRETRAINED_PATH)
tokenizer = SentenceTokenizer(VOCAB_PATH)

TensorFlow:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

DeepMoji is focused on emoji prediction and sentiment analysis, while TensorFlow is a general-purpose machine learning framework. DeepMoji offers a simpler API for its specific use case, whereas TensorFlow provides more flexibility but requires more setup. TensorFlow's code example demonstrates a basic neural network model, while DeepMoji's code shows how to load a pre-trained model for emoji prediction.

85,015

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

  • Broader scope and functionality as a general-purpose deep learning framework
  • Larger community and more extensive documentation
  • More frequent updates and active development

Cons of PyTorch

  • Steeper learning curve for beginners
  • Larger codebase and installation size
  • Not specifically optimized for emoji prediction or sentiment analysis

Code Comparison

DeepMoji:

from deepmoji.sentence_tokenizer import SentenceTokenizer
from deepmoji.model_def import deepmoji_emojis
from deepmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH

model = deepmoji_emojis(PRETRAINED_PATH)
tokenizer = SentenceTokenizer(VOCAB_PATH)

PyTorch:

import torch
import torch.nn as nn
import torch.optim as optim

model = nn.Sequential(
    nn.Linear(input_size, hidden_size),
    nn.ReLU(),
    nn.Linear(hidden_size, output_size)
)
optimizer = optim.Adam(model.parameters(), lr=0.001)

DeepMoji is specifically designed for emoji prediction and sentiment analysis, while PyTorch is a more versatile deep learning framework. DeepMoji provides pre-trained models and a simpler API for its specific use case, whereas PyTorch offers greater flexibility for building custom neural network architectures across various domains.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

------ Update September 2023 ------

The online demo is no longer available as it's not possible for us to renew the certificate. The code in this repo still works, but you might have to make some changes for it to work in Python 3 (see the open PRs). You can also check out the PyTorch version of this algorithm called torchMoji made by HuggingFace.

DeepMoji

DeepMoji Youtube
(click image for video demonstration)

DeepMoji is a model trained on 1.2 billion tweets with emojis to understand how language is used to express emotions. Through transfer learning the model can obtain state-of-the-art performance on many emotion-related text modeling tasks.

See the paper or blog post for more details.

Overview

  • deepmoji/ contains all the underlying code needed to convert a dataset to our vocabulary and use our model.
  • examples/ contains short code snippets showing how to convert a dataset to our vocabulary, load up the model and run it on that dataset.
  • scripts/ contains code for processing and analysing datasets to reproduce results in the paper.
  • model/ contains the pretrained model and vocabulary.
  • data/ contains raw and processed datasets that we include in this repository for testing.
  • tests/ contains unit tests for the codebase.

To start out with, have a look inside the examples/ directory. See score_texts_emojis.py for how to use DeepMoji to extract emoji predictions, encode_texts.py for how to convert text into 2304-dimensional emotional feature vectors or finetune_youtube_last.py for how to use the model for transfer learning on a new dataset.

Please consider citing our paper if you use our model or code (see below for citation).

Frameworks

This code is based on Keras, which requires either Theano or Tensorflow as the backend. If you would rather use pyTorch there's an implementation available here, which has kindly been provided by Thomas Wolf.

Installation

We assume that you're using Python 2.7 with pip installed. As a backend you need to install either Theano (version 0.9+) or Tensorflow (version 1.3+). Once that's done you need to run the following inside the root directory to install the remaining dependencies:

pip install -e .

This will install the following dependencies:

Ensure that Keras uses your chosen backend. You can find the instructions here, under the Switching from one backend to another section.

Run the included script, which downloads the pretrained DeepMoji weights (~85MB) from here and places them in the model/ directory:

python scripts/download_weights.py

Testing

To run the tests, install nose. After installing, navigate to the tests/ directory and run:

nosetests -v

By default, this will also run finetuning tests. These tests train the model for one epoch and then check the resulting accuracy, which may take several minutes to finish. If you'd prefer to exclude those, run the following instead:

nosetests -v -a '!slow'

Disclaimer

This code has been tested to work with Python 2.7 on an Ubuntu 16.04 machine. It has not been optimized for efficiency, but should be fast enough for most purposes. We do not give any guarantees that there are no bugs - use the code on your own responsibility!

Contributions

We welcome pull requests if you feel like something could be improved. You can also greatly help us by telling us how you felt when writing your most recent tweets. Just click here to contribute.

License

This code and the pretrained model is licensed under the MIT license.

Benchmark datasets

The benchmark datasets are uploaded to this repository for convenience purposes only. They were not released by us and we do not claim any rights on them. Use the datasets at your responsibility and make sure you fulfill the licenses that they were released with. If you use any of the benchmark datasets please consider citing the original authors.

Twitter dataset

We sadly cannot release our large Twitter dataset of tweets with emojis due to licensing restrictions.

Citation

@inproceedings{felbo2017,
  title={Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm},
  author={Felbo, Bjarke and Mislove, Alan and S{\o}gaard, Anders and Rahwan, Iyad and Lehmann, Sune},
  booktitle={Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2017}
}