unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

21,586

2,644

21,586

670

View on GitHub

Top Related Projects

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

bert

39,267

TensorFlow code and pre-trained models for BERT

gpt-3

15,769

GPT-3: Language Models are Few-Shot Learners

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

allennlp

11,862

An open-source NLP research library, built on PyTorch.

Quick Overview

UniLM (Unified Language Model) is a project by Microsoft Research that aims to unify various natural language processing tasks under a single pre-trained model. It supports tasks such as natural language understanding, generation, and translation, making it a versatile tool for researchers and developers working on NLP applications.

Pros

Versatile: Supports multiple NLP tasks with a single pre-trained model
State-of-the-art performance: Achieves competitive results on various benchmarks
Extensible: Can be fine-tuned for specific tasks or domains
Active development: Regularly updated with new models and improvements

Cons

Resource-intensive: Requires significant computational resources for training and inference
Complex architecture: May be challenging for beginners to understand and implement
Limited documentation: Some aspects of the project may lack detailed explanations
Dependency on specific frameworks: Primarily built on PyTorch, which may limit integration options

Code Examples

Loading a pre-trained UniLM model:

from transformers import UniLMForConditionalGeneration, UniLMTokenizer

model = UniLMForConditionalGeneration.from_pretrained("microsoft/unilm-base-cased")
tokenizer = UniLMTokenizer.from_pretrained("microsoft/unilm-base-cased")

Generating text using UniLM:

input_text = "Translate English to French: Hello, how are you?"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

outputs = model.generate(input_ids, max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Fine-tuning UniLM for a specific task:

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()

Getting Started

To get started with UniLM, follow these steps:

Install the required dependencies:

pip install torch transformers

Load a pre-trained UniLM model:

from transformers import UniLMForConditionalGeneration, UniLMTokenizer

model = UniLMForConditionalGeneration.from_pretrained("microsoft/unilm-base-cased")
tokenizer = UniLMTokenizer.from_pretrained("microsoft/unilm-base-cased")

Use the model for your desired NLP task, such as text generation or translation, as shown in the code examples above.
For more advanced usage and fine-tuning, refer to the project's documentation and examples on the GitHub repository.

Competitor Comparisons

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of Transformers

Broader model support: Includes a wide range of transformer-based models from various sources
Extensive documentation and community support
Easier integration with popular deep learning frameworks like PyTorch and TensorFlow

Cons of Transformers

Larger codebase and potentially steeper learning curve for beginners
May have slower inference times for some models compared to UniLM's optimized implementations

Code Comparison

UniLM:

from unilm import UniLMTokenizer, UniLMForConditionalGeneration

tokenizer = UniLMTokenizer.from_pretrained("microsoft/unilm-base-cased")
model = UniLMForConditionalGeneration.from_pretrained("microsoft/unilm-base-cased")

Transformers:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("t5-base")
model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")

Both repositories provide easy-to-use interfaces for loading pre-trained models and tokenizers. UniLM focuses specifically on unified language models, while Transformers offers a more diverse range of models and tasks. The code snippets demonstrate the similarity in usage, with Transformers providing a more generalized approach through its Auto classes.

bert

39,267

TensorFlow code and pre-trained models for BERT

Pros of BERT

Widely adopted and well-established in the NLP community
Extensive documentation and pre-trained models available
Strong performance on various NLP tasks, especially for understanding context

Cons of BERT

Limited to text understanding tasks, not designed for generation
Requires fine-tuning for specific tasks, which can be resource-intensive
Fixed input length, which can be a limitation for longer sequences

Code Comparison

BERT example:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

UniLM example:

from unilm import UniLMTokenizer, UniLMForConditionalGeneration
tokenizer = UniLMTokenizer.from_pretrained('unilm-base-cased')
model = UniLMForConditionalGeneration.from_pretrained('unilm-base-cased')
inputs = tokenizer("summarize: " + text, return_tensors="pt")
summary_ids = model.generate(inputs['input_ids'])

UniLM offers a more versatile approach, supporting both understanding and generation tasks, while BERT focuses primarily on understanding. UniLM's architecture allows for easier adaptation to various NLP tasks without extensive fine-tuning, making it more flexible for diverse applications.

gpt-3

15,769

GPT-3: Language Models are Few-Shot Learners

Pros of GPT-3

More powerful and versatile for general language tasks
Larger model with better performance on complex reasoning
Extensive API and documentation for easier integration

Cons of GPT-3

Closed-source and requires paid API access
Limited customization options for specific use cases
Higher computational requirements for deployment

Code Comparison

While a direct code comparison is not possible due to GPT-3's closed-source nature, we can compare usage examples:

UniLM:

from unilm import UniLMTokenizer, UniLMForConditionalGeneration

tokenizer = UniLMTokenizer.from_pretrained("microsoft/unilm-base-cased")
model = UniLMForConditionalGeneration.from_pretrained("microsoft/unilm-base-cased")

GPT-3 (using OpenAI API):

import openai

openai.api_key = "your-api-key"
response = openai.Completion.create(engine="davinci", prompt="Hello, world!")

Key Differences

UniLM is open-source and can be fine-tuned for specific tasks
GPT-3 offers a simpler API interface but less flexibility
UniLM focuses on unified language modeling across various NLP tasks
GPT-3 excels in zero-shot and few-shot learning scenarios

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

More extensive documentation and examples
Broader range of supported tasks and architectures
Active community with frequent updates and contributions

Cons of fairseq

Steeper learning curve for beginners
Less focus on unified language models
Potentially more complex setup process

Code Comparison

fairseq:

from fairseq.models.transformer import TransformerModel

model = TransformerModel.from_pretrained('/path/to/model', checkpoint_file='model.pt')
tokens = model.encode('Hello world!')
output = model.decode(tokens)

UniLM:

from unilm import UniLMTokenizer, UniLMForConditionalGeneration

tokenizer = UniLMTokenizer.from_pretrained('unilm-base-cased')
model = UniLMForConditionalGeneration.from_pretrained('unilm-base-cased')
input_ids = tokenizer.encode('Hello world!', return_tensors='pt')
outputs = model.generate(input_ids)

Both repositories offer powerful tools for natural language processing tasks. fairseq provides a wider range of models and tasks, making it suitable for diverse research projects. UniLM focuses on unified language models, offering a more streamlined approach for specific use cases. fairseq may be more challenging for beginners but offers greater flexibility, while UniLM provides a more straightforward implementation for unified language modeling tasks.

allennlp

11,862

An open-source NLP research library, built on PyTorch.

Pros of AllenNLP

More comprehensive documentation and tutorials
Wider range of pre-built models and tasks
Stronger focus on research-oriented features

Cons of AllenNLP

Steeper learning curve for beginners
Less emphasis on multi-modal and cross-lingual tasks
Slower development cycle compared to UniLM

Code Comparison

AllenNLP:

from allennlp.predictors import Predictor

predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz")
result = predictor.predict(sentence="Did Uriah honestly think he could beat the game in under three hours?")

UniLM:

from unilm import UniLMTokenizer, UniLMForConditionalGeneration

tokenizer = UniLMTokenizer.from_pretrained("microsoft/unilm-base-cased")
model = UniLMForConditionalGeneration.from_pretrained("microsoft/unilm-base-cased")
input_ids = tokenizer.encode("summarize: " + text, return_tensors="pt")
summary_ids = model.generate(input_ids, max_length=50, num_beams=5, no_repeat_ngram_size=2)

Both repositories offer powerful NLP tools, but AllenNLP provides a more comprehensive framework for research and experimentation, while UniLM focuses on unified pre-training and multi-task learning. AllenNLP's code is more verbose but offers greater flexibility, while UniLM's code is more concise and easier to use for specific tasks like summarization.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

aka.ms/GeneralAI

Hiring

We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on Foundation Models (aka large-scale pre-trained models) and General AI, NLP, MT, Speech, Document AI and Multimodal AI, please send your resume to fuwei@microsoft.com.

Foundation Architecture

TorchScale - A Library of Foundation Architectures (repo)

Fundamental research to develop new architectures for foundation models and AI, focusing on modeling generality and capability, as well as training stability and efficiency.

Stability - DeepNet: scaling Transformers to 1,000 Layers and beyond

Generality - Foundation Transformers (Magneto): towards true general-purpose modeling across tasks and modalities (including language, vision, speech, and multimodal)

Capability - A Length-Extrapolatable Transformer

Efficiency & Transferability - X-MoE: scalable & finetunable sparse Mixture-of-Experts (MoE)

The Revolution of Model Architecture

BitNet: 1-bit Transformers for Large Language Models

RetNet: Retentive Network: A Successor to Transformer for Large Language Models

LongNet: Scaling Transformers to 1,000,000,000 Tokens

Foundation Models

The Evolution of (M)LLM (Multimodal LLM)

Kosmos-2.5: A Multimodal Literate Model

Kosmos-2: Grounding Multimodal Large Language Models to the World

Kosmos-1: A Multimodal Large Language Model (MLLM)

MetaLM: Language Models are General-Purpose Interfaces

The Big Convergence - Large-scale self-supervised pre-training across tasks (predictive and generative), languages (100+ languages), and modalities (language, image, audio, layout/format + language, vision + language, audio + language, etc.)

Language & Multilingual

UniLM: unified pre-training for language understanding and generation

InfoXLM/XLM-E: multilingual/cross-lingual pre-trained models for 100+ languages

DeltaLM/mT6: encoder-decoder pre-training for language generation and translation for 100+ languages

MiniLM: small and fast pre-trained models for language understanding and generation

AdaLM: domain, language, and task adaptation of pre-trained models

EdgeLM(NEW): small pre-trained models on edge/client devices

SimLM (NEW): large-scale pre-training for similarity matching

E5 (NEW): text embeddings

MiniLLM (NEW): Knowledge Distillation of Large Language Models

Vision

BEiT/BEiT-2: generative self-supervised pre-training for vision / BERT Pre-Training of Image Transformers

DiT: self-supervised pre-training for Document Image Transformers

TextDiffuser/TextDiffuser-2 (NEW): Diffusion Models as Text Painters

Speech

WavLM: speech pre-training for full stack tasks

VALL-E: a neural codec language model for TTS

Multimodal (X + Language)

LayoutLM/LayoutLMv2/LayoutLMv3: multimodal (text + layout/format + image) Document Foundation Model for Document AI (e.g. scanned documents, PDF, etc.)

LayoutXLM: multimodal (text + layout/format + image) Document Foundation Model for multilingual Document AI

MarkupLM: markup language model pre-training for visually-rich document understanding

XDoc: unified pre-training for cross-format document understanding

UniSpeech: unified pre-training for self-supervised learning and supervised learning for ASR

UniSpeech-SAT: universal speech representation learning with speaker-aware pre-training

SpeechT5: encoder-decoder pre-training for spoken language processing

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

VLMo: Unified vision-language pre-training

VL-BEiT (NEW): Generative Vision-Language Pre-training - evolution of BEiT to multimodal

BEiT-3 (NEW): a general-purpose multimodal foundation model, and a major milestone of The Big Convergence of Large-scale Pre-training Across Tasks, Languages, and Modalities.

Toolkits

s2s-ft: sequence-to-sequence fine-tuning toolkit

Aggressive Decoding (NEW): lossless and efficient sequence-to-sequence decoding algorithm

Applications

TrOCR: transformer-based OCR w/ pre-trained models

LayoutReader: pre-training of text and layout for reading order detection

XLM-T: multilingual NMT w/ pretrained cross-lingual encoders

News

December, 2024: RedStone was released!
December, 2023: LongNet and LongViT released
[Model Release] Dec, 2023: TextDiffuser-2 models, code and demo.
Sep, 2023: Kosmos-2.5 - a multimodal literate model for machine reading of text-intensive images.
[Model Release] May, 2023: TextDiffuser models and code.
[Model Release] March, 2023: BEiT-3 pretrained models and code.
March, 2023: Kosmos-1 - a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot).
January, 2023: VALL-E a language modeling approach for text to speech synthesis (TTS), which achieves state-of-the-art zero-shot TTS performance. See https://aka.ms/valle for demos of our work.
[Model Release] January, 2023: E5 - Text Embeddings by Weakly-Supervised Contrastive Pre-training.
November, 2022: TorchScale 0.1.1 was released!
November, 2022: TrOCR was accepted by AAAI 2023.
[Model Release] November, 2022: XDoc BASE models for cross-format document understanding.
[Model Release] September, 2022: TrOCR BASE and LARGE models for Scene Text Recognition (STR).
[Model Release] September, 2022: BEiT v2 code and pretrained models.
August, 2022: BEiT-3 - a general-purpose multimodal foundation model, which achieves state-of-the-art transfer performance on both vision and vision-language tasks
July, 2022: SimLM - Large-scale self-supervised pre-training for similarity matching
June, 2022: DiT and LayoutLMv3 were accepted by ACM Multimedia 2022.
June, 2022: MetaLM - Language models are general-purpose interfaces to foundation models (language/multilingual, vision, speech, and multimodal)
June, 2022: VL-BEiT - bidirectional multimodal Transformer learned from scratch with one unified pretraining task, one shared backbone, and one-stage training, supporting both vision and vision-language tasks.
[Model Release] June, 2022: LayoutLMv3 Chinese - Chinese version of LayoutLMv3
[Code Release] May, 2022: Aggressive Decoding - Lossless Speedup for Seq2seq Generation
April, 2022: Transformers at Scale = DeepNet + X-MoE
[Model Release] April, 2022: LayoutLMv3 - Pre-training for Document AI with Unified Text and Image Masking
[Model Release] March, 2022: EdgeFormer - Parameter-efficient Transformer for On-device Seq2seq Generation
[Model Release] March, 2022: DiT - Self-supervised Document Image Transformer. Demos: Document Layout Analysis, Document Image Classification
January, 2022: BEiT was accepted by ICLR 2022 as Oral presentation (54 out of 3391).
[Model Release] December 16th, 2021: TrOCR small models for handwritten and printed texts, with 3x inference speedup.
November 24th, 2021: VLMo as the new SOTA on the VQA Challenge
November, 2021: Multilingual translation at scale: 10000 language pairs and beyond
[Model Release] November, 2021: MarkupLM - Pre-training for text and markup language (e.g. HTML/XML)
[Model Release] November, 2021: VLMo - Unified vision-language pre-training w/ BEiT
October, 2021: WavLM Large achieves state-of-the-art performance on the SUPERB benchmark
[Model Release] October, 2021: WavLM - Large-scale self-supervised pre-trained models for speech.
[Model Release] October 2021: TrOCR is on HuggingFace
September 28th, 2021: T-ULRv5 (aka XLM-E/InfoXLM) as the SOTA on the XTREME leaderboard. // Blog
[Model Release] September, 2021: LayoutLM-cased are on HuggingFace
[Model Release] September, 2021: TrOCR - Transformer-based OCR w/ pre-trained BEiT and RoBERTa models.
August 2021: LayoutLMv2 and LayoutXLM are on HuggingFace
[Model Release] August, 2021: LayoutReader - Built with LayoutLM to improve general reading order detection.
[Model Release] August, 2021: DeltaLM - Encoder-decoder pre-training for language generation and translation.
August 2021: BEiT is on HuggingFace
[Model Release] July, 2021: BEiT - Towards BERT moment for CV
[Model Release] June, 2021: LayoutLMv2, LayoutXLM, MiniLMv2, and AdaLM.
May, 2021: LayoutLMv2, InfoXLMv2, MiniLMv2, UniLMv3, and AdaLM were accepted by ACL 2021.
April, 2021: LayoutXLM is coming by extending the LayoutLM into multilingual support! A multilingual form understanding benchmark XFUND is also introduced, which includes forms with human labeled key-value pairs in 7 languages (Chinese, Japanese, Spanish, French, Italian, German, Portuguese).
March, 2021: InfoXLM was accepted by NAACL 2021.
December 29th, 2020: LayoutLMv2 is coming with the new SOTA on a wide variety of document AI tasks, including DocVQA and SROIE leaderboard.
October 8th, 2020: T-ULRv2 (aka InfoXLM) as the SOTA on the XTREME leaderboard. // Blog
September, 2020: MiniLM was accepted by NeurIPS 2020.
July 16, 2020: InfoXLM (Multilingual UniLM) arXiv
June, 2020: UniLMv2 was accepted by ICML 2020; LayoutLM was accepted by KDD 2020.
April 5, 2020: Multilingual MiniLM released!
September, 2019: UniLMv1 was accepted by NeurIPS 2019.

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the transformers project.

Microsoft Open Source Code of Conduct

Contact Information

For help or issues using the pre-trained models, please submit a GitHub issue.

For other communications, please contact Furu Wei (fuwei@microsoft.com).

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of Transformers

Cons of Transformers

Code Comparison

Pros of BERT

Cons of BERT

Code Comparison

Pros of GPT-3

Cons of GPT-3

Code Comparison

Key Differences

Pros of fairseq

Cons of fairseq

Code Comparison

Pros of AllenNLP

Cons of AllenNLP

Code Comparison

Convert designs to code with AI

README

Hiring

Foundation Architecture

TorchScale - A Library of Foundation Architectures (repo)

The Revolution of Model Architecture

Foundation Models

The Evolution of (M)LLM (Multimodal LLM)

Language & Multilingual

Vision

Speech

Multimodal (X + Language)

Toolkits

Applications

Links

LLMOps (repo)

RedStone (repo)

News

License

Contact Information

Top Related Projects

Convert designs to code with AI