gpt-3

GPT-3: Language Models are Few-Shot Learners

15,769

2,306

15,769

View on GitHub

Top Related Projects

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

gpt-neox

7,276

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Quick Overview

The openai/gpt-3 repository on GitHub is not an official repository for the GPT-3 language model. OpenAI does not publicly share the source code for GPT-3. Instead, they provide access to GPT-3 through their API. This repository is likely a community-created project related to GPT-3 or its applications.

Pros

Provides a centralized location for community-driven GPT-3 related projects
May offer useful tools or resources for developers working with the GPT-3 API
Could serve as a hub for sharing knowledge and best practices around GPT-3 usage

Cons

Not an official OpenAI repository, which may lead to confusion
May contain outdated or unofficial information about GPT-3
Lacks the actual GPT-3 model or its source code
Could potentially spread misinformation if not properly maintained or vetted

As this is not an official code library for GPT-3, we'll skip the code examples and getting started instructions sections. For official information and access to GPT-3, developers should refer to the OpenAI API documentation and guidelines provided by OpenAI directly.

Competitor Comparisons

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

Open-source and freely available for use and modification
Supports a wide range of models and architectures beyond just GPT
Active community development and frequent updates

Cons of transformers

Requires more technical expertise to implement and fine-tune models
May have lower performance compared to GPT-3 for some tasks
Limited access to the largest pre-trained models

Code comparison

transformers:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
input_ids = tokenizer.encode("Hello, I'm a language model,", return_tensors="pt")
output = model.generate(input_ids, max_length=50)

GPT-3 (via OpenAI API):

import openai

openai.api_key = "your-api-key"
response = openai.Completion.create(
  engine="text-davinci-002",
  prompt="Hello, I'm a language model,",
  max_tokens=50
)

Note: GPT-3 is not open-source, so direct code comparison is limited to API usage.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

Open-source and freely available for use and modification
Offers advanced optimization techniques for training large language models
Provides extensive documentation and examples for implementation

Cons of DeepSpeed

Requires more technical expertise to implement and optimize
May have a steeper learning curve for newcomers to deep learning

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(
    args=args,
    model=model,
    model_parameters=params
)

GPT-3:

import openai
openai.api_key = "your-api-key"
response = openai.Completion.create(
    engine="davinci",
    prompt="Your prompt here"
)

Summary

DeepSpeed is an open-source library focused on optimizing and scaling deep learning training, particularly for large language models. It offers advanced techniques for efficient training but requires more technical expertise. GPT-3, on the other hand, is a proprietary API-based service that provides easy access to a pre-trained language model, but with less flexibility for customization and optimization. The choice between the two depends on the specific needs of the project, available resources, and the level of control desired over the model training process.

gpt-neox

7,276

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Pros of GPT-NeoX

Open-source and freely available for research and development
Allows for customization and fine-tuning on specific tasks or domains
Provides transparency in model architecture and training process

Cons of GPT-NeoX

Generally smaller in scale and may have lower performance compared to GPT-3
Limited by the computational resources available to the open-source community
May lack some of the advanced features and optimizations of GPT-3

Code Comparison

GPT-NeoX:

from transformers import GPTNeoXForCausalLM, GPTNeoXTokenizerFast

model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b")
tokenizer = GPTNeoXTokenizerFast.from_pretrained("EleutherAI/gpt-neox-20b")

GPT-3:

import openai

openai.api_key = "your-api-key"
response = openai.Completion.create(engine="text-davinci-002", prompt="Hello, world!")

Note: The code comparison highlights the difference in usage. GPT-NeoX can be used locally with the Transformers library, while GPT-3 requires API access through OpenAI's platform.

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

Open-source and freely available for research and commercial use
Supports a wide range of sequence-to-sequence tasks beyond language modeling
Actively maintained with regular updates and contributions from the community

Cons of fairseq

Generally requires more technical expertise to use and implement
May have lower performance on some language tasks compared to GPT-3
Less extensive documentation and tutorials available

Code Comparison

fairseq:

from fairseq.models.transformer import TransformerModel

model = TransformerModel.from_pretrained('/path/to/model', checkpoint_file='model.pt')
tokens = model.encode('Hello, world!')
output = model.generate(tokens, beam=5)[0]['tokens']

GPT-3 (using OpenAI API):

import openai

openai.api_key = 'your-api-key'
response = openai.Completion.create(
  engine="text-davinci-002",
  prompt="Hello, world!",
  max_tokens=50
)

Note: The GPT-3 repository doesn't contain the model code, as it's accessed through an API. The fairseq example shows local model usage, while the GPT-3 example demonstrates API interaction.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

GPT-3: Language Models are Few-Shot Learners

arXiv link

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions â something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

175b_samples.jsonl - Unconditional, unfiltered 2048 token samples from GPT-3 with p=.85, t=1.　 CONTENT WARNING: GPT-3 was trained on arbitrary data from the web, so may contain offensive content and language.
data - Synthetic datasets for word scramble and arithmetic tasks described in the paper.
dataset_statistics - Statistics for all languages included in the training dataset mix.
overlap_frequency.md - Samples of 13-gram overlaps between our training data and benchmarks, selected by frequency in the training set.
model-card.md - GPT-3 Model Card.

How to cite

@article{brown2020language,
    title={Language Models are Few-Shot Learners},
    author={Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert-Voss and Gretchen Krueger and Tom Henighan and Rewon Child and Aditya Ramesh and Daniel M. Ziegler and Jeffrey Wu and Clemens Winter and Christopher Hesse and Mark Chen and Eric Sigler and Mateusz Litwin and Scott Gray and Benjamin Chess and Jack Clark and Christopher Berner and Sam McCandlish and Alec Radford and Ilya Sutskever and Dario Amodei},
    year={2020},
    eprint={2005.14165},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Menu

gpt-3

Top Related Projects

transformers

DeepSpeed

gpt-neox

fairseq

Quick Overview

Pros

Cons

Competitor Comparisons

transformers

Pros of transformers

Cons of transformers

Code comparison

DeepSpeed

Pros of DeepSpeed

Cons of DeepSpeed

Code Comparison

Summary

gpt-neox

Pros of GPT-NeoX

Cons of GPT-NeoX

Code Comparison

fairseq

Pros of fairseq

Cons of fairseq

Code Comparison

Convert designs to code with AI

README

GPT-3: Language Models are Few-Shot Learners

Contents

How to cite

Top Related Projects

transformers

DeepSpeed

gpt-neox

fairseq

Convert designs to code with AI