Convert Figma logo to code with AI

openai logogpt-3

GPT-3: Language Models are Few-Shot Learners

15,672
2,298
15,672
3

Top Related Projects

๐Ÿค— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

35,868

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

30,331

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Quick Overview

The openai/gpt-3 repository on GitHub is not an official repository for the GPT-3 language model. OpenAI does not publicly share the source code for GPT-3. Instead, they provide access to GPT-3 through their API. This repository is likely a community-created project related to GPT-3 or its applications.

Pros

  • Provides a centralized location for community-driven GPT-3 related projects
  • May offer useful tools or resources for developers working with the GPT-3 API
  • Could serve as a hub for sharing knowledge and best practices around GPT-3 usage

Cons

  • Not an official OpenAI repository, which may lead to confusion
  • May contain outdated or unofficial information about GPT-3
  • Lacks the actual GPT-3 model or its source code
  • Could potentially spread misinformation if not properly maintained or vetted

As this is not an official code library for GPT-3, we'll skip the code examples and getting started instructions sections. For official information and access to GPT-3, developers should refer to the OpenAI API documentation and guidelines provided by OpenAI directly.

Competitor Comparisons

๐Ÿค— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pros of transformers

  • Open-source and freely available for use and modification
  • Supports a wide range of models and architectures beyond just GPT
  • Active community development and frequent updates

Cons of transformers

  • Requires more technical expertise to implement and fine-tune models
  • May have lower performance compared to GPT-3 for some tasks
  • Limited access to the largest pre-trained models

Code comparison

transformers:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
input_ids = tokenizer.encode("Hello, I'm a language model,", return_tensors="pt")
output = model.generate(input_ids, max_length=50)

GPT-3 (via OpenAI API):

import openai

openai.api_key = "your-api-key"
response = openai.Completion.create(
  engine="text-davinci-002",
  prompt="Hello, I'm a language model,",
  max_tokens=50
)

Note: GPT-3 is not open-source, so direct code comparison is limited to API usage.

35,868

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

  • Open-source and freely available for use and modification
  • Offers advanced optimization techniques for training large models
  • Supports a wider range of hardware configurations

Cons of DeepSpeed

  • Requires more setup and configuration compared to GPT-3
  • May have a steeper learning curve for beginners
  • Less extensive pre-training on diverse datasets

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(args=args,
                                                     model=model,
                                                     model_parameters=params)

GPT-3:

import openai
openai.api_key = "your-api-key"
response = openai.Completion.create(engine="davinci", prompt="Hello, world!")

DeepSpeed focuses on optimizing model training, while GPT-3 provides a simple API for generating text completions. DeepSpeed requires more hands-on implementation but offers greater flexibility, whereas GPT-3 is easier to use out-of-the-box but with less customization options.

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Pros of GPT-NeoX

  • Open-source and freely available for research and development
  • Allows for customization and fine-tuning on specific tasks or domains
  • Provides transparency in model architecture and training process

Cons of GPT-NeoX

  • Generally smaller in scale and may have lower performance compared to GPT-3
  • Limited by the computational resources available to the open-source community
  • May lack some of the advanced features and optimizations of GPT-3

Code Comparison

GPT-NeoX:

from transformers import GPTNeoXForCausalLM, GPTNeoXTokenizerFast

model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b")
tokenizer = GPTNeoXTokenizerFast.from_pretrained("EleutherAI/gpt-neox-20b")

GPT-3:

import openai

openai.api_key = "your-api-key"
response = openai.Completion.create(engine="text-davinci-002", prompt="Hello, world!")

Note: The code comparison highlights the difference in usage. GPT-NeoX can be used locally with the Transformers library, while GPT-3 requires API access through OpenAI's platform.

30,331

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

  • Open-source and freely available for research and commercial use
  • Supports a wide range of sequence-to-sequence tasks beyond language modeling
  • Actively maintained with regular updates and contributions from the community

Cons of fairseq

  • Generally requires more technical expertise to use and implement
  • May have lower performance on some language tasks compared to GPT-3
  • Less extensive documentation and tutorials available

Code Comparison

fairseq:

from fairseq.models.transformer import TransformerModel

model = TransformerModel.from_pretrained('/path/to/model', checkpoint_file='model.pt')
tokens = model.encode('Hello, world!')
output = model.generate(tokens, beam=5)[0]['tokens']

GPT-3 (using OpenAI API):

import openai

openai.api_key = 'your-api-key'
response = openai.Completion.create(
  engine="text-davinci-002",
  prompt="Hello, world!",
  max_tokens=50
)

Note: The GPT-3 repository doesn't contain the model code, as it's accessed through an API. The fairseq example shows local model usage, while the GPT-3 example demonstrates API interaction.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

GPT-3: Language Models are Few-Shot Learners

arXiv link

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions รขย€ย“ something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

Contents

  • 175b_samples.jsonl - Unconditional, unfiltered 2048 token samples from GPT-3 with p=.85, t=1.ใ€€ CONTENT WARNING: GPT-3 was trained on arbitrary data from the web, so may contain offensive content and language.
  • data - Synthetic datasets for word scramble and arithmetic tasks described in the paper.
  • dataset_statistics - Statistics for all languages included in the training dataset mix.
  • overlap_frequency.md - Samples of 13-gram overlaps between our training data and benchmarks, selected by frequency in the training set.
  • model-card.md - GPT-3 Model Card.

How to cite

@article{brown2020language,
    title={Language Models are Few-Shot Learners},
    author={Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert-Voss and Gretchen Krueger and Tom Henighan and Rewon Child and Aditya Ramesh and Daniel M. Ziegler and Jeffrey Wu and Clemens Winter and Christopher Hesse and Mark Chen and Eric Sigler and Mateusz Litwin and Scott Gray and Benjamin Chess and Jack Clark and Christopher Berner and Sam McCandlish and Alec Radford and Ilya Sutskever and Dario Amodei},
    year={2020},
    eprint={2005.14165},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}