Top Related Projects
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
TensorFlow code and pre-trained models for BERT
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Quick Overview
OLMo (Open Language Model) is an open-source language model and toolkit developed by AI2 (Allen Institute for AI). It aims to provide a fully open, reproducible, and customizable foundation for large language models, including pre-training, fine-tuning, and inference capabilities.
Pros
- Fully open-source, allowing for transparency and reproducibility in language model research
- Supports both pre-training and fine-tuning, enabling customization for specific tasks
- Includes a comprehensive toolkit for model development and experimentation
- Designed with scalability in mind, supporting distributed training across multiple GPUs
Cons
- Relatively new project, which may lead to potential instability or lack of extensive community support
- Requires significant computational resources for pre-training and fine-tuning large models
- Documentation may be less comprehensive compared to more established language model frameworks
- Limited pre-trained model options compared to some commercial alternatives
Code Examples
- Loading a pre-trained OLMo model:
from olmo import OLMoForCausalLM, OLMoTokenizer
model = OLMoForCausalLM.from_pretrained("allenai/OLMo-7B")
tokenizer = OLMoTokenizer.from_pretrained("allenai/OLMo-7B")
- Generating text with OLMo:
prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
- Fine-tuning OLMo on a custom dataset:
from olmo import OLMoForCausalLM, Trainer, TrainingArguments
model = OLMoForCausalLM.from_pretrained("allenai/OLMo-7B")
trainer = Trainer(
model=model,
args=TrainingArguments(output_dir="./olmo-finetuned", num_train_epochs=3),
train_dataset=your_custom_dataset,
)
trainer.train()
Getting Started
To get started with OLMo, follow these steps:
-
Install OLMo using pip:
pip install olmo
-
Load a pre-trained model and tokenizer:
from olmo import OLMoForCausalLM, OLMoTokenizer model = OLMoForCausalLM.from_pretrained("allenai/OLMo-7B") tokenizer = OLMoTokenizer.from_pretrained("allenai/OLMo-7B")
-
Generate text:
prompt = "Hello, world!" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=50) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text)
For more advanced usage, including pre-training and fine-tuning, refer to the official documentation and examples in the OLMo repository.
Competitor Comparisons
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Pros of GPT-NeoX
- More extensive documentation and examples for training and fine-tuning
- Broader community support and contributions
- Designed for distributed training across multiple GPUs
Cons of GPT-NeoX
- Higher computational requirements for training
- Less focus on interpretability and analysis tools
- More complex setup process for beginners
Code Comparison
OLMo:
from olmo import OLMo
model = OLMo.from_pretrained("olmo-1b")
output = model.generate("Hello, world!")
GPT-NeoX:
from gpt_neox import GPTNeoX
model = GPTNeoX.from_pretrained("gpt-neox-20b")
output = model.generate("Hello, world!")
Both repositories provide similar high-level APIs for loading and using pre-trained models. However, GPT-NeoX offers more advanced features for distributed training and customization, while OLMo focuses on simplicity and ease of use for researchers and developers.
OLMo emphasizes interpretability and analysis tools, making it more suitable for research-oriented tasks. GPT-NeoX, on the other hand, is designed for large-scale training and deployment, making it a better choice for production environments and projects requiring significant computational resources.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of transformers
- Extensive model support: Includes a wide range of pre-trained models and architectures
- Active community: Large user base and frequent updates
- Comprehensive documentation: Detailed guides and examples for various tasks
Cons of transformers
- Complexity: Can be overwhelming for beginners due to its extensive features
- Resource intensive: Some models require significant computational resources
Code comparison
OLMo
from olmo import OLMoTokenizer, OLMoForCausalLM
tokenizer = OLMoTokenizer.from_pretrained("allenai/olmo-7b")
model = OLMoForCausalLM.from_pretrained("allenai/olmo-7b")
transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
Key differences
- OLMo focuses specifically on the OLMo model, while transformers supports a wide range of models
- transformers uses a more generalized
Auto
class for model and tokenizer loading - OLMo's API is tailored for its specific architecture, while transformers provides a unified interface for various models
TensorFlow code and pre-trained models for BERT
Pros of BERT
- Well-established and widely adopted in the NLP community
- Extensive documentation and pre-trained models available
- Proven performance on various NLP tasks
Cons of BERT
- Older architecture compared to more recent language models
- Limited context window size (typically 512 tokens)
- Requires fine-tuning for specific tasks
Code Comparison
BERT example:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
OLMo example:
from olmo import OLMoTokenizer, OLMoForCausalLM
tokenizer = OLMoTokenizer.from_pretrained("allenai/OLMo-7B")
model = OLMoForCausalLM.from_pretrained("allenai/OLMo-7B")
Key Differences
- OLMo is a more recent model with potential for improved performance
- BERT uses bidirectional training, while OLMo is a unidirectional (left-to-right) model
- OLMo is designed for open-ended text generation, while BERT excels in understanding context
Use Cases
- BERT: Sentiment analysis, named entity recognition, question answering
- OLMo: Text generation, language modeling, conversational AI
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- More mature and widely adopted, with extensive documentation and community support
- Offers a broader range of optimization techniques and training acceleration methods
- Supports multiple deep learning frameworks, including PyTorch and TensorFlow
Cons of DeepSpeed
- More complex setup and configuration process
- Steeper learning curve for beginners
- May require more fine-tuning to achieve optimal performance
Code Comparison
OLMo:
from olmo import OLMo
model = OLMo.from_pretrained("allenai/olmo-7b")
output = model.generate("The capital of France is")
DeepSpeed:
import deepspeed
import torch
model, optimizer, _, _ = deepspeed.initialize(args=args,
model=model,
model_parameters=params)
output = model(input_ids)
Key Differences
- OLMo focuses on providing a simple API for large language models, while DeepSpeed offers a comprehensive suite of optimization tools
- DeepSpeed is framework-agnostic, whereas OLMo is primarily designed for PyTorch
- OLMo emphasizes ease of use for specific language tasks, while DeepSpeed aims to optimize general deep learning workloads
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Pros of fairseq
- More established and mature project with a larger community
- Supports a wider range of NLP tasks and models
- Extensive documentation and examples
Cons of fairseq
- Larger codebase, potentially more complex to navigate
- May have more dependencies and setup requirements
- Less focused on specific language model architectures
Code Comparison
OLMo example:
from olmo import OLMo
model = OLMo.from_pretrained("olmo-1b")
output = model.generate("The quick brown fox")
fairseq example:
from fairseq.models.transformer_lm import TransformerLanguageModel
model = TransformerLanguageModel.from_pretrained("transformer_lm.gpt2.large")
output = model.generate("The quick brown fox", beam=5, sampling=True)
Both repositories provide high-level APIs for loading and using pre-trained models. OLMo appears to have a more streamlined interface specifically for language models, while fairseq offers more flexibility and options for various NLP tasks.
fairseq's codebase is more extensive, covering a broader range of models and tasks. OLMo, being more focused on large language models, may have a simpler structure for those specifically interested in LLMs.
Overall, the choice between these repositories depends on the specific requirements of the project and the desired balance between flexibility and specialization in language modeling.
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Pros of Petals
- Focuses on distributed inference, allowing users to run large language models collaboratively
- Supports a wider range of models, including BLOOM and LLaMA
- Offers a unique approach to democratizing access to large language models
Cons of Petals
- Less emphasis on model training and fine-tuning compared to OLMo
- May have higher latency due to its distributed nature
- Potentially more complex setup for individual users
Code Comparison
OLMo:
from olmo import OLMo
model = OLMo.from_pretrained("allenai/olmo-7b")
output = model.generate("Hello, world!")
Petals:
from petals import AutoDistributedModelForCausalLM
model = AutoDistributedModelForCausalLM.from_pretrained("bigscience/bloom")
output = model.generate("Hello, world!")
Both repositories provide easy-to-use interfaces for working with large language models. OLMo focuses on a specific model and offers more control over training and fine-tuning, while Petals emphasizes distributed inference across a network of contributors. The code examples show similar usage patterns, but Petals' approach is geared towards distributed computing.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
OLMo: Open Language Model
OLMo is a repository for training and using AI2's state-of-the-art open language models. It is built by scientists, for scientists.
Installation
First install PyTorch according to the instructions specific to your operating system.
To install from source (recommended for training/fine-tuning) run:
git clone https://github.com/allenai/OLMo.git
cd OLMo
pip install -e .[all]
Otherwise you can install the model code by itself directly from PyPI with:
pip install ai2-olmo
Models
Overview
The core models in the OLMo family released so far are (all trained on the Dolma dataset):
Model | Training Tokens | Context Length | Training Config | W&B Logs | Data Order File(s) ⨠|
---|---|---|---|---|---|
OLMo 1B | 3 Trillion | 2048 | configs/official/OLMo-1B.yaml | wandb.ai/â¦/OLMo-1B | epoch 1 |
OLMo 7B | 2.5 Trillion | 2048 | configs/official/OLMo-7B.yaml | wandb.ai/â¦/OLMo-7B | epoch 1, epoch 2 |
OLMo 7B Twin 2T | 2 Trillion | 2048 | configs/official/OLMo-7B.yaml | wandb.ai/â¦/OLMo-7B-Twin-2T | epoch 1 |
OLMo 7B April 2024 | 2.05 Trillion | 4096 | configs/official/OLMo-7B-0424.yaml | Coming soon | Coming soon |
OLMo 7B July 2024 | 2.75 Trillion | 4096 | configs/official/OLMo-7B-0724.yaml | Coming soon | Coming soon |
⨠See Inspecting training data below for usage.
Checkpoints
URLs to checkpoints at intermediate steps of the models' trainings can be found in the csv files under checkpoints/official/
. These 'directory' URLs cannot currently be directly accessed, but files within the directory are publicly accessible. These URLs can also be provided to the training script to resume training from the checkpoint (see Training). Each checkpoint directory consists of:
config.yaml
: the config at that training step.model.pt
,optim.pt
,train.pt
: model, optimizer and training state at that training step.
Details about the other types of OLMo checkpoints (including OLMo HF Transformers checkpoints) can be found in Checkpoints.md.
Inference
You can utilize our Hugging Face integration to run inference on the OLMo Transformers checkpoints:
from transformers import AutoModelForCausalLM, AutoTokenizer
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-0724-hf")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B-0724-hf")
message = ["Language modeling is "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
Alternatively, with the Hugging Face pipeline abstraction:
from transformers import pipeline
olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B-0724-hf")
print(olmo_pipe("Language modeling is"))
Inference on finetuned checkpoints
If you finetune the model using the code in Fine-tuning, you can use the conversion script to convert a native OLMo checkpoint to a Hugging Face-compatible checkpoint.
python scripts/convert_olmo_to_hf_new.py --input_dir /path/to/olmo/checkpoint --output_dir /path/to/hf/checkpoint/ --tokenizer_json_path tokenizers/allenai_gpt-neox-olmo-dolma-v1_5.json
Quantization
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-0724-hf", torch_dtype=torch.float16, load_in_8bit=True) # requires bitsandbytes
The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as inputs.input_ids.to('cuda') to avoid potential issues.
Reproducibility
Training
The configs used to train the official OLMo models are provided in the configs/official/
directory.
Note that while the training and validation data is public and free to download, the paths to the data within those configs are pointed at a CloudFlare R2 bucket, which requires an API key for programmatic access. So in order to use any of these configs to reproduce a training run you'll first have to download the corresponding data to a location of your choosing and then update the paths in the config accordingly.
You can derive the public HTTP URL from an R2 URL by replacing r2://olmo-data
with https://olmo-data.org
.
For example, if the R2 data URL is:
r2://olmo-data/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-000-00000.npy
then the corresponding public URL is:
https://olmo-data.org/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-000-00000.npy
Once you've updated the data paths in the config you can launch a training run via torchrun
. For example, to launch the 1B model training on a single 8x GPU node, you would run:
torchrun --nproc_per_node=8 scripts/train.py configs/official/OLMo-1B.yaml
You can use the same method to launch multi-node jobs as well. See the documentation for torchrun
to understand the additional arguments you'll need to configure the rendezvous backend / endpoint.
To resume training from a checkpoint, you can pass its path (local or URL)
to scripts/train.py
with the --load_path
arguments. For example, to resume training from step 1000 of the OLMo 1B run:
torchrun --nproc_per_node=8 scripts/train.py configs/official/OLMo-1B.yaml --load_path=https://olmo-checkpoints.org/ai2-llm/olmo-small/w1r5xfzt/step1000-unsharded
Inspecting training data
You may be interested in inspecting the exact tokens that composed a particular batch during the training of one of the OLMo models. We provide tools to do this, but first you'll need to download the data as above (unless you have an R2 API key) and update the corresponding config accordingly.
Then take note of the URL of the data order file you want, which can be found in the Models Overview table. For example, the data order file for the first epoch of the OLMo-7B model is https://olmo-checkpoints.org/ai2-llm/olmo-medium/wvc30anm/train_data/global_indices.npy.
Once you have that you can use this snippet to inspect the data within a particular batch:
import numpy as np
from cached_path import cached_path
from olmo.config import TrainConfig
from olmo.data import build_memmap_dataset
# Update these paths to what you want:
data_order_file_path = cached_path("https://olmo-checkpoints.org/ai2-llm/olmo-medium/wvc30anm/train_data/global_indices.npy")
train_config_path = "configs/official/OLMo-7B.yaml"
cfg = TrainConfig.load(train_config_path)
dataset = build_memmap_dataset(cfg, cfg.data)
batch_size = cfg.global_train_batch_size
global_indices = np.memmap(data_order_file_path, mode="r+", dtype=np.uint32)
def get_batch_instances(batch_idx: int) -> list[list[int]]:
batch_start = batch_idx * batch_size
batch_end = (batch_idx + 1) * batch_size
batch_indices = global_indices[batch_start:batch_end]
batch_instances = []
for index in batch_indices:
token_ids = dataset[index]["input_ids"].tolist()
batch_instances.append(token_ids)
return batch_instances
# Get all 2048 x 2048 token IDs in the first batch.
get_batch_instances(0)
Fine-tuning
To fine-tune an OLMo model using our trainer you'll first need to prepare your dataset by tokenizing it and saving the tokens IDs to a flat numpy memory-mapped array. See scripts/prepare_tulu_data.py
for an example with the Tulu V2 dataset, which can be easily modified for other datasets.
Next, prepare your training config. There are many examples in the configs/
directory that you can use as a starting point. The most important thing is to make sure the model parameters (the model
field in the config) match up with the checkpoint you're starting from. To be safe you can always start from the config that comes with the model checkpoint. At a minimum you'll need to make the following changes to the config or provide the corresponding overrides from the command line:
- Update
load_path
to point to the checkpoint you want to start from. - Set
reset_trainer_state
totrue
. - Update
data.paths
to point to thetoken_ids.npy
file you generated. - Optionally update
data.label_mask_paths
to point to thelabel_mask.npy
file you generated, unless you don't need special masking for the loss. - Update
evaluators
to add/remove in-loop evaluations.
Once you're satisfied with your training config, you can launch the training job via torchrun
. For example:
torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
--data.paths=[{path_to_data}/input_ids.npy] \
--data.label_mask_paths=[{path_to_data}/label_mask.npy] \
--load_path={path_to_checkpoint} \
--reset_trainer_state
Note: passing CLI overrides like --reset_trainer_state
is only necessary if you didn't update those fields in your config.
Evaluation
Additional tools for evaluating OLMo models are available at the OLMo Eval repo.
Debugging
See Debugging.
Citing
@article{OLMo,
title={OLMo: Accelerating the Science of Language Models},
author={Dirk Groeneveld and Iz Beltagy and Pete Walsh and Akshita Bhagia and Rodney Kinney and Oyvind Tafjord and A. Jha and Hamish Ivison and Ian Magnusson and Yizhong Wang and Shane Arora and David Atkinson and Russell Authur and Khyathi Raghavi Chandu and Arman Cohan and Jennifer Dumas and Yanai Elazar and Yuling Gu and Jack Hessel and Tushar Khot and William Merrill and Jacob Daniel Morrison and Niklas Muennighoff and Aakanksha Naik and Crystal Nam and Matthew E. Peters and Valentina Pyatkin and Abhilasha Ravichander and Dustin Schwenk and Saurabh Shah and Will Smith and Emma Strubell and Nishant Subramani and Mitchell Wortsman and Pradeep Dasigi and Nathan Lambert and Kyle Richardson and Luke Zettlemoyer and Jesse Dodge and Kyle Lo and Luca Soldaini and Noah A. Smith and Hanna Hajishirzi},
year={2024},
url={https://api.semanticscholar.org/CorpusID:267365485},
journal={arXiv preprint},
}
Top Related Projects
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
TensorFlow code and pre-trained models for BERT
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot