llm-foundry

LLM training code for Databricks foundation models

4,273

568

4,273

View on GitHub

Top Related Projects

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

gpt-neox

7,276

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Megatron-LM

13,053

Ongoing research training transformer models at scale

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Quick Overview

LLM Foundry is an open-source repository by MosaicML that provides tools and recipes for training large language models (LLMs). It offers efficient implementations of popular model architectures, training techniques, and evaluation methods, allowing researchers and practitioners to easily train and fine-tune LLMs.

Pros

Comprehensive toolkit for training and evaluating LLMs
Optimized for efficiency and scalability
Supports various model architectures and training techniques
Regularly updated with the latest advancements in LLM research

Cons

Requires significant computational resources for training large models
May have a steep learning curve for beginners in LLM development
Documentation could be more extensive for some advanced features
Limited support for older hardware or non-standard setups

Code Examples

Defining a model configuration:

from llmfoundry import COMPOSER_MODEL_REGISTRY

cfg = {
    'name': 'mpt_7b',
    'init_device': 'meta',
    'precision': 'amp_bf16',
    'd_model': 4096,
    'n_heads': 32,
    'n_layers': 32,
    'expansion_ratio': 4,
    'max_seq_len': 2048,
    'vocab_size': 50368,
    'attn_config': {
        'attn_impl': 'flash',
        'alibi': True,
        'alibi_bias_max': 8,
    },
}

model = COMPOSER_MODEL_REGISTRY['mpt'](cfg)

Training a model:

from composer import Trainer
from composer.algorithms import LowPrecisionLayerNorm

trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    eval_dataloader=eval_dataloader,
    max_duration='1ep',
    algorithms=[LowPrecisionLayerNorm()],
    device='gpu',
    precision='amp_bf16',
)

trainer.fit()

Evaluating a model:

from llmfoundry.data import MosaicInContextLearningEvaluator

evaluator = MosaicInContextLearningEvaluator(
    model,
    device='gpu',
    precision='amp_bf16',
    max_seq_len=2048,
    tasks=['lambada', 'hellaswag', 'winogrande'],
)

results = evaluator.evaluate()
print(results)

Getting Started

To get started with LLM Foundry:

Install the library:

pip install llm-foundry

Import necessary modules:

from llmfoundry import COMPOSER_MODEL_REGISTRY
from composer import Trainer
from llmfoundry.data import MosaicInContextLearningEvaluator

Define your model configuration, create dataloaders, and set up training:

cfg = {...}  # Define your model configuration
model = COMPOSER_MODEL_REGISTRY['mpt'](cfg)
trainer = Trainer(model=model, ...)
trainer.fit()

Evaluate your model:

evaluator = MosaicInContextLearningEvaluator(model, ...)
results = evaluator.evaluate()

Competitor Comparisons

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

Extensive model support: Offers a wide range of pre-trained models and architectures
Large community and ecosystem: Well-established library with extensive documentation and community support
Easy integration: Seamlessly integrates with other popular machine learning libraries

Cons of transformers

Less focus on efficiency: May not be as optimized for large-scale training as llm-foundry
Steeper learning curve: Can be overwhelming for beginners due to its extensive features

Code Comparison

transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

llm-foundry:

from llmfoundry import ModelRegistry

model = ModelRegistry.load_model("gpt2")
tokenizer = ModelRegistry.load_tokenizer("gpt2")

Both repositories provide tools for working with large language models, but they have different focuses. transformers offers a comprehensive suite of pre-trained models and tools for various NLP tasks, while llm-foundry is more specialized in efficient training and deployment of large language models. The choice between them depends on the specific requirements of your project and your familiarity with each library.

gpt-neox

7,276

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Pros of gpt-neox

More established and widely used in the open-source LLM community
Extensive documentation and community support
Designed specifically for training large language models

Cons of gpt-neox

Less flexible for general ML tasks beyond language modeling
May require more computational resources for training
Limited integration with other ML frameworks

Code Comparison

gpt-neox:

from megatron.neox_arguments import NeoXArgs
from megatron.global_vars import set_global_variables, get_tokenizer
from megatron.neox_model import GPTNeoX

neox_args = NeoXArgs.from_pretrained("EleutherAI/gpt-neox-20b")
model = GPTNeoX(neox_args)

llm-foundry:

from composer import Trainer
from llmfoundry import MPTForCausalLM, MPTTokenizer

model = MPTForCausalLM.from_pretrained("mosaicml/mpt-7b")
tokenizer = MPTTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
trainer = Trainer(model=model, train_dataloader=train_dataloader)

Both repositories offer powerful tools for working with large language models, but they cater to slightly different use cases. gpt-neox is more specialized for training large language models, while llm-foundry provides a more flexible framework for various ML tasks, including but not limited to language modeling.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

More comprehensive optimization toolkit with a wider range of features for large-scale model training
Better integration with popular deep learning frameworks like PyTorch and TensorFlow
More extensive documentation and community support

Cons of DeepSpeed

Steeper learning curve due to its extensive feature set
May require more configuration and tuning to achieve optimal performance
Less focused on specific LLM training workflows compared to llm-foundry

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(
    args=args,
    model=model,
    model_parameters=params
)

llm-foundry:

from llmfoundry import create_composer_model
model = create_composer_model(
    model_name='gpt2',
    pretrained=False,
    max_seq_len=2048
)

DeepSpeed offers more low-level control and optimization options, while llm-foundry provides a higher-level API specifically tailored for LLM training. DeepSpeed's initialization process is more detailed, allowing for fine-grained control over various aspects of model training. In contrast, llm-foundry simplifies the process of creating and training language models with a more streamlined API.

Megatron-LM

13,053

Ongoing research training transformer models at scale

Pros of Megatron-LM

Highly optimized for NVIDIA GPUs, offering excellent performance on supported hardware
Extensive support for large-scale distributed training across multiple nodes
Implements advanced techniques like tensor parallelism and pipeline parallelism

Cons of Megatron-LM

Limited flexibility in model architectures and training configurations
Steeper learning curve due to its focus on high-performance, large-scale training
Less emphasis on ease of use and quick experimentation

Code Comparison

Megatron-LM:

model = get_language_model(args)
optimizer = get_optimizer(model, args)
lr_scheduler = get_learning_rate_scheduler(optimizer, args)

llm-foundry:

model = ComposerModel(model, tokenizer)
optimizer = DecoupledAdamW(model.parameters(), lr=lr, betas=betas)
trainer = Trainer(model=model, optimizers=optimizer, max_duration=max_duration)

The code snippets highlight the difference in approach:

Megatron-LM focuses on performance-oriented setup with custom functions
llm-foundry emphasizes simplicity and integration with the Composer library

Both repositories offer powerful tools for training large language models, but they cater to different use cases and levels of expertise. Megatron-LM is better suited for large-scale, high-performance training, while llm-foundry provides a more accessible and flexible approach for a wider range of users and experiments.

t5x

2,856

Pros of t5x

Built on JAX, offering efficient and scalable training on TPUs and GPUs
Extensive documentation and examples for various T5 model variants
Integrated with Google's research ecosystem, potentially benefiting from ongoing updates

Cons of t5x

More specialized focus on T5 models, potentially limiting flexibility for other architectures
Steeper learning curve for those unfamiliar with JAX and Google's research frameworks
Less emphasis on production deployment compared to llm-foundry

Code Comparison

t5x:

import jax
from t5x import models
from t5x import utils

model = models.EncoderDecoderModel(...)
trainer = utils.Trainer(model=model, ...)
trainer.train(...)

llm-foundry:

from llmfoundry import ModelRegistry, Trainer

model = ModelRegistry.get_model(...)
trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    eval_dataloader=eval_dataloader,
)
trainer.fit()

Both repositories offer tools for training large language models, but t5x focuses on T5 variants using JAX, while llm-foundry provides a more general-purpose framework with PyTorch integration. t5x may be preferred for T5-specific research, while llm-foundry offers broader model support and easier production deployment.

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

Broader scope: Supports a wide range of sequence-to-sequence tasks beyond just language models
Extensive documentation and examples for various NLP tasks
Longer development history and larger community support

Cons of fairseq

Less focused on large language model training and optimization
May require more setup and configuration for specific LLM tasks
Potentially steeper learning curve for users primarily interested in LLMs

Code Comparison

fairseq:

from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('/path/to/model', 'checkpoint.pt')
tokens = model.encode('Hello world!')
output = model.generate(tokens, beam=5)[0]

llm-foundry:

from llmfoundry import MPTForCausalLM
model = MPTForCausalLM.from_pretrained('mosaicml/mpt-7b')
output = model.generate("Hello world!", max_length=50)

The code comparison shows that fairseq requires more explicit steps for loading and using a model, while llm-foundry provides a more streamlined approach specifically tailored for large language models. This reflects the different focus areas of the two repositories, with fairseq offering more flexibility for various NLP tasks and llm-foundry prioritizing ease of use for LLM-specific applications.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

LLM Foundry

This repository contains code for training, finetuning, evaluating, and deploying LLMs for inference with Composer and the MosaicML platform. Designed to be easy-to-use, efficient and flexible, this codebase enables rapid experimentation with the latest techniques.

You'll find in this repo:

llmfoundry/ - source code for models, datasets, callbacks, utilities, etc.
scripts/ - scripts to run LLM workloads
- data_prep/ - convert text data from original sources to StreamingDataset format
- train/ - train or finetune HuggingFace and MPT models from 125M - 70B parameters
  - train/benchmarking - profile training throughput and MFU
- inference/ - convert models to HuggingFace or ONNX format, and generate responses
  - inference/benchmarking - profile inference latency and throughput
- eval/ - evaluate LLMs on academic (or custom) in-context-learning tasks
mcli/ - launch any of these workloads using MCLI and the MosaicML platform
TUTORIAL.md - a deeper dive into the repo, example workflows, and FAQs

DBRX

DBRX is a state-of-the-art open source LLM trained by Databricks Mosaic team. It uses the Mixture-of-Experts (MoE) architecture and was trained with optimized versions of Composer, LLM Foundry, and MegaBlocks. The model has 132B total parameters and 36B active parameters. We have released two DBRX models:

Model	Context Length	Download
DBRX Base	32768	https://huggingface.co/databricks/dbrx-base
DBRX Instruct	32768	https://huggingface.co/databricks/dbrx-instruct

Our model weights and code are licensed for both researchers and commercial entities. The Databricks Open Source License can be found at LICENSE, and our Acceptable Use Policy can be found here.

For more information about the DBRX models, see https://github.com/databricks/dbrx.

MPT

Mosaic Pretrained Transformers (MPT) are GPT-style models with some special features -- Flash Attention for efficiency, ALiBi for context length extrapolation, and stability improvements to mitigate loss spikes. As part of MosaicML's Foundation series, we have open-sourced several MPT models:

Model	Context Length	Download	Commercial use?
MPT-30B	8192	https://huggingface.co/mosaicml/mpt-30b	Yes
MPT-30B-Instruct	8192	https://huggingface.co/mosaicml/mpt-30b-instruct	Yes
MPT-30B-Chat	8192	https://huggingface.co/mosaicml/mpt-30b-chat	No
MPT-7b-8k	8192	https://huggingface.co/mosaicml/mpt-7b-8k	Yes
MPT-7b-8k-Chat	8192	https://huggingface.co/mosaicml/mpt-7b-8k-chat	No
MPT-7B	2048	https://huggingface.co/mosaicml/mpt-7b	Yes
MPT-7B-Instruct	2048	https://huggingface.co/mosaicml/mpt-7b-instruct	Yes
MPT-7B-Chat	2048	https://huggingface.co/mosaicml/mpt-7b-chat	No
MPT-7B-StoryWriter	65536	https://huggingface.co/mosaicml/mpt-7b-storywriter	Yes

To try out these models locally, follow the instructions in scripts/inference/README.md to prompt HF models using our hf_generate.py or hf_chat.py scripts.

MPT Community

We've been overwhelmed by all the amazing work the community has put into MPT! Here we provide a few links to some of them:

ReplitLM: replit-code-v1-3b is a 2.7B Causal Language Model focused on Code Completion. The model has been trained on a subset of the Stack Dedup v1.2 dataset covering 20 languages such as Java, Python, and C++
LLaVa-MPT: Visual instruction tuning to get MPT multimodal capabilities
ggml: Optimized MPT version for efficient inference on consumer hardware
GPT4All: locally running chat system, now with MPT support!
Q8MPT-Chat: 8-bit optimized MPT for CPU by our friends at Intel

Tutorial videos from the community:

Something missing? Contribute with a PR!

Latest News

Hardware and Software Requirements

This codebase has been tested with PyTorch 2.4 with NVIDIA A100s and H100s. This codebase may also work on systems with other devices, such as consumer NVIDIA cards and AMD cards, but we are not actively testing these systems. If you have success/failure using LLM Foundry on other systems, please let us know in a Github issue and we will update the support matrix!

Device	Torch Version	Cuda Version	Status
A100-40GB/80GB	2.7.0	12.8	:white_check_mark: Supported
H100-80GB	2.7.0	12.8	:white_check_mark: Supported

MosaicML Docker Images

We highly recommend using our prebuilt Docker images. You can find them here: https://hub.docker.com/orgs/mosaicml/repositories.

The mosaicml/pytorch images are pinned to specific PyTorch and CUDA versions, and are stable and rarely updated.

The mosaicml/llm-foundry images are built with new tags upon every commit to the main branch. You can select a specific commit hash such as mosaicml/llm-foundry:2.7.0_cu128-9867a7b or take the latest one using mosaicml/llm-foundry:2.7.0_cu128-latest.

Please Note: The mosaicml/llm-foundry images do not come with the llm-foundry package preinstalled, just the dependencies. You will still need to pip install llm-foundry either from PyPi or from source.

Docker Image	Torch Version	Cuda Version	LLM Foundry dependencies installed?
`mosaicml/pytorch:2.7.0_cu128-python3.12-ubuntu22.04`	2.7.0	12.8 (Infiniband)	No
`mosaicml/llm-foundry:2.7.0_cu128-latest`	2.7.0	12.8 (Infiniband)	Yes
`mosaicml/llm-foundry:2.7.0_cu128_aws-latest`	2.7.0	12.8 (EFA)	Yes

Installation

This assumes you already have PyTorch, CMake, and packaging installed. If not, you can install them with pip install cmake packaging torch.

To get started, clone the repo and set up your environment. Instructions to do so differ slightly depending on whether you're using Docker.

With Docker (recommended)

We strongly recommend working with LLM Foundry inside a Docker container (see our recommended Docker image above). If you are doing so, follow these steps to clone the repo and install the requirements.

git clone https://github.com/mosaicml/llm-foundry.git
cd llm-foundry
pip install -e ".[gpu]"  # or `pip install -e .` if no NVIDIA GPU.

Without Docker (not recommended)

If you choose not to use Docker, you should create and use a virtual environment.

git clone https://github.com/mosaicml/llm-foundry.git
cd llm-foundry

# Creating and activate a virtual environment
python3 -m venv llmfoundry-venv
source llmfoundry-venv/bin/activate

pip install cmake packaging torch  # setup.py requires these be installed

pip install -e ".[gpu]"  # or `pip install -e .` if no NVIDIA GPU.

TransformerEngine and amp_fp8 support

NVIDIA H100 GPUs have FP8 support; we have installed Flash Attention and Transformer in our Docker images already (see above). If you are not using our Docker images, you can install these packages with:

pip install flash-attn --no-build-isolation
pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable

See here for more details on enabling TransformerEngine layers and amp_fp8.

AMD (BETA support)

In our testing of AMD GPUs, the env setup includes:

git clone https://github.com/mosaicml/llm-foundry.git
cd llm-foundry

# Creating and activate a virtual environment
python3 -m venv llmfoundry-venv-amd
source llmfoundry-venv-amd/bin/activate

# installs
pip install cmake packaging torch
pip install -e .  # This installs some things that are not needed but they don't hurt
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

Lastly, install the ROCm enabled flash attention (instructions here).

Notes:

We don't yet have a Docker image where everything works perfectly. You might need to up/downgrade some packages (in our case, we needed to downgrade to numpy==1.23.5) before everything works without issue.

Intel Gaudi

Support for LLM Foundry on Intel Gaudi devices is experimental, please use the branch habana_alpha and see the README on that branch which has install instructions and known issues.

For training and inference performance results on Intel Gaudi2 accelerators, see our blog: https://www.databricks.com/blog/llm-training-and-inference-intel-gaudi2-ai-accelerators

Quickstart

Note Make sure to go through the installation steps above before trying the quickstart!

Here is an end-to-end workflow for preparing a subset of the C4 dataset, training an MPT-125M model for 10 batches, converting the model to HuggingFace format, evaluating the model on the Winograd challenge, and generating responses to prompts.

(Remember this is a quickstart just to demonstrate the tools -- To get good quality, the LLM must be trained for longer than 10 batches ð)

cd scripts

# Convert C4 dataset to StreamingDataset format
python data_prep/convert_dataset_hf.py \
  --dataset allenai/c4 --data_subset en \
  --out_root my-copy-c4 --splits train_small val_small \
  --concat_tokens 2048 --tokenizer EleutherAI/gpt-neox-20b --eos_text '<|endoftext|>'

# Train an MPT-125m model for 10 batches
composer train/train.py \
  train/yamls/pretrain/mpt-125m.yaml \
  variables.data_local=my-copy-c4 \
  train_loader.dataset.split=train_small \
  eval_loader.dataset.split=val_small \
  max_duration=10ba \
  eval_interval=0 \
  save_folder=mpt-125m

# Convert the model to HuggingFace format
python inference/convert_composer_to_hf.py \
  --composer_path mpt-125m/ep0-ba10-rank0.pt \
  --hf_output_path mpt-125m-hf \
  --output_precision bf16 \
  # --hf_repo_for_upload user-org/repo-name

# Evaluate the model on a subset of tasks
composer eval/eval.py \
  eval/yamls/hf_eval.yaml \
  icl_tasks=eval/yamls/copa.yaml \
  model_name_or_path=mpt-125m-hf

# Generate responses to prompts
python inference/hf_generate.py \
  --name_or_path mpt-125m-hf \
  --max_new_tokens 256 \
  --prompts \
    "The answer to life, the universe, and happiness is" \
    "Here's a quick recipe for baking chocolate chip cookies: Start by"

Note: the composer command used above to train the model refers to the Composer library's distributed launcher.

If you have a write-enabled HuggingFace auth token, you can optionally upload your model to the Hub! Just export your token like this:

export HF_TOKEN=your-auth-token

and uncomment the line containing --hf_repo_for_upload ... in the above call to inference/convert_composer_to_hf.py.

Registry

You can use the registry to customize your workflows without forking the library. Some components of LLM Foundry are registrable, such as models, loggers, and callbacks. This means that you can register new options for these components, and then use them in your yaml config.

Discovering registrable components

To help find and understand registrable components, you can use the llmfoundry registry cli command.

We provide two commands currently:

llmfoundry registry get [--group]: List all registries, and their components, optionally specifying a specific registry. Example usage: llmfoundry registry get --group loggers or llmfoundry registry get
llmfoundry registry find <group> <name>: Get information about a specific registered component. Example usage: llmfoundry registry find loggers wandb

Use --help on any of these commands for more information.

These commands can also help you understand what each registry is composed of, as each registry contains a docstring that will be printed out. The general concept is that each registry defines an interface, and components registered to that registry must implement that interface. If there is a part of the library that is not currently extendable, but you think it should be, please open an issue!

How to register

There are a few ways to register a new component:

Python entrypoints

You can specify registered components via a Python entrypoint if you are building your own package with registered components. This would be the expected usage if you are building a large extension to LLM Foundry, and going to be overriding many components. Note that things registered via entrypoints will override components registered directly in code.

For example, the following would register the MyLogger class, under the key my_logger, in the llm_foundry.loggers registry:

[build-system]
requires = ["setuptools>=42", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "foundry_registry"
version = "0.1.0"
dependencies = [
    "mosaicml",
    "llm-foundry",
]

# Note: Even though in python code, this would be llmfoundry.registry.loggers,
# when specified in the entry_points, it has to be "llmfoundry_loggers". That is,
# the segments of the name should be joined by an _ in the entry_points section.
[project.entry-points."llmfoundry_loggers"]
my_logger = "foundry_registry.loggers:MyLogger"

If developing new components via entrypoints, it is important to note that Python entrypoints are global to the Python environment. This means that if you have multiple packages that register components with the same key, the last one installed will be the one used. This can be useful for overriding components in LLM Foundry, but can also lead to unexpected behavior if not careful. Additionally, if you change the pyproject.toml, you will need to reinstall the package for the changes to take effect. You can do this quickly by installing with pip install -e . --no-deps to avoid reinstalling dependencies.

Direct call to register

You can also register a component directly in your code:

from composer.loggers import LoggerDestination
from llmfoundry.registry import loggers

class MyLogger(LoggerDestination):
    pass

loggers.register("my_logger", func=MyLogger)

Decorators

You can also use decorators to register components directly from your code:

from composer.loggers import LoggerDestination
from llmfoundry.registry import loggers

@loggers.register("my_logger")
class MyLogger(LoggerDestination):
    pass

For both the direct call and decorator approaches, if using the LLM Foundry train/eval scripts, you will need to provide the code_paths argument, which is a list of files need to execute in order to register your components. For example, you may have a file called foundry_imports.py that contains the following:

from foundry_registry.loggers import MyLogger
from llmfoundry.registry import loggers

loggers.register("my_logger", func=MyLogger)

You would then provide code_paths to the train/eval scripts in your yaml config:

...
code_paths:
  - foundry_imports.py
...

One of these would be the expected usage if you are building a small extension to LLM Foundry, only overriding a few components, and thus don't want to create an entire package.

Learn more about LLM Foundry!

Check out TUTORIAL.md to keep learning about working with LLM Foundry. The tutorial highlights example workflows, points you to other resources throughout the repo, and answers frequently asked questions!

Contact Us

If you run into any problems with the code, please file Github issues directly to this repo.

If you want to train LLMs on the MosaicML platform, reach out to us at demo@mosaicml.com!

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot