FlagAI
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
Top Related Projects
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Ongoing research training transformer models at scale
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
An open-source NLP research library, built on PyTorch.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Quick Overview
FlagAI is an open-source AI toolkit developed by BAAI (Beijing Academy of Artificial Intelligence). It provides a comprehensive set of tools and models for natural language processing, computer vision, and multimodal tasks. FlagAI aims to facilitate AI research and application development by offering pre-trained models, efficient training frameworks, and easy-to-use APIs.
Pros
- Comprehensive toolkit covering multiple AI domains (NLP, CV, multimodal)
- Offers pre-trained models and efficient training frameworks
- Supports both PyTorch and TensorFlow backends
- Provides easy-to-use APIs for quick integration and deployment
Cons
- Documentation may be limited or not as extensive as some other popular AI libraries
- Community support might be smaller compared to more established frameworks
- May have a steeper learning curve for beginners due to its comprehensive nature
- Some features or models might be more focused on Chinese language processing
Code Examples
- Loading a pre-trained BERT model:
from flagai.auto_model.auto_loader import AutoLoader
loader = AutoLoader(task_name="text_classification", model_name="BERT-base-en")
model = loader.get_model()
tokenizer = loader.get_tokenizer()
- Performing text classification:
text = "FlagAI is an excellent AI toolkit."
tokens = tokenizer.tokenize(text)
input_ids = tokenizer.convert_tokens_to_ids(tokens)
logits = model(input_ids)
predicted_class = logits.argmax(-1).item()
- Fine-tuning a model on a custom dataset:
from flagai.trainer import Trainer
trainer = Trainer(
env_type="pytorch",
experiment_name="bert_classification",
batch_size=16,
gradient_accumulation_steps=1,
max_epochs=3,
num_gpus=1,
save_interval=1000,
eval_interval=100,
)
trainer.train(model, train_dataset, valid_dataset)
Getting Started
To get started with FlagAI, follow these steps:
- Install FlagAI using pip:
pip install flagai
- Import the necessary modules:
from flagai.auto_model.auto_loader import AutoLoader
from flagai.trainer import Trainer
- Load a pre-trained model and tokenizer:
loader = AutoLoader(task_name="text_classification", model_name="BERT-base-en")
model = loader.get_model()
tokenizer = loader.get_tokenizer()
- Use the model for inference or fine-tuning as shown in the code examples above.
Competitor Comparisons
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Pros of transformers
- Extensive model support: Covers a wide range of NLP tasks and architectures
- Large community and ecosystem: Frequent updates, extensive documentation, and third-party integrations
- Seamless integration with PyTorch and TensorFlow
Cons of transformers
- Can be complex for beginners due to its extensive features and options
- Larger library size and potential overhead for simpler projects
- May require more computational resources for some models
Code Comparison
transformers:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
FlagAI:
from flagai.auto_model.auto_loader import AutoLoader
auto_loader = AutoLoader("seq2seq", "GLM-large-ch")
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()
Both libraries offer similar functionality for loading pre-trained models and tokenizers, but transformers provides a more standardized approach across different model architectures. FlagAI focuses on specific models and tasks, potentially offering a more streamlined experience for supported use cases.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- More mature and widely adopted in the industry
- Extensive documentation and community support
- Broader range of optimization techniques and features
Cons of DeepSpeed
- Steeper learning curve for beginners
- Primarily focused on PyTorch, limiting flexibility for other frameworks
- Can be complex to configure for specific use cases
Code Comparison
DeepSpeed:
import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(
args=args, model=model, model_parameters=params
)
FlagAI:
from flagai.auto_model.auto_loader import AutoLoader
loader = AutoLoader(task_name="text_classification",
model_name="BERT-base-en")
model = loader.get_model()
DeepSpeed offers more fine-grained control over optimization and distributed training, while FlagAI provides a simpler, more user-friendly interface for common NLP tasks. DeepSpeed is better suited for large-scale, performance-critical applications, whereas FlagAI is more accessible for quick prototyping and smaller projects.
Ongoing research training transformer models at scale
Pros of Megatron-LM
- Highly optimized for NVIDIA GPUs, offering excellent performance for large-scale language models
- Supports advanced parallelism techniques like tensor, pipeline, and sequence parallelism
- Extensive documentation and examples for training and fine-tuning various model architectures
Cons of Megatron-LM
- Limited flexibility for non-NVIDIA hardware or cloud environments
- Steeper learning curve due to its focus on high-performance, distributed training
- Less emphasis on easy-to-use APIs for downstream tasks and applications
Code Comparison
Megatron-LM (model initialization):
model = get_language_model(
attention_mask_func, num_tokentypes=num_tokentypes, add_pooler=add_pooler,
init_method=init_method, scaled_init_method=scaled_init_method)
FlagAI (model initialization):
model = BaseModel.from_pretrained(model_name)
model.to(device)
Megatron-LM focuses on distributed training and optimization, while FlagAI emphasizes ease of use and quick deployment. Megatron-LM's code is more complex, reflecting its advanced features, while FlagAI's API is more straightforward for common tasks. Both projects aim to facilitate large-scale language model development but cater to different user needs and hardware setups.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Pros of gpt-neox
- Specialized for training large language models, particularly GPT-style models
- Extensive documentation and community support
- Highly optimized for distributed training on multiple GPUs
Cons of gpt-neox
- Limited flexibility for other AI tasks beyond language modeling
- Steeper learning curve for users new to large-scale language model training
- Requires significant computational resources for optimal performance
Code Comparison
gpt-neox:
from megatron.neox_arguments import NeoXArgs
from megatron.global_vars import set_global_variables, get_tokenizer
from megatron.training import pretrain
args = NeoXArgs.from_ymls("configs/your_config.yml")
set_global_variables(args)
FlagAI:
from flagai.auto_model.auto_loader import AutoLoader
from flagai.trainer import Trainer
auto_loader = AutoLoader("lm", model_name="GLM-large-ch")
model = auto_loader.get_model()
trainer = Trainer(env_type="pytorch", pytorch_device="cuda")
The code snippets demonstrate the different approaches to model initialization and training setup. gpt-neox focuses on large-scale distributed training, while FlagAI offers a more user-friendly interface for various AI tasks.
An open-source NLP research library, built on PyTorch.
Pros of AllenNLP
- More established and widely used in the NLP research community
- Extensive documentation and tutorials available
- Strong integration with PyTorch and support for various NLP tasks
Cons of AllenNLP
- Steeper learning curve for beginners
- Less focus on large-scale language models and multi-modal tasks
- May require more setup and configuration for certain tasks
Code Comparison
AllenNLP:
from allennlp.predictors import Predictor
predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz")
result = predictor.predict(sentence="Did Uriah honestly think he could beat the game in under three hours?")
FlagAI:
from flagai.auto_model.auto_loader import AutoLoader
loader = AutoLoader("seq2seq", "THUDM/chatglm-6b", use_cache=True)
model = loader.get_model()
tokenizer = loader.get_tokenizer()
Both libraries offer easy-to-use interfaces for loading and using pre-trained models, but FlagAI seems to have a more streamlined approach for large language models.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Pros of fairseq
- More established and widely used in the research community
- Extensive documentation and examples for various NLP tasks
- Supports a broader range of architectures and models
Cons of fairseq
- Steeper learning curve for beginners
- Less focus on Chinese language models and tasks
- Requires more setup and configuration for specific use cases
Code Comparison
FlagAI:
from flagai.auto_model.auto_loader import AutoLoader
loader = AutoLoader("seq2seq", "GLM-large-ch")
model = loader.get_model()
tokenizer = loader.get_tokenizer()
fairseq:
from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('/path/to/model')
tokenizer = model.encode('Hello world')
FlagAI focuses on simplifying the process of loading and using pre-trained models, especially for Chinese language tasks. It provides a more streamlined API for common use cases.
fairseq offers more flexibility and control over model architecture and training process, but requires more code and configuration to set up and use models.
Both libraries support various NLP tasks, but FlagAI has a stronger emphasis on Chinese language models and applications, while fairseq covers a broader range of languages and architectures.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model. Our goal is to support training, fine-tuning, and deployment of large-scale models on various downstream tasks with multi-modality.
Why should I use FlagAI?
-
Quickly Download Models via API
FlagAI provides an API that allows you to quickly download pre-trained models and fine-tune them on a wide range of datasets collected from SuperGLUE and CLUE benchmarks for both Chinese and English text.
FlagAI now supports over 30 mainstream models, including Language Model Aquila, multilingual text and image representation model AltCLIP, text-to-image generation model AltDiffusion
, WuDao GLM (with a maximum of 10 billion parameters), EVA-CLIP, OPT, BERT, RoBERTa, GPT2, T5, ALM, and models from Huggingface Transformers, etc.
-
Parallel train with fewer than 10 lines of code
Backed by the four most popular data/model parallel libraries -- PyTorch, Deepspeed, Megatron-LM, BMTrain -- FlagAI allows for seamless integration between them, enabling users to parallel their training/testing process with fewer than ten lines of code.
-
Conveniently use the few-shot learning toolkits
FlagAI also provides prompt-learning toolkit for few-shot tasks.
-
Particularly good at Chinese tasks
These models can be applied to (Chinese/English) Text, for tasks like text classification, information extraction, question answering, summarization, and text generation, with a particular focus on Chinese tasks.
Toolkits and Pre-trained Models
The code is partially based on GLM, Transformersï¼timm and DeepSpeedExamples.
Toolkits
Name | Description | Examples |
---|---|---|
GLM_custom_pvp | Customizing PET templates | README.md |
GLM_ptuning | p-tuning tool | ââ |
BMInf-generate | Accelerating generation | README.md |
Pre-trained Models
Model | Task | Train | Finetune | Inference/Generate | Examples |
---|---|---|---|---|---|
Aquila | Natural Language Processing | â | â | â | README.md |
ALM | Arabic Text Generation | â | â | â | README.md |
AltCLIP | Image-Text Matching | â | â | â | README.md |
AltCLIP-m18 | Image-Text Matching | â | â | â | README.md |
AltDiffusion | Text-to-Image Generation | â | â | â | README.md |
AltDiffusion-m18 | Text-to-Image Generation,supporting 18 languages | â | â | â | README.md |
BERT-title-generation-english | English Title Generation | â | â | â | README.md |
CLIP | Image-Text Matching | â | â | â | ââ |
CPM3-finetune | Text Continuation | â | â | â | ââ |
CPM3-generate | Text Continuation | â | â | â | ââ |
CPM3_pretrain | Text Continuation | â | â | â | ââ |
CPM_1 | Text Continuation | â | â | â | README.md |
EVA-CLIP | Image-Text Matching | â | â | â | README.md |
Galactica | Text Continuation | â | â | â | ââ |
GLM-large-ch-blank-filling | Blank Filling | â | â | â | TUTORIAL |
GLM-large-ch-poetry-generation | Poetry Generation | â | â | â | TUTORIAL |
GLM-large-ch-title-generation | Title Generation | â | â | â | TUTORIAL |
GLM-pretrain | Pre-Train | â | â | â | ââ |
GLM-seq2seq | Generation | â | â | â | ââ |
GLM-superglue | Classification | â | â | â | ââ |
GPT-2-text-writting | Text Continuation | â | â | â | TUTORIAL |
GPT2-text-writting | Text Continuation | â | â | â | ââ |
GPT2-title-generation | Title Generation | â | â | â | ââ |
OPT | Text Continuation | â | â | â | README.md |
RoBERTa-base-ch-ner | Named Entity Recognition | â | â | â | TUTORIAL |
RoBERTa-base-ch-semantic-matching | Semantic Similarity Matching | â | â | â | TUTORIAL |
RoBERTa-base-ch-title-generation | Title Generation | â | â | â | TUTORIAL |
RoBERTa-faq | Question-Answer | â | â | â | README.md |
Swinv1 | Image Classification | â | â | â | ââ |
Swinv2 | Image Classification | â | â | â | ââ |
T5-huggingface-11b | Train | â | â | â | TUTORIAL |
T5-title-generation | Title Generation | â | â | â | TUTORIAL |
T5-flagai-11b | Pre-Train | â | â | â | ââ |
ViT-cifar100 | Pre-Train | â | â | â | ââ |
- More excamples in ./examples
- More tutorials in ./docs
Contributing
Thanks for your interest in contributing! There are many ways to get involved; start with our contributor guidelines and then check these open issues for specific tasks.
Contact us
Welcome to raise your questions or feature requests on GitHub Issues , and share your experience on the Discussions board.
- Official email: open.platform@baai.ac.cn.
- Zhihu: FlagAI
- Scan the qrcode to join the WeChat group for communication:

Quick Start
We provide many models which are trained to perform different tasks. You can load these models by AutoLoader to make prediction. See more in FlagAI/quickstart
.
Requirements and Installation
- Python version >= 3.8
- PyTorch version >= 1.8.0
- [Optional] For training/testing models on GPUs, you'll also need to install CUDA and NCCL
- To install FlagAI with pip:
pip install -U flagai
- [Optional] To install FlagAI and develop locally:
git clone https://github.com/FlagAI-Open/FlagAI.git
python setup.py install
- [Optional] For faster training, install NVIDIA's apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
- [Optional] For ZeRO optimizers, install DEEPSPEED (>= 0.7.7)
git clone https://github.com/microsoft/DeepSpeed
cd DeepSpeed
DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e .
ds_report # check the deespeed status
- [Optional] For BMTrain training, install BMTrain (>= 0.2.2)
git clone https://github.com/OpenBMB/BMTrain
cd BMTrain
python setup.py install
- [Optional] For BMInf low-resource inference, install BMInf
pip install bminf
- [Optional] For Flash Attention, install Flash-attention (>=1.0.2)
pip install flash-attn
- [Tips] For single-node docker environments, we need to set up ports for your ssh. e.g., root@127.0.0.1 with port 711
>>> vim ~/.ssh/config
Host 127.0.0.1
Hostname 127.0.0.1
Port 7110
User root
- [Tips] For multi-node docker environments, generate ssh keys and copy the public key to all nodes (in
~/.ssh/
)
>>> ssh-keygen -t rsa -C "xxx@xxx.com"
Load model and tokenizer
We provide the AutoLoad class to load the model and tokenizer quickly, for example:
from flagai.auto_model.auto_loader import AutoLoader
auto_loader = AutoLoader(
task_name="title-generation",
model_name="BERT-base-en"
)
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()
This example is for the title_generation
task, and you can also model other tasks by modifying the task_name
.
Then you can use the model and tokenizer to fine-tune or test.
Examples
1. Predictor
We provide the Predictor
class to predict for different tasks, for example:
from flagai.model.predictor.predictor import Predictor
predictor = Predictor(model, tokenizer)
test_data = [
"Four minutes after the red card, Emerson Royal nodded a corner into the path of the unmarked Kane at the far post, who nudged the ball in for his 12th goal in 17 North London derby appearances. Arteta's misery was compounded two minutes after half-time when Kane held the ball up in front of goal and teed up Son to smash a shot beyond a crowd of defenders to make it 3-0.The goal moved the South Korea talisman a goal behind Premier League top scorer Mohamed Salah on 21 for the season, and he looked perturbed when he was hauled off with 18 minutes remaining, receiving words of consolation from Pierre-Emile Hojbjerg.Once his frustrations have eased, Son and Spurs will look ahead to two final games in which they only need a point more than Arsenal to finish fourth.",
]
for text in test_data:
print(
predictor.predict_generate_beamsearch(text,
out_max_length=50,
beam_size=3))
This example is for the seq2seq
task, where we can get beam-search
results by calling the predict_generate_beamsearch
function. In addition, we also support prediction for tasks such as NER
and title generate
.
2. NER
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor
task_name = "ner"
model_name = "RoBERTa-base-ch"
target = ["O", "B-LOC", "I-LOC", "B-ORG", "I-ORG", "B-PER", "I-PER"]
maxlen = 256
auto_loader = AutoLoader(task_name,
model_name=model_name,
load_pretrain_params=True,
class_num=len(target))
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()
predictor = Predictor(model, tokenizer)
test_data = [
"6æ15æ¥ï¼æ²³åçæç©èå¤ç ç©¶ææ¹æé«éµæç©éå
¬å¼åè¡¨å£°ææ¿è®¤ï¼â仿¥æ²¡æè¯´è¿åºåçç 忝å¢ä¸»äººç",
"4æ8æ¥ï¼å京å¬å¥¥ä¼ã嬿®å¥¥ä¼æ»ç»è¡¨å½°å¤§ä¼å¨äººæ°å¤§ä¼å éé举è¡ãä¹ è¿å¹³æ»ä¹¦è®°åºå¸å¤§ä¼å¹¶å表éè¦è®²è¯ãå¨è®²è¯ä¸ï¼æ»ä¹¦è®°å
åè¯å®äºå京å¬å¥¥ä¼ã嬿®å¥¥ä¼åå¾çä¼å¼æç»©ï¼å
¨é¢å顾äº7å¹´ç¹åå¤èµçä¸å¡åç¨ï¼æ·±å
¥æ»ç»äºç¹å¤ä¸¾åå京å¬å¥¥ä¼ã嬿®å¥¥ä¼çå®è´µç»éªï¼æ·±å»ééäºå京å¬å¥¥ç²¾ç¥ï¼å¯¹è¿ç¨å¥½å¬å¥¥é产æ¨å¨é«è´¨éåå±æåºæç¡®è¦æ±ã",
"å½å°æ¶é´8æ¥ï¼æ¬§çå§åä¼è¡¨ç¤ºï¼æ¬§çåæå彿¿åºç°å·²å»ç»å
±è®¡çº¦300亿欧å
ä¸ä¿ç½æ¯å¯¡å¤´åå
¶ä»è¢«å¶è£çä¿æ¹äººåæå
³çèµäº§ã",
"è¿ä¸çå£ç¶æä¸è±å½å¿
åå
¬å¸äºæ´²çäº¤ææ°æ®æ¾ç¤ºåæ´å°¼äºçãè仿¬§èµææ³¨çï¼ä¹æ¯ä¸»éçãå·´åè«ä¸¤è¿è´¥ï¼",
]
for t in test_data:
entities = predictor.predict_ner(t, target, maxlen=maxlen)
result = {}
for e in entities:
if e[2] not in result:
result[e[2]] = [t[e[0]:e[1] + 1]]
else:
result[e[2]].append(t[e[0]:e[1] + 1])
print(f"result is {result}")
3. Semantic Matching example
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor
maxlen = 256
auto_loader = AutoLoader("semantic-matching",
model_name="RoBERTa-base-ch",
load_pretrain_params=True,
class_num=2)
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()
predictor = Predictor(model, tokenizer)
test_data = [["åæäºå", "ä½ ææ²¡æåæ"], ["æå¼èªå¨æ¨ªå±", "å¼å¯ç§»å¨æ°æ®"],
["æè§å¾ä½ å¾èªæ", "ä½ èªæææ¯è¿ä¹è§å¾"]]
for text_pair in test_data:
print(predictor.predict_cls_classifier(text_pair))
LICENSE
The majority of FlagAI is licensed under the Apache 2.0 license, however portions of the project are available under separate license terms:
- Megatron-LM is licensed under the Megatron-LM license
- GLM is licensed under the MIT license
- AltDiffusion is licensed under the CreativeML Open RAIL-M license
News
- [9 June 2023] release v1.7.0, Support Aquila #324;
- [31 Mar 2023] release v1.6.3, Support AltCLIP-m18 #303 and AltDiffusion-m18 #302;
- [17 Mar 2023] release v1.6.2, Support application of new optimizers #266, and added a new gpt model name 'GPT2-base-en' for English;
- [2 Mar 2023] release v1.6.1, Support Galactica model #234; BMInf, a low-resource inference package #238, and examples for p-tuning #227
- [12 Jan 2023] release v1.6.0, support a new parallel lib called BMTrain and integate Flash Attention to speedup training of BERT and ViT models, examples in FlashAttentionBERT and FlashAttentionViT. Also add the contrastive search based text generation method SimCTG and DreamBooth finetuning based on AltDiffusion, examples in AltDiffusionNaruto.
- [28 Nov 2022] release v1.5.0, support 1.1B EVA-CLIP and [ALM: A large Arabic Language Model based on GLM], examples in ALM
- [10 Nov 2022] release v1.4.0, support AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities, examples in AltCLIP and AltDiffusion
- [29 Aug 2022] release v1.3.0, Added CLIP module and redesigned tokenizer APIs in #81
- [21 Jul 2022] release v1.2.0, ViTs are supported in #71
- [29 Jun 2022] release v1.1.0, support OPTs downloading and inference/fine-tuning #63
- [17 May 2022] made our first contribution in #1
Platforms supported

Misc
↳ Stargazers, thank you for your support!
↳ Forkers, thank you for your support!
↳ Star History
]
Top Related Projects
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Ongoing research training transformer models at scale
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
An open-source NLP research library, built on PyTorch.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot