PaddleNLP

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

12,655

3,052

12,655

567

View on GitHub

Top Related Projects

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

NeMo

15,292

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

allennlp

11,862

An open-source NLP research library, built on PyTorch.

Quick Overview

PaddleNLP is an easy-to-use and powerful NLP library built on PaddlePaddle, an open-source deep learning platform. It provides a wide range of NLP tools, pre-trained models, and datasets for various natural language processing tasks. PaddleNLP aims to make NLP development more accessible and efficient for both researchers and practitioners.

Pros

Comprehensive collection of pre-trained models and datasets for various NLP tasks
Easy-to-use API with high-level abstractions for quick development
Seamless integration with PaddlePaddle ecosystem for efficient deep learning
Active development and regular updates from the community

Cons

Less popular compared to some other NLP libraries like Hugging Face Transformers
Documentation and examples may not be as extensive as more established libraries
Primarily focused on Chinese NLP, which may limit its applicability for some users
Steeper learning curve for those not familiar with PaddlePaddle framework

Code Examples

Text Classification

from paddlenlp import Taskflow

text_classifier = Taskflow("text_classification")
result = text_classifier("这个产品很好用，推荐购买")
print(result)  # Output: [{'label': 'positive', 'score': 0.9991}]

Named Entity Recognition

from paddlenlp import Taskflow

ner = Taskflow("ner")
result = ner("华为是一家总部位于广东省深圳市的中国大型跨国科技公司")
print(result)  # Output: [{'text': '华为', 'label': 'ORGANIZATION'}, {'text': '广东省深圳市', 'label': 'LOC'}, {'text': '中国', 'label': 'LOC'}]

Sentiment Analysis

from paddlenlp import Taskflow

senta = Taskflow("sentiment_analysis")
result = senta("这家餐厅的菜品非常美味，服务态度也很好")
print(result)  # Output: [{'label': 'positive', 'score': 0.9999}]

Getting Started

To get started with PaddleNLP, follow these steps:

Install PaddleNLP:

pip install --upgrade paddlenlp

Import and use a pre-built task:

from paddlenlp import Taskflow

# Use a pre-built task for text classification
classifier = Taskflow("text_classification")
result = classifier("这是一个很棒的产品")
print(result)

For more advanced usage and custom models, refer to the official documentation and examples in the GitHub repository.

Competitor Comparisons

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

Larger community and more extensive documentation
Supports a wider range of models and tasks
More frequent updates and releases

Cons of transformers

Can be more complex for beginners
Potentially slower inference speed for some models
Larger package size and dependencies

Code Comparison

transformers:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

PaddleNLP:

from paddlenlp.transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pd")
outputs = model(**inputs)

Both libraries offer similar functionality for working with transformer models, with slight differences in syntax and backend frameworks (PyTorch for transformers, PaddlePaddle for PaddleNLP). transformers generally has broader model support and a larger community, while PaddleNLP may offer better performance in some scenarios and is more tightly integrated with the PaddlePaddle ecosystem.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

Highly optimized for distributed training and large-scale models
Supports various optimization techniques like ZeRO, pipeline parallelism, and 3D parallelism
Integrates well with popular frameworks like PyTorch and Hugging Face Transformers

Cons of DeepSpeed

Steeper learning curve due to its focus on advanced optimization techniques
Less comprehensive NLP-specific tools and pre-trained models compared to PaddleNLP
May be overkill for smaller-scale projects or simpler NLP tasks

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(args=args, model=model, model_parameters=params)
for step, batch in enumerate(data_loader):
    loss = model_engine(batch)
    model_engine.backward(loss)
    model_engine.step()

PaddleNLP:

import paddle
from paddlenlp.transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
optimizer = paddle.optimizer.AdamW(learning_rate=0.001, parameters=model.parameters())
for batch in train_data_loader:
    loss = model(batch)
    loss.backward()
    optimizer.step()

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

More extensive documentation and examples
Wider range of pre-trained models available
Stronger community support and contributions

Cons of fairseq

Steeper learning curve for beginners
Less focus on Chinese language processing
Requires more computational resources for some tasks

Code Comparison

fairseq:

from fairseq.models.transformer import TransformerModel
en2de = TransformerModel.from_pretrained('/path/to/model', checkpoint_file='model.pt')
en2de.translate('Hello world!')

PaddleNLP:

from paddlenlp.transformers import AutoModelForConditionalGeneration, AutoTokenizer
model = AutoModelForConditionalGeneration.from_pretrained("mbart-large-50-many-to-many-mmt")
tokenizer = AutoTokenizer.from_pretrained("mbart-large-50-many-to-many-mmt")

Both libraries offer similar functionality for working with transformer models, but fairseq's API is more specialized for machine translation tasks, while PaddleNLP provides a more general-purpose interface for various NLP tasks. PaddleNLP's code is often more concise and easier to read, especially for newcomers to the field.

NeMo

15,292

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Pros of NeMo

Specialized for conversational AI and speech tasks
Extensive documentation and tutorials
Strong GPU optimization and integration with NVIDIA hardware

Cons of NeMo

Limited to specific AI domains (speech, NLP)
Steeper learning curve for beginners
Requires NVIDIA GPUs for optimal performance

Code Comparison

NeMo example:

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained("QuartzNet15x5Base-En")
transcription = asr_model.transcribe(["audio_file.wav"])

PaddleNLP example:

from paddlenlp.transformers import ErnieTokenizer, ErnieForSequenceClassification

tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
model = ErnieForSequenceClassification.from_pretrained('ernie-1.0')

NeMo focuses on speech recognition tasks, while PaddleNLP provides a broader range of NLP functionalities. NeMo's code is more specialized for audio processing, whereas PaddleNLP offers a more general-purpose NLP toolkit.

allennlp

11,862

An open-source NLP research library, built on PyTorch.

Pros of AllenNLP

More extensive documentation and tutorials
Wider range of pre-implemented models and tasks
Stronger focus on research-oriented features

Cons of AllenNLP

Steeper learning curve for beginners
Less integration with deep learning frameworks other than PyTorch

Code Comparison

AllenNLP:

from allennlp.predictors import Predictor

predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz")
result = predictor.predict(sentence="Did Uriah honestly think he could beat the game in under three hours?")

PaddleNLP:

from paddlenlp import Taskflow

srl = Taskflow("semantic_role_labeling")
result = srl("他叫我吃饭")

AllenNLP provides a more verbose but flexible approach, while PaddleNLP offers a simpler, more streamlined interface for common NLP tasks. AllenNLP's code demonstrates its research-oriented nature, allowing for more customization. PaddleNLP's code showcases its ease of use for quick implementations, especially for Chinese NLP tasks.

Both libraries have their strengths, with AllenNLP being more suitable for research and complex projects, while PaddleNLP excels in simplicity and Chinese language support.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ç®ä½ä¸æð | Englishð

ç¹æ§ | æ¨¡åæ¯æ | å®è£ | å¿«éå¼å§ | ç¤¾åºäº¤æµ

News ð¢

2025.04.29 PaddleNLP ç°å·²æ¯æ Qwen3 ç³»åæ¨¡å: Qwen3 ç³»åæ¨¡åæ¯ææä¸¤ç§æèæ¨¡å¼ï¼é¢è®ç»çº¦ 36 ä¸äº¿ä¸ª tokenã119 ç§è¯è¨åæ¹è¨ãåæ¬åä¸ª Dense æ¨¡å, Qwen3-32BãQwen3-14BãQwen3-8BãQwen3-4BãQwen3-1.7B å Qwen3-0.6Bãä¸¤ä¸ª MoE æ¨¡åçæéï¼Qwen3-235B-A22Bï¼Qwen3-30B-A3Bã
2025.03.12 PaddleNLP v3.0 Beta4ï¼å¨é¢æ¯æ DeepSeek V3/R1/R1-Distill, å QwQ-32B ççé¨æèæ¨¡åãDeepSeek V3/R1å®æ´çæ¯æ FP8ãINT8ã4-bit éåæ¨çï¼MTP ææºè§£ç ãåæº FP8æ¨çè¾åºè¶1000 tokens/s; 4-bit æ¨çè¾åºè¶2100Â tokens/s! åå¸æ°çæ¨çé¨ç½²éåï¼çé¨æ¨¡åä¸é®é¨ç½²ãæ¨çé¨ç½²ä½¿ç¨ææ¡£å¨é¢æ´æ°ï¼ä½éªå¨é¢æåï¼èªç ä¸ä¸ä»£éç¨ä¿¡æ¯æ½åæ¨¡å PP-UIE å¨æ°åå¸ï¼æ¯æ8K é¿åº¦ä¿¡æ¯æ½åãæ°å¢å¤§æ¨¡å Embedding è®ç»ï¼æ¯æ INF-CL è¶å¤§ batch size è®ç»ãæ°å¢MergeKitæ¨¡åèåå·¥å·ï¼ç¼è§£å¯¹é½ä»£ä»·ãä½èµæºè®ç»å¨é¢ä¼åï¼16G å°æ¾åå¯ä»¥æµçè®ç»ã
**2025.02.10 PaddleNLP ç°å·²æ¯æ DeepSeek-R1ç³»åæ¨¡åï¼å¨çº¿ä½¿ç¨**ï¼ä¾æå¨æ°ç PaddleNLP 3.0å¥ä»¶ï¼DeepSeek-R1ç³»åæ¨¡åç°å·²å¨é¢æ¯æãååæ°æ®å¹¶è¡ãæ°æ®åç»ååå¹¶è¡ãæ¨¡åå¹¶è¡ãæµæ°´çº¿å¹¶è¡ä»¥åä¸å®¶å¹¶è¡çä¸ç³»ååè¿çåå¸å¼è®ç»è½åï¼ç»å Paddle æ¡æ¶ç¬æçåç¨çæ³¨æåæ©ç è¡¨ç¤ºææ¯ââFlashMask æ¹æ³ï¼DeepSeek-R1ç³»åæ¨¡åå¨è®ç»è¿ç¨ä¸æ¾èéä½äºæ¾åæ¶èï¼åæ¶åå¾äºåè¶çè®ç»æ§è½æåã

ç¹å»å±å¼

2025.03.17 ãDeepSeek-R1æ»¡è¡çåæºé¨ç½²å®æµã ð¥ð¥ð¥ é£æ¡¨æ¡æ¶3.0å¤§æ¨¡åæ¨çé¨ç½²å¨é¢åçº§ï¼æ¯æå¤æ¬¾ä¸»æµå¤§æ¨¡åï¼DeepSeek-R1æ»¡è¡çå®ç°åæºé¨ç½²ï¼ååæåä¸åï¼æ¬¢è¿å¹¿å¤§ç¨æ·å¼ç®±ä½éªï½ç°å·²å¼å¯æå¥æ´»å¨ï¼å®æ DeepSeek-R1-MTP åæºé¨ç½²ä»»å¡ãæäº¤é«è´¨éæµè¯ blogï¼å³å¯å®æ¶èµ¢åå¥éï¼ð°ð°ð° æ¥åå°åï¼ æ´»å¨è¯¦æï¼https://github.com/PaddlePaddle/PaddleNLP/issues/10166 ï¼ åèææ¡£ï¼https://github.com/PaddlePaddle/PaddleNLP/issues/10157 ã
2025.03.06 PaddleNLP ç°å·²æ¯æ Qwen/QwQ-32B æ¨¡å: å¶æ¨¡ååæ°ä»æ 32Bï¼ä½å¶æ°å¦æ¨çãç¼ç¨è½ååéç¨è½åå¯ä¸å·å¤ 671B åæ°ï¼å¶ä¸ 37B è¢«æ¿æ´»ï¼ç DeepSeek-R1 åª²ç¾ãåå© PaddleNLP 3.0å¥ä»¶ï¼ç°å¯å®ç°å¤ç§å¹¶è¡çç¥å¾®è°è®ç»ãé«æ§è½æ¨çãä½æ¯ç¹éååæå¡åé¨ç½²ã
2025.02.20 ð¥ð¥ãPP-UIE ä¿¡æ¯æ½åæºè½å¼æå¨æ°åçº§ã å¼ºåé¶æ ·æ¬å¦ä¹ è½åï¼æ¯ææå°çè³é¶æ æ³¨æ°æ®å®ç°é«æå·å¯å¨ä¸è¿ç§»å¦ä¹ ï¼æ¾èéä½æ°æ®æ æ³¨ææ¬ï¼å·å¤å¤çé¿ææ¬è½åï¼æ¯æ 8192 ä¸ª Token é¿åº¦ææ¡£ä¿¡æ¯æ½åï¼å®ç°è·¨æ®µè½è¯å«å³é®ä¿¡æ¯ï¼å½¢æå®æ´çè§£ï¼æä¾å®æ´å¯å®å¶åçè®ç»åæ¨çå¨æµç¨ï¼è®ç»æçç¸è¾äº LLama-Factory å®ç°äº1.8åçæåã 2æ26æ¥ï¼å¨ä¸ï¼19ï¼00ä¸ºæ¨æ·±åº¦è§£æå¨æ° PP-UIE ææ¯æ¹æ¡åå¨é¨ç½²æ¹é¢çåè½ãä¼å¿ä¸æå·§ãæ¥åé¾æ¥ï¼https://www.wjx.top/vm/mBKC6pb.aspx?udsid=606418
**2024.12.16 PaddleNLP v3.0 Beta3**ï¼å¤§æ¨¡ååè½å¨æ°åçº§ï¼æ°å¢äº Llama-3.2ãDeepSeekV2æ¨¡åï¼åçº§äº TokenizerFastï¼å¿«éåè¯ï¼éæäº SFTTrainerï¼ä¸é®å¼å¯ SFT è®ç»ãæ¤å¤ï¼PaddleNLP è¿æ¯æäºä¼åå¨ç¶æçå¸è½½åéè½½åè½ï¼å®ç°äºç²¾ç»åçéæ°è®¡ç®ï¼è®ç»æ§è½æå7%ãå¨ Unified Checkpoint æ¹é¢ï¼è¿ä¸æ¥ä¼åäºå¼æ¥ä¿åé»è¾ï¼æ°å¢ Checkpoint åç¼©åè½ï¼å¯èç78.5%åå¨ç©ºé´ã æåï¼å¨å¤§æ¨¡åæ¨çæ¹é¢ï¼åçº§ Append Attentionï¼æ¯æäº FP8éåï¼æ¯æææºè§£ç ã
2024.12.13 ðãé£æ¡¨å¤§æ¨¡åå¥ä»¶ Unified Checkpoint ææ¯ãï¼å éæ¨¡ååå¨95%ï¼èçç©ºé´78%ãæ¯æå¨åå¸å¼çç¥è°æ´èªéåºè½¬æ¢ï¼æåæ¨¡åè®ç»ççµæ´»æ§ä¸å¯æ©å±æ§ãè®ç»-åç¼©-æ¨çç»ä¸åå¨åè®®ï¼æ éæå¨è½¬æ¢æåå¨æµç¨ä½éªãCheckpoint æ æåç¼©ç»åå¼æ¥ä¿åï¼å®ç°ç§çº§åå¨å¹¶éä½æ¨¡ååå¨ææ¬ãéç¨äºæºè½å¶é ãææ¥äº¤éãå»çå¥åº·ãéèæå¡çäº§ä¸å®éåºæ¯ã12æ24æ¥ï¼å¨äºï¼19ï¼00ç´æä¸ºæ¨è¯¦ç»è§£è¯»è¯¥ææ¯å¦ä½ä¼åå¤§æ¨¡åè®ç»æµç¨ãæ¥åé¾æ¥ï¼https://www.wjx.top/vm/huZkHn9.aspx?udsid=787976
2024.11.28 ðãFlashRAG-Paddle | åºäº PaddleNLP çé«æå¼åä¸è¯æµ RAG æ¡æ¶ãï¼ä¸ºææ¬æ´å¿«æ´å¥½æå»ºåç¡®åµå¥è¡¨ç¤ºãå éæ¨ççæéåº¦ãPaddleNLP æ¯æè¶å¤§ Batch åµå¥è¡¨ç¤ºå¦ä¹ ä¸å¤ç¡¬ä»¶é«æ§è½æ¨çï¼æ¶µç INT8/INT4éåææ¯åå¤ç§é«ææ³¨æåæºå¶ä¼åä¸ TensorCore æ·±åº¦ä¼åãåç½®å¨ç¯èç®åèåææ¯ï¼ä½¿å¾ FlashRAG æ¨çæ§è½ç¸æ¯ transformers å¨æå¾æå70%ä»¥ä¸ï¼ç»åæ£ç´¢å¢å¼ºç¥è¯è¾åºç»ææ´å åç¡®ï¼å¸¦æ¥ææ·é«æçä½¿ç¨ä½éªãç´ææ¶é´ï¼12æ3æ¥ï¼å¨äºï¼19ï¼00ãæ¥åé¾æ¥ï¼https://www.wjx.top/vm/eaBa1vA.aspx?udsid=682361
2024.08.08 ðãé£æ¡¨äº§ä¸çº§å¤§è¯è¨æ¨¡åå¼åå©å¨ PaddleNLP 3.0 éç£åå¸ãï¼è®åæ¨å¨æµç¨è´¯éï¼ä¸»æµæ¨¡åå¨è¦çãå¤§æ¨¡åèªå¨å¹¶è¡ï¼åäº¿æ¨¡åè®æ¨å¨æµç¨å¼ç®±å³ç¨ãæä¾äº§ä¸çº§é«æ§è½ç²¾è°ä¸å¯¹é½è§£å³æ¹æ¡ï¼åç¼©æ¨çé¢åï¼å¤ç¡¬ä»¶ééãè¦çäº§ä¸çº§æºè½å©æãåå®¹åä½ãç¥è¯é®çãå³é®ä¿¡æ¯æ½åçåºç¨åºæ¯ãç´ææ¶é´ï¼8æ22æ¥ï¼å¨åï¼19ï¼00ãæ¥åé¾æ¥ï¼https://www.wjx.top/vm/Y2f7FFY.aspx?udsid=143844
**2024.06.27 PaddleNLP v3.0 Beta**ï¼æ¥æ±å¤§æ¨¡åï¼ä½éªå¨åçº§ãç»ä¸å¤§æ¨¡åå¥ä»¶ï¼å®ç°å½äº§è®¡ç®è¯çå¨æµç¨æ¥å¥ï¼å¨é¢æ¯æé£æ¡¨4D å¹¶è¡éç½®ãé«æç²¾è°çç¥ãé«æå¯¹é½ç®æ³ãé«æ§è½æ¨ççå¤§æ¨¡åäº§ä¸çº§åºç¨æµç¨ï¼èªç æè´æ¶æç RsLoRA+ç®æ³ãèªå¨æ©ç¼©å®¹åå¨æºå¶ Unified Checkpoint åéç¨åæ¯æç FastFFNãFusedQKV å©åå¤§æ¨¡åè®æ¨ï¼ä¸»æµæ¨¡åæç»æ¯ææ´æ°ï¼æä¾é«æè§£å³æ¹æ¡ã
**2024.04.24 PaddleNLP v2.8**ï¼èªç æè´æ¶æç RsLoRA+ç®æ³ï¼å¤§å¹æå PEFT è®ç»æ¶æéåº¦ä»¥åè®ç»ææï¼å¼å¥é«æ§è½çæå éå° RLHF PPO ç®æ³ï¼æç ´ PPO è®ç»ä¸çæéåº¦ç¶é¢ï¼PPO è®ç»æ§è½å¤§å¹é¢åãéç¨åæ¯æ FastFFNãFusedQKV çå¤ä¸ªå¤§æ¨¡åè®ç»æ§è½ä¼åæ¹å¼ï¼å¤§æ¨¡åè®ç»æ´å¿«ãæ´ç¨³å®ã

ç¹æ§

ð§ å¤ç¡¬ä»¶è®æ¨ä¸ä½

ð é«ææç¨çé¢è®ç»

ð¤ é«æç²¾è°

ææ¡£

æ´å¤è¯¦ç»ææ¡£, è¯·è®¿é® PaddleNLP Documentation.

æ¨¡åæ¯æ

æ¨¡ååæ°å·²æ¯æ LLaMA ç³»åãBaichuan ç³»åãBloom ç³»åãChatGLM ç³»åãGemma ç³»åãMistral ç³»åãOPT ç³»åå Qwen ç³»åï¼è¯¦ç»åè¡¨ðãLLMãæ¨¡ååæ°æ¯æåè¡¨å¦ä¸ï¼

æ¨¡åç³»å	æ¨¡ååç§°
PP-UIE	paddlenlp/PP-UIE-0.5B, paddlenlp/PP-UIE-1.5B, paddlenlp/PP-UIE-7B, paddlenlp/PP-UIE-14B
LLaMA	facebook/llama-7b, facebook/llama-13b, facebook/llama-30b, facebook/llama-65b
Llama2	meta-llama/Llama-2-7b, meta-llama/Llama-2-7b-chat, meta-llama/Llama-2-13b, meta-llama/Llama-2-13b-chat, meta-llama/Llama-2-70b, meta-llama/Llama-2-70b-chat
Llama3	meta-llama/Meta-Llama-3-8B, meta-llama/Meta-Llama-3-8B-Instruct, meta-llama/Meta-Llama-3-70B, meta-llama/Meta-Llama-3-70B-Instruct
Llama3.1	meta-llama/Meta-Llama-3.1-8B, meta-llama/Meta-Llama-3.1-8B-Instruct, meta-llama/Meta-Llama-3.1-70B, meta-llama/Meta-Llama-3.1-70B-Instruct, meta-llama/Meta-Llama-3.1-405B, meta-llama/Meta-Llama-3.1-405B-Instruct, meta-llama/Llama-Guard-3-8B
Llama3.2	meta-llama/Llama-3.2-1B, meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.2-3B, meta-llama/Llama-3.2-3B-Instruct, meta-llama/Llama-Guard-3-1B
Llama3.3	meta-llama/Llama-3.3-70B-Instruct
Baichuan	baichuan-inc/Baichuan-7B, baichuan-inc/Baichuan-13B-Base, baichuan-inc/Baichuan-13B-Chat
Baichuan2	baichuan-inc/Baichuan2-7B-Base, baichuan-inc/Baichuan2-7B-Chat, baichuan-inc/Baichuan2-13B-Base, baichuan-inc/Baichuan2-13B-Chat
Bloom	bigscience/bloom-560m, bigscience/bloom-560m-bf16, bigscience/bloom-1b1, bigscience/bloom-3b, bigscience/bloom-7b1, bigscience/bloomz-560m, bigscience/bloomz-1b1, bigscience/bloomz-3b, bigscience/bloomz-7b1-mt, bigscience/bloomz-7b1-p3, bigscience/bloomz-7b1, bellegroup/belle-7b-2m
ChatGLM	THUDM/chatglm-6b, THUDM/chatglm-6b-v1.1
ChatGLM2	THUDM/chatglm2-6b
ChatGLM3	THUDM/chatglm3-6b
DeepSeekV2	deepseek-ai/DeepSeek-V2, deepseek-ai/DeepSeek-V2-Chat, deepseek-ai/DeepSeek-V2-Lite, deepseek-ai/DeepSeek-V2-Lite-Chat, deepseek-ai/DeepSeek-Coder-V2-Base, deepseek-ai/DeepSeek-Coder-V2-Instruct, deepseek-ai/DeepSeek-Coder-V2-Lite-Base, deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
DeepSeekV3	deepseek-ai/DeepSeek-V3, deepseek-ai/DeepSeek-V3-Base
DeepSeek-R1	deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-R1-Zero, deepseek-ai/DeepSeek-R1-Distill-Llama-70B, deepseek-ai/DeepSeek-R1-Distill-Llama-8B, deepseek-ai/DeepSeek-R1-Distill-Qwen-14B, deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, deepseek-ai/DeepSeek-R1-Distill-Qwen-32B, deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Gemma	google/gemma-7b, google/gemma-7b-it, google/gemma-2b, google/gemma-2b-it
Mistral	mistralai/Mistral-7B-Instruct-v0.3, mistralai/Mistral-7B-v0.1
Mixtral	mistralai/Mixtral-8x7B-Instruct-v0.1
OPT	facebook/opt-125m, facebook/opt-350m, facebook/opt-1.3b, facebook/opt-2.7b, facebook/opt-6.7b, facebook/opt-13b, facebook/opt-30b, facebook/opt-66b, facebook/opt-iml-1.3b, opt-iml-max-1.3b
Qwen	qwen/qwen-7b, qwen/qwen-7b-chat, qwen/qwen-14b, qwen/qwen-14b-chat, qwen/qwen-72b, qwen/qwen-72b-chat,
Qwen1.5	Qwen/Qwen1.5-0.5B, Qwen/Qwen1.5-0.5B-Chat, Qwen/Qwen1.5-1.8B, Qwen/Qwen1.5-1.8B-Chat, Qwen/Qwen1.5-4B, Qwen/Qwen1.5-4B-Chat, Qwen/Qwen1.5-7B, Qwen/Qwen1.5-7B-Chat, Qwen/Qwen1.5-14B, Qwen/Qwen1.5-14B-Chat, Qwen/Qwen1.5-32B, Qwen/Qwen1.5-32B-Chat, Qwen/Qwen1.5-72B, Qwen/Qwen1.5-72B-Chat, Qwen/Qwen1.5-110B, Qwen/Qwen1.5-110B-Chat, Qwen/Qwen1.5-MoE-A2.7B, Qwen/Qwen1.5-MoE-A2.7B-Chat
Qwen2	Qwen/Qwen2-0.5B, Qwen/Qwen2-0.5B-Instruct, Qwen/Qwen2-1.5B, Qwen/Qwen2-1.5B-Instruct, Qwen/Qwen2-7B, Qwen/Qwen2-7B-Instruct, Qwen/Qwen2-72B, Qwen/Qwen2-72B-Instruct, Qwen/Qwen2-57B-A14B, Qwen/Qwen2-57B-A14B-Instruct
Qwen2-Math	Qwen/Qwen2-Math-1.5B, Qwen/Qwen2-Math-1.5B-Instruct, Qwen/Qwen2-Math-7B, Qwen/Qwen2-Math-7B-Instruct, Qwen/Qwen2-Math-72B, Qwen/Qwen2-Math-72B-Instruct, Qwen/Qwen2-Math-RM-72B
Qwen2.5	Qwen/Qwen2.5-0.5B, Qwen/Qwen2.5-0.5B-Instruct, Qwen/Qwen2.5-1.5B, Qwen/Qwen2.5-1.5B-Instruct, Qwen/Qwen2.5-3B, Qwen/Qwen2.5-3B-Instruct, Qwen/Qwen2.5-7B, Qwen/Qwen2.5-7B-Instruct, Qwen/Qwen2.5-7B-Instruct-1M, Qwen/Qwen2.5-14B, Qwen/Qwen2.5-14B-Instruct, Qwen/Qwen2.5-14B-Instruct-1M, Qwen/Qwen2.5-32B, Qwen/Qwen2.5-32B-Instruct, Qwen/Qwen2.5-72B, Qwen/Qwen2.5-72B-Instruct
Qwen2.5-Math	Qwen/Qwen2.5-Math-1.5B, Qwen/Qwen2.5-Math-1.5B-Instruct, Qwen/Qwen2.5-Math-7B, Qwen/Qwen2.5-Math-7B-Instruct, Qwen/Qwen2.5-Math-72B, Qwen/Qwen2.5-Math-72B-Instruct, Qwen/Qwen2.5-Math-RM-72B
Qwen2.5-Coder	Qwen/Qwen2.5-Coder-1.5B, Qwen/Qwen2.5-Coder-1.5B-Instruct, Qwen/Qwen2.5-Coder-7B, Qwen/Qwen2.5-Coder-7B-Instruct
Qwen3	Qwen/Qwen3-0.6B, Qwen/Qwen3-1.7B, Qwen/Qwen3-4B, Qwen/Qwen3-8B, Qwen/Qwen3-14B, Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B, Qwen/Qwen3-235B-A22B, Qwen/Qwen3-0.6B-Base, Qwen/Qwen3-1.7B-Base, Qwen/Qwen3-4B-Base, Qwen/Qwen3-8B-Base, Qwen/Qwen3-14B-Base, Qwen/Qwen3-30B-A3B-Base
QwQ	Qwen/QwQ-32B, Qwen/QwQ-32B-Preview
Yuan2	IEITYuan/Yuan2-2B, IEITYuan/Yuan2-51B, IEITYuan/Yuan2-102B

4D å¹¶è¡åç®åä¼åå·²æ¯æ LLaMA ç³»åãBaichuan ç³»åãBloom ç³»åãChatGLM ç³»åãGemma ç³»åãMistral ç³»åãOPT ç³»åå Qwen ç³»åï¼ãLLMãæ¨¡å4D å¹¶è¡åç®åæ¯æåè¡¨å¦ä¸ï¼

æ¨¡ååç§°/å¹¶è¡è½åæ¯æ	æ°æ®å¹¶è¡	å¼ éæ¨¡åå¹¶è¡		åæ°åçå¹¶è¡			æµæ°´çº¿å¹¶è¡
		åºç¡è½å	åºåå¹¶è¡	stage1	stage2	stage3
Llama	â	â	â	â	â	â	â
Qwen	â	â	â	â	â	â	â
Qwen1.5	â	â	â	â	â	â	â
Qwen2	â	â	â	â	â	â	â
Mixtral(moe)	â	â	â	â	â	â	ð§
Mistral	â	â	ð§	â	â	â	ð§
Baichuan	â	â	â	â	â	â	â
Baichuan2	â	â	â	â	â	â	â
ChatGLM	â	â	ð§	â	â	â	ð§
ChatGLM2	â	ð§	ð§	â	â	â	ð§
ChatGLM3	â	ð§	ð§	â	â	â	ð§
Bloom	â	â	ð§	â	â	â	ð§
GPT-2/GPT-3	â	â	â	â	â	â	â
OPT	â	â	ð§	â	â	â	ð§
Gemma	â	â	â	â	â	â	â
Yuan2	â	â	â	â	â	â	ð§

å¤§æ¨¡åé¢è®ç»ãç²¾è°ï¼åå« SFTãPEFT ææ¯ï¼ãå¯¹é½ãéåå·²æ¯æ LLaMA ç³»åãBaichuan ç³»åãBloom ç³»åãChatGLM ç³»åãMistral ç³»åãOPT ç³»åå Qwen ç³»åï¼ãLLMãæ¨¡åé¢è®ç»ãç²¾è°ãå¯¹é½ãéåæ¯æåè¡¨å¦ä¸ï¼

Model	Pretrain	SFT	LoRA	FlashMask	Prefix Tuning	DPO/SimPO/ORPO/KTO	RLHF	Mergekit	Quantization
Llama	â	â	â	â	â	â	â	â	â
Qwen	â	â	â	â	â	â	ð§	â	ð§
Mixtral	â	â	â	ð§	ð§	â	ð§	â	ð§
Mistral	â	â	â	ð§	â	â	ð§	â	ð§
Baichuan/Baichuan2	â	â	â	â	â	â	ð§	â	â
ChatGLM-6B	â	â	â	ð§	â	ð§	ð§	â	â
ChatGLM2/ChatGLM3	â	â	â	ð§	â	â	ð§	â	â
Bloom	â	â	â	ð§	â	ð§	ð§	â	â
GPT-3	â	â	ð§	ð§	ð§	ð§	ð§	â	ð§
OPT	â	â	â	ð§	ð§	ð§	ð§	â	ð§
Gemma	â	â	â	ð§	ð§	â	ð§	â	ð§
Yuan	â	â	â	ð§	ð§	â	ð§	â	ð§

å¤§æ¨¡åæ¨çå·²æ¯æ LLaMA ç³»åãQwen ç³»åãDeepSeek ç³»åãMistral ç³»åãChatGLM ç³»åãBloom ç³»åå Baichuan ç³»åï¼æ¯æ Weight Only INT8å INT4æ¨çï¼æ¯æ WACï¼æéãæ¿æ´»ãCache KVï¼è¿è¡ INT8ãFP8éåçæ¨çï¼ãLLMãæ¨¡åæ¨çæ¯æåè¡¨å¦ä¸ï¼

æ¨¡ååç§°/éåç±»åæ¯æ	FP16/BF16	WINT8	WINT4	INT8-A8W8	FP8-A8W8	INT8-A8W8C8
LLaMA	â	â	â	â	â	â
Qwen	â	â	â	â	â	â
DeepSeek	â	â	â	ð§	â	ð§
Qwen-Moe	â	â	â	ð§	ð§	ð§
Mixtral	â	â	â	ð§	ð§	ð§
ChatGLM	â	â	â	ð§	ð§	ð§
Bloom	â	â	â	ð§	ð§	ð§
BaiChuan	â	â	â	â	â	ð§

å®è£

ç¯å¢ä¾èµ

python >= 3.8
paddlepaddle >= 3.0.0rc1

pip å®è£

pip install --upgrade paddlenlp==3.0.0b4

æèå¯éè¿ä»¥ä¸å½ä»¤å®è£ææ° develop åæ¯ä»£ç ï¼

pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html

æ´å¤å³äº PaddlePaddle å PaddleNLP å®è£çè¯¦ç»æç¨è¯·æ¥çInstallationã

å¿«éå¼å§

å¤§æ¨¡åææ¬çæ

PaddleNLP æä¾äºæ¹ä¾¿æç¨ç Auto APIï¼è½å¤å¿«éçå è½½æ¨¡åå Tokenizerãè¿éä»¥ä½¿ç¨ Qwen/Qwen2-0.5B æ¨¡ååææ¬çæä¸ºä¾ï¼

from paddlenlp.transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B")
# if using CPU, please change float16 to float32
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B", dtype="float16")
input_features = tokenizer("ä½ å¥½ï¼è¯·èªæä»ç»ä¸ä¸ã", return_tensors="pd")
outputs = model.generate(**input_features, max_new_tokens=128)
print(tokenizer.batch_decode(outputs[0], skip_special_tokens=True))
# ['ææ¯ä¸ä¸ªAIè¯è¨æ¨¡åï¼æå¯ä»¥åçåç§é®é¢ï¼åæ¬ä½ä¸éäºï¼å¤©æ°ãæ°é»ãåå²ãæåãç§å¦ãæè²ãå¨±ä¹çãè¯·é®æ¨æä»ä¹éè¦äºè§£çåï¼']

å¤§æ¨¡åé¢è®ç»

git clone https://github.com/PaddlePaddle/PaddleNLP.git && cd PaddleNLP # å¦å·²cloneæä¸è½½PaddleNLPå¯è·³è¿
mkdir -p llm/data && cd llm/data
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k.bin
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k.idx
cd .. # change folder to PaddleNLP/llm
# å¦éä½¿ç¨use_fused_rms_norm=trueï¼éè¦åå¾slm/model_zoo/gpt-3/external_opså®è£fused_ln
python -u run_pretrain.py ./config/qwen/pretrain_argument_0p5b.json

å¤§æ¨¡å SFT ç²¾è°

git clone https://github.com/PaddlePaddle/PaddleNLP.git && cd PaddleNLP # å¦å·²cloneæä¸è½½PaddleNLPå¯è·³è¿
mkdir -p llm/data && cd llm/data
wget https://bj.bcebos.com/paddlenlp/datasets/examples/AdvertiseGen.tar.gz && tar -zxvf AdvertiseGen.tar.gz
cd .. # change folder to PaddleNLP/llm
python -u run_finetune.py ./config/qwen/sft_argument_0p5b.json

from paddlenlp.trl import SFTConfig, SFTTrainer
from datasets import load_dataset

dataset = load_dataset("ZHUI/alpaca_demo", split="train")

training_args = SFTConfig(output_dir="Qwen/Qwen2.5-0.5B-SFT", device="gpu")
trainer = SFTTrainer(
    args=training_args,
    model="Qwen/Qwen2.5-0.5B-Instruct",
    train_dataset=dataset,
)
trainer.train()

æ´å¤ PaddleNLP åå®¹å¯åèï¼

ç²¾éæ¨¡ååºï¼åå«ä¼è´¨é¢è®ç»æ¨¡åçç«¯å°ç«¯å¨æµç¨ä½¿ç¨ã
å¤åºæ¯ç¤ºä¾ï¼äºè§£å¦ä½ä½¿ç¨ PaddleNLP è§£å³ NLP å¤ç§ææ¯é®é¢ï¼åå«åºç¡ææ¯ãç³»ç»åºç¨ä¸æå±åºç¨ã
äº¤äºå¼æç¨ï¼å¨ðåè´¹ç®åå¹³å° AI Studio ä¸å¿«éå¦ä¹ PaddleNLPã

ç¤¾åºäº¤æµ

å¾®ä¿¡æ«æäºç»´ç å¹¶å¡«åé®å·ï¼å³å¯å å¥äº¤æµç¾¤ä¸ä¼å¤ç¤¾åºå¼åèä»¥åå®æ¹å¢éæ·±åº¦äº¤æµ.

Citation

@misc{=paddlenlp,
    title={PaddleNLP: An Easy-to-use and High Performance NLP Library},
    author={PaddleNLP Contributors},
    howpublished = {\url{https://github.com/PaddlePaddle/PaddleNLP}},
    year={2021}
}

Acknowledge

æä»¬åé´äº Hugging Face çTransformersð¤å³äºé¢è®ç»æ¨¡åä½¿ç¨çä¼ç§è®¾è®¡ï¼å¨æ¤å¯¹ Hugging Face ä½èåå¶å¼æºç¤¾åºè¡¨ç¤ºæè°¢ã

License

PaddleNLP éµå¾ªApache-2.0å¼æºåè®®ã

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of transformers

Cons of transformers

Code Comparison

Pros of DeepSpeed

Cons of DeepSpeed

Code Comparison

Pros of fairseq

Cons of fairseq

Code Comparison

Pros of NeMo

Cons of NeMo

Code Comparison

Pros of AllenNLP

Cons of AllenNLP

Code Comparison

Convert designs to code with AI

README

ç¹æ§ | æ¨¡åæ¯æ | å®è£ | å¿«éå¼å§ | ç¤¾åºäº¤æµ

News ð¢

ç¹æ§

ææ¡£

æ¨¡åæ¯æ

å®è£

ç¯å¢ä¾èµ

pip å®è£

å¿«éå¼å§

å¤§æ¨¡åææ¬çæ

å¤§æ¨¡åé¢è®­ç»

å¤§æ¨¡å SFT ç²¾è°

ç¤¾åºäº¤æµ

Citation

Acknowledge

License

Top Related Projects

Convert designs to code with AI

ç¹æ§ | æ¨¡åæ¯æ | å®è£ | å¿«éå¼å§ | ç¤¾åºäº¤æµ

News ð¢

ç¹æ§

ææ¡£

æ¨¡åæ¯æ

å®è£

ç¯å¢ä¾èµ

pip å®è£

å¿«éå¼å§

å¤§æ¨¡åææ¬çæ

å¤§æ¨¡åé¢è®ç»

å¤§æ¨¡å SFT ç²¾è°

ç¤¾åºäº¤æµ