PaddleNLP
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
Top Related Projects
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
An open-source NLP research library, built on PyTorch.
Quick Overview
PaddleNLP is an easy-to-use and powerful NLP library built on PaddlePaddle, an open-source deep learning platform. It provides a wide range of NLP tools, pre-trained models, and datasets for various natural language processing tasks. PaddleNLP aims to make NLP development more accessible and efficient for both researchers and practitioners.
Pros
- Comprehensive collection of pre-trained models and datasets for various NLP tasks
- Easy-to-use API with high-level abstractions for quick development
- Seamless integration with PaddlePaddle ecosystem for efficient deep learning
- Active development and regular updates from the community
Cons
- Less popular compared to some other NLP libraries like Hugging Face Transformers
- Documentation and examples may not be as extensive as more established libraries
- Primarily focused on Chinese NLP, which may limit its applicability for some users
- Steeper learning curve for those not familiar with PaddlePaddle framework
Code Examples
- Text Classification
from paddlenlp import Taskflow
text_classifier = Taskflow("text_classification")
result = text_classifier("这个产品很好用,推荐购买")
print(result) # Output: [{'label': 'positive', 'score': 0.9991}]
- Named Entity Recognition
from paddlenlp import Taskflow
ner = Taskflow("ner")
result = ner("华为是一家总部位于广东省深圳市的中国大型跨国科技公司")
print(result) # Output: [{'text': '华为', 'label': 'ORGANIZATION'}, {'text': '广东省深圳市', 'label': 'LOC'}, {'text': '中国', 'label': 'LOC'}]
- Sentiment Analysis
from paddlenlp import Taskflow
senta = Taskflow("sentiment_analysis")
result = senta("这家餐厅的菜品非常美味,服务态度也很好")
print(result) # Output: [{'label': 'positive', 'score': 0.9999}]
Getting Started
To get started with PaddleNLP, follow these steps:
- Install PaddleNLP:
pip install --upgrade paddlenlp
- Import and use a pre-built task:
from paddlenlp import Taskflow
# Use a pre-built task for text classification
classifier = Taskflow("text_classification")
result = classifier("这是一个很棒的产品")
print(result)
For more advanced usage and custom models, refer to the official documentation and examples in the GitHub repository.
Competitor Comparisons
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of transformers
- Larger community and more extensive documentation
- Supports a wider range of models and tasks
- More frequent updates and releases
Cons of transformers
- Can be more complex for beginners
- Potentially slower inference speed for some models
- Larger package size and dependencies
Code Comparison
transformers:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)
PaddleNLP:
from paddlenlp.transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pd")
outputs = model(**inputs)
Both libraries offer similar functionality for working with transformer models, with slight differences in syntax and backend frameworks (PyTorch for transformers, PaddlePaddle for PaddleNLP). transformers generally has broader model support and a larger community, while PaddleNLP may offer better performance in some scenarios and is more tightly integrated with the PaddlePaddle ecosystem.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- Highly optimized for large-scale model training and inference
- Supports a wide range of hardware configurations, including multi-GPU and multi-node setups
- Offers advanced features like ZeRO optimizer and pipeline parallelism
Cons of DeepSpeed
- Steeper learning curve compared to PaddleNLP
- Primarily focused on optimization techniques rather than providing a comprehensive NLP toolkit
- May require more manual configuration for specific use cases
Code Comparison
DeepSpeed:
import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(args=args,
model=model,
model_parameters=params)
PaddleNLP:
from paddlenlp.transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_classes=2)
DeepSpeed focuses on optimizing model training and inference, while PaddleNLP provides a more comprehensive NLP toolkit with pre-built models and easy-to-use APIs. DeepSpeed requires more manual configuration but offers advanced optimization techniques, whereas PaddleNLP emphasizes simplicity and ease of use for NLP tasks.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Pros of fairseq
- More extensive documentation and examples
- Wider range of pre-trained models available
- Stronger community support and contributions
Cons of fairseq
- Steeper learning curve for beginners
- Less focus on Chinese language processing
- Requires more computational resources for some tasks
Code Comparison
fairseq:
from fairseq.models.transformer import TransformerModel
en2de = TransformerModel.from_pretrained('/path/to/model', checkpoint_file='model.pt')
en2de.translate('Hello world!')
PaddleNLP:
from paddlenlp.transformers import AutoModelForConditionalGeneration, AutoTokenizer
model = AutoModelForConditionalGeneration.from_pretrained("mbart-large-50-many-to-many-mmt")
tokenizer = AutoTokenizer.from_pretrained("mbart-large-50-many-to-many-mmt")
Both libraries offer similar functionality for working with transformer models, but fairseq's API is more specialized for machine translation tasks, while PaddleNLP provides a more general-purpose interface for various NLP tasks. PaddleNLP's code is often more concise and easier to read, especially for newcomers to the field.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Pros of NeMo
- Specialized for conversational AI and speech tasks
- Extensive documentation and tutorials
- Strong GPU optimization and integration with NVIDIA hardware
Cons of NeMo
- Limited to specific AI domains (speech, NLP)
- Steeper learning curve for beginners
- Requires NVIDIA GPUs for optimal performance
Code Comparison
NeMo example:
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained("QuartzNet15x5Base-En")
transcription = asr_model.transcribe(["audio_file.wav"])
PaddleNLP example:
from paddlenlp.transformers import ErnieTokenizer, ErnieForSequenceClassification
tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
model = ErnieForSequenceClassification.from_pretrained('ernie-1.0')
NeMo focuses on speech recognition tasks, while PaddleNLP provides a broader range of NLP functionalities. NeMo's code is more specialized for audio processing, whereas PaddleNLP offers a more general-purpose NLP toolkit.
An open-source NLP research library, built on PyTorch.
Pros of AllenNLP
- More extensive documentation and tutorials
- Wider range of pre-implemented models and tasks
- Stronger focus on research-oriented features
Cons of AllenNLP
- Steeper learning curve for beginners
- Less integration with deep learning frameworks other than PyTorch
Code Comparison
AllenNLP:
from allennlp.predictors import Predictor
predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz")
result = predictor.predict(sentence="Did Uriah honestly think he could beat the game in under three hours?")
PaddleNLP:
from paddlenlp import Taskflow
srl = Taskflow("semantic_role_labeling")
result = srl("他叫我吃饭")
AllenNLP provides a more verbose but flexible approach, while PaddleNLP offers a simpler, more streamlined interface for common NLP tasks. AllenNLP's code demonstrates its research-oriented nature, allowing for more customization. PaddleNLP's code showcases its ease of use for quick implementations, especially for Chinese NLP tasks.
Both libraries have their strengths, with AllenNLP being more suitable for research and complex projects, while PaddleNLP excels in simplicity and Chinese language support.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ç®ä½ä¸æð | Englishð
ç¹æ§ | 模åæ¯æ | å®è£ | å¿«éå¼å§ | 社åºäº¤æµ
PaddleNLPæ¯ä¸æ¬¾åºäºé£æ¡¨æ·±åº¦å¦ä¹ æ¡æ¶ç大è¯è¨æ¨¡å(LLM)å¼åå¥ä»¶ï¼æ¯æå¨å¤ç§ç¡¬ä»¶ä¸è¿è¡é«æç大模åè®ç»ãæ æå缩以åé«æ§è½æ¨çãPaddleNLP å ·å¤**ç®åæç¨åæ§è½æè´**çç¹ç¹ï¼è´åäºå©åå¼åè å®ç°é«æç大模å产ä¸çº§åºç¨ã
News ð¢
-
**2024.12.16 PaddleNLP v3.0 Beta3**ï¼å¤§æ¨¡ååè½å ¨æ°å级ï¼æ°å¢äº Llama-3.2ãDeepSeekV2模åï¼åçº§äº TokenizerFastï¼å¿«éåè¯ï¼éæäº SFTTrainerï¼ä¸é®å¼å¯ SFT è®ç»ãæ¤å¤ï¼PaddleNLP è¿æ¯æäºä¼åå¨ç¶æçå¸è½½åéè½½åè½ï¼å®ç°äºç²¾ç»åçéæ°è®¡ç®ï¼è®ç»æ§è½æå7%ãå¨ Unified Checkpoint æ¹é¢ï¼è¿ä¸æ¥ä¼åäºå¼æ¥ä¿åé»è¾ï¼æ°å¢ Checkpoint å缩åè½ï¼å¯èç78.5%åå¨ç©ºé´ã æåï¼å¨å¤§æ¨¡åæ¨çæ¹é¢ï¼å级 Append Attentionï¼æ¯æäº FP8éåï¼æ¯æææºè§£ç ã
-
2024.12.13 ðãé£æ¡¨å¤§æ¨¡åå¥ä»¶ Unified Checkpoint ææ¯ãï¼å é模ååå¨95%ï¼èç空é´78%ãæ¯æå ¨åå¸å¼çç¥è°æ´èªéåºè½¬æ¢ï¼æå模åè®ç»ççµæ´»æ§ä¸å¯æ©å±æ§ãè®ç»-å缩-æ¨çç»ä¸åå¨åè®®ï¼æ éæå¨è½¬æ¢æåå ¨æµç¨ä½éªãCheckpoint æ æå缩ç»åå¼æ¥ä¿åï¼å®ç°ç§çº§åå¨å¹¶éä½æ¨¡ååå¨ææ¬ãéç¨äºæºè½å¶é ãææ¥äº¤éãå»çå¥åº·ãéèæå¡ç产ä¸å®é åºæ¯ã12æ24æ¥ï¼å¨äºï¼19ï¼00ç´æ为æ¨è¯¦ç»è§£è¯»è¯¥ææ¯å¦ä½ä¼å大模åè®ç»æµç¨ãæ¥åé¾æ¥ï¼https://www.wjx.top/vm/huZkHn9.aspx?udsid=787976
-
2024.11.28 ðãFlashRAG-Paddle | åºäº PaddleNLP çé«æå¼åä¸è¯æµ RAG æ¡æ¶ãï¼ä¸ºææ¬æ´å¿«æ´å¥½æ建åç¡®åµå ¥è¡¨ç¤ºãå éæ¨ççæé度ãPaddleNLP æ¯æè¶ å¤§ Batch åµå ¥è¡¨ç¤ºå¦ä¹ ä¸å¤ç¡¬ä»¶é«æ§è½æ¨çï¼æ¶µç INT8/INT4éåææ¯åå¤ç§é«æ注æåæºå¶ä¼åä¸ TensorCore 深度ä¼åãå ç½®å ¨ç¯èç®åèåææ¯ï¼ä½¿å¾ FlashRAG æ¨çæ§è½ç¸æ¯ transformers å¨æå¾æå70%以ä¸ï¼ç»åæ£ç´¢å¢å¼ºç¥è¯è¾åºç»ææ´å åç¡®ï¼å¸¦æ¥ææ·é«æç使ç¨ä½éªãç´ææ¶é´ï¼12æ3æ¥ï¼å¨äºï¼19ï¼00ãæ¥åé¾æ¥ï¼https://www.wjx.top/vm/eaBa1vA.aspx?udsid=682361
ç¹å»å±å¼
-
2024.08.08 ðãé£æ¡¨äº§ä¸çº§å¤§è¯è¨æ¨¡åå¼åå©å¨ PaddleNLP 3.0 éç£ åå¸ãï¼è®åæ¨å ¨æµç¨è´¯éï¼ä¸»æµæ¨¡åå ¨è¦çã大模åèªå¨å¹¶è¡ï¼å亿模åè®æ¨å ¨æµç¨å¼ç®±å³ç¨ãæä¾äº§ä¸çº§é«æ§è½ç²¾è°ä¸å¯¹é½è§£å³æ¹æ¡ï¼å缩æ¨çé¢å ï¼å¤ç¡¬ä»¶éé ãè¦ç产ä¸çº§æºè½å©æãå 容åä½ãç¥è¯é®çãå ³é®ä¿¡æ¯æ½åçåºç¨åºæ¯ãç´ææ¶é´ï¼8æ22æ¥ï¼å¨åï¼19ï¼00ãæ¥åé¾æ¥ï¼https://www.wjx.top/vm/Y2f7FFY.aspx?udsid=143844
-
**2024.06.27 PaddleNLP v3.0 Beta**ï¼æ¥æ±å¤§æ¨¡åï¼ä½éªå ¨å级ãç»ä¸å¤§æ¨¡åå¥ä»¶ï¼å®ç°å½äº§è®¡ç®è¯çå ¨æµç¨æ¥å ¥ï¼å ¨é¢æ¯æé£æ¡¨4D 并è¡é ç½®ãé«æç²¾è°çç¥ãé«æ对é½ç®æ³ãé«æ§è½æ¨çç大模å产ä¸çº§åºç¨æµç¨ï¼èªç æè´æ¶æç RsLoRA+ç®æ³ãèªå¨æ©ç¼©å®¹åå¨æºå¶ Unified Checkpoint åéç¨åæ¯æç FastFFNãFusedQKV å©å大模åè®æ¨ï¼ä¸»æµæ¨¡åæç»æ¯ææ´æ°ï¼æä¾é«æ解å³æ¹æ¡ã
-
**2024.04.24 PaddleNLP v2.8**ï¼èªç æè´æ¶æç RsLoRA+ç®æ³ï¼å¤§å¹ æå PEFT è®ç»æ¶æé度以åè®ç»ææï¼å¼å ¥é«æ§è½çæå éå° RLHF PPO ç®æ³ï¼æç ´ PPO è®ç»ä¸çæé度ç¶é¢ï¼PPO è®ç»æ§è½å¤§å¹ é¢å ãéç¨åæ¯æ FastFFNãFusedQKV çå¤ä¸ªå¤§æ¨¡åè®ç»æ§è½ä¼åæ¹å¼ï¼å¤§æ¨¡åè®ç»æ´å¿«ãæ´ç¨³å®ã
ç¹æ§
ð§ å¤ç¡¬ä»¶è®æ¨ä¸ä½
æ¯æè±ä¼è¾¾ GPUãæä» XPUãæè ¾ NPUãç§å GCU åæµ·å DCU çå¤ä¸ªç¡¬ä»¶ç大模ååèªç¶è¯è¨ç解模åè®ç»åæ¨çï¼å¥ä»¶æ¥å£æ¯æ硬件快éåæ¢ï¼å¤§å¹ éä½ç¡¬ä»¶åæ¢ç åææ¬ã å½åæ¯æçèªç¶è¯è¨ç解模åï¼å¤ç¡¬ä»¶èªç¶è¯è¨ç解模åå表
ð é«ææç¨çé¢è®ç»
æ¯æ纯æ°æ®å¹¶è¡çç¥ãåç»åæ°åççæ°æ®å¹¶è¡çç¥ãå¼ é模å并è¡çç¥åæµæ°´çº¿æ¨¡å并è¡çç¥ç4D é«æ§è½è®ç»ï¼Trainer æ¯æåå¸å¼çç¥é ç½®åï¼éä½å¤æåå¸å¼ç»å带æ¥ç使ç¨ææ¬ï¼ Unified Checkpoint 大模ååå¨å·¥å ·å¯ä»¥ä½¿å¾è®ç»æç¹æ¯ææºå¨èµæºå¨ææ©ç¼©å®¹æ¢å¤ãæ¤å¤ï¼å¼æ¥ä¿åï¼æ¨¡ååå¨å¯å é95%ï¼Checkpoint å缩ï¼å¯èç78.5%åå¨ç©ºé´ã
ð¤ é«æç²¾è°
ç²¾è°ç®æ³æ·±åº¦ç»åé¶å¡«å æ°æ®æµå FlashMask é«æ§è½ç®åï¼éä½è®ç»æ ææ°æ®å¡«å å计ç®ï¼å¤§å¹ æåç²¾è°è®ç»ååã
ðï¸ æ æå缩åé«æ§è½æ¨ç
大模åå¥ä»¶é«æ§è½æ¨ç模åå ç½®å¨ææå ¥åå ¨ç¯èç®åèåçç¥ï¼æ大å 快并è¡æ¨çé度ãåºå±å®ç°ç»èå°è£ åï¼å®ç°å¼ç®±å³ç¨çé«æ§è½å¹¶è¡æ¨çè½åã
模åæ¯æ
- 模ååæ°å·²æ¯æ LLaMA ç³»åãBaichuan ç³»åãBloom ç³»åãChatGLM ç³»åãGemma ç³»åãMistral ç³»åãOPT ç³»åå Qwen ç³»åï¼è¯¦ç»å表ðãLLMã模ååæ°æ¯æå表å¦ä¸ï¼
模åç³»å | 模åå称 |
---|---|
LLaMA | facebook/llama-7b, facebook/llama-13b, facebook/llama-30b, facebook/llama-65b |
Llama2 | meta-llama/Llama-2-7b, meta-llama/Llama-2-7b-chat, meta-llama/Llama-2-13b, meta-llama/Llama-2-13b-chat, meta-llama/Llama-2-70b, meta-llama/Llama-2-70b-chat |
Llama3 | meta-llama/Meta-Llama-3-8B, meta-llama/Meta-Llama-3-8B-Instruct, meta-llama/Meta-Llama-3-70B, meta-llama/Meta-Llama-3-70B-Instruct |
Llama3.1 | meta-llama/Meta-Llama-3.1-8B, meta-llama/Meta-Llama-3.1-8B-Instruct, meta-llama/Meta-Llama-3.1-70B, meta-llama/Meta-Llama-3.1-70B-Instruct, meta-llama/Meta-Llama-3.1-405B, meta-llama/Meta-Llama-3.1-405B-Instruct, meta-llama/Llama-Guard-3-8B |
Llama3.2 | meta-llama/Llama-3.2-1B, meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.2-3B, meta-llama/Llama-3.2-3B-Instruct, meta-llama/Llama-Guard-3-1B |
Llama3.3 | meta-llama/Llama-3.3-70B-Instruct |
Baichuan | baichuan-inc/Baichuan-7B, baichuan-inc/Baichuan-13B-Base, baichuan-inc/Baichuan-13B-Chat |
Baichuan2 | baichuan-inc/Baichuan2-7B-Base, baichuan-inc/Baichuan2-7B-Chat, baichuan-inc/Baichuan2-13B-Base, baichuan-inc/Baichuan2-13B-Chat |
Bloom | bigscience/bloom-560m, bigscience/bloom-560m-bf16, bigscience/bloom-1b1, bigscience/bloom-3b, bigscience/bloom-7b1, bigscience/bloomz-560m, bigscience/bloomz-1b1, bigscience/bloomz-3b, bigscience/bloomz-7b1-mt, bigscience/bloomz-7b1-p3, bigscience/bloomz-7b1, bellegroup/belle-7b-2m |
ChatGLM | THUDM/chatglm-6b, THUDM/chatglm-6b-v1.1 |
ChatGLM2 | THUDM/chatglm2-6b |
ChatGLM3 | THUDM/chatglm3-6b |
DeepSeekV2 | deepseek-ai/DeepSeek-V2, deepseek-ai/DeepSeek-V2-Chat, deepseek-ai/DeepSeek-V2-Lite, deepseek-ai/DeepSeek-V2-Lite-Chat, deepseek-ai/DeepSeek-Coder-V2-Base, deepseek-ai/DeepSeek-Coder-V2-Instruct, deepseek-ai/DeepSeek-Coder-V2-Lite-Base, deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct |
Gemma | google/gemma-7b, google/gemma-7b-it, google/gemma-2b, google/gemma-2b-it |
Mistral | mistralai/Mistral-7B-Instruct-v0.3, mistralai/Mistral-7B-v0.1 |
Mixtral | mistralai/Mixtral-8x7B-Instruct-v0.1 |
OPT | facebook/opt-125m, facebook/opt-350m, facebook/opt-1.3b, facebook/opt-2.7b, facebook/opt-6.7b, facebook/opt-13b, facebook/opt-30b, facebook/opt-66b, facebook/opt-iml-1.3b, opt-iml-max-1.3b |
Qwen | qwen/qwen-7b, qwen/qwen-7b-chat, qwen/qwen-14b, qwen/qwen-14b-chat, qwen/qwen-72b, qwen/qwen-72b-chat, |
Qwen1.5 | Qwen/Qwen1.5-0.5B, Qwen/Qwen1.5-0.5B-Chat, Qwen/Qwen1.5-1.8B, Qwen/Qwen1.5-1.8B-Chat, Qwen/Qwen1.5-4B, Qwen/Qwen1.5-4B-Chat, Qwen/Qwen1.5-7B, Qwen/Qwen1.5-7B-Chat, Qwen/Qwen1.5-14B, Qwen/Qwen1.5-14B-Chat, Qwen/Qwen1.5-32B, Qwen/Qwen1.5-32B-Chat, Qwen/Qwen1.5-72B, Qwen/Qwen1.5-72B-Chat, Qwen/Qwen1.5-110B, Qwen/Qwen1.5-110B-Chat, Qwen/Qwen1.5-MoE-A2.7B, Qwen/Qwen1.5-MoE-A2.7B-Chat |
Qwen2 | Qwen/Qwen2-0.5B, Qwen/Qwen2-0.5B-Instruct, Qwen/Qwen2-1.5B, Qwen/Qwen2-1.5B-Instruct, Qwen/Qwen2-7B, Qwen/Qwen2-7B-Instruct, Qwen/Qwen2-72B, Qwen/Qwen2-72B-Instruct, Qwen/Qwen2-57B-A14B, Qwen/Qwen2-57B-A14B-Instruct |
Qwen2-Math | Qwen/Qwen2-Math-1.5B, Qwen/Qwen2-Math-1.5B-Instruct, Qwen/Qwen2-Math-7B, Qwen/Qwen2-Math-7B-Instruct, Qwen/Qwen2-Math-72B, Qwen/Qwen2-Math-72B-Instruct, Qwen/Qwen2-Math-RM-72B |
Qwen2.5 | Qwen/Qwen2.5-0.5B, Qwen/Qwen2.5-0.5B-Instruct, Qwen/Qwen2.5-1.5B, Qwen/Qwen2.5-1.5B-Instruct, Qwen/Qwen2.5-3B, Qwen/Qwen2.5-3B-Instruct, Qwen/Qwen2.5-7B, Qwen/Qwen2.5-7B-Instruct, Qwen/Qwen2.5-14B, Qwen/Qwen2.5-14B-Instruct, Qwen/Qwen2.5-32B, Qwen/Qwen2.5-32B-Instruct, Qwen/Qwen2.5-72B, Qwen/Qwen2.5-72B-Instruct |
Qwen2.5-Math | Qwen/Qwen2.5-Math-1.5B, Qwen/Qwen2.5-Math-1.5B-Instruct, Qwen/Qwen2.5-Math-7B, Qwen/Qwen2.5-Math-7B-Instruct, Qwen/Qwen2.5-Math-72B, Qwen/Qwen2.5-Math-72B-Instruct, Qwen/Qwen2.5-Math-RM-72B |
Qwen2.5-Coder | Qwen/Qwen2.5-Coder-1.5B, Qwen/Qwen2.5-Coder-1.5B-Instruct, Qwen/Qwen2.5-Coder-7B, Qwen/Qwen2.5-Coder-7B-Instruct |
Yuan2 | IEITYuan/Yuan2-2B, IEITYuan/Yuan2-51B, IEITYuan/Yuan2-102B |
- 4D 并è¡åç®åä¼åå·²æ¯æ LLaMA ç³»åãBaichuan ç³»åãBloom ç³»åãChatGLM ç³»åãGemma ç³»åãMistral ç³»åãOPT ç³»åå Qwen ç³»åï¼ãLLMã模å4D 并è¡åç®åæ¯æå表å¦ä¸ï¼
模åå称/并è¡è½åæ¯æ | æ°æ®å¹¶è¡ | å¼ é模åå¹¶è¡ | åæ°åçå¹¶è¡ | æµæ°´çº¿å¹¶è¡ | |||
---|---|---|---|---|---|---|---|
åºç¡è½å | åºåå¹¶è¡ | stage1 | stage2 | stage3 | |||
Llama | â | â | â | â | â | â | â |
Qwen | â | â | â | â | â | â | â |
Qwen1.5 | â | â | â | â | â | â | â |
Qwen2 | â | â | â | â | â | â | â |
Mixtral(moe) | â | â | â | â | â | â | ð§ |
Mistral | â | â | ð§ | â | â | â | ð§ |
Baichuan | â | â | â | â | â | â | â |
Baichuan2 | â | â | â | â | â | â | â |
ChatGLM | â | â | ð§ | â | â | â | ð§ |
ChatGLM2 | â | ð§ | ð§ | â | â | â | ð§ |
ChatGLM3 | â | ð§ | ð§ | â | â | â | ð§ |
Bloom | â | â | ð§ | â | â | â | ð§ |
GPT-2/GPT-3 | â | â | â | â | â | â | â |
OPT | â | â | ð§ | â | â | â | ð§ |
Gemma | â | â | â | â | â | â | â |
Yuan2 | â | â | â | â | â | â | ð§ |
- 大模åé¢è®ç»ãç²¾è°ï¼å å« SFTãPEFT ææ¯ï¼ã对é½ãéåå·²æ¯æ LLaMA ç³»åãBaichuan ç³»åãBloom ç³»åãChatGLM ç³»åãMistral ç³»åãOPT ç³»åå Qwen ç³»åï¼ãLLMã模åé¢è®ç»ãç²¾è°ã对é½ãéåæ¯æå表å¦ä¸ï¼
Model | Pretrain | SFT | LoRA | FlashMask | Prefix Tuning | DPO/SimPO/ORPO | RLHF | Mergekit | Quantization |
---|---|---|---|---|---|---|---|---|---|
Llama | â | â | â | â | â | â | â | â | â |
Qwen | â | â | â | â | â | â | ð§ | â | ð§ |
Mixtral | â | â | â | ð§ | ð§ | â | ð§ | â | ð§ |
Mistral | â | â | â | ð§ | â | â | ð§ | â | ð§ |
Baichuan/Baichuan2 | â | â | â | â | â | â | ð§ | â | â |
ChatGLM-6B | â | â | â | ð§ | â | ð§ | ð§ | â | â |
ChatGLM2/ChatGLM3 | â | â | â | ð§ | â | â | ð§ | â | â |
Bloom | â | â | â | ð§ | â | ð§ | ð§ | â | â |
GPT-3 | â | â | ð§ | ð§ | ð§ | ð§ | ð§ | â | ð§ |
OPT | â | â | â | ð§ | ð§ | ð§ | ð§ | â | ð§ |
Gemma | â | â | â | ð§ | ð§ | â | ð§ | â | ð§ |
Yuan | â | â | â | ð§ | ð§ | â | ð§ | â | ð§ |
- 大模åæ¨çå·²æ¯æ LLaMA ç³»åãQwen ç³»åãMistral ç³»åãChatGLM ç³»åãBloom ç³»åå Baichuan ç³»åï¼æ¯æ Weight Only INT8å INT4æ¨çï¼æ¯æ WACï¼æéãæ¿æ´»ãCache KVï¼è¿è¡ INT8ãFP8éåçæ¨çï¼ãLLMã模åæ¨çæ¯æå表å¦ä¸ï¼
模åå称/éåç±»åæ¯æ | FP16/BF16 | WINT8 | WINT4 | INT8-A8W8 | FP8-A8W8 | INT8-A8W8C8 |
---|---|---|---|---|---|---|
LLaMA | â | â | â | â | â | â |
Qwen | â | â | â | â | â | â |
Qwen-Moe | â | â | â | ð§ | ð§ | ð§ |
Mixtral | â | â | â | ð§ | ð§ | ð§ |
ChatGLM | â | â | â | ð§ | ð§ | ð§ |
Bloom | â | â | â | ð§ | ð§ | ð§ |
BaiChuan | â | â | â | â | â | ð§ |
å®è£
ç¯å¢ä¾èµ
- python >= 3.8
- paddlepaddle >= 3.0.0b0
å¦ææ¨å°æªå®è£ PaddlePaddleï¼è¯·åè é£æ¡¨å®ç½ è¿è¡å®è£ ã
pip å®è£
pip install --upgrade paddlenlp==3.0.0b3
æè å¯éè¿ä»¥ä¸å½ä»¤å®è£ ææ° develop åæ¯ä»£ç ï¼
pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html
æ´å¤å ³äº PaddlePaddle å PaddleNLP å®è£ ç详ç»æç¨è¯·æ¥çInstallationã
å¿«éå¼å§
大模åææ¬çæ
PaddleNLP æä¾äºæ¹ä¾¿æç¨ç Auto APIï¼è½å¤å¿«éçå 载模åå Tokenizerãè¿éä»¥ä½¿ç¨ Qwen/Qwen2-0.5B
模ååææ¬çæ为ä¾ï¼
>>> from paddlenlp.transformers import AutoTokenizer, AutoModelForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B")
>>> model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B", dtype="float16")
>>> input_features = tokenizer("ä½ å¥½ï¼è¯·èªæä»ç»ä¸ä¸ã", return_tensors="pd")
>>> outputs = model.generate(**input_features, max_length=128)
>>> print(tokenizer.batch_decode(outputs[0], skip_special_tokens=True))
['ææ¯ä¸ä¸ªAIè¯è¨æ¨¡åï¼æå¯ä»¥åçåç§é®é¢ï¼å
æ¬ä½ä¸éäºï¼å¤©æ°ãæ°é»ãåå²ãæåãç§å¦ãæè²ã娱ä¹çã请é®æ¨æä»ä¹éè¦äºè§£çåï¼']
大模åé¢è®ç»
git clone https://github.com/PaddlePaddle/PaddleNLP.git && cd PaddleNLP # å¦å·²cloneæä¸è½½PaddleNLPå¯è·³è¿
mkdir -p llm/data && cd llm/data
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k.bin
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k.idx
cd .. # change folder to PaddleNLP/llm
python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" run_pretrain.py ./config/llama/pretrain_argument.json
大模å SFT ç²¾è°
git clone https://github.com/PaddlePaddle/PaddleNLP.git && cd PaddleNLP # å¦å·²cloneæä¸è½½PaddleNLPå¯è·³è¿
mkdir -p llm/data && cd llm/data
wget https://bj.bcebos.com/paddlenlp/datasets/examples/AdvertiseGen.tar.gz && tar -zxvf AdvertiseGen.tar.gz
cd .. # change folder to PaddleNLP/llm
python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" run_finetune.py ./config/llama/sft_argument.json
æ´å¤å¤§æ¨¡åå ¨æµç¨æ¥éª¤ï¼è¯·åèé£æ¡¨å¤§æ¨¡åå¥ä»¶ä»ç»ã å¦å¤æ们è¿æä¾äºå¿«éå¾®è°æ¹å¼, æ é clone æºä»£ç ï¼
from paddlenlp.trl import SFTConfig, SFTTrainer
from datasets import load_dataset
dataset = load_dataset("ZHUI/alpaca_demo", split="train")
training_args = SFTConfig(output_dir="Qwen/Qwen2.5-0.5B-SFT", device="gpu")
trainer = SFTTrainer(
args=training_args,
model="Qwen/Qwen2.5-0.5B",
train_dataset=dataset,
)
trainer.train()
æ´å¤ PaddleNLP å 容å¯åèï¼
- ç²¾é模ååºï¼å å«ä¼è´¨é¢è®ç»æ¨¡åç端å°ç«¯å ¨æµç¨ä½¿ç¨ã
- å¤åºæ¯ç¤ºä¾ï¼äºè§£å¦ä½ä½¿ç¨ PaddleNLP è§£å³ NLP å¤ç§ææ¯é®é¢ï¼å å«åºç¡ææ¯ãç³»ç»åºç¨ä¸æå±åºç¨ã
- 交äºå¼æç¨ï¼å¨ðå è´¹ç®åå¹³å° AI Studio ä¸å¿«éå¦ä¹ PaddleNLPã
社åºäº¤æµ
- 微信æ«æäºç»´ç 并填åé®å·ï¼å³å¯å å ¥äº¤æµç¾¤ä¸ä¼å¤ç¤¾åºå¼åè 以åå®æ¹å¢é深度交æµ.
Citation
å¦æ PaddleNLP 对æ¨çç 究æ帮å©ï¼æ¬¢è¿å¼ç¨
@misc{=paddlenlp,
title={PaddleNLP: An Easy-to-use and High Performance NLP Library},
author={PaddleNLP Contributors},
howpublished = {\url{https://github.com/PaddlePaddle/PaddleNLP}},
year={2021}
}
Acknowledge
æ们åé´äº Hugging Face çTransformersð¤å ³äºé¢è®ç»æ¨¡å使ç¨çä¼ç§è®¾è®¡ï¼å¨æ¤å¯¹ Hugging Face ä½è åå ¶å¼æºç¤¾åºè¡¨ç¤ºæè°¢ã
License
PaddleNLP éµå¾ªApache-2.0å¼æºåè®®ã
Top Related Projects
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
An open-source NLP research library, built on PyTorch.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot