Top Related Projects
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
An open-source NLP research library, built on PyTorch.
TensorFlow code and pre-trained models for BERT
Quick Overview
CPM-Bee is an open-source large language model (LLM) developed by the OpenBMB team. It is designed to be a powerful, efficient, and versatile model for various natural language processing tasks, with a focus on Chinese language capabilities.
Pros
- Specialized in Chinese language processing while maintaining multilingual capabilities
- Open-source and freely available for research and commercial use
- Efficient performance with relatively small model size (10B parameters)
- Supports a wide range of NLP tasks, including text generation, summarization, and question-answering
Cons
- May not perform as well as larger models (e.g., GPT-3) on certain complex tasks
- Limited documentation and community support compared to more established LLMs
- Potential biases in training data, as with most large language models
- Requires significant computational resources for fine-tuning and deployment
Code Examples
# Example 1: Text Generation
from cpm_bee import CPMBee
model = CPMBee.from_pretrained("openbmb/cpm-bee-10b")
generated_text = model.generate("Once upon a time, in a land far away,", max_length=100)
print(generated_text)
# Example 2: Question Answering
question = "What is the capital of France?"
context = "France is a country in Western Europe. Its capital city is Paris, known for the Eiffel Tower."
answer = model.answer_question(question, context)
print(answer)
# Example 3: Text Summarization
long_text = "..." # A long article or document
summary = model.summarize(long_text, max_length=150)
print(summary)
Getting Started
To get started with CPM-Bee, follow these steps:
- Install the library:
pip install cpm-bee
- Import and use the model:
from cpm_bee import CPMBee
model = CPMBee.from_pretrained("openbmb/cpm-bee-10b")
result = model.generate("Hello, world!")
print(result)
Note: Ensure you have sufficient GPU resources to run the model efficiently.
Competitor Comparisons
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Pros of petals
- Focuses on distributed inference of large language models
- Supports multiple models including BLOOM and OPT
- Offers a user-friendly API for easy integration
Cons of petals
- Limited to specific pre-trained models
- May require more setup and configuration for distributed computing
- Less flexibility in terms of model customization
Code comparison
CPM-Bee:
from cpm_live import CPMBee
model = CPMBee("path/to/model")
response = model.generate("Hello, how are you?")
print(response)
petals:
import petals
model = petals.AutoDistributedModelForCausalLM.from_pretrained("bigscience/bloom")
inputs = model.tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs)
print(model.tokenizer.decode(outputs[0]))
Both repositories provide APIs for working with large language models, but CPM-Bee focuses on a specific model (CPM-Bee) while petals supports multiple distributed models. CPM-Bee offers a simpler interface for basic text generation, while petals provides more flexibility in terms of model selection and distributed computing capabilities.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Pros of gpt-neox
- More extensive documentation and examples
- Larger community and more active development
- Better support for distributed training across multiple GPUs/nodes
Cons of gpt-neox
- Higher computational requirements
- More complex setup and configuration process
- Steeper learning curve for beginners
Code Comparison
CPM-Bee:
from cpm_live import CPMBee
model = CPMBee("path/to/model")
response = model.generate("Hello, how are you?")
print(response)
gpt-neox:
from gpt_neox import GPTNeoX
model = GPTNeoX.from_pretrained("EleutherAI/gpt-neox-20b")
input_ids = tokenizer.encode("Hello, how are you?", return_tensors="pt")
output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(output[0]))
Both repositories provide implementations of large language models, but gpt-neox offers more flexibility and scalability for advanced users, while CPM-Bee focuses on simplicity and ease of use for quick deployment and experimentation.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- Highly optimized for large-scale distributed training
- Extensive documentation and tutorials
- Supports a wide range of AI models and frameworks
Cons of DeepSpeed
- Steeper learning curve for beginners
- Primarily focused on training, less emphasis on inference
- May be overkill for smaller projects or single-GPU setups
Code Comparison
DeepSpeed:
import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(
args=args,
model=model,
model_parameters=params
)
CPM-Bee:
from cpm_bee import CPMBee
model = CPMBee.from_pretrained("path/to/model")
output = model.generate("Input text", max_length=50)
Key Differences
- DeepSpeed focuses on efficient training of large models, while CPM-Bee is primarily for inference with pre-trained models
- DeepSpeed offers more advanced optimization techniques, whereas CPM-Bee provides a simpler API for quick deployment
- DeepSpeed has broader framework support, while CPM-Bee is specifically designed for CPM-based models
Use Cases
- DeepSpeed: Large-scale model training, distributed computing environments
- CPM-Bee: Rapid prototyping, inference tasks, applications requiring Chinese language understanding
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of transformers
- Extensive model support: Covers a wide range of transformer-based models
- Active community: Regular updates and contributions from a large user base
- Comprehensive documentation: Detailed guides and examples for various tasks
Cons of transformers
- Complexity: Can be overwhelming for beginners due to its extensive features
- Resource-intensive: Some models require significant computational resources
Code comparison
transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs)
CPM-Bee:
from cpm_live import CPMBeeModel
model = CPMBeeModel.from_pretrained("cpm-bee-10b")
response = model.generate("Hello, how are you?")
print(response)
The transformers library offers a more flexible approach, allowing for easy switching between different models and tasks. CPM-Bee provides a simpler interface specifically tailored for the CPM-Bee model, which may be more straightforward for users focused on this particular model.
An open-source NLP research library, built on PyTorch.
Pros of AllenNLP
- Comprehensive documentation and tutorials
- Extensive pre-built models and datasets
- Strong community support and regular updates
Cons of AllenNLP
- Steeper learning curve for beginners
- More complex setup and configuration
Code Comparison
AllenNLP:
from allennlp.predictors import Predictor
predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz")
result = predictor.predict(sentence="Did Uriah honestly think he could beat the game in under three hours?")
CPM-Bee:
from cpm_live import CPMBee
model = CPMBee("path/to/model")
response = model.generate("What is the capital of France?", max_tokens=50)
print(response)
AllenNLP offers a more structured approach with pre-built models and predictors, while CPM-Bee provides a simpler interface for text generation. AllenNLP is better suited for complex NLP tasks and research, whereas CPM-Bee focuses on ease of use for general language generation.
TensorFlow code and pre-trained models for BERT
Pros of BERT
- Widely adopted and well-documented, with extensive research and community support
- Pre-trained models available for various languages and tasks
- Proven effectiveness in numerous NLP applications
Cons of BERT
- Larger model size and higher computational requirements
- Limited to a maximum sequence length of 512 tokens
- Less flexible for fine-tuning on specific downstream tasks
Code Comparison
BERT:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
CPM-Bee:
from openbmb import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("openbmb/cpm-bee-10b")
model = AutoModel.from_pretrained("openbmb/cpm-bee-10b")
Key Differences
- CPM-Bee is a more recent model with potential improvements in Chinese language understanding
- BERT has a larger ecosystem of pre-trained models and fine-tuning examples
- CPM-Bee may offer better performance on specific Chinese NLP tasks
- BERT is more suitable for general-purpose NLP applications across multiple languages
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
CPM-Bee
ç¾äº¿åæ°çå¼æºä¸è±æåè¯åºåº§å¤§æ¨¡å
模å ⢠OpenBMBä½ç³» ⢠æ§è½è¡¨ç° ⢠å¼æºåè®®
⨠模åä»ç»
CPM-Beeæ¯ä¸ä¸ªå®å ¨å¼æºãå 许åç¨çç¾äº¿åæ°ä¸è±æåºåº§æ¨¡åï¼ä¹æ¯CPM-Liveè®ç»ç第äºä¸ªéç¨ç¢ãå®éç¨Transformerèªåå½æ¶æï¼auto-regressiveï¼ï¼å¨è¶ ä¸äº¿ï¼trillionï¼é«è´¨éè¯æä¸è¿è¡é¢è®ç»ï¼æ¥æ强大çåºç¡è½åãå¼åè åç 究è å¯ä»¥å¨CPM-Beeåºåº§æ¨¡åçåºç¡ä¸å¨åç±»åºæ¯è¿è¡éé æ¥ä»¥å建ç¹å®é¢åçåºç¨æ¨¡åã
-
**ð å¼æºå¯åç¨**ï¼OpenBMBå§ç»ç§æ¿â让大模åé£å ¥å家ä¸æ·âçå¼æºç²¾ç¥ï¼CPM-Beeåºåº§æ¨¡åå°å®å ¨å¼æºå¹¶ä¸å¯åç¨ï¼ä»¥æ¨å¨å¤§æ¨¡åé¢åçåå±ãæ们é¼å±å ¨çèå´å çç§ç æºæãä¼ä¸å个人å¼åè å¨éµå®å¼æºè®¸å¯åè®®çåæä¸ï¼èªç±å°å¨CPM-Beeåºåº§æ¨¡åä¸è¿è¡åæ°ã
-
ð« ä¸è±åè¯æ§è½ä¼å¼ï¼CPM-Beeåºåº§æ¨¡åå¨é¢è®ç»è¯æä¸è¿è¡äºä¸¥æ ¼ççéåé æ¯ï¼åæ¶å¨ä¸è±åè¯ä¸å ·æ亮ç¼è¡¨ç°ï¼å ·ä½å¯åè§è¯æµä»»å¡åç»æã
-
ð è¶ å¤§è§æ¨¡é«è´¨éè¯æï¼CPM-Beeåºåº§æ¨¡åå¨è¶ ä¸äº¿è¯æè¿è¡è®ç»ï¼æ¯å¼æºç¤¾åºå ç»è¿è¯ææå¤ç模åä¹ä¸ãåæ¶ï¼æ们对é¢è®ç»è¯æè¿è¡äºä¸¥æ ¼ççéãæ¸ æ´ååå¤ç以确ä¿è´¨éã
-
OpenBMB大模åç³»ç»çææ¯æï¼OpenBMB大模åç³»ç»å´ç»é«æ§è½é¢è®ç»ãéé ãå缩ãæ¨çå¼åäºä¸ç³»åå·¥å ·ï¼CPM-Beeåºåº§æ¨¡åå°é å¥ææçå·¥å ·èæ¬ï¼é«ææ¯æå¼åè è¿è¡è¿é¶ä½¿ç¨ã
-
ð¨ 对è¯åå·¥å ·ä½¿ç¨è½åï¼ ç»åOpenBMBå¨æ令微è°åå·¥å ·å¦ä¹ çæ¢ç´¢ï¼æ们å¨CPM-Beeåºåº§æ¨¡åçåºç¡ä¸è¿è¡å¾®è°ï¼è®ç»åºäºå ·æ强大对è¯åå·¥å ·ä½¿ç¨è½åçå®ä¾æ¨¡åï¼APIåå æµå°äºè¿æå¼æ¾ã
Read this in English.
说æï¼CPM-Beeæ¯ä¸ä¸ªåºåº§æ¨¡åï¼å³ä»é¶å¼å§éè¿é¢è®ç»å¾æ¥ãæ们é¼å±ç¨æ·å¨èªå·±çåºæ¯åæ°æ®ä¸éé /å¾®è°/对é½ååè¿è¡ä½¿ç¨ãä¾å¦ï¼WebCPM 以CPM-Bee为åºåº§ï¼å¨äººç±»ç½ç»æ£ç´¢çåºååæ°æ®ä¸è¿è¡éé ï¼è·å¾äºå¤æé®çåä¸ç½æ£ç´¢çè½åãåç»æ们å°ä¼åå¸æ´å¤å¨CPM-Beeåºåº§æ¨¡ååºç¡ä¸éé ç模åã
ð° æ´æ°ä¿¡æ¯
- [2023/06/30] åºäºCPM-Beeçå¤æ¨¡æç³»å模åVisCPMåå¸ï¼æ¯æå¤æ¨¡æ对è¯åæçå¾ï¼
- [2023/06/16] CPM-Beeç°å·²æ¯æð¤Transformersã
- [2023/06/08] æ´æ°äºä½¿ç¨CPM-Beeè¿è¡åºç¡ä»»å¡å¾®è°çæç¨ã
- [2023/05/27] ç¾äº¿åæ°ï¼å 许åç¨çä¸è±åè¯åºåº§æ¨¡åCPM-Beeå¼æºäºï¼å®æ¯CPM-Liveç第äºä¸ªéç¨ç¢ã
ð¯ CPM-Beeç³»å模å
模å | æè¿° |
---|---|
VisCPM | æ¯æå¤æ¨¡æ对è¯åå¾æååçæçå¼æºä¸è±åè¯å¤æ¨¡æ大模å |
WebCPM | æ¯æå¤æé®çåä¸ç½æ£ç´¢çå¼æºä¸æ大模å |
ð å®è£ å使ç¨
æ¨éè¦å é该ä»åºï¼
$ git clone -b main --single-branch https://github.com/OpenBMB/CPM-Bee.git
并确ä¿æ¨çç¯å¢ç¬¦åè¦æ±ï¼
- python>=3.7
- torch>=1.10,<2.0.0
æ们建议使ç¨Anaconda管çç¯å¢å¹¶ä»PyPIå®è£ å ¶ä»ä¾èµé¡¹ï¼
$ cd src
$ pip install -r requirements.txt
注æ**torchçæ¬éä¸CUDAçæ¬å¯¹åºï¼ä¸ç¶ä¼å¼èµ·å®è£ é误**ï¼å°¤å ¶æ¯torchä¹æ¯éè¿pip install -r requirements.txtè¿è¡å®è£ æ¶ï¼è¾ä¸ºå®¹æåºç°èªå¨æåå®è£ çtorchçæ¬ä¸æ¬å°CUDAçæ¬ä¸å¯¹åºï¼å¯¼è´BMTrainæ æ³å®è£ ã
模å
- 10B模åä¸è½½é¾æ¥ï¼å¦æè¦ä½¿ç¨ð¤Transformersè¿è¡æ¨¡åï¼è¯·åèè¿éï¼ã
æ°æ®æ ¼å¼
- ä¸åäºå·²æåºåº§æ¨¡åéç¨éç»æåçèªç±ææ¬å½¢å¼ç»ç»æ°æ®ï¼CPM-Beeéç¨ç»æåçjsonæ ¼å¼æ¥ç»ç»æ°æ®ã对äºç»æåæ°æ®ï¼CPM-Beeçåºåº§æ¨¡åå¯ä»¥åç¡®å°è¿è¡è¯ä¹ç解ï¼é«æå®æåç±»åºç¡ä»»å¡ï¼å æ¬ï¼å¡«ç©ºãææ¬çæãç¿»è¯ãé®çãè¯åé¢æµãææ¬éæ©é¢ççï¼ä¸é¢ç»åºä¸äºä»£è¡¨æ§ä»»å¡ç模æ¿ï¼
"填空":{
"input": "å¿çå¦é¢åçç 究人ååç°ï¼ååºéè¦å³å®çæ好æ¹æ³ä¹ä¸ï¼æ¯å¦éæ©ä¸æ大å¦æ<mask_0>ï¼é½æ¶åå°ä½¿ç¨å³çå·¥ä½è¡¨ãç 究ä¼åçå¿çå¦å®¶å°<mask_1>ä¸ç论çæ³å³çè¿è¡æ¯è¾ï¼ççå®ä»¬æå¤ç¸ä¼¼ãå·¥ä½è¡¨ç¨åºçæ¯æè
认为å®ä¼äº§çæä¼çï¼ä¹å°±æ¯è¯´ï¼æ好çå³çãè½ç¶æ<mask_2>å¯ä»¥æ¥åï¼ä½å®ä»¬å¨æ¬è´¨ä¸é½æ¯ç¸ä¼¼çã",
"<ans>":{
"<mask_0>":"",
"<mask_1>":"",
"<mask_2>":""
}
}
"ææ¬çæ": {
"input": "ä»å¤©å¤©æ°å¾å¥½ï¼æåå¦å¦ä¸èµ·å»å
¬åï¼",
"prompt": "å¾åå约100å",
"<ans>": ""
}
"ç¿»è¯": {
"input": "å京æ¯ä¸å½çé¦é½",
"prompt": "ä¸ç¿»è±",
"<ans>": ""
}
"é®ç": {
"input": "NGC 6231æ¯ä¸ä¸ªä½äºå¤©è座ççæ£æå¢ï¼å¤©ç座æ 为赤ç»16æ¶54åï¼èµ¤çº¬-41度48åï¼è§è§è§æµå¤§å°çº¦45è§åï¼äº®åº¦çº¦2.6è§æçï¼è·å°ç5900å
å¹´ãNGC 6231å¹´é¾çº¦ä¸ºä¸ç¾äºåä¸å¹´ï¼æ¯ä¸ä¸ªé常年轻çæå¢ï¼æå¢å
çæ亮ææ¯5çç天è座 ζ1æãç¨åçæè¿éæå°åæè¿éå°±è½çå°ä¸ªå«çè¡æãNGC 6231å¨1654年被æ大å©å¤©æå¦å®¶ä¹ç¦å°¼Â·å·´èæ¯ç¹Â·é迪å°çº³ï¼Giovanni Battista Hodiernaï¼ä»¥Luminosaeçååé¦æ¬¡çºªå½å¨æ表ä¸ï¼ä½æ¯æªè§è®°è½½äºå¤å°Â·æ¢
西è¶ç天ä½å表åå¨å»Â·èµ«æå°ç深空天ä½ç®å½ãè¿ä¸ªå¤©ä½å¨1678年被ç±å¾·è·åé·ï¼I.7ï¼ã1745年被å¤è¥¿äºç§æ¯ï¼Jean-Phillippe Loys de Cheseauxï¼ï¼9ï¼ã1751年被尼å¯æ·路æ·æå¡ä¼ï¼II.13ï¼åå«å次ç¬ç«åç°ã",
"question": "NGC 6231çç»çº¬åº¦æ¯å¤å°ï¼",
"<ans>": ""
}
"è¯åé¢æµ": {
"input":"ä¹åå¤æ¬¡èé¤é½éæ©è¿éï¼æåç§å¤§å°çå
æ¿åæ¶è½å®¹çº³å¾å¤äººï¼ç¯å¢å¥½æç¹è²è¿æ表æ¼ï¼æ´ä½èé¤æ°å´ä¸ä¸è¢«å¸¦å¨èµ·æ¥ãç°å¨ç±äºçç«æ¹æäºçµç¤ç¾ï¼å£æççä¸å¦ä»åï¼ä¸è¿å
¶ä»èåé½è¿æ¯ä¸éï¼ç¤ç¾å©ä¸çæ骨èæåè¿è½åå å·¥ä¸ä¸æ¤ççä¹å¾å¥½åã",
"question":"è¯åæ¯å¤å°ï¼(1-5)",
"<ans>":""
}
"éæ©é¢": {
"input": "ç¶æ¯é½å¸æèªå·±çå©åè¯å®ãåæ¢ãæ礼è²ãè¦æ³è®©å©åæ为è¿æ ·ç人ï¼ç¶æ¯é¦å
å¾ä»èªå·±åèµ·ï¼è¦æ¯è¿èªå·±é½åä¸å°ï¼åæè½è¦æ±å©ååå°å¢ï¼",
"options": {
"<option_0>": "å°æè¦æ±",
"<option_1>": "éä½æ å",
"<option_2>": "èªå·±å
å好",
"<option_3>": "让å©åæ¿ä¸»æ"
},
"question": "æè²å©åæ¶ï¼ç¶æ¯åºè¯¥ï¼",
"<ans>": ""
}
- 注æå¨æ¨¡åæ¨çæ¶å¯éç¨ä¸è¿°æ¨¡æ¿ï¼å¨æ¨¡åè®ç»æ¶éå¨
ä¸""å¤å¡«ä¸æ åçæ¡ï¼å¦ï¼
{
"input": "å京æ¯ä¸å½çé¦é½",
"prompt": "ä¸ç¿»è±",
"<ans>": "Beijing is the capital of China"
}
{
"input": "ç¶æ¯é½å¸æèªå·±çå©åè¯å®ãåæ¢ãæ礼è²ãè¦æ³è®©å©åæ为è¿æ ·ç人ï¼ç¶æ¯é¦å
å¾ä»èªå·±åèµ·ï¼è¦æ¯è¿èªå·±é½åä¸å°ï¼åæè½è¦æ±å©ååå°å¢ï¼",
"options": {
"<option_0>": "å°æè¦æ±",
"<option_1>": "éä½æ å",
"<option_2>": "èªå·±å
å好",
"<option_3>": "让å©åæ¿ä¸»æ"
},
"question": "æè²å©åæ¶ï¼ç¶æ¯åºè¯¥ï¼",
"<ans>": "<option_2>"
}
- CPM-Beeå¨é¢è®ç»é¶æ®µæ³¨å
¥äºä¸äºjsonæ ¼å¼ï¼å¯ä»¥ç´æ¥ä½¿ç¨ï¼ä¹æ¯æç¨æ·èªå·±è®¾è®¡jsonæ ¼å¼ç¶åå¾®è°æ¨¡åãææçjsonæ ¼å¼éè¦æ»¡è¶³ä¸åæ¡ä»¶ï¼
- è¾åºå
容**å¿
é¡»**使ç¨
ä½ä¸ºé®å¼æ¥ç»ç»ï¼ - éæ©é¢çé项建议使ç¨<option_xx>æ¥ç»ç»ï¼ä¸xx为æ°åï¼
- 填空é¢ç空ç½å»ºè®®ä½¿ç¨<mask_xx>æ¥ç»ç»ï¼ä¸xx为æ°åï¼
- å 为"<"å¨CPM-Beeä¸ä¼ä½ä¸ºè¯å«
ã<option_xx>ã<mask_xx>ç触å符ï¼æ以å¨æ°æ®ä¸æä¸**å¿ é¡»**å°"<"转å为"<<"è¿è¡è½¬ä¹ï¼ä¾å¦å¨ä¸é¢çä¾åä¸"1 < 2"ã"10 < 8"被转å为"1 << 2"ã"10 << 8"ï¼
- è¾åºå
容**å¿
é¡»**使ç¨
{
"question": "ä¸é¢åªé¡¹æ¯æ£ç¡®ç",
"options": {
"<option_0>": "1 << 2",
"<option_1>": "10 << 8",
},
"<ans>": "<option_0>"
}
模åé¢è®ç»
-
æ°æ®æ¸ æ´
- éè¦å°æ¯ä¸ªæ ·æ¬æ¾ç½®ä¸ºä¸è¡ï¼æ¢è¡è¿è¡è½¬ä¹å为\nï¼æ ¼å¼å¯ä¸ºtxtä¹å¯ä¸ºjsonï¼ä¾å¦ï¼
- txtæ ¼å¼
... ... How can cross training benefit groups like runners, swimmers, or weightlifters?\n\n1. Reduces the risk of injury...\n\n2. Improves overall fitness... Are there any particular physical benefits to mindful walking, such as improved posture or increased physical fitness?\n\n1. Choose a quiet and peaceful environment...\n\n2. Start by tuning into your breath and becoming aware of your surroundings... ... ...
- jsonæ ¼å¼
... ... {"template": "Does the answer correctly answer the question", "sentence": "Unicode has the explicit aim of transcending ...", "question": "What is the aim of Unicode?", "options": {"<option_0>": "no", "<option_1>": "yes"}, "<ans>": "<option_1>"} ... ...
- æ¡ä¾ï¼æ们æä¾äºwiki(txtæ ¼å¼ï¼çº¯ææ¬)åflan(jsonæ ¼å¼ï¼éæ©é¢)çæ ·ä¾ï¼å¯ä»¥ä¸è½½åæä¸åæ件路å¾ä¸çraw_dataè¿è¡æ件ç»ç»ï¼å®æåç»æ¥éª¤çå°è¯ã
-
CPMBee/ âââ src | âââ ... âââ raw_dataï¼åå§æ°æ®ä½ç½®ï¼ âââ wiki | âââ raw.txtï¼txtåå§æ°æ®ï¼ âââ flan âââ raw.jsonï¼jsonåå§æ°æ®ï¼
- éè¦å°æ¯ä¸ªæ ·æ¬æ¾ç½®ä¸ºä¸è¡ï¼æ¢è¡è¿è¡è½¬ä¹å为\nï¼æ ¼å¼å¯ä¸ºtxtä¹å¯ä¸ºjsonï¼ä¾å¦ï¼
-
æ°æ®éçæ
- CPMBee为äºé«æ读åæ°æ®ä»¥åå¨åå¸å¼æ件系ç»ä¸è¿è¡æ°æ®éé¨ç½²ï¼éè¦å°å
¶è½¬åæäºè¿å¶æ件ï¼å
·ä½è°ç¨srcä¸çbuild_dataset.pyï¼å
·ä½åæ°å
æ¬ï¼
- --input-path: å¯¼å ¥çåå§æ°æ®è·¯å¾ï¼ç¨åºä¼å°è·¯å¾ä¸çæ件ç»ä¸æå è¿è¡å¤ç
- --output-path: 导åºçæ°æ®éè·¯å¾
- --output-name: 导åºçæ°æ®éå称
- --data-type: txt/json
- --min-length: å°äºæå°é¿åº¦çæ°æ®å°è¢«æå¼
- --max-length: è¶ è¿æ大é¿åº¦çæ°æ®å°è¢«åå
- txtæ ¼å¼çåå§æ°æ®å°æç §min-lengthåmax-lengthè¿è¡ååï¼ç¶åç»ä¸ä»¥{'text':'......'}çjsonæ ¼å¼å¯¼åºå°æ°æ®é
- 导åºçæ°æ®éå°æ两个æ件ï¼ä¸ä¸ªå为output-nameçäºè¿å¶æ件ï¼ä¸ä¸ªmeta.binæ件ï¼meta.binæ件ä¸è®°å½äºoutput-nameçå
ä¿¡æ¯ï¼å
æ¬ï¼
- "file_name": meta.bin对åºçæ件åï¼ä¸è¬å°±æ¯output-name
- "block_begin": æ°æ®éæååå¸åå¨ï¼æ°æ®éæå¨çå¼å§åï¼ä¸è¬æ¯0
- "block_end": æ°æ®éæååå¸åå¨ï¼æ°æ®éæå¨çç»æåï¼ä¸è¬æ¯æ»åæ°
- "nbytes": 60221163, æ»çæ°æ®é大å°
- "nlines": 41733, æ»çæ°æ®éè¡æ°
- "block_size": 16777216ï¼æ°æ®éæ¯å大å°
- æ¡ä¾ï¼æ们å°æ ·ä¾ç»å®çwikiåflançæ为æ°æ®éï¼
-
$ cd CPMBee/src $ python build_dataset.py --input-path ../raw_data/wiki/ --output-path ../datasets/wiki/ --output-name wiki --data-type txt --min-length 100 --max-length 10000 $ python build_dataset.py --input-path ../raw_data/flan/ --output-path ../datasets/flan/ --output-name flan --data-type json
- çæä¹åçæ件ç»æ为ï¼
CPMBee/ âââ src | âââ ... | âââ build_dataset.py âââ raw_data | âââ wiki | | âââ raw.txt | âââ flan | âââ raw.json âââ datasetsï¼çæçæ°æ®éï¼ âââ wikiï¼wiki对åºçæ°æ®éï¼ | âââ data | âââ wiki | âââ meta.bin âââ flanï¼flan对åºçæ°æ®éï¼ âââ data âââ flan âââ meta.bin
- CPMBee为äºé«æ读åæ°æ®ä»¥åå¨åå¸å¼æ件系ç»ä¸è¿è¡æ°æ®éé¨ç½²ï¼éè¦å°å
¶è½¬åæäºè¿å¶æ件ï¼å
·ä½è°ç¨srcä¸çbuild_dataset.pyï¼å
·ä½åæ°å
æ¬ï¼
-
ä»»å¡è½¬æ¢èæ¬
- 对äºæ¯ä¸ªæ°æ®éï¼å¯ä»¥æ°åä»»å¡è½¬æ¢èæ¬æ¥å¯¹æ°æ®éä¸çjsonæ ¼å¼è¿è¡æ¹åï¼æ¹åæåç±»é¢è®ç»ä»»å¡ã
- èæ¬æ ¼å¼é满足以ä¸æ ¼å¼ï¼
import random def transform(data, num_sample: int, r: random.Random): ...
- 对äºæ¯ä¸ªæ°æ®éï¼CPMBeeçåºå±æ件系ç»å°ä¼èªå¨å¯¼å ¥æ°æ®éï¼è¯»åºæ°æ®ï¼ç¶åè°ç¨ä»»å¡è½¬æ¢èæ¬è¿è¡æ¹é ã
- 转æ¢èæ¬å å«ä¸ä¸ªè¾å ¥åæ°ï¼data为读åºæ ·æ¬ï¼num_sample为读åºçæ ·æ¬æ°éï¼é常为1æ¡ï¼in-context learning设å®ä¸ä¼æå¤æ¡ï¼ï¼r为éæºçæå¨ã
- æ¡ä¾ï¼é对wikiåflanå转æ¢èæ¬ï¼
- wikièæ¬
import random def rand(n: int, r: random.Random): return int(r.random() * n) def transform(data, num_sample: int, r: random.Random): # æç §ä¹åçæ¥éª¤ï¼wikiä¸çæ°æ®é½ä¸º{'text':'...'}å½¢å¼ text = data['text'] # éæºé®è½50%~100%çå 容è¿è¡é¢æµ mid = rand(len(text) // 2, r) # CPMBeeéè¦<æ¥è¯å«ç¹æ®é®ï¼æ以éè¦å°å 容ä¸ç<转æ¢ä¸º<<è¿è¡è½¬ä¹ ipt = text[:mid].replace("<", "<<") ans = text[mid:].replace("<", "<<") return {"input": ipt, "<ans>": ans}
- flanèæ¬
import random def transform(data, num_sample: int, r: random.Random): # æç §ä¹åçæ¥éª¤ï¼flanä¸çæ°æ®å·²ç»æ¯éæ©é¢çjsonæ ¼å¼äºï¼ä¸å å«<ans>é®ï¼æ以ç´æ¥è¿åè¿è¡è®ç» return data
- åå®ä»»å¡è½¬æ¢èæ¬åçæ件ç»æ为ï¼
CPMBee/ âââ src | âââ ... | âââ build_dataset.py âââ raw_data | âââ wiki | | âââ raw.txt | | | âââ flan | âââ raw.json âââ datasets âââ wiki | âââ data | | âââ wiki | | âââ meta.bin | âââ transform.pyï¼wiki对åºçä»»å¡è½¬æ¢èæ¬ï¼ âââ flan âââ data | âââ flan | âââ meta.bin âââ transform.pyï¼flan对åºçä»»å¡è½¬æ¢èæ¬ï¼
-
æ°æ®éèæ¬
- ææåä¸è®ç»çæ°æ®ééè¦ä¸ä¸ªæ°æ®éèæ¬æ¥è¿è¡ä¿¡æ¯æ±æ»ï¼æ°æ®éèæ¬ä¹æ¯ä¸ä¸ªjsonæ件ï¼æ ¼å¼å¦ä¸
-
[ { "dataset_name": "wiki", "task_name": "lm", "weight": 1.0, "path": "wiki/data", "incontext_weight": [1.0], "transforms": "wiki/transform.py" }, { "dataset_name": "flan", "task_name": "nlu", "weight": 1.0, "path": "flan/data", "incontext_weight": [1.0], "transforms": "flan/transform.py" } ]
- å
¶ä¸ï¼å
å«åæ°æï¼
- dataset_name: æ°æ®éå称ï¼
- task_name: æ°æ®éæå±ä»»å¡ï¼task_name+dataset_nameå°ä½ä¸ºè®ç»è¿ç¨ä¸è¯å«æ°æ®éçæ ç¾ï¼task_nameåå¯ç¨äºè®ç»è¿ç¨ä¸é对任å¡åå«æ±æ»lossä¿¡æ¯ï¼
- weight: éæ ·æéï¼
- path: meta.binãäºè¿å¶æ°æ®å¯¹åºçè·¯å¾ï¼
- transforms: ä»»å¡è½¬æ¢èæ¬å¯¹åºçè·¯å¾ï¼
- incontext_weight: è®ç»æ ·æ¬å å ï¼[1.0]表示100%çæ¦çéæ ·ä¸ä¸ªæ ·æ¬ï¼[0.8, 0.2]表示20%æ¦çéæ ·ä¸¤ä¸ªæ ·æ¬è¿è¡æ¼æ¥ï¼[0.75, 0.1, 0.15]表示15%æ¦çéæ ·ä¸ä¸ªæ ·æ¬ã10%çæ¦çéæ ·ä¸¤ä¸ªæ ·æ¬è¿è¡æ¼æ¥ã
- æ¡ä¾ï¼åå®æ°æ®éèæ¬æ±æ»wikiåflanæ°æ®éåçæ件路å¾ç»æ
CPMBee/ âââ src | âââ ... | âââ build_dataset.py âââ raw_data | âââ wiki | | âââ raw.txt | âââ flan | âââ raw.json âââ datasets âââ datasets.jsonï¼æ°æ®éèæ¬ï¼ âââ wiki | âââ data | | âââ wiki | | âââ meta.bin | âââ transform.py âââ flan âââ data | âââ flan | âââ meta.bin âââ transform.py
- å
¶ä¸ï¼å
å«åæ°æï¼
-
é¢è®ç»èæ¬
- é¢è®ç»èæ¬å¦ä¸
-
#! /bin/bash # æ¯å°æºå¨çå¡æ° GPUS_PER_NODE=8 # æºå¨å°æ° NNODES=1 # masteræºå¨çIPå端å£ï¼æ´å¤ä¿¡æ¯å¯ä»¥åèpytorchåå¸å¼è®ç»ææ¡£ MASTER_ADDR="localhost" MASTER_PORT=12345 OPTS="" # model and dataset settings # 模åé ç½® OPTS+=" --model-config config/cpm-bee-10b.json" # æ¥éª¤4æ°æ®éèæ¬ä½ç½® OPTS+=" --dataset ../datasets/datasets.json" # training settings # è®ç»æ¥æ° OPTS+=" --train-iters 200000" # åå¡çbatch size OPTS+=" --batch-size 2" # æ ·æ¬æ大é¿åº¦ï¼æ³¨æCPMBeeåºå±ä¼æ¼æ¥æ°æ®ç¡®ä¿max-lengthçå©ç¨æç OPTS+=" --max-length 2048" # å¦ä¹ çï¼å¦ææ¥çä¹åçckpt继ç»è®ç»ï¼å»ºè®®æ¹å° OPTS+=" --lr 0.01" # warmupæ¥æ° OPTS+=" --warmup-iters 2000" # å¦ä¹ çä¸éçæºå¶ OPTS+=" --lr-decay-style noam" # weight decayï¼è¿ä¸ªä¼ç»åå°AdamWä¸ OPTS+=" --weight-decay 0.01" # 梯度è£åªçèå´ OPTS+=" --clip-grad 1.0" # æ··å精度losså åç³»æ° OPTS+=" --loss-scale 1048576" # æ··å精度losså åç³»æ°çå¢é¿/éä½åæ° OPTS+=" --loss-scale-factor 2" # æ¯éå¤å°æ¥losså åç³»æ°è¿è¡å¢é¿ OPTS+=" --loss-scale-steps 128" # log settings # æ¯éå¤å°æ¥æå°åæ°åå¼æ¹å·®ã梯度åå¼æ¹å·® OPTS+=" --inspect-iters 100" # logæ件è¾åºè·¯å¾ OPTS+=" --log-dir ../logs/train/" # tensorboardæ件è¾åºè·¯å¾ OPTS+=" --tensorboard ../logs/tensorboard/cpm_live_48_4096/" # saving ckpts # æ¯éå¤å°æ¥è¾åºckpt OPTS+=" --save-iters 500" # è¾åºckptçè·¯å¾ OPTS+=" --save ../results/" # è¾åºckptçå称ï¼CPMBeeå¨è¾åºckptæ¶ä¼æå°æ¥æ° OPTS+=" --save-name cpm_live_checkpoint" # loading ckptsï¼å¦æå è½½èçckptå°±æä¸å注éæå¼ï¼ç¶åå¡«åMODEL_STEPS # MODEL_STEPS="0" # OPTS+=" --start-step ${MODEL_STEPS}" # OPTS+=" --load ../results/cpm_live_checkpoint-${MODEL_STEPS}.pt" # æ¯å¦å è½½åå²æ¢¯åº¦ # OPTS+=" --load-grad " CMD="torchrun --nnodes=${NNODES} --nproc_per_node=${GPUS_PER_NODE} --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=${MASTER_ADDR}:${MASTER_PORT} pretrain_cpm_bee.py ${OPTS}" echo ${CMD} $CMD
- æ¡ä¾ï¼åå®é¢è®ç»èæ¬åçæ件路å¾ç»æ
-
CPMBee/ âââ src | âââ scripts | | âââ pretrain_cpm_bee.shï¼é¢è®ç»èæ¬ï¼ | âââ pretrain_cpm_bee.py | âââ build_dataset.py âââ raw_data | âââ wiki | | âââ raw.txt | âââ flan | âââ raw.json âââ datasets âââ datasets.json âââ wiki | âââ data | | âââ wiki | | âââ meta.bin | âââ transform.py âââ flan âââ data | âââ flan | âââ meta.bin âââ transform.py
-
é¢è®ç»å½ä»¤
-
cd CPMBee/src bash scripts/pretrain_cpm_bee.sh
- æ¡ä¾ï¼åå®é¢è®ç»èæ¬åçæ件路å¾ç»æ
-
CPMBee/ âââ src | âââ scripts | | âââ pretrain_cpm_bee.sh | âââ pretrain_cpm_bee.py | âââ build_dataset.py âââ resultsï¼ckptè¾åºè·¯å¾ï¼ âââ logsï¼logæ件è¾åºè·¯å¾ï¼ âââ raw_data | âââ wiki | | âââ raw.txt | âââ flan | âââ raw.json âââ datasets âââ datasets.json âââ wiki | âââ data | | âââ wiki | | âââ meta.bin | âââ transform.py âââ flan âââ data | âââ flan | âââ meta.bin âââ transform.py
-
OpenBMB è¡çåè½
åºäºOpenBMBç大模åç³»ç»çæï¼æ们å¨è®ç»CPM-Beeçè¿ç¨ä¸å®ç°äºå ¨æµç¨é«æãåæ¶æä¾äºæ¨¡åå¾®è°ï¼åºäºBMTrainåOpenDeltaï¼ãå·¥å ·ä½¿ç¨ï¼åºäºBMToolsï¼ã模åå缩ï¼åºäºBMCookï¼ãä½èµæºæ¨çï¼åºäºBMInfï¼çå ¨å¥èæ¬ï¼å¯ä»¥åå©å¼åè å¿«éä¸æå使ç¨CPM-Beeã
模åå¾®è°
åºäºBMTrainåOpenDeltaï¼æ们ç»åºäºä¸¤ç§å¾®è°æ¹æ¡ï¼å ¨åæ°å¾®è°ååæ°é«æçå¢éå¾®è°ï¼å¯ä»¥å°CPM-Beeéé å°åç±»ä¸æ¸¸åºæ¯ä¸ã
- å ¨åæ°å¾®è°ï¼
$ torchrun --nnodes=1 --nproc_per_node=4 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py
- å¢éå¾®è°ï¼
$ torchrun --nnodes=1 --nproc_per_node=4 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py \
--use-delta \
å¾®è°æµç¨
è¦å¨ç¹å®ä»»å¡ä¸å¾®è°æ¨¡åï¼æ¨åºè¯¥åå¤æ°æ®é并æå¦ä¸æ¹å¼æ§è¡ï¼
- è°æ´æ°æ®æ ¼å¼ã
æ¨å¯ä»¥å°åç±»é®é¢éæå°éæ©é¢çæ ¼å¼ä¸ãæå
³æ°æ®æ ¼å¼çæ´å¤ä¿¡æ¯ï¼æ¨å¯ä»¥æ¥çCPM-Beeæ°æ®æ ¼å¼
åºå½æ³¨æï¼ç±äºæ们éå®<...>
ä½ä¸ºç¹æ®tokençæ è®°ï¼å¯è½ä¸ææ¬ä¸ç<
æ··æ·ï¼æ以æ¨åºå½å¯¹ææ¬æ°æ®ä¸çéç¹æ®tokençé¨åï¼å转ä¹å¤çãä¾å¦ï¼æ们æå¦ä¸æ°æ®
该æ°æ®ä¸ï¼{"input": "å¢éé åé常éè¦ï¼å¦æä¸è½åå°<mask_0>ï¼åå¯è½ä¼é æ1+1<2çç»æï¼æ以ï¼è¦æ´å 注æ<mask_1>", "<ans>": {"<mask_0>": "", "<mask_1>": ""}}
<mask_0>
ä¸<mask_1>
æ¯ç¹æ®tokenï¼åºä¿æä¸åï¼å ¶ä½<
åæ¿æ¢ä¸º<<
ï¼è½¬ä¹å¤çåçæ°æ®å¦ä¸:{"input": "å¢éé åé常éè¦ï¼å¦æä¸è½åå°<mask_0>ï¼åå¯è½ä¼é æ1+1<<2çç»æï¼æ以ï¼è¦æ´å 注æ<mask_1>", "<ans>": {"<mask_0>": "", "<mask_1>": ""}}
- å°æ°æ®éé¢å¤ç为äºè¿å¶æ件ã è¦æ建é¢å¤çæ°æ®éï¼æ¨å¯ä»¥è¿è¡
$ python preprocess_dataset.py --input your/reformated/data/path --output_path your/binary/data/path --output_name data_name
é¢å¤çåï¼æ¨å°è·å¾ï¼
|-- your/binary/data/path
|-- folder1
| |-- data_name
| |-- meta.bin
|-- folder2
|-- data_name
|-- meta.bin
- å¾®è°CPM-Bee è¦å¼å§å¾®è°ï¼æ¨å¯ä»¥è¿è¡ï¼
$ bash scripts/finetune_cpm_bee.sh
æè æ¨å¯ä»¥ç´æ¥éè¿torchrunè¿è¡finetune_cpm_bee.pyãä¾å¦ï¼æ¨å¯ä»¥å¨å ·æ4åGPUçæå¡å¨ä¸å¯¹CPM-Beeè¿è¡å¢éå¾®è°ï¼å¦ä¸æ示ï¼
torchrun --nnodes=1 --nproc_per_node=4 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py \
--model-config your/model/config/path \
--load your/model/checkpoint/path \
--dataset your/binary/data/path/folder1 \
--eval_dataset your/binary/data/path/folder2 \
--use-delta
æ们建议æ¨ä½¿ç¨ä¸è¿°æ¹æ¡å¾®è°ï¼åæ¶æ¨å¯ä»¥åèð¤Transformersï¼ä½¿ç¨æ¨èªå·±ç并è¡åçç¥æ¥å¾®è°CPM-Beeã
模åå缩
åºäºBMCookï¼æ们对åå§çCPM-Beeåºåº§æ¨¡åè¿è¡å缩ï¼æä¾äºå¤ç§å¤§å°çCPM-Bee模åæ¥éåºåç§ä¸åçåºæ¯ãæ¤å¤ï¼æ们é对ä¸å大å°ç模åé½æä¾äºåºäºð¤Transformersççæ¬ï¼æ¨å¯ä»¥ç¹å»ä¸æ¹é¾æ¥è¿å ¥æ¨¡åä»åºæ¥çæ´å¤ä¿¡æ¯ã
模å | #Attnå± | #FFNå± | Attnéç¶æ维度 | FFNéç¶æ维度 | ä¸è½½ | ð¤Transformers |
---|---|---|---|---|---|---|
CPM-Bee-10B | 48 | 48 | 4096 | 10240 | é¾æ¥ | é¾æ¥ |
CPM-Bee-5B | 19 | 24 | 4096 | 10240 | é¾æ¥ | é¾æ¥ |
CPM-Bee-2B | 19 | 24 | 2048 | 5120 | é¾æ¥ | é¾æ¥ |
CPM-Bee-1B | 19 | 24 | 1280 | 1024 | é¾æ¥ | é¾æ¥ |
模åé¨ç½²
对äºå缩åçCPM-Beeï¼æ®éçæ¶è´¹çº§æ¾å¡å³å¯å®æå¿«éæ¨çï¼ä¸å大å°ç模åæå ç¨çæ¨çèµæºå¦ä¸ï¼
模å | æ¨çæ¾åå ç¨ | æ¨è硬件 |
---|---|---|
CPM-Bee-10B | 20GB | RTX 3090ï¼24 GBï¼ |
CPM-Bee-5B | 11 GB | RTX 3090ï¼24 GBï¼ |
CPM-Bee-2B | 6.7 GB | GTX 1080ï¼8 GBï¼ |
CPM-Bee-1B | 4.1 GB | GTX 1660ï¼6 GBï¼ |
使ç¨æ¬ä»åº
对äºå ·ä½çæ¨çä»»å¡ï¼æ¨å¯ä»¥æ ¹æ®å éä¸æ¥çCPM-Beeä»åºç¼åèªå·±çæ¨ç代ç ãè¿éæ们举ä¸ä¸ªç®åçææ¬çæ示ä¾ã
from cpm_live.generation.bee import CPMBeeBeamSearch
from cpm_live.models import CPMBeeTorch, CPMBeeConfig
from cpm_live.tokenizers import CPMBeeTokenizer
import torch
# prepare your input data.
data_list = [
{"input": "ä»å¤©å¤©æ°æ¯çç", "prompt": "å¾ååä¸å¥è¯", "<ans>": ""}
]
# load model
config = CPMBeeConfig.from_json_file("cpm-bee-5b.json")
ckpt_path = "cpm-bee-5b-ckpt.pt"
tokenizer = CPMBeeTokenizer()
model = CPMBeeTorch(config=config)
# load checkpoints
model.load_state_dict(torch.load(ckpt_path), strict=False)
model.cuda()
# use beam search
beam_search = CPMBeeBeamSearch(
model=model,
tokenizer=tokenizer,
)
for data in data_list:
inference_results = beam_search.generate([data], max_length=100, repetition_penalty=1.1)
for res in inference_results:
print(res)
æ们è¿å°ä¸é¢ç代ç éæå°ä¸ä¸ªpythonæ件text_generation.py
ä¸ï¼ä¸ºäºä¾¿äºæ¨çï¼å¯ä»¥ç´æ¥è¿è¡è¯¥æ件ï¼
python text_generation.py
å¦ææ¨çæ¾åè¾å°ï¼æ³ä½¿ç¨BMInfè¿è¡ä½èµæºæ¨ç:
python text_generation.py --use-bminf --memory-limit 12
å¦æå¸æ使ç¨CPUè¿è¡æ¨çï¼
python text_generation.py --device cpu
å¦æå¸æå¨æ¨çæ¶å 载微è°åçdelta模å:
python text_generation.py --delta delta.pt
使ç¨ð¤Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("openbmb/cpm-bee-10b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("openbmb/cpm-bee-10b", trust_remote_code=True).cuda()
result = model.generate({"input": "ä»å¤©å¤©æ°ä¸éï¼", "<ans>": ""}, tokenizer)
print(result)
æ们æä¾äºä¸ä¸ªåºäºð¤Transformersçæ¨çèæ¬text_generation_hf.py
ï¼æ¨å¯ä»¥è¿è¡
python text_generation_hf.py
å¤å¡é¨ç½²ï¼
python text_generation_hf.py --multi-gpu
å¤å¡é¨ç½²çåºç¡ä¸ï¼å 载微è°åçdelta模å:
python text_generation_hf.py --multi-gpu --delta delta.pt
ð« æ§è½è¡¨ç°
é¶æ ·æ¬è¯æµ
æ们对CPM-Beeåºåº§æ¨¡åè¿è¡äºå ¨æ¹ä½çä¸è±æè½åè¯æµã å¨ä¸æçZero-CLUEè¯æµåºåä¸ï¼CPM-Beeå¯ä»¥å¤§å¹ è¶ è¶å ¶ä»æ¨¡åï¼ä½åä¸æ大模å第ä¸ãå¨è±æè¯æµåºåä¸ï¼CPM-Beeä¹å±ç°åºäºåå¼æºæ¨¡åLLaMAç¸å½çææã
ZeroCLUEä¸æè¯æµ
模å | Score | EPRSTMT | CSLDCP | TNEWSF | IFLYTEKF | OCNLIF | BUSTM | CHIDF | CSLF | CLUEWSCF |
---|---|---|---|---|---|---|---|---|---|---|
CPM-Bee | 78.184 | 85.52 | 58.99 | 78.2 | 58.81 | 77.73 | 83.85 | 89.65 | 83.6 | 87.24 |
Ctyun_Big_Model | 76.217 | 87.25 | 48.02 | 77.13 | 59.62 | 75.5 | 90.05 | 84.6 | 82.9 | 81.72 |
PaddleNLP-UTC | 70.547 | 85.92 | 58.92 | 68.27 | 40.15 | 74.79 | 76.7 | 82.75 | 70.6 | 74.48 |
äºéç¥-UnifiedMC | 70.295 | 88.71 | 50.18 | 71.67 | 40.58 | 75.5 | 80.15 | 84.85 | 60.6 | 81.72 |
è±æè¯æµ
模å | Average | BoolQ | PIQA | SIQA | HellaSwag | WinoGrande | ARC-e | ARC-c | OBQA |
---|---|---|---|---|---|---|---|---|---|
GPT-3 | 60.5 | 81 | - | 78.9 | 70.2 | 68.8 | 51.4 | 57.6 | |
Gopher | 79.3 | 81.8 | 50.6 | 79.2 | 70.1 | - | - | - | |
Chinchilla | 83.7 | 81.8 | 51.3 | 80.8 | 74.9 | - | - | - | |
PaLM | 84.8 | 80.5 | - | 79.7 | 77 | 75.2 | 52.5 | 50.4 | |
LLaMA-7B | 66.13 | 76.5 | 79.8 | 48.9 | 76.1 | 70.1 | 72.8 | 47.6 | 57.2 |
LLaMA-13B | 68.08 | 78.1 | 80.1 | 50.4 | 79.2 | 73 | 74.8 | 52.7 | 56.4 |
CPM-Bee | 67.80 | 78.69 | 77.58 | 61.11 | 78.89 | 61.88 | 66.88 | 54.18 | 63.20 |
CPM-Bee + Decoder Tuning
使ç¨åOpenBMBåTHUNLPèåèªç çDecoder Tuningï¼å°å表äºACL 2023ï¼ææ¯ï¼å¯ä»¥ä» ä» ä½¿ç¨APIçæ åµä¸ï¼ä¸è®¿é®åä¿®æ¹æ¨¡ååæ°å³å¯å¤§å¹ æé«ä¸æ¸¸ä»»å¡çæ§è½ã å®ç°ä»£ç é¾æ¥ã
æ ·æ¬æ° | 模å | SST2 | IMDB | Yelp | AGNews | DBpedia | Yahoo | RTE | SNLI | MNLI-m | MNLI-mm | FewNERD | Avg. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CPM-Bee | 80.5 | 89.1 | 96.6 | 74.6 | 71.3 | 46.7 | 84.1 | 45.4 | 45.6 | 45.6 | 1.6 | 61.9 |
16 | T5-3B | 89.9 | 92.7 | 94.9 | 87.7 | 96.2 | 66.5 | 55.8 | 52.0 | 52.8 | 52.2 | 51.9 | 72.1 |
LLaMA-7B | 85.1 | 90.5 | 92.8 | 71.4 | 89.8 | 45.1 | 49.1 | 35.2 | 36.3 | 36.2 | 54.6 | 62.4 | |
Vicuna-13B | 82.1 | 88.8 | 95.6 | 86.4 | 74.4 | 55.3 | 62.5 | 61.4 | 54.3 | 48.6 | 52.1 | 69.2 | |
CPM-Bee | 92.7 | 96.2 | 97.5 | 85.5 | 89.8 | 65.2 | 86.0 | 86.4 | 76.3 | 76.3 | 54.6 | 82.4 | |
64 | LLaMA-7B | 87.5 | 85.7 | 96.9 | 75.4 | 93.5 | 47.4 | 51.4 | 39.4 | 36.2 | 38.4 | 59.8 | 64.7 |
Vicuna-13B | 92.0 | 90.8 | 96.5 | 87.7 | 87.8 | 58.7 | 59.1 | 58.7 | 56.7 | 48.4 | 56.8 | 72.1 | |
CPM-Bee | 94.3 | 96.5 | 98.3 | 88.5 | 93.5 | 68.7 | 87.1 | 88.9 | 78.0 | 79.0 | 59.8 | 84.8 | |
256 | LLaMA-7B | 87.6 | 88.8 | 97.1 | 82.4 | 94.2 | 48.5 | 53.4 | 39.8 | 37.3 | 37.4 | 59.1 | 66.0 |
Vicuna-13B | 93.1 | 88.7 | 96.8 | 89.9 | 89.1 | 58.6 | 58.5 | 58.7 | 57.5 | 48.3 | 56.6 | 72.3 | |
CPM-Bee | 94.5 | 96.7 | 98.4 | 89.7 | 94.2 | 69.9 | 87.7 | 89.4 | 81.7 | 80.6 | 59.1 | 85.6 |
ðå¼æºåè®®
模ååè®®
CPM-Beeåºåº§éç¨å议为âéç¨æ¨¡å许å¯åè®®-æ¥æºè¯´æ-å®£ä¼ éå¶-åä¸ææâï¼æ¬æ¨¡åå 许åç¨ï¼å¦éå°æ¨¡åç¨äºåä¸ç¨éï¼è¯·èç³»cpm@modelbest.cnæ¥è·å书é¢ææã
声æ
ä½ä¸ºä¸ä¸ªè¯è¨æ¨¡åï¼CPM-Beeéè¿å¦ä¹ 大éçææ¬æ¥çæå 容ï¼ä½å®æ æ³ç解ã表达个人è§ç¹æä»·å¼å¤æï¼å®æè¾åºçä»»ä½å 容é½ä¸ä»£è¡¨æ¨¡åå¼åè çè§ç¹åç«åºã å æ¤ç¨æ·å¨ä½¿ç¨CPM-Beeçæçå 容æ¶ï¼åºèªè¡è´è´£å¯¹å ¶è¿è¡è¯ä¼°åéªè¯ã
Top Related Projects
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
An open-source NLP research library, built on PyTorch.
TensorFlow code and pre-trained models for BERT
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot