Top Related Projects
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
An open-source NLP research library, built on PyTorch.
TensorFlow code and pre-trained models for BERT
Quick Overview
CPM-Bee is an open-source large language model (LLM) developed by the OpenBMB team. It is designed to be a powerful, efficient, and versatile model for various natural language processing tasks, with a focus on Chinese language capabilities.
Pros
- Specialized in Chinese language processing while maintaining multilingual capabilities
- Open-source and freely available for research and commercial use
- Efficient performance with relatively small model size (10B parameters)
- Supports a wide range of NLP tasks, including text generation, summarization, and question-answering
Cons
- May not perform as well as larger models (e.g., GPT-3) on certain complex tasks
- Limited documentation and community support compared to more established LLMs
- Potential biases in training data, as with most large language models
- Requires significant computational resources for fine-tuning and deployment
Code Examples
# Example 1: Text Generation
from cpm_bee import CPMBee
model = CPMBee.from_pretrained("openbmb/cpm-bee-10b")
generated_text = model.generate("Once upon a time, in a land far away,", max_length=100)
print(generated_text)
# Example 2: Question Answering
question = "What is the capital of France?"
context = "France is a country in Western Europe. Its capital city is Paris, known for the Eiffel Tower."
answer = model.answer_question(question, context)
print(answer)
# Example 3: Text Summarization
long_text = "..." # A long article or document
summary = model.summarize(long_text, max_length=150)
print(summary)
Getting Started
To get started with CPM-Bee, follow these steps:
- Install the library:
pip install cpm-bee
- Import and use the model:
from cpm_bee import CPMBee
model = CPMBee.from_pretrained("openbmb/cpm-bee-10b")
result = model.generate("Hello, world!")
print(result)
Note: Ensure you have sufficient GPU resources to run the model efficiently.
Competitor Comparisons
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Pros of petals
- Focuses on distributed inference of large language models
- Supports multiple models including BLOOM and OPT
- Offers a user-friendly API for easy integration
Cons of petals
- Limited to specific pre-trained models
- May require more setup and configuration for distributed computing
- Less flexibility in terms of model customization
Code comparison
CPM-Bee:
from cpm_live import CPMBee
model = CPMBee("path/to/model")
response = model.generate("Hello, how are you?")
print(response)
petals:
import petals
model = petals.AutoDistributedModelForCausalLM.from_pretrained("bigscience/bloom")
inputs = model.tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs)
print(model.tokenizer.decode(outputs[0]))
Both repositories provide APIs for working with large language models, but CPM-Bee focuses on a specific model (CPM-Bee) while petals supports multiple distributed models. CPM-Bee offers a simpler interface for basic text generation, while petals provides more flexibility in terms of model selection and distributed computing capabilities.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Pros of gpt-neox
- More extensive documentation and examples
- Larger community and more active development
- Better support for distributed training across multiple GPUs/nodes
Cons of gpt-neox
- Higher computational requirements
- More complex setup and configuration process
- Steeper learning curve for beginners
Code Comparison
CPM-Bee:
from cpm_live import CPMBee
model = CPMBee("path/to/model")
response = model.generate("Hello, how are you?")
print(response)
gpt-neox:
from gpt_neox import GPTNeoX
model = GPTNeoX.from_pretrained("EleutherAI/gpt-neox-20b")
input_ids = tokenizer.encode("Hello, how are you?", return_tensors="pt")
output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(output[0]))
Both repositories provide implementations of large language models, but gpt-neox offers more flexibility and scalability for advanced users, while CPM-Bee focuses on simplicity and ease of use for quick deployment and experimentation.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- More comprehensive and widely adopted optimization toolkit for deep learning
- Supports a broader range of models and training scenarios
- Offers advanced features like ZeRO-Offload and 3D parallelism
Cons of DeepSpeed
- Steeper learning curve due to its extensive feature set
- May require more configuration and tuning for optimal performance
- Potentially higher overhead for simpler use cases
Code Comparison
DeepSpeed:
import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(
args=args,
model=model,
model_parameters=params
)
CPM-Bee:
from cpm_bee import CPMBeeConfig, CPMBee
config = CPMBeeConfig.from_pretrained("path/to/config")
model = CPMBee.from_pretrained("path/to/model", config=config)
Summary
DeepSpeed is a more comprehensive optimization toolkit for deep learning, offering a wide range of features and optimizations. It's suitable for large-scale training scenarios and supports various models. However, it may have a steeper learning curve and require more configuration.
CPM-Bee, on the other hand, appears to be more focused on a specific model architecture (CPM-Bee) and may be easier to use for that particular use case. It likely has a simpler setup process but may not offer as many advanced optimization techniques as DeepSpeed.
The choice between the two depends on the specific requirements of your project, the scale of your training, and the level of optimization needed.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of transformers
- Extensive model support: Covers a wide range of transformer-based models
- Active community: Regular updates and contributions from a large user base
- Comprehensive documentation: Detailed guides and examples for various tasks
Cons of transformers
- Complexity: Can be overwhelming for beginners due to its extensive features
- Resource-intensive: Some models require significant computational resources
Code comparison
transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs)
CPM-Bee:
from cpm_live import CPMBeeModel
model = CPMBeeModel.from_pretrained("cpm-bee-10b")
response = model.generate("Hello, how are you?")
print(response)
The transformers library offers a more flexible approach, allowing for easy switching between different models and tasks. CPM-Bee provides a simpler interface specifically tailored for the CPM-Bee model, which may be more straightforward for users focused on this particular model.
An open-source NLP research library, built on PyTorch.
Pros of AllenNLP
- Comprehensive documentation and tutorials
- Extensive pre-built models and datasets
- Strong community support and regular updates
Cons of AllenNLP
- Steeper learning curve for beginners
- More complex setup and configuration
Code Comparison
AllenNLP:
from allennlp.predictors import Predictor
predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz")
result = predictor.predict(sentence="Did Uriah honestly think he could beat the game in under three hours?")
CPM-Bee:
from cpm_live import CPMBee
model = CPMBee("path/to/model")
response = model.generate("What is the capital of France?", max_tokens=50)
print(response)
AllenNLP offers a more structured approach with pre-built models and predictors, while CPM-Bee provides a simpler interface for text generation. AllenNLP is better suited for complex NLP tasks and research, whereas CPM-Bee focuses on ease of use for general language generation.
TensorFlow code and pre-trained models for BERT
Pros of BERT
- Widely adopted and well-documented, with extensive research and community support
- Pre-trained models available for various languages and tasks
- Proven effectiveness in numerous NLP applications
Cons of BERT
- Larger model size and higher computational requirements
- Limited to a maximum sequence length of 512 tokens
- Less flexible for fine-tuning on specific downstream tasks
Code Comparison
BERT:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
CPM-Bee:
from openbmb import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("openbmb/cpm-bee-10b")
model = AutoModel.from_pretrained("openbmb/cpm-bee-10b")
Key Differences
- CPM-Bee is a more recent model with potential improvements in Chinese language understanding
- BERT has a larger ecosystem of pre-trained models and fine-tuning examples
- CPM-Bee may offer better performance on specific Chinese NLP tasks
- BERT is more suitable for general-purpose NLP applications across multiple languages
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
CPM-Bee
ç¾äº¿åæ°ç弿ºä¸è±æåè¯åºåº§å¤§æ¨¡å
模å ⢠OpenBMBä½ç³» ⢠æ§è½è¡¨ç° â¢ å¼æºåè®®
⨠模åä»ç»
CPM-Beeæ¯ä¸ä¸ªå®å ¨å¼æºãå 许åç¨çç¾äº¿åæ°ä¸è±æåºåº§æ¨¡åï¼ä¹æ¯CPM-Liveè®ç»ç第äºä¸ªéç¨ç¢ãå®éç¨Transformerèªå彿¶æï¼auto-regressiveï¼ï¼å¨è¶ ä¸äº¿ï¼trillionï¼é«è´¨éè¯æä¸è¿è¡é¢è®ç»ï¼æ¥æå¼ºå¤§çåºç¡è½åãå¼åè åç ç©¶è å¯ä»¥å¨CPM-Beeåºåº§æ¨¡åçåºç¡ä¸å¨åç±»åºæ¯è¿è¡éé æ¥ä»¥å建ç¹å®é¢åçåºç¨æ¨¡åã
-
**ð 弿ºå¯åç¨**ï¼OpenBMBå§ç»ç§æ¿â让大模åé£å ¥å家䏿·âç弿ºç²¾ç¥ï¼CPM-Beeåºåº§æ¨¡åå°å®å ¨å¼æºå¹¶ä¸å¯åç¨ï¼ä»¥æ¨å¨å¤§æ¨¡åé¢åçåå±ãæä»¬é¼å±å ¨çèå´å çç§ç æºæãä¼ä¸å个人å¼åè å¨éµå®å¼æºè®¸å¯åè®®çåæä¸ï¼èªç±å°å¨CPM-Beeåºåº§æ¨¡åä¸è¿è¡åæ°ã
-
ð« ä¸è±åè¯æ§è½ä¼å¼ï¼CPM-Beeåºåº§æ¨¡åå¨é¢è®ç»è¯æä¸è¿è¡äºä¸¥æ ¼ççéåé æ¯ï¼åæ¶å¨ä¸è±åè¯ä¸å ·æäº®ç¼è¡¨ç°ï¼å ·ä½å¯åè§è¯æµä»»å¡åç»æã
-
ð è¶ å¤§è§æ¨¡é«è´¨éè¯æï¼CPM-Beeåºåº§æ¨¡åå¨è¶ ä¸äº¿è¯æè¿è¡è®ç»ï¼æ¯å¼æºç¤¾åºå ç»è¿è¯ææå¤ç模åä¹ä¸ãåæ¶ï¼æä»¬å¯¹é¢è®ç»è¯æè¿è¡äºä¸¥æ ¼ççéãæ¸ æ´ååå¤ç以确ä¿è´¨éã
-
OpenBMB大模åç³»ç»çææ¯æï¼OpenBMB大模åç³»ç»å´ç»é«æ§è½é¢è®ç»ãéé ãåç¼©ãæ¨çå¼åäºä¸ç³»åå·¥å ·ï¼CPM-Beeåºåº§æ¨¡åå°é 奿æçå·¥å ·èæ¬ï¼é«ææ¯æå¼åè è¿è¡è¿é¶ä½¿ç¨ã
-
ð¨ 对è¯åå·¥å ·ä½¿ç¨è½åï¼ ç»åOpenBMB卿令微è°åå·¥å ·å¦ä¹ çæ¢ç´¢ï¼æä»¬å¨CPM-Beeåºåº§æ¨¡åçåºç¡ä¸è¿è¡å¾®è°ï¼è®ç»åºäºå ·æå¼ºå¤§å¯¹è¯åå·¥å ·ä½¿ç¨è½åçå®ä¾æ¨¡åï¼APIåå æµå°äºè¿æå¼æ¾ã
Read this in English.
说æï¼CPM-Beeæ¯ä¸ä¸ªåºåº§æ¨¡åï¼å³ä»é¶å¼å§éè¿é¢è®ç»å¾æ¥ãæä»¬é¼å±ç¨æ·å¨èªå·±çåºæ¯åæ°æ®ä¸éé /å¾®è°/对é½ååè¿è¡ä½¿ç¨ãä¾å¦ï¼WebCPM 以CPM-Bee为åºåº§ï¼å¨äººç±»ç½ç»æ£ç´¢çåºååæ°æ®ä¸è¿è¡éé ï¼è·å¾äºå¤æé®çåä¸ç½æ£ç´¢çè½åãåç»æä»¬å°ä¼å叿´å¤å¨CPM-Beeåºåº§æ¨¡ååºç¡ä¸éé çæ¨¡åã

ð° æ´æ°ä¿¡æ¯
- [2023/06/30] åºäºCPM-Beeç夿¨¡æç³»å模åVisCPMåå¸ï¼æ¯æå¤æ¨¡æå¯¹è¯åæçå¾ï¼
- [2023/06/16] CPM-Beeç°å·²æ¯æð¤Transformersã
- [2023/06/08] æ´æ°äºä½¿ç¨CPM-Beeè¿è¡åºç¡ä»»å¡å¾®è°çæç¨ã
- [2023/05/27] ç¾äº¿åæ°ï¼å 许åç¨çä¸è±åè¯åºåº§æ¨¡åCPM-Bee弿ºäºï¼å®æ¯CPM-Liveç第äºä¸ªéç¨ç¢ã
ð¯ CPM-Beeç³»åæ¨¡å
模å | æè¿° |
---|---|
VisCPM | æ¯æå¤æ¨¡æå¯¹è¯å徿ååçæç弿ºä¸è±åè¯å¤æ¨¡æå¤§æ¨¡å |
WebCPM | æ¯æå¤æé®çåä¸ç½æ£ç´¢ç弿ºä¸æå¤§æ¨¡å |
ð å®è£ å使ç¨
æ¨éè¦å é该ä»åºï¼
$ git clone -b main --single-branch https://github.com/OpenBMB/CPM-Bee.git
å¹¶ç¡®ä¿æ¨çç¯å¢ç¬¦åè¦æ±ï¼
- python>=3.7
- torch>=1.10,<2.0.0
æä»¬å»ºè®®ä½¿ç¨Anaconda管çç¯å¢å¹¶ä»PyPIå®è£ å ¶ä»ä¾èµé¡¹ï¼
$ cd src
$ pip install -r requirements.txt
注æ**torchçæ¬éä¸CUDAçæ¬å¯¹åºï¼ä¸ç¶ä¼å¼èµ·å®è£ é误**ï¼å°¤å ¶æ¯torch乿¯éè¿pip install -r requirements.txtè¿è¡å®è£ æ¶ï¼è¾ä¸ºå®¹æåºç°èªå¨æåå®è£ çtorchçæ¬ä¸æ¬å°CUDAçæ¬ä¸å¯¹åºï¼å¯¼è´BMTrainæ æ³å®è£ ã
模å
- 10B模åä¸è½½é¾æ¥ï¼å¦æè¦ä½¿ç¨ð¤Transformersè¿è¡æ¨¡åï¼è¯·åèè¿éï¼ã
æ°æ®æ ¼å¼
- ä¸åäºå·²æåºåº§æ¨¡åéç¨éç»æåçèªç±ææ¬å½¢å¼ç»ç»æ°æ®ï¼CPM-Beeéç¨ç»æåçjsonæ ¼å¼æ¥ç»ç»æ°æ®ã对äºç»æåæ°æ®ï¼CPM-Beeçåºåº§æ¨¡åå¯ä»¥åç¡®å°è¿è¡è¯ä¹çè§£ï¼é«æå®æåç±»åºç¡ä»»å¡ï¼å æ¬ï¼å¡«ç©ºãææ¬çæãç¿»è¯ãé®çãè¯å颿µãææ¬éæ©é¢ççï¼ä¸é¢ç»åºä¸äºä»£è¡¨æ§ä»»å¡ç模æ¿ï¼
"填空":{
"input": "å¿çå¦é¢åçç 究人ååç°ï¼ååºéè¦å³å®çæå¥½æ¹æ³ä¹ä¸ï¼æ¯å¦éæ©ä¸æå¤§å¦æ<mask_0>ï¼é½æ¶åå°ä½¿ç¨å³çå·¥ä½è¡¨ãç ç©¶ä¼åçå¿çå¦å®¶å°<mask_1>ä¸çè®ºçæ³å³çè¿è¡æ¯è¾ï¼ççå®ä»¬æå¤ç¸ä¼¼ãå·¥ä½è¡¨ç¨åºçæ¯æè
认为å®ä¼äº§çæä¼çï¼ä¹å°±æ¯è¯´ï¼æå¥½çå³çãè½ç¶æ<mask_2>å¯ä»¥æ¥åï¼ä½å®ä»¬å¨æ¬è´¨ä¸é½æ¯ç¸ä¼¼çã",
"<ans>":{
"<mask_0>":"",
"<mask_1>":"",
"<mask_2>":""
}
}
"ææ¬çæ": {
"input": "ä»å¤©å¤©æ°å¾å¥½ï¼æåå¦å¦ä¸èµ·å»å
¬åï¼",
"prompt": "å¾åå约100å",
"<ans>": ""
}
"ç¿»è¯": {
"input": "å京æ¯ä¸å½çé¦é½",
"prompt": "ä¸ç¿»è±",
"<ans>": ""
}
"é®ç": {
"input": "NGC 6231æ¯ä¸ä¸ªä½äºå¤©è座ççæ£æå¢ï¼å¤©ç座æ 为赤ç»16æ¶54åï¼èµ¤çº¬-41度48åï¼è§è§è§æµå¤§å°çº¦45è§åï¼äº®åº¦çº¦2.6è§æçï¼è·å°ç5900å
å¹´ãNGC 6231å¹´é¾çº¦ä¸ºä¸ç¾äºåä¸å¹´ï¼æ¯ä¸ä¸ªé常年轻çæå¢ï¼æå¢å
çæäº®ææ¯5çç天è座 ζ1æãç¨åçæè¿éæå°åæè¿éå°±è½çå°ä¸ªå«çè¡æãNGC 6231å¨1654年被æå¤§å©å¤©æå¦å®¶ä¹ç¦å°¼Â·å·´èæ¯ç¹Â·é迪å°çº³ï¼Giovanni Battista Hodiernaï¼ä»¥Luminosaeçåå馿¬¡çºªå½å¨æè¡¨ä¸ï¼ä½æ¯æªè§è®°è½½äºå¤å°Â·æ¢
西è¶ç天ä½å表åå¨å»Â·èµ«æå°ç深空天ä½ç®å½ãè¿ä¸ªå¤©ä½å¨1678年被ç±å¾·è·åé·ï¼I.7ï¼ã1745年被å¤è¥¿äºç§æ¯ï¼Jean-Phillippe Loys de Cheseauxï¼ï¼9ï¼ã1751å¹´è¢«å°¼å¯æÂ·è·¯æÂ·æå¡ä¼ï¼II.13ï¼åå«å次ç¬ç«åç°ã",
"question": "NGC 6231çç»çº¬åº¦æ¯å¤å°ï¼",
"<ans>": ""
}
"è¯å颿µ": {
"input":"ä¹å夿¬¡èé¤é½éæ©è¿éï¼æåç§å¤§å°çå
æ¿åæ¶è½å®¹çº³å¾å¤äººï¼ç¯å¢å¥½æç¹è²è¿æè¡¨æ¼ï¼æ´ä½è餿°å´ä¸ä¸è¢«å¸¦å¨èµ·æ¥ãç°å¨ç±äºçç«æ¹æäºçµç¤ç¾ï¼å£æççä¸å¦ä»åï¼ä¸è¿å
¶ä»èåé½è¿æ¯ä¸éï¼ç¤ç¾å©ä¸çæéª¨èæåè¿è½åå å·¥ä¸ä¸æ¤ççä¹å¾å¥½åã",
"question":"è¯åæ¯å¤å°ï¼(1-5)",
"<ans>":""
}
"éæ©é¢": {
"input": "ç¶æ¯é½å¸æèªå·±çå©åè¯å®ã忢ãæç¤¼è²ãè¦æ³è®©å©åæä¸ºè¿æ ·ç人ï¼ç¶æ¯é¦å
å¾ä»èªå·±åèµ·ï¼è¦æ¯è¿èªå·±é½åä¸å°ï¼åæè½è¦æ±å©ååå°å¢ï¼",
"options": {
"<option_0>": "å°æè¦æ±",
"<option_1>": "é使 å",
"<option_2>": "èªå·±å
å好",
"<option_3>": "让å©åæ¿ä¸»æ"
},
"question": "æè²å©åæ¶ï¼ç¶æ¯åºè¯¥ï¼",
"<ans>": ""
}
- 注æå¨æ¨¡åæ¨çæ¶å¯éç¨ä¸è¿°æ¨¡æ¿ï¼å¨æ¨¡åè®ç»æ¶éå¨
ä¸""å¤å¡«ä¸æ åçæ¡ï¼å¦ï¼
{
"input": "å京æ¯ä¸å½çé¦é½",
"prompt": "ä¸ç¿»è±",
"<ans>": "Beijing is the capital of China"
}
{
"input": "ç¶æ¯é½å¸æèªå·±çå©åè¯å®ã忢ãæç¤¼è²ãè¦æ³è®©å©åæä¸ºè¿æ ·ç人ï¼ç¶æ¯é¦å
å¾ä»èªå·±åèµ·ï¼è¦æ¯è¿èªå·±é½åä¸å°ï¼åæè½è¦æ±å©ååå°å¢ï¼",
"options": {
"<option_0>": "å°æè¦æ±",
"<option_1>": "é使 å",
"<option_2>": "èªå·±å
å好",
"<option_3>": "让å©åæ¿ä¸»æ"
},
"question": "æè²å©åæ¶ï¼ç¶æ¯åºè¯¥ï¼",
"<ans>": "<option_2>"
}
- CPM-Beeå¨é¢è®ç»é¶æ®µæ³¨å
¥äºä¸äºjsonæ ¼å¼ï¼å¯ä»¥ç´æ¥ä½¿ç¨ï¼ä¹æ¯æç¨æ·èªå·±è®¾è®¡jsonæ ¼å¼ç¶åå¾®è°æ¨¡åãææçjsonæ ¼å¼éè¦æ»¡è¶³ä¸åæ¡ä»¶ï¼
- è¾åºå
容**å¿
é¡»**使ç¨
ä½ä¸ºé®å¼æ¥ç»ç»ï¼ - éæ©é¢çé项建议使ç¨<option_xx>æ¥ç»ç»ï¼ä¸xx为æ°åï¼
- 填空é¢ç空ç½å»ºè®®ä½¿ç¨<mask_xx>æ¥ç»ç»ï¼ä¸xx为æ°åï¼
- å 为"<"å¨CPM-Beeä¸ä¼ä½ä¸ºè¯å«
ã<option_xx>ã<mask_xx>ç触åç¬¦ï¼æä»¥å¨æ°æ®ä¸æä¸**å¿ é¡»**å°"<"转å为"<<"è¿è¡è½¬ä¹ï¼ä¾å¦å¨ä¸é¢çä¾åä¸"1 < 2"ã"10 < 8"被转å为"1 << 2"ã"10 << 8"ï¼
- è¾åºå
容**å¿
é¡»**使ç¨
{
"question": "ä¸é¢åªé¡¹æ¯æ£ç¡®ç",
"options": {
"<option_0>": "1 << 2",
"<option_1>": "10 << 8",
},
"<ans>": "<option_0>"
}
模åé¢è®ç»
-
æ°æ®æ¸ æ´
- éè¦å°æ¯ä¸ªæ ·æ¬æ¾ç½®ä¸ºä¸è¡ï¼æ¢è¡è¿è¡è½¬ä¹å为\nï¼æ ¼å¼å¯ä¸ºtxtä¹å¯ä¸ºjsonï¼ä¾å¦ï¼
- txtæ ¼å¼
... ... How can cross training benefit groups like runners, swimmers, or weightlifters?\n\n1. Reduces the risk of injury...\n\n2. Improves overall fitness... Are there any particular physical benefits to mindful walking, such as improved posture or increased physical fitness?\n\n1. Choose a quiet and peaceful environment...\n\n2. Start by tuning into your breath and becoming aware of your surroundings... ... ...
- jsonæ ¼å¼
... ... {"template": "Does the answer correctly answer the question", "sentence": "Unicode has the explicit aim of transcending ...", "question": "What is the aim of Unicode?", "options": {"<option_0>": "no", "<option_1>": "yes"}, "<ans>": "<option_1>"} ... ...
- æ¡ä¾ï¼æä»¬æä¾äºwiki(txtæ ¼å¼ï¼çº¯ææ¬)åflan(jsonæ ¼å¼ï¼éæ©é¢)çæ ·ä¾ï¼å¯ä»¥ä¸è½½åæä¸åæä»¶è·¯å¾ä¸çraw_dataè¿è¡æä»¶ç»ç»ï¼å®æåç»æ¥éª¤çå°è¯ã
-
CPMBee/ âââ src | âââ ... âââ raw_dataï¼åå§æ°æ®ä½ç½®ï¼ âââ wiki | âââ raw.txtï¼txtåå§æ°æ®ï¼ âââ flan âââ raw.jsonï¼jsonåå§æ°æ®ï¼
- éè¦å°æ¯ä¸ªæ ·æ¬æ¾ç½®ä¸ºä¸è¡ï¼æ¢è¡è¿è¡è½¬ä¹å为\nï¼æ ¼å¼å¯ä¸ºtxtä¹å¯ä¸ºjsonï¼ä¾å¦ï¼
-
æ°æ®éçæ
- CPMBee为äºé«æè¯»åæ°æ®ä»¥åå¨åå¸å¼æä»¶ç³»ç»ä¸è¿è¡æ°æ®éé¨ç½²ï¼éè¦å°å
¶è½¬åæäºè¿å¶æä»¶ï¼å
·ä½è°ç¨srcä¸çbuild_dataset.pyï¼å
·ä½åæ°å
æ¬ï¼
- --input-path: å¯¼å ¥çåå§æ°æ®è·¯å¾ï¼ç¨åºä¼å°è·¯å¾ä¸çæä»¶ç»ä¸æå è¿è¡å¤ç
- --output-path: 导åºçæ°æ®éè·¯å¾
- --output-name: 导åºçæ°æ®éåç§°
- --data-type: txt/json
- --min-length: å°äºæå°é¿åº¦çæ°æ®å°è¢«æå¼
- --max-length: è¶ è¿æå¤§é¿åº¦çæ°æ®å°è¢«åå
- txtæ ¼å¼çåå§æ°æ®å°æç §min-lengthåmax-lengthè¿è¡ååï¼ç¶åç»ä¸ä»¥{'text':'......'}çjsonæ ¼å¼å¯¼åºå°æ°æ®é
- 导åºçæ°æ®éå°æä¸¤ä¸ªæä»¶ï¼ä¸ä¸ªå为output-nameçäºè¿å¶æä»¶ï¼ä¸ä¸ªmeta.binæä»¶ï¼meta.binæä»¶ä¸è®°å½äºoutput-nameçå
ä¿¡æ¯ï¼å
æ¬ï¼
- "file_name": meta.bin对åºçæä»¶åï¼ä¸è¬å°±æ¯output-name
- "block_begin": æ°æ®éæååå¸åå¨ï¼æ°æ®éæå¨çå¼å§åï¼ä¸è¬æ¯0
- "block_end": æ°æ®éæååå¸åå¨ï¼æ°æ®éæå¨çç»æåï¼ä¸è¬æ¯æ»åæ°
- "nbytes": 60221163, æ»çæ°æ®é大å°
- "nlines": 41733, æ»çæ°æ®éè¡æ°
- "block_size": 16777216ï¼æ°æ®éæ¯å大å°
- æ¡ä¾ï¼æä»¬å°æ ·ä¾ç»å®çwikiåflançæä¸ºæ°æ®éï¼
-
$ cd CPMBee/src $ python build_dataset.py --input-path ../raw_data/wiki/ --output-path ../datasets/wiki/ --output-name wiki --data-type txt --min-length 100 --max-length 10000 $ python build_dataset.py --input-path ../raw_data/flan/ --output-path ../datasets/flan/ --output-name flan --data-type json
- çæä¹åçæä»¶ç»æä¸ºï¼
CPMBee/ âââ src | âââ ... | âââ build_dataset.py âââ raw_data | âââ wiki | | âââ raw.txt | âââ flan | âââ raw.json âââ datasetsï¼çæçæ°æ®éï¼ âââ wikiï¼wiki对åºçæ°æ®éï¼ | âââ data | âââ wiki | âââ meta.bin âââ flanï¼flan对åºçæ°æ®éï¼ âââ data âââ flan âââ meta.bin
- CPMBee为äºé«æè¯»åæ°æ®ä»¥åå¨åå¸å¼æä»¶ç³»ç»ä¸è¿è¡æ°æ®éé¨ç½²ï¼éè¦å°å
¶è½¬åæäºè¿å¶æä»¶ï¼å
·ä½è°ç¨srcä¸çbuild_dataset.pyï¼å
·ä½åæ°å
æ¬ï¼
-
ä»»å¡è½¬æ¢èæ¬
- å¯¹äºæ¯ä¸ªæ°æ®éï¼å¯ä»¥æ°åä»»å¡è½¬æ¢èæ¬æ¥å¯¹æ°æ®éä¸çjsonæ ¼å¼è¿è¡æ¹åï¼æ¹åæåç±»é¢è®ç»ä»»å¡ã
- èæ¬æ ¼å¼éæ»¡è¶³ä»¥ä¸æ ¼å¼ï¼
import random def transform(data, num_sample: int, r: random.Random): ...
- å¯¹äºæ¯ä¸ªæ°æ®éï¼CPMBeeçåºå±æä»¶ç³»ç»å°ä¼èªå¨å¯¼å ¥æ°æ®éï¼è¯»åºæ°æ®ï¼ç¶åè°ç¨ä»»å¡è½¬æ¢èæ¬è¿è¡æ¹é ã
- 转æ¢èæ¬å å«ä¸ä¸ªè¾å ¥åæ°ï¼dataä¸ºè¯»åºæ ·æ¬ï¼num_sample为读åºçæ ·æ¬æ°éï¼é常为1æ¡ï¼in-context learning设å®ä¸ä¼æå¤æ¡ï¼ï¼rä¸ºéæºçæå¨ã
- æ¡ä¾ï¼é对wikiåflanå转æ¢èæ¬ï¼
- wikièæ¬
import random def rand(n: int, r: random.Random): return int(r.random() * n) def transform(data, num_sample: int, r: random.Random): # æç §ä¹åçæ¥éª¤ï¼wikiä¸çæ°æ®é½ä¸º{'text':'...'}å½¢å¼ text = data['text'] # éæºé®è½50%~100%çå 容è¿è¡é¢æµ mid = rand(len(text) // 2, r) # CPMBeeéè¦<æ¥è¯å«ç¹æ®é®ï¼æä»¥éè¦å°å 容ä¸ç<转æ¢ä¸º<<è¿è¡è½¬ä¹ ipt = text[:mid].replace("<", "<<") ans = text[mid:].replace("<", "<<") return {"input": ipt, "<ans>": ans}
- flanèæ¬
import random def transform(data, num_sample: int, r: random.Random): # æç §ä¹åçæ¥éª¤ï¼flanä¸çæ°æ®å·²ç»æ¯éæ©é¢çjsonæ ¼å¼äºï¼ä¸å å«<ans>é®ï¼æä»¥ç´æ¥è¿åè¿è¡è®ç» return data
- åå®ä»»å¡è½¬æ¢èæ¬åçæä»¶ç»æä¸ºï¼
CPMBee/ âââ src | âââ ... | âââ build_dataset.py âââ raw_data | âââ wiki | | âââ raw.txt | | | âââ flan | âââ raw.json âââ datasets âââ wiki | âââ data | | âââ wiki | | âââ meta.bin | âââ transform.pyï¼wiki对åºçä»»å¡è½¬æ¢èæ¬ï¼ âââ flan âââ data | âââ flan | âââ meta.bin âââ transform.pyï¼flan对åºçä»»å¡è½¬æ¢èæ¬ï¼
-
æ°æ®éèæ¬
- ææåä¸è®ç»çæ°æ®ééè¦ä¸ä¸ªæ°æ®éèæ¬æ¥è¿è¡ä¿¡æ¯æ±æ»ï¼æ°æ®éèæ¬ä¹æ¯ä¸ä¸ªjsonæä»¶ï¼æ ¼å¼å¦ä¸
-
[ { "dataset_name": "wiki", "task_name": "lm", "weight": 1.0, "path": "wiki/data", "incontext_weight": [1.0], "transforms": "wiki/transform.py" }, { "dataset_name": "flan", "task_name": "nlu", "weight": 1.0, "path": "flan/data", "incontext_weight": [1.0], "transforms": "flan/transform.py" } ]
- å
¶ä¸ï¼å
å«åæ°æï¼
- dataset_name: æ°æ®éåç§°ï¼
- task_name: æ°æ®éæå±ä»»å¡ï¼task_name+dataset_nameå°ä½ä¸ºè®ç»è¿ç¨ä¸è¯å«æ°æ®éçæ ç¾ï¼task_nameåå¯ç¨äºè®ç»è¿ç¨ä¸é对任å¡å嫿±æ»lossä¿¡æ¯ï¼
- weight: éæ ·æéï¼
- path: meta.binãäºè¿å¶æ°æ®å¯¹åºçè·¯å¾ï¼
- transforms: ä»»å¡è½¬æ¢èæ¬å¯¹åºçè·¯å¾ï¼
- incontext_weight: è®ç»æ ·æ¬å å ï¼[1.0]表示100%çæ¦çéæ ·ä¸ä¸ªæ ·æ¬ï¼[0.8, 0.2]表示20%æ¦çéæ ·ä¸¤ä¸ªæ ·æ¬è¿è¡æ¼æ¥ï¼[0.75, 0.1, 0.15]表示15%æ¦çéæ ·ä¸ä¸ªæ ·æ¬ã10%çæ¦çéæ ·ä¸¤ä¸ªæ ·æ¬è¿è¡æ¼æ¥ã
- æ¡ä¾ï¼å宿°æ®éèæ¬æ±æ»wikiåflanæ°æ®éåçæä»¶è·¯å¾ç»æ
CPMBee/ âââ src | âââ ... | âââ build_dataset.py âââ raw_data | âââ wiki | | âââ raw.txt | âââ flan | âââ raw.json âââ datasets âââ datasets.jsonï¼æ°æ®éèæ¬ï¼ âââ wiki | âââ data | | âââ wiki | | âââ meta.bin | âââ transform.py âââ flan âââ data | âââ flan | âââ meta.bin âââ transform.py
- å
¶ä¸ï¼å
å«åæ°æï¼
-
é¢è®ç»èæ¬
- é¢è®ç»èæ¬å¦ä¸
-
#! /bin/bash # æ¯å°æºå¨ç塿° GPUS_PER_NODE=8 # æºå¨å°æ° NNODES=1 # masteræºå¨çIPå端å£ï¼æ´å¤ä¿¡æ¯å¯ä»¥åèpytorchåå¸å¼è®ç»ææ¡£ MASTER_ADDR="localhost" MASTER_PORT=12345 OPTS="" # model and dataset settings # 模åé ç½® OPTS+=" --model-config config/cpm-bee-10b.json" # æ¥éª¤4æ°æ®éèæ¬ä½ç½® OPTS+=" --dataset ../datasets/datasets.json" # training settings # è®ç»æ¥æ° OPTS+=" --train-iters 200000" # åå¡çbatch size OPTS+=" --batch-size 2" # æ ·æ¬æå¤§é¿åº¦ï¼æ³¨æCPMBeeåºå±ä¼æ¼æ¥æ°æ®ç¡®ä¿max-lengthçå©ç¨æç OPTS+=" --max-length 2048" # å¦ä¹ çï¼å¦ææ¥çä¹åçckptç»§ç»è®ç»ï¼å»ºè®®æ¹å° OPTS+=" --lr 0.01" # warmupæ¥æ° OPTS+=" --warmup-iters 2000" # å¦ä¹ çä¸éçæºå¶ OPTS+=" --lr-decay-style noam" # weight decayï¼è¿ä¸ªä¼ç»åå°AdamWä¸ OPTS+=" --weight-decay 0.01" # 梯度è£åªçèå´ OPTS+=" --clip-grad 1.0" # æ··å精度losså åç³»æ° OPTS+=" --loss-scale 1048576" # æ··å精度losså åç³»æ°çå¢é¿/éä½åæ° OPTS+=" --loss-scale-factor 2" # æ¯éå¤å°æ¥losså åç³»æ°è¿è¡å¢é¿ OPTS+=" --loss-scale-steps 128" # log settings # æ¯éå¤å°æ¥æå°åæ°å弿¹å·®ã梯度å弿¹å·® OPTS+=" --inspect-iters 100" # logæä»¶è¾åºè·¯å¾ OPTS+=" --log-dir ../logs/train/" # tensorboardæä»¶è¾åºè·¯å¾ OPTS+=" --tensorboard ../logs/tensorboard/cpm_live_48_4096/" # saving ckpts # æ¯éå¤å°æ¥è¾åºckpt OPTS+=" --save-iters 500" # è¾åºckptçè·¯å¾ OPTS+=" --save ../results/" # è¾åºckptçåç§°ï¼CPMBeeå¨è¾åºckptæ¶ä¼æå°æ¥æ° OPTS+=" --save-name cpm_live_checkpoint" # loading ckptsï¼å¦æå è½½èçckptå°±æä¸å注éæå¼ï¼ç¶åå¡«åMODEL_STEPS # MODEL_STEPS="0" # OPTS+=" --start-step ${MODEL_STEPS}" # OPTS+=" --load ../results/cpm_live_checkpoint-${MODEL_STEPS}.pt" # æ¯å¦å è½½å岿¢¯åº¦ # OPTS+=" --load-grad " CMD="torchrun --nnodes=${NNODES} --nproc_per_node=${GPUS_PER_NODE} --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=${MASTER_ADDR}:${MASTER_PORT} pretrain_cpm_bee.py ${OPTS}" echo ${CMD} $CMD
- æ¡ä¾ï¼åå®é¢è®ç»èæ¬åçæä»¶è·¯å¾ç»æ
-
CPMBee/ âââ src | âââ scripts | | âââ pretrain_cpm_bee.shï¼é¢è®ç»èæ¬ï¼ | âââ pretrain_cpm_bee.py | âââ build_dataset.py âââ raw_data | âââ wiki | | âââ raw.txt | âââ flan | âââ raw.json âââ datasets âââ datasets.json âââ wiki | âââ data | | âââ wiki | | âââ meta.bin | âââ transform.py âââ flan âââ data | âââ flan | âââ meta.bin âââ transform.py
-
é¢è®ç»å½ä»¤
-
cd CPMBee/src bash scripts/pretrain_cpm_bee.sh
- æ¡ä¾ï¼åå®é¢è®ç»èæ¬åçæä»¶è·¯å¾ç»æ
-
CPMBee/ âââ src | âââ scripts | | âââ pretrain_cpm_bee.sh | âââ pretrain_cpm_bee.py | âââ build_dataset.py âââ resultsï¼ckptè¾åºè·¯å¾ï¼ âââ logsï¼logæä»¶è¾åºè·¯å¾ï¼ âââ raw_data | âââ wiki | | âââ raw.txt | âââ flan | âââ raw.json âââ datasets âââ datasets.json âââ wiki | âââ data | | âââ wiki | | âââ meta.bin | âââ transform.py âââ flan âââ data | âââ flan | âââ meta.bin âââ transform.py
-
OpenBMB è¡çåè½
åºäºOpenBMBç大模åç³»ç»çæï¼æä»¬å¨è®ç»CPM-Beeçè¿ç¨ä¸å®ç°äºå ¨æµç¨é«æãåæ¶æä¾äºæ¨¡åå¾®è°ï¼åºäºBMTrainåOpenDeltaï¼ãå·¥å ·ä½¿ç¨ï¼åºäºBMToolsï¼ã模åå缩ï¼åºäºBMCookï¼ãä½èµæºæ¨çï¼åºäºBMInfï¼çå ¨å¥èæ¬ï¼å¯ä»¥åå©å¼åè å¿«é䏿å使ç¨CPM-Beeã
模åå¾®è°
åºäºBMTrainåOpenDeltaï¼æä»¬ç»åºäºä¸¤ç§å¾®è°æ¹æ¡ï¼å ¨åæ°å¾®è°å忰髿çå¢éå¾®è°ï¼å¯ä»¥å°CPM-Beeéé å°åç±»ä¸æ¸¸åºæ¯ä¸ã
- å ¨åæ°å¾®è°ï¼
$ torchrun --nnodes=1 --nproc_per_node=4 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py
- å¢éå¾®è°ï¼
$ torchrun --nnodes=1 --nproc_per_node=4 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py \
--use-delta \
å¾®è°æµç¨
è¦å¨ç¹å®ä»»å¡ä¸å¾®è°æ¨¡åï¼æ¨åºè¯¥å夿°æ®éå¹¶æå¦ä¸æ¹å¼æ§è¡ï¼
- è°æ´æ°æ®æ ¼å¼ã
æ¨å¯ä»¥å°åç±»é®é¢éæå°éæ©é¢çæ ¼å¼ä¸ãæå
³æ°æ®æ ¼å¼çæ´å¤ä¿¡æ¯ï¼æ¨å¯ä»¥æ¥çCPM-Beeæ°æ®æ ¼å¼
åºå½æ³¨æï¼ç±äºæä»¬éå®<...>
ä½ä¸ºç¹æ®tokençæ è®°ï¼å¯è½ä¸ææ¬ä¸ç<
æ··æ·ï¼æä»¥æ¨åºå½å¯¹ææ¬æ°æ®ä¸çéç¹æ®tokençé¨åï¼å转ä¹å¤çãä¾å¦ï¼æä»¬æå¦ä¸æ°æ®
è¯¥æ°æ®ä¸ï¼{"input": "å¢éé åé常éè¦ï¼å¦æä¸è½åå°<mask_0>ï¼åå¯è½ä¼é æ1+1<2çç»æï¼æä»¥ï¼è¦æ´å 注æ<mask_1>", "<ans>": {"<mask_0>": "", "<mask_1>": ""}}
<mask_0>
ä¸<mask_1>
æ¯ç¹æ®tokenï¼åºä¿æä¸åï¼å ¶ä½<
åæ¿æ¢ä¸º<<
ï¼è½¬ä¹å¤çåçæ°æ®å¦ä¸:{"input": "å¢éé åé常éè¦ï¼å¦æä¸è½åå°<mask_0>ï¼åå¯è½ä¼é æ1+1<<2çç»æï¼æä»¥ï¼è¦æ´å 注æ<mask_1>", "<ans>": {"<mask_0>": "", "<mask_1>": ""}}
- å°æ°æ®éé¢å¤ç为äºè¿å¶æä»¶ã è¦æå»ºé¢å¤çæ°æ®éï¼æ¨å¯ä»¥è¿è¡
$ python preprocess_dataset.py --input your/reformated/data/path --output_path your/binary/data/path --output_name data_name
é¢å¤çåï¼æ¨å°è·å¾ï¼
|-- your/binary/data/path
|-- folder1
| |-- data_name
| |-- meta.bin
|-- folder2
|-- data_name
|-- meta.bin
- å¾®è°CPM-Bee è¦å¼å§å¾®è°ï¼æ¨å¯ä»¥è¿è¡ï¼
$ bash scripts/finetune_cpm_bee.sh
æè æ¨å¯ä»¥ç´æ¥éè¿torchrunè¿è¡finetune_cpm_bee.pyãä¾å¦ï¼æ¨å¯ä»¥å¨å ·æ4åGPUçæå¡å¨ä¸å¯¹CPM-Beeè¿è¡å¢éå¾®è°ï¼å¦ä¸æç¤ºï¼
torchrun --nnodes=1 --nproc_per_node=4 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py \
--model-config your/model/config/path \
--load your/model/checkpoint/path \
--dataset your/binary/data/path/folder1 \
--eval_dataset your/binary/data/path/folder2 \
--use-delta
æä»¬å»ºè®®æ¨ä½¿ç¨ä¸è¿°æ¹æ¡å¾®è°ï¼åæ¶æ¨å¯ä»¥åèð¤Transformersï¼ä½¿ç¨æ¨èªå·±çå¹¶è¡åçç¥æ¥å¾®è°CPM-Beeã
模åå缩
åºäºBMCookï¼æä»¬å¯¹åå§çCPM-Beeåºåº§æ¨¡åè¿è¡åç¼©ï¼æä¾äºå¤ç§å¤§å°çCPM-Beeæ¨¡åæ¥éåºåç§ä¸åçåºæ¯ãæ¤å¤ï¼æä»¬é对ä¸å大å°ç模å齿ä¾äºåºäºð¤Transformersççæ¬ï¼æ¨å¯ä»¥ç¹å»ä¸æ¹é¾æ¥è¿å ¥æ¨¡åä»åºæ¥çæ´å¤ä¿¡æ¯ã
模å | #Attnå± | #FFNå± | Attnéç¶æç»´åº¦ | FFNéç¶æç»´åº¦ | ä¸è½½ | ð¤Transformers |
---|---|---|---|---|---|---|
CPM-Bee-10B | 48 | 48 | 4096 | 10240 | 龿¥ | 龿¥ |
CPM-Bee-5B | 19 | 24 | 4096 | 10240 | 龿¥ | 龿¥ |
CPM-Bee-2B | 19 | 24 | 2048 | 5120 | 龿¥ | 龿¥ |
CPM-Bee-1B | 19 | 24 | 1280 | 1024 | 龿¥ | 龿¥ |
模åé¨ç½²
对äºå缩åçCPM-Beeï¼æ®éçæ¶è´¹çº§æ¾å¡å³å¯å®æå¿«éæ¨çï¼ä¸å大å°ç模åæå ç¨çæ¨çèµæºå¦ä¸ï¼
模å | æ¨çæ¾åå ç¨ | æ¨è硬件 |
---|---|---|
CPM-Bee-10B | 20GB | RTX 3090ï¼24 GBï¼ |
CPM-Bee-5B | 11 GB | RTX 3090ï¼24 GBï¼ |
CPM-Bee-2B | 6.7 GB | GTX 1080ï¼8 GBï¼ |
CPM-Bee-1B | 4.1 GB | GTX 1660ï¼6 GBï¼ |
ä½¿ç¨æ¬ä»åº
对äºå ·ä½çæ¨çä»»å¡ï¼æ¨å¯ä»¥æ ¹æ®å é䏿¥çCPM-Beeä»åºç¼åèªå·±çæ¨ç代ç ãè¿éæä»¬ä¸¾ä¸ä¸ªç®åçææ¬çæç¤ºä¾ã
from cpm_live.generation.bee import CPMBeeBeamSearch
from cpm_live.models import CPMBeeTorch, CPMBeeConfig
from cpm_live.tokenizers import CPMBeeTokenizer
import torch
# prepare your input data.
data_list = [
{"input": "ä»å¤©å¤©æ°æ¯çç", "prompt": "å¾ååä¸å¥è¯", "<ans>": ""}
]
# load model
config = CPMBeeConfig.from_json_file("cpm-bee-5b.json")
ckpt_path = "cpm-bee-5b-ckpt.pt"
tokenizer = CPMBeeTokenizer()
model = CPMBeeTorch(config=config)
# load checkpoints
model.load_state_dict(torch.load(ckpt_path), strict=False)
model.cuda()
# use beam search
beam_search = CPMBeeBeamSearch(
model=model,
tokenizer=tokenizer,
)
for data in data_list:
inference_results = beam_search.generate([data], max_length=100, repetition_penalty=1.1)
for res in inference_results:
print(res)
æä»¬è¿å°ä¸é¢ç代ç éæå°ä¸ä¸ªpythonæä»¶text_generation.py
ä¸ï¼ä¸ºäºä¾¿äºæ¨çï¼å¯ä»¥ç´æ¥è¿è¡è¯¥æä»¶ï¼
python text_generation.py
妿æ¨çæ¾åè¾å°ï¼æ³ä½¿ç¨BMInfè¿è¡ä½èµæºæ¨ç:
python text_generation.py --use-bminf --memory-limit 12
å¦æå¸æä½¿ç¨CPUè¿è¡æ¨çï¼
python text_generation.py --device cpu
妿叿卿¨çæ¶å 载微è°åçdelta模å:
python text_generation.py --delta delta.pt
使ç¨ð¤Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("openbmb/cpm-bee-10b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("openbmb/cpm-bee-10b", trust_remote_code=True).cuda()
result = model.generate({"input": "ä»å¤©å¤©æ°ä¸éï¼", "<ans>": ""}, tokenizer)
print(result)
æä»¬æä¾äºä¸ä¸ªåºäºð¤Transformersçæ¨çèæ¬text_generation_hf.py
ï¼æ¨å¯ä»¥è¿è¡
python text_generation_hf.py
å¤å¡é¨ç½²ï¼
python text_generation_hf.py --multi-gpu
å¤å¡é¨ç½²çåºç¡ä¸ï¼å 载微è°åçdelta模å:
python text_generation_hf.py --multi-gpu --delta delta.pt
ð« æ§è½è¡¨ç°
é¶æ ·æ¬è¯æµ
æä»¬å¯¹CPM-Beeåºåº§æ¨¡åè¿è¡äºå ¨æ¹ä½çä¸è±æè½åè¯æµã å¨ä¸æçZero-CLUEè¯æµåºåä¸ï¼CPM-Beeå¯ä»¥å¤§å¹ è¶ è¶å ¶ä»æ¨¡åï¼ä½åä¸æå¤§æ¨¡å第ä¸ãå¨è±æè¯æµåºåä¸ï¼CPM-Beeä¹å±ç°åºäºå弿ºæ¨¡åLLaMAç¸å½çææã
ZeroCLUEä¸æè¯æµ
模å | Score | EPRSTMT | CSLDCP | TNEWSF | IFLYTEKF | OCNLIF | BUSTM | CHIDF | CSLF | CLUEWSCF |
---|---|---|---|---|---|---|---|---|---|---|
CPM-Bee | 78.184 | 85.52 | 58.99 | 78.2 | 58.81 | 77.73 | 83.85 | 89.65 | 83.6 | 87.24 |
Ctyun_Big_Model | 76.217 | 87.25 | 48.02 | 77.13 | 59.62 | 75.5 | 90.05 | 84.6 | 82.9 | 81.72 |
PaddleNLP-UTC | 70.547 | 85.92 | 58.92 | 68.27 | 40.15 | 74.79 | 76.7 | 82.75 | 70.6 | 74.48 |
äºéç¥-UnifiedMC | 70.295 | 88.71 | 50.18 | 71.67 | 40.58 | 75.5 | 80.15 | 84.85 | 60.6 | 81.72 |
è±æè¯æµ
模å | Average | BoolQ | PIQA | SIQA | HellaSwag | WinoGrande | ARC-e | ARC-c | OBQA |
---|---|---|---|---|---|---|---|---|---|
GPT-3 | 60.5 | 81 | - | 78.9 | 70.2 | 68.8 | 51.4 | 57.6 | |
Gopher | 79.3 | 81.8 | 50.6 | 79.2 | 70.1 | - | - | - | |
Chinchilla | 83.7 | 81.8 | 51.3 | 80.8 | 74.9 | - | - | - | |
PaLM | 84.8 | 80.5 | - | 79.7 | 77 | 75.2 | 52.5 | 50.4 | |
LLaMA-7B | 66.13 | 76.5 | 79.8 | 48.9 | 76.1 | 70.1 | 72.8 | 47.6 | 57.2 |
LLaMA-13B | 68.08 | 78.1 | 80.1 | 50.4 | 79.2 | 73 | 74.8 | 52.7 | 56.4 |
CPM-Bee | 67.80 | 78.69 | 77.58 | 61.11 | 78.89 | 61.88 | 66.88 | 54.18 | 63.20 |
CPM-Bee + Decoder Tuning
使ç¨åOpenBMBåTHUNLPèåèªç çDecoder Tuningï¼å°å表äºACL 2023ï¼ææ¯ï¼å¯ä»¥ä» ä» ä½¿ç¨APIçæ åµä¸ï¼ä¸è®¿é®åä¿®æ¹æ¨¡ååæ°å³å¯å¤§å¹ æé«ä¸æ¸¸ä»»å¡çæ§è½ã å®ç°ä»£ç 龿¥ã
æ ·æ¬æ° | 模å | SST2 | IMDB | Yelp | AGNews | DBpedia | Yahoo | RTE | SNLI | MNLI-m | MNLI-mm | FewNERD | Avg. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CPM-Bee | 80.5 | 89.1 | 96.6 | 74.6 | 71.3 | 46.7 | 84.1 | 45.4 | 45.6 | 45.6 | 1.6 | 61.9 |
16 | T5-3B | 89.9 | 92.7 | 94.9 | 87.7 | 96.2 | 66.5 | 55.8 | 52.0 | 52.8 | 52.2 | 51.9 | 72.1 |
LLaMA-7B | 85.1 | 90.5 | 92.8 | 71.4 | 89.8 | 45.1 | 49.1 | 35.2 | 36.3 | 36.2 | 54.6 | 62.4 | |
Vicuna-13B | 82.1 | 88.8 | 95.6 | 86.4 | 74.4 | 55.3 | 62.5 | 61.4 | 54.3 | 48.6 | 52.1 | 69.2 | |
CPM-Bee | 92.7 | 96.2 | 97.5 | 85.5 | 89.8 | 65.2 | 86.0 | 86.4 | 76.3 | 76.3 | 54.6 | 82.4 | |
64 | LLaMA-7B | 87.5 | 85.7 | 96.9 | 75.4 | 93.5 | 47.4 | 51.4 | 39.4 | 36.2 | 38.4 | 59.8 | 64.7 |
Vicuna-13B | 92.0 | 90.8 | 96.5 | 87.7 | 87.8 | 58.7 | 59.1 | 58.7 | 56.7 | 48.4 | 56.8 | 72.1 | |
CPM-Bee | 94.3 | 96.5 | 98.3 | 88.5 | 93.5 | 68.7 | 87.1 | 88.9 | 78.0 | 79.0 | 59.8 | 84.8 | |
256 | LLaMA-7B | 87.6 | 88.8 | 97.1 | 82.4 | 94.2 | 48.5 | 53.4 | 39.8 | 37.3 | 37.4 | 59.1 | 66.0 |
Vicuna-13B | 93.1 | 88.7 | 96.8 | 89.9 | 89.1 | 58.6 | 58.5 | 58.7 | 57.5 | 48.3 | 56.6 | 72.3 | |
CPM-Bee | 94.5 | 96.7 | 98.4 | 89.7 | 94.2 | 69.9 | 87.7 | 89.4 | 81.7 | 80.6 | 59.1 | 85.6 |
ð弿ºåè®®
模ååè®®
CPM-Beeåºåº§éç¨å议为âéç¨æ¨¡å许å¯åè®®-æ¥æºè¯´æ-å®£ä¼ éå¶-å䏿æâï¼æ¬æ¨¡åå 许åç¨ï¼å¦éå°æ¨¡åç¨äºåä¸ç¨éï¼è¯·èç³»cpm@modelbest.cnæ¥è·å书颿æã
声æ
ä½ä¸ºä¸ä¸ªè¯è¨æ¨¡åï¼CPM-Beeéè¿å¦ä¹ 大éçææ¬æ¥çæå 容ï¼ä½å®æ æ³çè§£ã表达个人è§ç¹æä»·å¼å¤æï¼å®æè¾åºçä»»ä½å 容é½ä¸ä»£è¡¨æ¨¡åå¼åè çè§ç¹åç«åºã å æ¤ç¨æ·å¨ä½¿ç¨CPM-Beeçæçå 容æ¶ï¼åºèªè¡è´è´£å¯¹å ¶è¿è¡è¯ä¼°åéªè¯ã
Top Related Projects
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
An open-source NLP research library, built on PyTorch.
TensorFlow code and pre-trained models for BERT
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot