CPM-Bee

百亿参数的中英文双语基座大模型

2,434

192

2,434

View on GitHub

Top Related Projects

petals

9,741

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

gpt-neox

7,276

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

allennlp

11,862

An open-source NLP research library, built on PyTorch.

bert

39,267

TensorFlow code and pre-trained models for BERT

Quick Overview

CPM-Bee is an open-source large language model (LLM) developed by the OpenBMB team. It is designed to be a powerful, efficient, and versatile model for various natural language processing tasks, with a focus on Chinese language capabilities.

Pros

Specialized in Chinese language processing while maintaining multilingual capabilities
Open-source and freely available for research and commercial use
Efficient performance with relatively small model size (10B parameters)
Supports a wide range of NLP tasks, including text generation, summarization, and question-answering

Cons

May not perform as well as larger models (e.g., GPT-3) on certain complex tasks
Limited documentation and community support compared to more established LLMs
Potential biases in training data, as with most large language models
Requires significant computational resources for fine-tuning and deployment

Code Examples

# Example 1: Text Generation
from cpm_bee import CPMBee

model = CPMBee.from_pretrained("openbmb/cpm-bee-10b")
generated_text = model.generate("Once upon a time, in a land far away,", max_length=100)
print(generated_text)

# Example 2: Question Answering
question = "What is the capital of France?"
context = "France is a country in Western Europe. Its capital city is Paris, known for the Eiffel Tower."
answer = model.answer_question(question, context)
print(answer)

# Example 3: Text Summarization
long_text = "..." # A long article or document
summary = model.summarize(long_text, max_length=150)
print(summary)

Getting Started

To get started with CPM-Bee, follow these steps:

Install the library:

pip install cpm-bee

Import and use the model:

from cpm_bee import CPMBee

model = CPMBee.from_pretrained("openbmb/cpm-bee-10b")
result = model.generate("Hello, world!")
print(result)

Note: Ensure you have sufficient GPU resources to run the model efficiently.

Competitor Comparisons

petals

9,741

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Pros of petals

Focuses on distributed inference of large language models
Supports multiple models including BLOOM and OPT
Offers a user-friendly API for easy integration

Cons of petals

Limited to specific pre-trained models
May require more setup and configuration for distributed computing
Less flexibility in terms of model customization

Code comparison

CPM-Bee:

from cpm_live import CPMBee

model = CPMBee("path/to/model")
response = model.generate("Hello, how are you?")
print(response)

petals:

import petals

model = petals.AutoDistributedModelForCausalLM.from_pretrained("bigscience/bloom")
inputs = model.tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs)
print(model.tokenizer.decode(outputs[0]))

Both repositories provide APIs for working with large language models, but CPM-Bee focuses on a specific model (CPM-Bee) while petals supports multiple distributed models. CPM-Bee offers a simpler interface for basic text generation, while petals provides more flexibility in terms of model selection and distributed computing capabilities.

gpt-neox

7,276

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Pros of gpt-neox

More extensive documentation and examples
Larger community and more active development
Better support for distributed training across multiple GPUs/nodes

Cons of gpt-neox

Higher computational requirements
More complex setup and configuration process
Steeper learning curve for beginners

Code Comparison

CPM-Bee:

from cpm_live import CPMBee

model = CPMBee("path/to/model")
response = model.generate("Hello, how are you?")
print(response)

gpt-neox:

from gpt_neox import GPTNeoX

model = GPTNeoX.from_pretrained("EleutherAI/gpt-neox-20b")
input_ids = tokenizer.encode("Hello, how are you?", return_tensors="pt")
output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(output[0]))

Both repositories provide implementations of large language models, but gpt-neox offers more flexibility and scalability for advanced users, while CPM-Bee focuses on simplicity and ease of use for quick deployment and experimentation.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

More comprehensive and widely adopted optimization toolkit for deep learning
Supports a broader range of models and training scenarios
Offers advanced features like ZeRO-Offload and 3D parallelism

Cons of DeepSpeed

Steeper learning curve due to its extensive feature set
May require more configuration and tuning for optimal performance
Potentially higher overhead for simpler use cases

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(
    args=args,
    model=model,
    model_parameters=params
)

CPM-Bee:

from cpm_bee import CPMBeeConfig, CPMBee
config = CPMBeeConfig.from_pretrained("path/to/config")
model = CPMBee.from_pretrained("path/to/model", config=config)

Summary

DeepSpeed is a more comprehensive optimization toolkit for deep learning, offering a wide range of features and optimizations. It's suitable for large-scale training scenarios and supports various models. However, it may have a steeper learning curve and require more configuration.

CPM-Bee, on the other hand, appears to be more focused on a specific model architecture (CPM-Bee) and may be easier to use for that particular use case. It likely has a simpler setup process but may not offer as many advanced optimization techniques as DeepSpeed.

The choice between the two depends on the specific requirements of your project, the scale of your training, and the level of optimization needed.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

Extensive model support: Covers a wide range of transformer-based models
Active community: Regular updates and contributions from a large user base
Comprehensive documentation: Detailed guides and examples for various tasks

Cons of transformers

Complexity: Can be overwhelming for beginners due to its extensive features
Resource-intensive: Some models require significant computational resources

Code comparison

transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs)

CPM-Bee:

from cpm_live import CPMBeeModel

model = CPMBeeModel.from_pretrained("cpm-bee-10b")
response = model.generate("Hello, how are you?")
print(response)

The transformers library offers a more flexible approach, allowing for easy switching between different models and tasks. CPM-Bee provides a simpler interface specifically tailored for the CPM-Bee model, which may be more straightforward for users focused on this particular model.

allennlp

11,862

An open-source NLP research library, built on PyTorch.

Pros of AllenNLP

Comprehensive documentation and tutorials
Extensive pre-built models and datasets
Strong community support and regular updates

Cons of AllenNLP

Steeper learning curve for beginners
More complex setup and configuration

Code Comparison

AllenNLP:

from allennlp.predictors import Predictor

predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz")
result = predictor.predict(sentence="Did Uriah honestly think he could beat the game in under three hours?")

CPM-Bee:

from cpm_live import CPMBee

model = CPMBee("path/to/model")
response = model.generate("What is the capital of France?", max_tokens=50)
print(response)

AllenNLP offers a more structured approach with pre-built models and predictors, while CPM-Bee provides a simpler interface for text generation. AllenNLP is better suited for complex NLP tasks and research, whereas CPM-Bee focuses on ease of use for general language generation.

bert

39,267

TensorFlow code and pre-trained models for BERT

Pros of BERT

Widely adopted and well-documented, with extensive research and community support
Pre-trained models available for various languages and tasks
Proven effectiveness in numerous NLP applications

Cons of BERT

Larger model size and higher computational requirements
Limited to a maximum sequence length of 512 tokens
Less flexible for fine-tuning on specific downstream tasks

Code Comparison

BERT:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

CPM-Bee:

from openbmb import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("openbmb/cpm-bee-10b")
model = AutoModel.from_pretrained("openbmb/cpm-bee-10b")

Key Differences

CPM-Bee is a more recent model with potential improvements in Chinese language understanding
BERT has a larger ecosystem of pre-trained models and fine-tuning examples
CPM-Bee may offer better performance on specific Chinese NLP tasks
BERT is more suitable for general-purpose NLP applications across multiple languages

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

CPM-Bee

ç¾äº¿åæ°çå¼æºä¸è±æåè¯åºåº§å¤§æ¨¡å

æ¨¡å â¢ OpenBMBä½ç³» â¢ æ§è½è¡¨ç° â¢ å¼æºåè®®

â¨ æ¨¡åä»ç»

**ð å¼æºå¯åç¨**ï¼OpenBMBå§ç»ç§æ¿âè®©å¤§æ¨¡åé£å¥åå®¶ä¸æ·âçå¼æºç²¾ç¥ï¼CPM-Beeåºåº§æ¨¡åå°å®å¨å¼æºå¹¶ä¸å¯åç¨ï¼ä»¥æ¨å¨å¤§æ¨¡åé¢åçåå±ãæä»¬é¼å±å¨çèå´åçç§ç æºæãä¼ä¸åä¸ªäººå¼åèå¨éµå®å¼æºè®¸å¯åè®®çåæä¸ï¼èªç±å°å¨CPM-Beeåºåº§æ¨¡åä¸è¿è¡åæ°ã
ð« ä¸è±åè¯æ§è½ä¼å¼ï¼CPM-Beeåºåº§æ¨¡åå¨é¢è®ç»è¯æä¸è¿è¡äºä¸¥æ ¼ççéåéæ¯ï¼åæ¶å¨ä¸è±åè¯ä¸å·æäº®ç¼è¡¨ç°ï¼å·ä½å¯åè§è¯æµä»»å¡åç»æã
ð è¶å¤§è§æ¨¡é«è´¨éè¯æï¼CPM-Beeåºåº§æ¨¡åå¨è¶ä¸äº¿è¯æè¿è¡è®ç»ï¼æ¯å¼æºç¤¾åºåç»è¿è¯ææå¤çæ¨¡åä¹ä¸ãåæ¶ï¼æä»¬å¯¹é¢è®ç»è¯æè¿è¡äºä¸¥æ ¼ççéãæ¸æ´ååå¤çä»¥ç¡®ä¿è´¨éã
OpenBMBå¤§æ¨¡åç³»ç»çææ¯æï¼OpenBMBå¤§æ¨¡åç³»ç»å´ç»é«æ§è½é¢è®ç»ãééãåç¼©ãæ¨çå¼åäºä¸ç³»åå·¥å·ï¼CPM-Beeåºåº§æ¨¡åå°éå¥ææçå·¥å·èæ¬ï¼é«ææ¯æå¼åèè¿è¡è¿é¶ä½¿ç¨ã
ð¨ å¯¹è¯åå·¥å·ä½¿ç¨è½åï¼ ç»åOpenBMBå¨æä»¤å¾®è°åå·¥å·å¦ä¹ çæ¢ç´¢ï¼æä»¬å¨CPM-Beeåºåº§æ¨¡åçåºç¡ä¸è¿è¡å¾®è°ï¼è®ç»åºäºå·æå¼ºå¤§å¯¹è¯åå·¥å·ä½¿ç¨è½åçå®ä¾æ¨¡åï¼APIååæµå°äºè¿æå¼æ¾ã

Read this in English.

æ¬ä»åºä¸»è¦æä¾ CPM-Bee åºåº§æ¨¡å

ð° æ´æ°ä¿¡æ¯

[2023/06/30] åºäºCPM-Beeçå¤æ¨¡æç³»åæ¨¡åVisCPMåå¸ï¼æ¯æå¤æ¨¡æå¯¹è¯åæçå¾ï¼
[2023/06/16] CPM-Beeç°å·²æ¯æð¤Transformersã
[2023/06/08] æ´æ°äºä½¿ç¨CPM-Beeè¿è¡åºç¡ä»»å¡å¾®è°çæç¨ã
[2023/05/27] ç¾äº¿åæ°ï¼åè®¸åç¨çä¸è±åè¯åºåº§æ¨¡åCPM-Beeå¼æºäºï¼å®æ¯CPM-Liveçç¬¬äºä¸ªéç¨ç¢ã

ð¯ CPM-Beeç³»åæ¨¡å

æ¨¡å	æè¿°
VisCPM	æ¯æå¤æ¨¡æå¯¹è¯åå¾æååçæçå¼æºä¸è±åè¯å¤æ¨¡æå¤§æ¨¡å
WebCPM	æ¯æå¤æé®çåä¸ç½æ£ç´¢çå¼æºä¸æå¤§æ¨¡å

ð å®è£åä½¿ç¨

æ¨éè¦åéè¯¥ä»åºï¼

$ git clone -b main --single-branch https://github.com/OpenBMB/CPM-Bee.git

å¹¶ç¡®ä¿æ¨çç¯å¢ç¬¦åè¦æ±ï¼

- python>=3.7
- torch>=1.10,<2.0.0

$ cd src
$ pip install -r requirements.txt

æ¨¡å

10Bæ¨¡åä¸è½½é¾æ¥ï¼å¦æè¦ä½¿ç¨ð¤Transformersè¿è¡æ¨¡åï¼è¯·åèè¿éï¼ã

æ°æ®æ ¼å¼

ä¸åäºå·²æåºåº§æ¨¡åéç¨éç»æåçèªç±ææ¬å½¢å¼ç»ç»æ°æ®ï¼CPM-Beeéç¨ç»æåçjsonæ ¼å¼æ¥ç»ç»æ°æ®ãå¯¹äºç»æåæ°æ®ï¼CPM-Beeçåºåº§æ¨¡åå¯ä»¥åç¡®å°è¿è¡è¯ä¹çè§£ï¼é«æå®æåç±»åºç¡ä»»å¡ï¼åæ¬ï¼å¡«ç©ºãææ¬çæãç¿»è¯ãé®çãè¯åé¢æµãææ¬éæ©é¢ççï¼ä¸é¢ç»åºä¸äºä»£è¡¨æ§ä»»å¡çæ¨¡æ¿ï¼

  "å¡«ç©º":{
    "input": "å¿çå¦é¢åçç ç©¶äººååç°ï¼ååºéè¦å³å®çæå¥½æ¹æ³ä¹ä¸ï¼æ¯å¦éæ©ä¸æå¤§å¦æ<mask_0>ï¼é½æ¶åå°ä½¿ç¨å³çå·¥ä½è¡¨ãç ç©¶ä¼åçå¿çå¦å®¶å°<mask_1>ä¸çè®ºçæ³å³çè¿è¡æ¯è¾ï¼ççå®ä»¬æå¤ç¸ä¼¼ãå·¥ä½è¡¨ç¨åºçæ¯æèè®¤ä¸ºå®ä¼äº§çæä¼çï¼ä¹å°±æ¯è¯´ï¼æå¥½çå³çãè½ç¶æ<mask_2>å¯ä»¥æ¥åï¼ä½å®ä»¬å¨æ¬è´¨ä¸é½æ¯ç¸ä¼¼çã",
    "<ans>":{
      "<mask_0>":"",
      "<mask_1>":"",
      "<mask_2>":""
    }
  }

  "ææ¬çæ": {
    "input": "ä»å¤©å¤©æ°å¾å¥½ï¼æåå¦å¦ä¸èµ·å»å¬åï¼", 
    "prompt": "å¾ååçº¦100å", 
    "<ans>": ""
  }

  "ç¿»è¯": {
    "input": "åäº¬æ¯ä¸å½çé¦é½", 
    "prompt": "ä¸ç¿»è±", 
    "<ans>": ""
  }

  "é®ç": {
    "input": "NGC 6231æ¯ä¸ä¸ªä½äºå¤©èåº§ççæ£æå¢ï¼å¤©çåº§æ ä¸ºèµ¤ç»16æ¶54åï¼èµ¤çº¬-41åº¦48åï¼è§è§è§æµå¤§å°çº¦45è§åï¼äº®åº¦çº¦2.6è§æçï¼è·å°ç5900åå¹´ãNGC 6231å¹´é¾çº¦ä¸ºä¸ç¾äºåä¸å¹´ï¼æ¯ä¸ä¸ªéå¸¸å¹´è½»çæå¢ï¼æå¢åçæäº®ææ¯5ççå¤©èåº§ Î¶1æãç¨åçæè¿éæå°åæè¿éå°±è½çå°ä¸ªå«çè¡æãNGC 6231å¨1654å¹´è¢«æå¤§å©å¤©æå¦å®¶ä¹ç¦å°¼Â·å·´èæ¯ç¹Â·éè¿ªå°çº³ï¼Giovanni Battista Hodiernaï¼ä»¥Luminosaeçååé¦æ¬¡çºªå½å¨æè¡¨ä¸ï¼ä½æ¯æªè§è®°è½½äºå¤å°Â·æ¢è¥¿è¶çå¤©ä½åè¡¨åå¨å»Â·èµ«æå°çæ·±ç©ºå¤©ä½ç®å½ãè¿ä¸ªå¤©ä½å¨1678å¹´è¢«ç±å¾·èÂ·åé·ï¼I.7ï¼ã1745å¹´è¢«å¤è¥¿äºç§æ¯ï¼Jean-Phillippe Loys de Cheseauxï¼ï¼9ï¼ã1751å¹´è¢«å°¼å¯æÂ·è·¯æÂ·æå¡ä¼ï¼II.13ï¼åå«åæ¬¡ç¬ç«åç°ã", 
    "question": "NGC 6231çç»çº¬åº¦æ¯å¤å°ï¼", 
    "<ans>": ""
  }

  "è¯åé¢æµ": {
    "input":"ä¹åå¤æ¬¡èé¤é½éæ©è¿éï¼æåç§å¤§å°çåæ¿åæ¶è½å®¹çº³å¾å¤äººï¼ç¯å¢å¥½æç¹è²è¿æè¡¨æ¼ï¼æ´ä½èé¤æ°å´ä¸ä¸è¢«å¸¦å¨èµ·æ¥ãç°å¨ç±äºçç«æ¹æäºçµç¤ç¾ï¼å£æççä¸å¦ä»åï¼ä¸è¿å¶ä»èåé½è¿æ¯ä¸éï¼ç¤ç¾å©ä¸çæéª¨èæåè¿è½åå å·¥ä¸ä¸æ¤ççä¹å¾å¥½åã",
    "question":"è¯åæ¯å¤å°ï¼(1-5)",
    "<ans>":""
  }

  "éæ©é¢": {
    "input": "ç¶æ¯é½å¸æèªå·±çå©åè¯å®ãåæ¢ãæç¤¼è²ãè¦æ³è®©å©åæä¸ºè¿æ ·çäººï¼ç¶æ¯é¦åå¾ä»èªå·±åèµ·ï¼è¦æ¯è¿èªå·±é½åä¸å°ï¼åæè½è¦æ±å©ååå°å¢ï¼", 
    "options": {
      "<option_0>": "å°æè¦æ±", 
      "<option_1>": "éä½æ å",
      "<option_2>": "èªå·±ååå¥½",
      "<option_3>": "è®©å©åæ¿ä¸»æ"
    }, 
    "question": "æè²å©åæ¶ï¼ç¶æ¯åºè¯¥ï¼", 
    "<ans>": ""
  }

æ³¨æå¨æ¨¡åæ¨çæ¶å¯éç¨ä¸è¿°æ¨¡æ¿ï¼å¨æ¨¡åè®ç»æ¶éå¨ä¸""å¤å¡«ä¸æ åçæ¡ï¼å¦ï¼

  {
    "input": "åäº¬æ¯ä¸å½çé¦é½", 
    "prompt": "ä¸ç¿»è±", 
    "<ans>": "Beijing is the capital of China"
  }


  {
    "input": "ç¶æ¯é½å¸æèªå·±çå©åè¯å®ãåæ¢ãæç¤¼è²ãè¦æ³è®©å©åæä¸ºè¿æ ·çäººï¼ç¶æ¯é¦åå¾ä»èªå·±åèµ·ï¼è¦æ¯è¿èªå·±é½åä¸å°ï¼åæè½è¦æ±å©ååå°å¢ï¼", 
    "options": {
      "<option_0>": "å°æè¦æ±", 
      "<option_1>": "éä½æ å",
      "<option_2>": "èªå·±ååå¥½",
      "<option_3>": "è®©å©åæ¿ä¸»æ"
    }, 
    "question": "æè²å©åæ¶ï¼ç¶æ¯åºè¯¥ï¼", 
    "<ans>": "<option_2>"
  }

CPM-Beeå¨é¢è®ç»é¶æ®µæ³¨å¥äºä¸äºjsonæ ¼å¼ï¼å¯ä»¥ç´æ¥ä½¿ç¨ï¼ä¹æ¯æç¨æ·èªå·±è®¾è®¡jsonæ ¼å¼ç¶åå¾®è°æ¨¡åãææçjsonæ ¼å¼éè¦æ»¡è¶³ä¸åæ¡ä»¶ï¼
- è¾åºåå®¹**å¿é¡»**ä½¿ç¨ä½ä¸ºé®å¼æ¥ç»ç»ï¼
- éæ©é¢çéé¡¹å»ºè®®ä½¿ç¨<option_xx>æ¥ç»ç»ï¼ä¸xxä¸ºæ°åï¼
- å¡«ç©ºé¢çç©ºç½å»ºè®®ä½¿ç¨<mask_xx>æ¥ç»ç»ï¼ä¸xxä¸ºæ°åï¼
- å ä¸º"<"å¨CPM-Beeä¸ä¼ä½ä¸ºè¯å« ã<option_xx>ã<mask_xx>çè§¦åç¬¦ï¼æä»¥å¨æ°æ®ä¸æä¸**å¿é¡»**å°"<"è½¬åä¸º"<<"è¿è¡è½¬ä¹ï¼ä¾å¦å¨ä¸é¢çä¾åä¸"1 < 2"ã"10 < 8"è¢«è½¬åä¸º"1 << 2"ã"10 << 8"ï¼

  {
    "question": "ä¸é¢åªé¡¹æ¯æ£ç¡®ç", 
    "options": {
      "<option_0>": "1 << 2", 
      "<option_1>": "10 << 8",
    }, 
    "<ans>": "<option_0>"
  }

æ¨¡åé¢è®ç»

æ°æ®æ¸æ´

txtæ ¼å¼

    ...
    ...
    How can cross training benefit groups like runners, swimmers, or weightlifters?\n\n1. Reduces the risk of injury...\n\n2. Improves overall fitness...
    Are there any particular physical benefits to mindful walking, such as improved posture or increased physical fitness?\n\n1. Choose a quiet and peaceful environment...\n\n2. Start by tuning into your breath and becoming aware of your surroundings...
    ... 
    ...

jsonæ ¼å¼

    ...
    ...
    {"template": "Does the answer correctly answer the question", "sentence": "Unicode has the explicit aim of transcending ...", "question": "What is the aim of Unicode?", "options": {"<option_0>": "no", "<option_1>": "yes"}, "<ans>": "<option_1>"}
    ... 
    ...

æ¡ä¾ï¼æä»¬æä¾äºwiki(txtæ ¼å¼ï¼çº¯ææ¬)åflan(jsonæ ¼å¼ï¼éæ©é¢)çæ ·ä¾ï¼å¯ä»¥ä¸è½½åæä¸åæä»¶è·¯å¾ä¸çraw_dataè¿è¡æä»¶ç»ç»ï¼å®æåç»æ¥éª¤çå°è¯ã

  CPMBee/
  âââ src
  |   âââ ...
  âââ raw_dataï¼åå§æ°æ®ä½ç½®ï¼
      âââ wiki
      |   âââ raw.txtï¼txtåå§æ°æ®ï¼
      âââ flan
          âââ raw.jsonï¼jsonåå§æ°æ®ï¼

æ°æ®éçæ
- CPMBeeä¸ºäºé«æè¯»åæ°æ®ä»¥åå¨åå¸å¼æä»¶ç³»ç»ä¸è¿è¡æ°æ®éé¨ç½²ï¼éè¦å°å¶è½¬åæäºè¿å¶æä»¶ï¼å·ä½è°ç¨srcä¸çbuild_dataset.pyï¼å·ä½åæ°åæ¬ï¼
  - --input-path: å¯¼å¥çåå§æ°æ®è·¯å¾ï¼ç¨åºä¼å°è·¯å¾ä¸çæä»¶ç»ä¸æåè¿è¡å¤ç
  - --output-path: å¯¼åºçæ°æ®éè·¯å¾
  - --output-name: å¯¼åºçæ°æ®éåç§°
  - --data-type: txt/json
  - --min-length: å°äºæå°é¿åº¦çæ°æ®å°è¢«æå¼
  - --max-length: è¶è¿æå¤§é¿åº¦çæ°æ®å°è¢«åå
  - txtæ ¼å¼çåå§æ°æ®å°æç§min-lengthåmax-lengthè¿è¡ååï¼ç¶åç»ä¸ä»¥{'text':'......'}çjsonæ ¼å¼å¯¼åºå°æ°æ®é
- å¯¼åºçæ°æ®éå°æä¸¤ä¸ªæä»¶ï¼ä¸ä¸ªåä¸ºoutput-nameçäºè¿å¶æä»¶ï¼ä¸ä¸ªmeta.binæä»¶ï¼meta.binæä»¶ä¸è®°å½äºoutput-nameçåä¿¡æ¯ï¼åæ¬ï¼
  - "file_name": meta.binå¯¹åºçæä»¶åï¼ä¸è¬å°±æ¯output-name
  - "block_begin": æ°æ®éæååå¸åå¨ï¼æ°æ®éæå¨çå¼å§åï¼ä¸è¬æ¯0
  - "block_end": æ°æ®éæååå¸åå¨ï¼æ°æ®éæå¨çç»æåï¼ä¸è¬æ¯æ»åæ°
  - "nbytes": 60221163, æ»çæ°æ®éå¤§å°
  - "nlines": 41733, æ»çæ°æ®éè¡æ°
  - "block_size": 16777216ï¼æ°æ®éæ¯åå¤§å°
- æ¡ä¾ï¼æä»¬å°æ ·ä¾ç»å®çwikiåflançæä¸ºæ°æ®éï¼
- ```
  $ cd CPMBee/src
  $ python build_dataset.py --input-path ../raw_data/wiki/  --output-path ../datasets/wiki/ --output-name wiki --data-type txt --min-length 100 --max-length 10000
  $ python build_dataset.py --input-path ../raw_data/flan/  --output-path ../datasets/flan/ --output-name flan --data-type json
```
  - çæä¹åçæä»¶ç»æä¸ºï¼
```
CPMBee/
âââ src
|   âââ ...
|   âââ build_dataset.py
âââ raw_data
|   âââ wiki
|   |   âââ raw.txt
|   âââ flan
|       âââ raw.json
âââ datasetsï¼çæçæ°æ®éï¼
    âââ wikiï¼wikiå¯¹åºçæ°æ®éï¼
    |   âââ data
    |       âââ wiki
    |       âââ meta.bin
    âââ flanï¼flanå¯¹åºçæ°æ®éï¼
        âââ data
            âââ flan
            âââ meta.bin
```

ä»»å¡è½¬æ¢èæ¬

å¯¹äºæ¯ä¸ªæ°æ®éï¼å¯ä»¥æ°åä»»å¡è½¬æ¢èæ¬æ¥å¯¹æ°æ®éä¸çjsonæ ¼å¼è¿è¡æ¹åï¼æ¹åæåç±»é¢è®ç»ä»»å¡ã
èæ¬æ ¼å¼éæ»¡è¶³ä»¥ä¸æ ¼å¼ï¼
```
    import random
    
    def transform(data, num_sample: int, r: random.Random):
        ...
```
- å¯¹äºæ¯ä¸ªæ°æ®éï¼CPMBeeçåºå±æä»¶ç³»ç»å°ä¼èªå¨å¯¼å¥æ°æ®éï¼è¯»åºæ°æ®ï¼ç¶åè°ç¨ä»»å¡è½¬æ¢èæ¬è¿è¡æ¹é ã
- è½¬æ¢èæ¬åå«ä¸ä¸ªè¾å¥åæ°ï¼dataä¸ºè¯»åºæ ·æ¬ï¼num_sampleä¸ºè¯»åºçæ ·æ¬æ°éï¼éå¸¸ä¸º1æ¡ï¼in-context learningè®¾å®ä¸ä¼æå¤æ¡ï¼ï¼rä¸ºéæºçæå¨ã

æ¡ä¾ï¼éå¯¹wikiåflanåè½¬æ¢èæ¬ï¼

wikièæ¬

import random

def rand(n: int, r: random.Random):
    return int(r.random() * n)

def transform(data, num_sample: int, r: random.Random):
    # æç§ä¹åçæ¥éª¤ï¼wikiä¸çæ°æ®é½ä¸º{'text':'...'}å½¢å¼
    text = data['text']
    # éæºé®è½50%~100%çåå®¹è¿è¡é¢æµ
    mid = rand(len(text) // 2, r)
    # CPMBeeéè¦<æ¥è¯å«ç¹æ®é®ï¼æä»¥éè¦å°åå®¹ä¸ç<è½¬æ¢ä¸º<<è¿è¡è½¬ä¹
    ipt = text[:mid].replace("<", "<<")
    ans = text[mid:].replace("<", "<<")
    return {"input": ipt,
            "<ans>": ans}

flanèæ¬

import random

def transform(data, num_sample: int, r: random.Random):
    # æç§ä¹åçæ¥éª¤ï¼flanä¸çæ°æ®å·²ç»æ¯éæ©é¢çjsonæ ¼å¼äºï¼ä¸åå«<ans>é®ï¼æä»¥ç´æ¥è¿åè¿è¡è®ç»
    return data

åå®ä»»å¡è½¬æ¢èæ¬åçæä»¶ç»æä¸ºï¼

CPMBee/
âââ src
|   âââ ...
|   âââ build_dataset.py
âââ raw_data
|   âââ wiki
|   |   âââ raw.txt
|   |
|   âââ flan
|       âââ raw.json
âââ datasets
    âââ wiki
    |   âââ data
    |   |   âââ wiki
    |   |   âââ meta.bin
    |   âââ transform.pyï¼wikiå¯¹åºçä»»å¡è½¬æ¢èæ¬ï¼
    âââ flan
        âââ data
        |   âââ flan
        |   âââ meta.bin
        âââ transform.pyï¼flanå¯¹åºçä»»å¡è½¬æ¢èæ¬ï¼

æ°æ®éèæ¬
- ææåä¸è®ç»çæ°æ®ééè¦ä¸ä¸ªæ°æ®éèæ¬æ¥è¿è¡ä¿¡æ¯æ±æ»ï¼æ°æ®éèæ¬ä¹æ¯ä¸ä¸ªjsonæä»¶ï¼æ ¼å¼å¦ä¸
- ```
  [
      {
          "dataset_name": "wiki",
          "task_name": "lm",
          "weight": 1.0,
          "path": "wiki/data",
          "incontext_weight": [1.0],
          "transforms": "wiki/transform.py"
      },
      {
          "dataset_name": "flan",
          "task_name": "nlu",
          "weight": 1.0,
          "path": "flan/data",
          "incontext_weight": [1.0],
          "transforms": "flan/transform.py"
      }
  ]
```
  - å¶ä¸ï¼åå«åæ°æï¼
    - dataset_name: æ°æ®éåç§°ï¼
    - task_name: æ°æ®éæå±ä»»å¡ï¼task_name+dataset_nameå°ä½ä¸ºè®ç»è¿ç¨ä¸è¯å«æ°æ®éçæ ç¾ï¼task_nameåå¯ç¨äºè®ç»è¿ç¨ä¸éå¯¹ä»»å¡åå«æ±æ»lossä¿¡æ¯ï¼
    - weight: éæ ·æéï¼
    - path: meta.binãäºè¿å¶æ°æ®å¯¹åºçè·¯å¾ï¼
    - transforms: ä»»å¡è½¬æ¢èæ¬å¯¹åºçè·¯å¾ï¼
    - incontext_weight: è®ç»æ ·æ¬å å ï¼[1.0]è¡¨ç¤º100%çæ¦çéæ ·ä¸ä¸ªæ ·æ¬ï¼[0.8, 0.2]è¡¨ç¤º20%æ¦çéæ ·ä¸¤ä¸ªæ ·æ¬è¿è¡æ¼æ¥ï¼[0.75, 0.1, 0.15]è¡¨ç¤º15%æ¦çéæ ·ä¸ä¸ªæ ·æ¬ã10%çæ¦çéæ ·ä¸¤ä¸ªæ ·æ¬è¿è¡æ¼æ¥ã
  - æ¡ä¾ï¼åå®æ°æ®éèæ¬æ±æ»wikiåflanæ°æ®éåçæä»¶è·¯å¾ç»æ
```
CPMBee/
âââ src
|   âââ ...
|   âââ build_dataset.py
âââ raw_data
|   âââ wiki
|   |   âââ raw.txt
|   âââ flan
|       âââ raw.json
âââ datasets
    âââ datasets.jsonï¼æ°æ®éèæ¬ï¼
    âââ wiki
    |   âââ data
    |   |   âââ wiki
    |   |   âââ meta.bin
    |   âââ transform.py
    âââ flan
        âââ data
        |   âââ flan
        |   âââ meta.bin
        âââ transform.py
```

é¢è®ç»èæ¬

é¢è®ç»èæ¬å¦ä¸

  #! /bin/bash
  # æ¯å°æºå¨çå¡æ°
  GPUS_PER_NODE=8
  # æºå¨å°æ°
  NNODES=1
  # masteræºå¨çIPåç«¯å£ï¼æ´å¤ä¿¡æ¯å¯ä»¥åèpytorchåå¸å¼è®ç»ææ¡£
  MASTER_ADDR="localhost"
  MASTER_PORT=12345
  
  OPTS=""
  # model and dataset settings
  # æ¨¡åéç½®
  OPTS+=" --model-config config/cpm-bee-10b.json"
  # æ¥éª¤4æ°æ®éèæ¬ä½ç½®
  OPTS+=" --dataset ../datasets/datasets.json"
  # training settings
  # è®ç»æ¥æ°
  OPTS+=" --train-iters 200000"
  # åå¡çbatch size
  OPTS+=" --batch-size 2"
  # æ ·æ¬æå¤§é¿åº¦ï¼æ³¨æCPMBeeåºå±ä¼æ¼æ¥æ°æ®ç¡®ä¿max-lengthçå©ç¨æç
  OPTS+=" --max-length 2048"
  # å¦ä¹ çï¼å¦ææ¥çä¹åçckptç»§ç»è®ç»ï¼å»ºè®®æ¹å°
  OPTS+=" --lr 0.01"
  # warmupæ¥æ°
  OPTS+=" --warmup-iters 2000"
  # å¦ä¹ çä¸éçæºå¶
  OPTS+=" --lr-decay-style noam"
  # weight decayï¼è¿ä¸ªä¼ç»åå°AdamWä¸
  OPTS+=" --weight-decay 0.01"
  # æ¢¯åº¦è£åªçèå´
  OPTS+=" --clip-grad 1.0"
  # æ··åç²¾åº¦losså åç³»æ°
  OPTS+=" --loss-scale 1048576"
  # æ··åç²¾åº¦losså åç³»æ°çå¢é¿/éä½åæ°
  OPTS+=" --loss-scale-factor 2"
  # æ¯éå¤å°æ¥losså åç³»æ°è¿è¡å¢é¿
  OPTS+=" --loss-scale-steps 128"
  # log settings
  # æ¯éå¤å°æ¥æå°åæ°åå¼æ¹å·®ãæ¢¯åº¦åå¼æ¹å·®
  OPTS+=" --inspect-iters 100"
  # logæä»¶è¾åºè·¯å¾
  OPTS+=" --log-dir ../logs/train/"
  # tensorboardæä»¶è¾åºè·¯å¾
  OPTS+=" --tensorboard ../logs/tensorboard/cpm_live_48_4096/"
  # saving ckpts
  # æ¯éå¤å°æ¥è¾åºckpt
  OPTS+=" --save-iters 500"
  # è¾åºckptçè·¯å¾
  OPTS+=" --save ../results/"
  # è¾åºckptçåç§°ï¼CPMBeeå¨è¾åºckptæ¶ä¼æå°æ¥æ°
  OPTS+=" --save-name cpm_live_checkpoint"
  # loading ckptsï¼å¦æå è½½èçckptå°±æä¸åæ³¨éæå¼ï¼ç¶åå¡«åMODEL_STEPS
  # MODEL_STEPS="0"
  # OPTS+=" --start-step ${MODEL_STEPS}"
  # OPTS+=" --load ../results/cpm_live_checkpoint-${MODEL_STEPS}.pt"
  # æ¯å¦å è½½åå²æ¢¯åº¦
  # OPTS+=" --load-grad "
  
  CMD="torchrun --nnodes=${NNODES} --nproc_per_node=${GPUS_PER_NODE} --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=${MASTER_ADDR}:${MASTER_PORT} pretrain_cpm_bee.py ${OPTS}"
  
  echo ${CMD}
  $CMD

æ¡ä¾ï¼åå®é¢è®ç»èæ¬åçæä»¶è·¯å¾ç»æ

  CPMBee/
  âââ src
  |   âââ scripts
  |   |      âââ pretrain_cpm_bee.shï¼é¢è®ç»èæ¬ï¼
  |   âââ pretrain_cpm_bee.py
  |   âââ build_dataset.py
  âââ raw_data
  |   âââ wiki
  |   |   âââ raw.txt
  |   âââ flan
  |       âââ raw.json
  âââ datasets
      âââ datasets.json
      âââ wiki
      |   âââ data
      |   |   âââ wiki
      |   |   âââ meta.bin
      |   âââ transform.py
      âââ flan
          âââ data
          |   âââ flan
          |   âââ meta.bin
          âââ transform.py

é¢è®ç»å½ä»¤

  cd CPMBee/src
  bash scripts/pretrain_cpm_bee.sh

æ¡ä¾ï¼åå®é¢è®ç»èæ¬åçæä»¶è·¯å¾ç»æ

  CPMBee/
  âââ src
  |   âââ scripts
  |   |      âââ pretrain_cpm_bee.sh
  |   âââ pretrain_cpm_bee.py
  |   âââ build_dataset.py
  âââ resultsï¼ckptè¾åºè·¯å¾ï¼
  âââ logsï¼logæä»¶è¾åºè·¯å¾ï¼                
  âââ raw_data
  |   âââ wiki
  |   |   âââ raw.txt
  |   âââ flan
  |       âââ raw.json
  âââ datasets
      âââ datasets.json
      âââ wiki
      |   âââ data
      |   |   âââ wiki
      |   |   âââ meta.bin
      |   âââ transform.py
      âââ flan
          âââ data
          |   âââ flan
          |   âââ meta.bin
          âââ transform.py

OpenBMB è¡çåè½

æ¨¡åå¾®è°

å¨åæ°å¾®è°ï¼

$ torchrun --nnodes=1 --nproc_per_node=4 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py

å¢éå¾®è°ï¼

$ torchrun --nnodes=1 --nproc_per_node=4 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py \
--use-delta \

å¾®è°æµç¨

è°æ´æ°æ®æ ¼å¼ã æ¨å¯ä»¥å°åç±»é®é¢éæå°éæ©é¢çæ ¼å¼ä¸ãæå³æ°æ®æ ¼å¼çæ´å¤ä¿¡æ¯ï¼æ¨å¯ä»¥æ¥çCPM-Beeæ°æ®æ ¼å¼
åºå½æ³¨æï¼ç±äºæä»¬éå®<...>ä½ä¸ºç¹æ®tokençæ è®°ï¼å¯è½ä¸ææ¬ä¸ç<æ··æ·ï¼æä»¥æ¨åºå½å¯¹ææ¬æ°æ®ä¸çéç¹æ®tokençé¨åï¼åè½¬ä¹å¤çãä¾å¦ï¼æä»¬æå¦ä¸æ°æ®
```
{"input": "å¢ééåéå¸¸éè¦ï¼å¦æä¸è½åå°<mask_0>ï¼åå¯è½ä¼é æ1+1<2çç»æï¼æä»¥ï¼è¦æ´å æ³¨æ<mask_1>", "<ans>": {"<mask_0>": "", "<mask_1>": ""}}
```
è¯¥æ°æ®ä¸ï¼<mask_0>ä¸<mask_1>æ¯ç¹æ®tokenï¼åºä¿æä¸åï¼å¶ä½<åæ¿æ¢ä¸º<<ï¼è½¬ä¹å¤çåçæ°æ®å¦ä¸:
```
{"input": "å¢ééåéå¸¸éè¦ï¼å¦æä¸è½åå°<mask_0>ï¼åå¯è½ä¼é æ1+1<<2çç»æï¼æä»¥ï¼è¦æ´å æ³¨æ<mask_1>", "<ans>": {"<mask_0>": "", "<mask_1>": ""}}
```
å°æ°æ®éé¢å¤çä¸ºäºè¿å¶æä»¶ã è¦æå»ºé¢å¤çæ°æ®éï¼æ¨å¯ä»¥è¿è¡

$ python preprocess_dataset.py --input your/reformated/data/path --output_path your/binary/data/path --output_name data_name

é¢å¤çåï¼æ¨å°è·å¾ï¼

|-- your/binary/data/path
    |-- folder1
    |    |-- data_name
    |    |-- meta.bin
    |-- folder2
         |-- data_name
         |-- meta.bin

å¾®è°CPM-Bee è¦å¼å§å¾®è°ï¼æ¨å¯ä»¥è¿è¡ï¼

$ bash scripts/finetune_cpm_bee.sh

torchrun --nnodes=1 --nproc_per_node=4 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py \
--model-config your/model/config/path \
--load your/model/checkpoint/path \
--dataset your/binary/data/path/folder1 \
--eval_dataset your/binary/data/path/folder2 \
--use-delta

æ¨¡ååç¼©

æ¨¡å	#Attnå±	#FFNå±	Attnéç¶æç»´åº¦	FFNéç¶æç»´åº¦	ä¸è½½	ð¤Transformers
CPM-Bee-10B	48	48	4096	10240	é¾æ¥	é¾æ¥
CPM-Bee-5B	19	24	4096	10240	é¾æ¥	é¾æ¥
CPM-Bee-2B	19	24	2048	5120	é¾æ¥	é¾æ¥
CPM-Bee-1B	19	24	1280	1024	é¾æ¥	é¾æ¥

æ¨¡åé¨ç½²

æ¨¡å	æ¨çæ¾åå ç¨	æ¨èç¡¬ä»¶
CPM-Bee-10B	20GB	RTX 3090ï¼24 GBï¼
CPM-Bee-5B	11 GB	RTX 3090ï¼24 GBï¼
CPM-Bee-2B	6.7 GB	GTX 1080ï¼8 GBï¼
CPM-Bee-1B	4.1 GB	GTX 1660ï¼6 GBï¼

ä½¿ç¨æ¬ä»åº

from cpm_live.generation.bee import CPMBeeBeamSearch
from cpm_live.models import CPMBeeTorch, CPMBeeConfig
from cpm_live.tokenizers import CPMBeeTokenizer
import torch

# prepare your input data.
data_list = [
    {"input": "ä»å¤©å¤©æ°æ¯çç", "prompt": "å¾ååä¸å¥è¯", "<ans>": ""}
]

# load model
config = CPMBeeConfig.from_json_file("cpm-bee-5b.json")
ckpt_path = "cpm-bee-5b-ckpt.pt"
tokenizer = CPMBeeTokenizer()
model = CPMBeeTorch(config=config)

# load checkpoints
model.load_state_dict(torch.load(ckpt_path), strict=False)
model.cuda()

# use beam search
beam_search = CPMBeeBeamSearch(
    model=model,
    tokenizer=tokenizer,
)
for data in data_list:
    inference_results = beam_search.generate([data], max_length=100, repetition_penalty=1.1)
    for res in inference_results:
        print(res)

python text_generation.py

å¦ææ¨çæ¾åè¾å°ï¼æ³ä½¿ç¨BMInfè¿è¡ä½èµæºæ¨ç:

python text_generation.py --use-bminf --memory-limit 12

å¦æå¸æä½¿ç¨CPUè¿è¡æ¨çï¼

python text_generation.py --device cpu

å¦æå¸æå¨æ¨çæ¶å è½½å¾®è°åçdeltaæ¨¡å:

python text_generation.py --delta delta.pt

ä½¿ç¨ð¤Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("openbmb/cpm-bee-10b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("openbmb/cpm-bee-10b", trust_remote_code=True).cuda()
result = model.generate({"input": "ä»å¤©å¤©æ°ä¸éï¼", "<ans>": ""}, tokenizer)
print(result)

æä»¬æä¾äºä¸ä¸ªåºäºð¤Transformersçæ¨çèæ¬text_generation_hf.pyï¼æ¨å¯ä»¥è¿è¡

python text_generation_hf.py

å¤å¡é¨ç½²ï¼

python text_generation_hf.py --multi-gpu

å¤å¡é¨ç½²çåºç¡ä¸ï¼å è½½å¾®è°åçdeltaæ¨¡å:

python text_generation_hf.py --multi-gpu --delta delta.pt

ð« æ§è½è¡¨ç°

é¶æ ·æ¬è¯æµ

ZeroCLUEä¸æè¯æµ

æ¨¡å	Score	EPRSTMT	CSLDCP	TNEWSF	IFLYTEKF	OCNLIF	BUSTM	CHIDF	CSLF	CLUEWSCF
CPM-Bee	78.184	85.52	58.99	78.2	58.81	77.73	83.85	89.65	83.6	87.24
Ctyun_Big_Model	76.217	87.25	48.02	77.13	59.62	75.5	90.05	84.6	82.9	81.72
PaddleNLP-UTC	70.547	85.92	58.92	68.27	40.15	74.79	76.7	82.75	70.6	74.48
äºéç¥-UnifiedMC	70.295	88.71	50.18	71.67	40.58	75.5	80.15	84.85	60.6	81.72

è±æè¯æµ

æ¨¡å	Average	BoolQ	PIQA	SIQA	HellaSwag	WinoGrande	ARC-e	ARC-c	OBQA
GPT-3		60.5	81	-	78.9	70.2	68.8	51.4	57.6
Gopher		79.3	81.8	50.6	79.2	70.1	-	-	-
Chinchilla		83.7	81.8	51.3	80.8	74.9	-	-	-
PaLM		84.8	80.5	-	79.7	77	75.2	52.5	50.4
LLaMA-7B	66.13	76.5	79.8	48.9	76.1	70.1	72.8	47.6	57.2
LLaMA-13B	68.08	78.1	80.1	50.4	79.2	73	74.8	52.7	56.4
CPM-Bee	67.80	78.69	77.58	61.11	78.89	61.88	66.88	54.18	63.20

CPM-Bee + Decoder Tuning

æ ·æ¬æ°	æ¨¡å	SST2	IMDB	Yelp	AGNews	DBpedia	Yahoo	RTE	SNLI	MNLI-m	MNLI-mm	FewNERD	Avg.
0	CPM-Bee	80.5	89.1	96.6	74.6	71.3	46.7	84.1	45.4	45.6	45.6	1.6	61.9
16	T5-3B	89.9	92.7	94.9	87.7	96.2	66.5	55.8	52.0	52.8	52.2	51.9	72.1
	LLaMA-7B	85.1	90.5	92.8	71.4	89.8	45.1	49.1	35.2	36.3	36.2	54.6	62.4
	Vicuna-13B	82.1	88.8	95.6	86.4	74.4	55.3	62.5	61.4	54.3	48.6	52.1	69.2
	CPM-Bee	92.7	96.2	97.5	85.5	89.8	65.2	86.0	86.4	76.3	76.3	54.6	82.4
64	LLaMA-7B	87.5	85.7	96.9	75.4	93.5	47.4	51.4	39.4	36.2	38.4	59.8	64.7
	Vicuna-13B	92.0	90.8	96.5	87.7	87.8	58.7	59.1	58.7	56.7	48.4	56.8	72.1
	CPM-Bee	94.3	96.5	98.3	88.5	93.5	68.7	87.1	88.9	78.0	79.0	59.8	84.8
256	LLaMA-7B	87.6	88.8	97.1	82.4	94.2	48.5	53.4	39.8	37.3	37.4	59.1	66.0
	Vicuna-13B	93.1	88.7	96.8	89.9	89.1	58.6	58.5	58.7	57.5	48.3	56.6	72.3
	CPM-Bee	94.5	96.7	98.4	89.7	94.2	69.9	87.7	89.4	81.7	80.6	59.1	85.6

ðå¼æºåè®®

æ¨¡ååè®®

å£°æ

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of petals

Cons of petals

Code comparison

Pros of gpt-neox

Cons of gpt-neox

Code Comparison

Pros of DeepSpeed

Cons of DeepSpeed

Code Comparison

Summary

Pros of transformers

Cons of transformers

Code comparison

Pros of AllenNLP

Cons of AllenNLP

Code Comparison

Pros of BERT

Cons of BERT

Code Comparison

Key Differences

Convert designs to code with AI

README

CPM-Bee

â¨ æ¨¡åä»ç»

ð° æ´æ°ä¿¡æ¯

ð¯ CPM-Beeç³»åæ¨¡å

ð å®è£ åä½¿ç¨

æ¨¡å

æ°æ®æ ¼å¼

æ¨¡åé¢è®­ç»

OpenBMB è¡çåè½

æ¨¡åå¾®è°

æ¨¡ååç¼©

æ¨¡åé¨ç½²

ä½¿ç¨æ¬ä»åº

ä½¿ç¨ð¤Transformers

ð« æ§è½è¡¨ç°

é¶æ ·æ¬è¯æµ

ZeroCLUEä¸­æè¯æµ

è±æè¯æµ

CPM-Bee + Decoder Tuning

ðå¼æºåè®®

æ¨¡ååè®®

å£°æ

Top Related Projects

Convert designs to code with AI

â¨ æ¨¡åä»ç»

ð° æ´æ°ä¿¡æ¯

ð¯ CPM-Beeç³»åæ¨¡å

ð å®è£åä½¿ç¨

æ¨¡å

æ°æ®æ ¼å¼

æ¨¡åé¢è®ç»

OpenBMB è¡çåè½

æ¨¡åå¾®è°

æ¨¡ååç¼©

æ¨¡åé¨ç½²

ä½¿ç¨æ¬ä»åº

ä½¿ç¨ð¤Transformers

ð« æ§è½è¡¨ç°

é¶æ ·æ¬è¯æµ

ZeroCLUEä¸æè¯æµ

è±æè¯æµ

ðå¼æºåè®®

æ¨¡ååè®®

å£°æ