Top Related Projects
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Quick Overview
Baichuan-7B is an open-source large language model (LLM) with 7 billion parameters, developed by Baichuan Intelligence. It is designed to understand and generate human-like text in both Chinese and English, offering a powerful foundation for various natural language processing tasks.
Pros
- Open-source and freely available for research and commercial use
- Bilingual capabilities in Chinese and English
- Competitive performance compared to other 7B parameter models
- Extensive training on diverse datasets, including web pages, books, and code
Cons
- Limited documentation and examples available in English
- Relatively new project, which may lead to potential instability or bugs
- Requires significant computational resources for fine-tuning or deployment
- May have biases or limitations inherent to its training data
Getting Started
To use Baichuan-7B, follow these steps:
- Install the required dependencies:
pip install transformers torch
- Load the model and tokenizer:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
- Generate text:
input_text = "Tell me a short story about a robot."
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_length=100, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
Note: Ensure you have sufficient GPU memory to load and run the model. If you encounter memory issues, consider using a smaller model or running on a machine with more resources.
Competitor Comparisons
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Pros of GPT-NeoX
- More extensive documentation and examples for training and fine-tuning
- Larger community support and contributions
- Designed for distributed training across multiple GPUs
Cons of GPT-NeoX
- Higher computational requirements for training
- More complex setup and configuration process
- Less focus on multilingual capabilities compared to Baichuan-7B
Code Comparison
GPT-NeoX configuration example:
{
"num_layers": 32,
"hidden_size": 6144,
"num_attention_heads": 64,
"seq_length": 2048,
"max_position_embeddings": 2048,
"norm": "layernorm",
"pos_emb": "rotary",
"rotary_pct": 0.25,
"no_weight_tying": true,
"gpt_j_residual": true,
"output_layer_parallelism": "column"
}
Baichuan-7B configuration example:
{
"hidden_size": 4096,
"num_attention_heads": 32,
"num_hidden_layers": 32,
"rms_norm_eps": 1e-6,
"vocab_size": 64000
}
The code comparison shows that GPT-NeoX offers more detailed configuration options, including specific settings for position embeddings and parallelism. Baichuan-7B's configuration is simpler and more straightforward, focusing on essential model parameters.
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Pros of ChatGLM-6B
- Smaller model size (6B parameters) potentially leading to faster inference and lower resource requirements
- Extensive documentation and examples for various use cases and deployment scenarios
- Strong support for Chinese language tasks and multilingual capabilities
Cons of ChatGLM-6B
- Slightly older release date compared to Baichuan-7B, which may incorporate more recent advancements
- Limited fine-tuning options and tools compared to Baichuan-7B's comprehensive training pipeline
Code Comparison
ChatGLM-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
Baichuan-7B:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
Both repositories provide similar code for loading and using the models, with minor differences in the model class and initialization parameters. ChatGLM-6B explicitly converts the model to half-precision and moves it to CUDA, while Baichuan-7B leaves these optimizations to the user's discretion.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Baichuan-7B
ð¤ Hugging Face ⢠ð¤ ModelScope ⢠ð¬ WeChat
æ´æ°ä¿¡æ¯
- [2023.09.06] æä»¬åå¸äºæ°ä¸ä»£å¼æºæ¨¡å Baichuan 2ï¼å å« 7Bã13B 尺寸 ð¥ð¥ð¥
ä»ç»
Baichuan-7B æ¯ç±ç¾å·æºè½å¼åçä¸ä¸ªå¼æºå¯åç¨çå¤§è§æ¨¡é¢è®ç»è¯è¨æ¨¡åãåºäº Transformer ç»æï¼å¨å¤§çº¦ 1.2 ä¸äº¿ tokens ä¸è®ç»ç 70 äº¿åæ°æ¨¡åï¼æ¯æä¸è±åè¯ï¼ä¸ä¸æçªå£é¿åº¦ä¸º 4096ã卿 åç䏿åè±æ benchmarkï¼C-Eval/MMLUï¼ä¸ååå¾å尺寸æå¥½çææã
å ¬å¼benchmarkæ¦å
ä¸æè¯æµ
C-Eval
C-Eval æ°æ®éæ¯ä¸ä¸ªå
¨é¢ç䏿åºç¡æ¨¡åè¯æµæ°æ®éï¼æ¶µçäº 52 个å¦ç§åå个é¾åº¦ç级å«ãæä»¬ä½¿ç¨è¯¥æ°æ®éç dev éä½ä¸º few-shot çæ¥æºï¼å¨ test éä¸è¿è¡äº 5-shot
æµè¯ãéè¿æ§è¡æ§è¡ä¸é¢çå½ä»¤ï¼
cd evaluation
python evaluate_zh.py --model_name_or_path 'your/model/path'
ç»æ
Model 5-shot | Average | Avg(Hard) | STEM | Social Sciences | Humanities | Others |
---|---|---|---|---|---|---|
GPT-4 | 68.7 | 54.9 | 67.1 | 77.6 | 64.5 | 67.8 |
ChatGPT | 54.4 | 41.4 | 52.9 | 61.8 | 50.9 | 53.6 |
Claude-v1.3 | 54.2 | 39.0 | 51.9 | 61.7 | 52.1 | 53.7 |
Claude-instant-v1.0 | 45.9 | 35.5 | 43.1 | 53.8 | 44.2 | 45.4 |
BLOOMZ-7B | 35.7 | 25.8 | 31.3 | 43.5 | 36.6 | 35.6 |
ChatGLM-6B | 34.5 | 23.1 | 30.4 | 39.6 | 37.4 | 34.5 |
Ziya-LLaMA-13B-pretrain | 30.2 | 22.7 | 27.7 | 34.4 | 32.0 | 28.9 |
moss-moon-003-base (16B) | 27.4 | 24.5 | 27.0 | 29.1 | 27.2 | 26.9 |
LLaMA-7B-hf | 27.1 | 25.9 | 27.1 | 26.8 | 27.9 | 26.3 |
Falcon-7B | 25.8 | 24.3 | 25.8 | 26.0 | 25.8 | 25.6 |
TigerBot-7B-base | 25.7 | 27.0 | 27.3 | 24.7 | 23.4 | 26.1 |
Aquila-7B* | 25.5 | 25.2 | 25.6 | 24.6 | 25.2 | 26.6 |
Open-LLaMA-v2-pretrain (7B) | 24.0 | 22.5 | 23.1 | 25.3 | 25.2 | 23.2 |
BLOOM-7B | 22.8 | 20.2 | 21.8 | 23.3 | 23.9 | 23.3 |
Baichuan-7B | 42.8 | 31.5 | 38.2 | 52.0 | 46.2 | 39.3 |
Gaokao
Gaokao æ¯ä¸ä¸ªä»¥ä¸å½é«èé¢ä½ä¸ºè¯æµå¤§è¯è¨æ¨¡åè½åçæ°æ®éï¼ç¨ä»¥è¯ä¼°æ¨¡åçè¯è¨è½ååé»è¾æ¨çè½åã
æä»¬åªä¿çäºå
¶ä¸çå项鿩é¢ï¼éæºåååå¯¹æææ¨¡åè¿è¡ç»ä¸ 5-shot
æµè¯ã
ç»æ
以䏿¯æµè¯çç»æã
Model | Average |
---|---|
BLOOMZ-7B | 28.72 |
LLaMA-7B | 27.81 |
BLOOM-7B | 26.96 |
TigerBot-7B-base | 25.94 |
Falcon-7B | 23.98 |
Ziya-LLaMA-13B-pretrain | 23.17 |
ChatGLM-6B | 21.41 |
Open-LLaMA-v2-pretrain | 21.41 |
Aquila-7B* | 24.39 |
Baichuan-7B | 36.24 |
AGIEval
AGIEval æ¨å¨è¯ä¼°æ¨¡åç认ç¥åè§£å³é®é¢ç¸å
³çä»»å¡ä¸çä¸è¬è½åã
æä»¬åªä¿çäºå
¶ä¸çåéä¸å项鿩é¢ï¼éæºåååå¯¹æææ¨¡åè¿è¡äºç»ä¸ 5-shot
æµè¯ã
ç»æ
Model | Average |
---|---|
BLOOMZ-7B | 30.27 |
LLaMA-7B | 28.17 |
Ziya-LLaMA-13B-pretrain | 27.64 |
Falcon-7B | 27.18 |
BLOOM-7B | 26.55 |
Aquila-7B* | 25.58 |
TigerBot-7B-base | 25.19 |
ChatGLM-6B | 23.49 |
Open-LLaMA-v2-pretrain | 23.49 |
Baichuan-7B | 34.44 |
*å ¶ä¸ Aquila æ¨¡åæ¥æºäºæºæºå®æ¹ç½ç«(https://model.baai.ac.cn/model-detail/100098) ä» ååè
è±ææ¦å
é¤äºä¸æä¹å¤ï¼Baichuan-7B乿µè¯äºæ¨¡åå¨è±æä¸çææï¼MMLU æ¯å
å« 57 个å¤éä»»å¡çè±æè¯æµæ°æ®éï¼æ¶µçäºåçæ°å¦ãç¾å½åå²ãè®¡ç®æºç§å¦ãæ³å¾çï¼é¾åº¦è¦çé«ä¸æ°´å¹³å°ä¸å®¶æ°´å¹³ï¼æ¯ç®å主æµçLLMè¯æµæ°æ®éãæä»¬éç¨äºå¼æº çè¯æµæ¹æ¡ï¼æç» 5-shot
ç»æå¦ä¸æç¤ºï¼
ç»æ
Model | Humanities | Social Sciences | STEM | Other | Average |
---|---|---|---|---|---|
ChatGLM-6B0 | 35.4 | 41.0 | 31.3 | 40.5 | 36.9 |
BLOOMZ-7B0 | 31.3 | 42.1 | 34.4 | 39.0 | 36.1 |
mpt-7B1 | - | - | - | - | 35.6 |
LLaMA-7B2 | 34.0 | 38.3 | 30.5 | 38.1 | 35.1 |
Falcon-7B1 | - | - | - | - | 35.0 |
moss-moon-003-sft (16B)0 | 30.5 | 33.8 | 29.3 | 34.4 | 31.9 |
BLOOM-7B0 | 25.0 | 24.4 | 26.5 | 26.4 | 25.5 |
moss-moon-003-base (16B)0 | 24.2 | 22.8 | 22.4 | 24.4 | 23.6 |
Baichuan-7B0 | 38.4 | 48.9 | 35.6 | 48.1 | 42.3 |
0: éæ°å¤ç°
1: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
2: https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu
å¤ç°æ¹æ³
git clone https://github.com/hendrycks/test
cd test
wget https://people.eecs.berkeley.edu/~hendrycks/data.tar
tar xf data.tar
mkdir results
cp ../evaluate_mmlu.py .
python evaluate_mmlu.py -m /path/to/Baichuan-7B
å ¶ä¸å¨ MMLU ä¸57个任å¡çå ·ä½ç»ææ å¦ä¸å¾ï¼
å ¶ä¸å个å¦ç§çææ å¦ä¸å¾ï¼
æ¨çæ¹æ³
æ¨ç代ç å·²ç»å¨å®æ¹ Huggingface åº
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", device_map="auto", trust_remote_code=True)
inputs = tokenizer('ç»é¹³é楼->ç乿¶£\nå¤é¨å¯å->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
æ°æ®
- åå§æ°æ®å æ¬å¼æºçä¸è±ææ°æ®åèªè¡æåç䏿äºèç½æ°æ®ï¼ä»¥åé¨åé«è´¨éç¥è¯æ§æ°æ®ã
- åèç¸å ³æ°æ®å·¥ä½ï¼é¢çåè´¨éæ¯æ°æ®å¤çç¯èéç¹èèçä¸¤ä¸ªç»´åº¦ã æä»¬åºäºå¯åå¼è§ååè´¨éæ¨¡åæåï¼å¯¹åå§æ°æ®éè¿è¡ç¯ç« åå¥åç²åº¦çè¿æ»¤ãå¨å ¨éæ°æ®ä¸ï¼å©ç¨å±é¨ææå叿¹æ³ï¼å¯¹ç¯ç« åå¥åç²åº¦å滤éã
æ´ä½æµç¨å¦ä¸æç¤ºï¼
- ç»è¿ä¸æçè°æ´åå¤è½®æµè¯ï¼æç»ç¡®è®¤äºä¸ä¸ªå¨ä¸æ¸¸ä»»å¡ä¸è¡¨ç°æå¥½çä¸è±æé æ¯ã
- æä»¬ä½¿ç¨äºä¸ä¸ªåºäºèªå¨å¦ä¹ çæ°æ®æéçç¥ï¼å¯¹ä¸åç±»å«çæ°æ®è¿è¡é æ¯ã
åè¯
æä»¬åè妿¯çæ¹æ¡ä½¿ç¨ SentencePiece ä¸ç Byte-Pair Encoding (BPE) ä½ä¸ºåè¯ç®æ³ï¼å¹¶ä¸è¿è¡äºä»¥ä¸çä¼åï¼
- ç®å大é¨å弿ºæ¨¡å主è¦åºäºè±æä¼åï¼å æ¤å¯¹ä¸æè¯æå卿çè¾ä½çé®é¢ãæä»¬ä½¿ç¨ 2000 䏿¡ä»¥ä¸è±ä¸ºä¸»çå¤è¯è¨è¯æè®ç»åè¯æ¨¡åï¼æ¾èæå对äºä¸æçå缩çã
- å¯¹äºæ°å¦é¢åï¼æä»¬åèäº LLaMA å Galactica ä¸çæ¹æ¡ï¼å¯¹æ°åçæ¯ä¸ä½åç¬åå¼ï¼é¿å åºç°æ°åä¸ä¸è´çé®é¢ï¼å¯¹äºæåæ°å¦è½åæéè¦å¸®å©ã
- 对äºç½è§åè¯ï¼å¦ç¹æ®ç¬¦å·çï¼ï¼æ¯æ UTF-8 characters ç byte ç¼ç ï¼å æ¤åå°æªç¥åè¯çå ¨è¦çã
- æä»¬åæäºä¸ååè¯å¨å¯¹è¯æçå缩çï¼å¦ä¸è¡¨ï¼å¯è§æä»¬çåè¯å¨ææ¾ä¼äº LLaMA, Falcon ç弿ºæ¨¡åï¼å¹¶ä¸å¯¹æ¯å ¶ä»ä¸æåè¯å¨å¨å缩çç¸å½çæ åµä¸ï¼è®ç»åæ¨çæçæ´é«ã
Model | Baichuan-7B | LLaMA | Falcon | mpt-7B | ChatGLM | moss-moon-003 |
---|---|---|---|---|---|---|
Compress Rate | 0.737 | 1.312 | 1.049 | 1.206 | 0.631 | 0.659 |
Vocab Size | 64,000 | 32,000 | 65,024 | 50,254 | 130,344 | 106,029 |
模åç»æ
æ´ä½æ¨¡ååºäºæ åç Transformer ç»æï¼æä»¬éç¨äºå LLaMA 䏿 ·ç模å设计
- ä½ç½®ç¼ç ï¼rotary-embedding æ¯ç°é¶æ®µè¢«å¤§å¤æ¨¡åéç¨çä½ç½®ç¼ç æ¹æ¡ï¼å ·ææ´å¥½çå¤å»¶ææãè½ç¶è®ç»è¿ç¨ä¸æå¤§é¿åº¦ä¸º4096ï¼ä½æ¯å®é æµè¯ä¸æ¨¡åå¯ä»¥å¾å¥½çæ©å±å° 5000 tokens 以ä¸ï¼å¦ä¸å¾ï¼
- æ¿æ´»å±ï¼SwiGLU, Feedforward åå为 8/3 åçéå«å±å¤§å°ï¼å³ 11,008
- Layer-Normalization: åºäº RMSNorm ç Pre-Normalization
è®ç»ç¨³å®æ§ååå
æä»¬å¨åæ¬ç LLaMA æ¡æ¶ä¸è¿è¡è¯¸å¤ä¿®æ¹ä»¥æåè®ç»æ¶çååï¼å ·ä½å æ¬ï¼
- ç®åä¼åææ¯ï¼éç¨æ´é«æç®åï¼å¦ Flash-Attentionï¼NVIDIA apex ç RMSNorm çã
- ç®åååææ¯ï¼å°é¨å计ç®ç®åè¿è¡ååï¼åå°å åå³°å¼ã
- æ··åç²¾åº¦ææ¯ï¼éä½å¨ä¸æå¤±æ¨¡åç²¾åº¦çæ åµä¸å é计ç®è¿ç¨ã
- è®ç»å®¹ç¾ææ¯ï¼è®ç»å¹³å°åè®ç»æ¡æ¶èåä¼åï¼IaaS + PaaS å®ç°åéçº§çæ éå®ä½å任塿¢å¤ã
- éä¿¡ä¼åææ¯ï¼å
·ä½å
æ¬ï¼
- éç¨æææç¥çéåéä¿¡ç®æ³ï¼é¿å ç½ç»æ¥å¡é®é¢ï¼æé«éä¿¡æçã
- æ ¹æ®å¡æ°èªéåºè®¾ç½® bucket sizeï¼æé«å¸¦å®½å©ç¨çã
- æ ¹æ®æ¨¡ååé群ç¯å¢ï¼è°ä¼éä¿¡åè¯çè§¦åæ¶æºï¼ä»èå°è®¡ç®åéä¿¡éå ã
åºäºä¸è¿°çå 个ä¼åææ¯ï¼æä»¬å¨åå¡ A800 æ¾å¡ä¸è¾¾å°äº 7B 模å 182 TFLOPS çååï¼GPU å³°å¼ç®åå©ç¨çé«è¾¾ 58.3%ã
æç»çlosså¦ä¸å¾ï¼
è®ç»æ¹æ³
å®è£ ä¾èµ
pip install -r requirements.txt
å夿°æ®
ç¨æ·å°è®ç»è¯æææ»rankæ°çåæ°ååååæå¤ä¸ª UTF-8 ææ¬æä»¶ï¼æ¾ç½®å¨è¯æç®å½ï¼é»è®¤ä¸º data_dir
ï¼ä¸ãå个rankè¿ç¨å°ä¼è¯»åè¯æç®å½ä¸çä¸åæä»¶ï¼å
¨é¨å è½½å°å
ååï¼å¼å§åç»è®ç»è¿ç¨ã以䏿¯ç®åçç¤ºèæµç¨ï¼å»ºè®®ç¨æ·å¨æ£å¼è®ç»ä»»å¡ä¸ï¼æ ¹æ®éæ±è°æ´æ°æ®ç产é»è¾ã
ä¸è½½ tokenizer 模å
ä¸è½½ tokenizer 模åæä»¶ tokenizer.model ï¼æ¾ç½®å¨é¡¹ç®ç®å½ä¸ã
é ç½® DeepSpeed
æ¬ç¤ºè代ç éç¨ DeepSpeed æ¡æ¶è¿è¡è®ç»ãç¨æ·éæ ¹æ®é群æ
åµï¼ä¿®æ¹ config/hostfile
ï¼å¦ææ¯å¤æºå¤å¡ï¼éè¦ä¿®æ¹ ssh ä¸å个èç¹ç IP é
ç½®ãå
·ä½å¯ä»¥åè§ DeepSpeed 宿¹è¯´æ ã
æ§è¡è®ç»
scripts/train.sh
åè®®
对æ¬ä»åºæºç ç使ç¨éµå¾ªå¼æºè®¸å¯åè®® Apache 2.0ã
Baichuan-7B æ¯æåç¨ãå¦æå° Baichuan-7B 模åæå ¶è¡çåç¨ä½åä¸ç¨éï¼è¯·æ¨æç §å¦ä¸æ¹å¼èç³»è®¸å¯æ¹ï¼ä»¥è¿è¡ç»è®°å¹¶åè®¸å¯æ¹ç³è¯·ä¹¦é¢ææï¼èç³»é®ç®±ï¼opensource@baichuan-inc.comï¼ å ·ä½è®¸å¯åè®®å¯è§ãBaichuan-7B 模å许å¯åè®®ãã
Third-Party Resources
- LLaMA Efficient Tuning æ¯æBaichuan-7B使ç¨Qloraè¿è¡Finetuneï¼æ¯æRLHFï¼æ¯æWebDemoã使ç¨ç»è¿sftçæ¨¡åè§ hiyouga/baichuan-7b-sftã
- fireballoon/baichuan-vicuna-chinese-7b ä½¿ç¨ ShareGPT, ShareGPT-ZH, COT & COT-ZH, Leetcode, dummyçå å«ä¸è±æçæ°æ®Finetuneåçæ¨¡åï¼è®ç»ä»£ç åèFastChatã
- fireballoon/baichuan-vicuna-7b 使ç¨ShareGPT, COT å Leetcodeçæ°æ®æ··åFinetuneåçæ¨¡åï¼è®ç»ä»£ç åèFastChatã
- Efficient-Tuning-LLMs æ¯æBaichuan-7B使ç¨Qloraè¿è¡Finetuneå4bit inferenceã
- fastllm fastllmæ¯çº¯c++å®ç°ï¼æ ç¬¬ä¸æ¹ä¾èµç大模ååºï¼æ¯æBaichuan-7B卿æºç«¯è¿è¡ã
- TheBloke/baichuan-7B-GPTQ 对Baichuan-7BçGPTQ 4bitéåã
Star History
Top Related Projects
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot