Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

5,685

504

5,685

View on GitHub

Top Related Projects

gpt-neox

7,276

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

ChatGLM-6B

41,111

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Quick Overview

Baichuan-7B is an open-source large language model (LLM) with 7 billion parameters, developed by Baichuan Intelligence. It is designed to understand and generate human-like text in both Chinese and English, offering a powerful foundation for various natural language processing tasks.

Pros

Open-source and freely available for research and commercial use
Bilingual capabilities in Chinese and English
Competitive performance compared to other 7B parameter models
Extensive training on diverse datasets, including web pages, books, and code

Cons

Limited documentation and examples available in English
Relatively new project, which may lead to potential instability or bugs
Requires significant computational resources for fine-tuning or deployment
May have biases or limitations inherent to its training data

Getting Started

To use Baichuan-7B, follow these steps:

Install the required dependencies:

pip install transformers torch

Load the model and tokenizer:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)

Generate text:

input_text = "Tell me a short story about a robot."
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_length=100, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

Note: Ensure you have sufficient GPU memory to load and run the model. If you encounter memory issues, consider using a smaller model or running on a machine with more resources.

Competitor Comparisons

gpt-neox

7,276

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Pros of GPT-NeoX

More extensive documentation and examples for training and fine-tuning
Larger community support and contributions
Designed for distributed training across multiple GPUs

Cons of GPT-NeoX

Higher computational requirements for training
More complex setup and configuration process
Less focus on multilingual capabilities compared to Baichuan-7B

Code Comparison

GPT-NeoX configuration example:

{
    "num_layers": 32,
    "hidden_size": 6144,
    "num_attention_heads": 64,
    "seq_length": 2048,
    "max_position_embeddings": 2048,
    "norm": "layernorm",
    "pos_emb": "rotary",
    "rotary_pct": 0.25,
    "no_weight_tying": true,
    "gpt_j_residual": true,
    "output_layer_parallelism": "column"
}

Baichuan-7B configuration example:

{
    "hidden_size": 4096,
    "num_attention_heads": 32,
    "num_hidden_layers": 32,
    "rms_norm_eps": 1e-6,
    "vocab_size": 64000
}

The code comparison shows that GPT-NeoX offers more detailed configuration options, including specific settings for position embeddings and parallelism. Baichuan-7B's configuration is simpler and more straightforward, focusing on essential model parameters.

ChatGLM-6B

41,111

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Pros of ChatGLM-6B

Smaller model size (6B parameters) potentially leading to faster inference and lower resource requirements
Extensive documentation and examples for various use cases and deployment scenarios
Strong support for Chinese language tasks and multilingual capabilities

Cons of ChatGLM-6B

Slightly older release date compared to Baichuan-7B, which may incorporate more recent advancements
Limited fine-tuning options and tools compared to Baichuan-7B's comprehensive training pipeline

Code Comparison

ChatGLM-6B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()

Baichuan-7B:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)

Both repositories provide similar code for loading and using the models, with minor differences in the model class and initialization parameters. ChatGLM-6B explicitly converts the model to half-precision and moves it to CUDA, while Baichuan-7B leaves these optimizations to the user's discretion.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Baichuan-7B

ð¤ Hugging Face â¢ ð¤ ModelScope â¢ ð¬ WeChat

ä¸æ | English

æ´æ°ä¿¡æ¯

[2023.09.06] æä»¬åå¸äºæ°ä¸ä»£å¼æºæ¨¡å Baichuan 2ï¼åå« 7Bã13B å°ºå¯¸ ð¥ð¥ð¥

ä»ç»

å¬å¼benchmarkæ¦å

ä¸æè¯æµ

C-Eval

cd evaluation
python evaluate_zh.py --model_name_or_path 'your/model/path'

ç»æ

Model 5-shot	Average	Avg(Hard)	STEM	Social Sciences	Humanities	Others
GPT-4	68.7	54.9	67.1	77.6	64.5	67.8
ChatGPT	54.4	41.4	52.9	61.8	50.9	53.6
Claude-v1.3	54.2	39.0	51.9	61.7	52.1	53.7
Claude-instant-v1.0	45.9	35.5	43.1	53.8	44.2	45.4
BLOOMZ-7B	35.7	25.8	31.3	43.5	36.6	35.6
ChatGLM-6B	34.5	23.1	30.4	39.6	37.4	34.5
Ziya-LLaMA-13B-pretrain	30.2	22.7	27.7	34.4	32.0	28.9
moss-moon-003-base (16B)	27.4	24.5	27.0	29.1	27.2	26.9
LLaMA-7B-hf	27.1	25.9	27.1	26.8	27.9	26.3
Falcon-7B	25.8	24.3	25.8	26.0	25.8	25.6
TigerBot-7B-base	25.7	27.0	27.3	24.7	23.4	26.1
Aquila-7B^*	25.5	25.2	25.6	24.6	25.2	26.6
Open-LLaMA-v2-pretrain (7B)	24.0	22.5	23.1	25.3	25.2	23.2
BLOOM-7B	22.8	20.2	21.8	23.3	23.9	23.3
Baichuan-7B	42.8	31.5	38.2	52.0	46.2	39.3

Gaokao

ç»æ

ä»¥ä¸æ¯æµè¯çç»æã

Model	Average
BLOOMZ-7B	28.72
LLaMA-7B	27.81
BLOOM-7B	26.96
TigerBot-7B-base	25.94
Falcon-7B	23.98
Ziya-LLaMA-13B-pretrain	23.17
ChatGLM-6B	21.41
Open-LLaMA-v2-pretrain	21.41
Aquila-7B^*	24.39
Baichuan-7B	36.24

AGIEval

ç»æ

Model	Average
BLOOMZ-7B	30.27
LLaMA-7B	28.17
Ziya-LLaMA-13B-pretrain	27.64
Falcon-7B	27.18
BLOOM-7B	26.55
Aquila-7B^*	25.58
TigerBot-7B-base	25.19
ChatGLM-6B	23.49
Open-LLaMA-v2-pretrain	23.49
Baichuan-7B	34.44

^*å¶ä¸ Aquila æ¨¡åæ¥æºäºæºæºå®æ¹ç½ç«(https://model.baai.ac.cn/model-detail/100098) ä»ååè

è±ææ¦å

ç»æ

Model	Humanities	Social Sciences	STEM	Other	Average
ChatGLM-6B⁰	35.4	41.0	31.3	40.5	36.9
BLOOMZ-7B⁰	31.3	42.1	34.4	39.0	36.1
mpt-7B¹	-	-	-	-	35.6
LLaMA-7B²	34.0	38.3	30.5	38.1	35.1
Falcon-7B¹	-	-	-	-	35.0
moss-moon-003-sft (16B)⁰	30.5	33.8	29.3	34.4	31.9
BLOOM-7B⁰	25.0	24.4	26.5	26.4	25.5
moss-moon-003-base (16B)⁰	24.2	22.8	22.4	24.4	23.6
Baichuan-7B⁰	38.4	48.9	35.6	48.1	42.3

^{0: éæ°å¤ç°}
^{1: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard}
^{2: https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu}

å¤ç°æ¹æ³

git clone https://github.com/hendrycks/test
cd test
wget https://people.eecs.berkeley.edu/~hendrycks/data.tar
tar xf data.tar
mkdir results
cp ../evaluate_mmlu.py .
python evaluate_mmlu.py -m /path/to/Baichuan-7B

å¶ä¸å¨ MMLU ä¸57ä¸ªä»»å¡çå·ä½ç»ææ å¦ä¸å¾ï¼

å¶ä¸åä¸ªå¦ç§çææ å¦ä¸å¾ï¼

æ¨çæ¹æ³

æ¨çä»£ç å·²ç»å¨å®æ¹ Huggingface åº

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", device_map="auto", trust_remote_code=True)
inputs = tokenizer('ç»é¹³éæ¥¼->çä¹æ¶£\nå¤é¨å¯å->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

æ°æ®

åå§æ°æ®åæ¬å¼æºçä¸è±ææ°æ®åèªè¡æåçä¸æäºèç½æ°æ®ï¼ä»¥åé¨åé«è´¨éç¥è¯æ§æ°æ®ã
åèç¸å³æ°æ®å·¥ä½ï¼é¢çåè´¨éæ¯æ°æ®å¤çç¯èéç¹èèçä¸¤ä¸ªç»´åº¦ã æä»¬åºäºå¯åå¼è§ååè´¨éæ¨¡åæåï¼å¯¹åå§æ°æ®éè¿è¡ç¯ç« åå¥åç²åº¦çè¿æ»¤ãå¨å¨éæ°æ®ä¸ï¼å©ç¨å±é¨ææåå¸æ¹æ³ï¼å¯¹ç¯ç« åå¥åç²åº¦åæ»¤éã

æ´ä½æµç¨å¦ä¸æç¤ºï¼

ç»è¿ä¸æçè°æ´åå¤è½®æµè¯ï¼æç»ç¡®è®¤äºä¸ä¸ªå¨ä¸æ¸¸ä»»å¡ä¸è¡¨ç°æå¥½çä¸è±æéæ¯ã
æä»¬ä½¿ç¨äºä¸ä¸ªåºäºèªå¨å¦ä¹ çæ°æ®æéçç¥ï¼å¯¹ä¸åç±»å«çæ°æ®è¿è¡éæ¯ã

åè¯

æä»¬åèå¦æ¯çæ¹æ¡ä½¿ç¨ SentencePiece ä¸ç Byte-Pair Encoding (BPE) ä½ä¸ºåè¯ç®æ³ï¼å¹¶ä¸è¿è¡äºä»¥ä¸çä¼åï¼

ç®åå¤§é¨åå¼æºæ¨¡åä¸»è¦åºäºè±æä¼åï¼å æ¤å¯¹ä¸æè¯æåå¨æçè¾ä½çé®é¢ãæä»¬ä½¿ç¨ 2000 ä¸æ¡ä»¥ä¸è±ä¸ºä¸»çå¤è¯è¨è¯æè®ç»åè¯æ¨¡åï¼æ¾èæåå¯¹äºä¸æçåç¼©çã
å¯¹äºç½è§åè¯ï¼å¦ç¹æ®ç¬¦å·çï¼ï¼æ¯æ UTF-8 characters ç byte ç¼ç ï¼å æ¤åå°æªç¥åè¯çå¨è¦çã
æä»¬åæäºä¸ååè¯å¨å¯¹è¯æçåç¼©çï¼å¦ä¸è¡¨ï¼å¯è§æä»¬çåè¯å¨ææ¾ä¼äº LLaMA, Falcon çå¼æºæ¨¡åï¼å¹¶ä¸å¯¹æ¯å¶ä»ä¸æåè¯å¨å¨åç¼©çç¸å½çæåµä¸ï¼è®ç»åæ¨çæçæ´é«ã

Model	Baichuan-7B	LLaMA	Falcon	mpt-7B	ChatGLM	moss-moon-003
Compress Rate	0.737	1.312	1.049	1.206	0.631	0.659
Vocab Size	64,000	32,000	65,024	50,254	130,344	106,029

æ¨¡åç»æ

æ´ä½æ¨¡ååºäºæ åç Transformer ç»æï¼æä»¬éç¨äºå LLaMA ä¸æ ·çæ¨¡åè®¾è®¡

ä½ç½®ç¼ç ï¼rotary-embedding æ¯ç°é¶æ®µè¢«å¤§å¤æ¨¡åéç¨çä½ç½®ç¼ç æ¹æ¡ï¼å·ææ´å¥½çå¤å»¶ææãè½ç¶è®ç»è¿ç¨ä¸æå¤§é¿åº¦ä¸º4096ï¼ä½æ¯å®éæµè¯ä¸æ¨¡åå¯ä»¥å¾å¥½çæ©å±å° 5000 tokens ä»¥ä¸ï¼å¦ä¸å¾ï¼

æ¿æ´»å±ï¼SwiGLU, Feedforward ååä¸º 8/3 åçéå«å±å¤§å°ï¼å³ 11,008
Layer-Normalization: åºäº RMSNorm ç Pre-Normalization

è®ç»ç¨³å®æ§ååå

ç®åä¼åææ¯ï¼éç¨æ´é«æç®åï¼å¦ Flash-Attentionï¼NVIDIA apex ç RMSNorm çã
ç®åååææ¯ï¼å°é¨åè®¡ç®ç®åè¿è¡ååï¼åå°ååå³°å¼ã
æ··åç²¾åº¦ææ¯ï¼éä½å¨ä¸æå¤±æ¨¡åç²¾åº¦çæåµä¸å éè®¡ç®è¿ç¨ã
è®ç»å®¹ç¾ææ¯ï¼è®ç»å¹³å°åè®ç»æ¡æ¶èåä¼åï¼IaaS + PaaS å®ç°åéçº§çæéå®ä½åä»»å¡æ¢å¤ã
éä¿¡ä¼åææ¯ï¼å·ä½åæ¬ï¼
1. éç¨æææç¥çéåéä¿¡ç®æ³ï¼é¿åç½ç»æ¥å¡é®é¢ï¼æé«éä¿¡æçã
2. æ ¹æ®å¡æ°èªéåºè®¾ç½® bucket sizeï¼æé«å¸¦å®½å©ç¨çã
3. æ ¹æ®æ¨¡ååéç¾¤ç¯å¢ï¼è°ä¼éä¿¡åè¯çè§¦åæ¶æºï¼ä»èå°è®¡ç®åéä¿¡éå ã

æç»çlosså¦ä¸å¾ï¼

è®ç»æ¹æ³

å®è£ä¾èµ

pip install -r requirements.txt

åå¤æ°æ®

ä¸è½½ tokenizer æ¨¡å

ä¸è½½ tokenizer æ¨¡åæä»¶ tokenizer.model ï¼æ¾ç½®å¨é¡¹ç®ç®å½ä¸ã

éç½® DeepSpeed

æ§è¡è®ç»

scripts/train.sh

åè®®

å¯¹æ¬ä»åºæºç çä½¿ç¨éµå¾ªå¼æºè®¸å¯åè®® Apache 2.0ã

Third-Party Resources

LLaMA Efficient Tuning æ¯æBaichuan-7Bä½¿ç¨Qloraè¿è¡Finetuneï¼æ¯æRLHFï¼æ¯æWebDemoãä½¿ç¨ç»è¿sftçæ¨¡åè§ hiyouga/baichuan-7b-sftã
fireballoon/baichuan-vicuna-chinese-7b ä½¿ç¨ ShareGPT, ShareGPT-ZH, COT & COT-ZH, Leetcode, dummyçåå«ä¸è±æçæ°æ®Finetuneåçæ¨¡åï¼è®ç»ä»£ç åèFastChatã
fireballoon/baichuan-vicuna-7b ä½¿ç¨ShareGPT, COT å Leetcodeçæ°æ®æ··åFinetuneåçæ¨¡åï¼è®ç»ä»£ç åèFastChatã
Efficient-Tuning-LLMs æ¯æBaichuan-7Bä½¿ç¨Qloraè¿è¡Finetuneå4bit inferenceã
fastllm fastllmæ¯çº¯c++å®ç°ï¼æ ç¬¬ä¸æ¹ä¾èµçå¤§æ¨¡ååºï¼æ¯æBaichuan-7Bå¨ææºç«¯è¿è¡ã
TheBloke/baichuan-7B-GPTQ å¯¹Baichuan-7BçGPTQ 4bitéåã

Star History

Top Related Projects

gpt-neox

7,276

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

ChatGLM-6B

41,111

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Baichuan-7B

Top Related Projects

gpt-neox

ChatGLM-6B

Quick Overview

Pros

Cons

Getting Started

Competitor Comparisons

gpt-neox

Pros of GPT-NeoX

Cons of GPT-NeoX

Code Comparison

ChatGLM-6B

Pros of ChatGLM-6B

Cons of ChatGLM-6B

Code Comparison

Convert designs to code with AI

README

Baichuan-7B

ä¸­æ | English

æ´æ°ä¿¡æ¯

ä»ç»

å ¬å¼benchmarkæ¦å

ä¸­æè¯æµ

C-Eval

ç»æ

Gaokao

ç»æ

AGIEval

ç»æ

è±ææ¦å

ç»æ

å¤ç°æ¹æ³

æ¨çæ¹æ³

æ°æ®

åè¯

æ¨¡åç»æ

è®­ç»ç¨³å®æ§ååå

è®­ç»æ¹æ³

å®è£ ä¾èµ

åå¤æ°æ®

ä¸è½½ tokenizer æ¨¡å

é ç½® DeepSpeed

æ§è¡è®­ç»

åè®®

Third-Party Resources

Star History

Top Related Projects

gpt-neox

ChatGLM-6B

Convert designs to code with AI

ä¸æ | English

æ´æ°ä¿¡æ¯

ä»ç»

å¬å¼benchmarkæ¦å

ä¸æè¯æµ

ç»æ

ç»æ

ç»æ

è±ææ¦å

ç»æ

å¤ç°æ¹æ³

æ¨çæ¹æ³

æ°æ®

åè¯

æ¨¡åç»æ

è®ç»ç¨³å®æ§ååå

è®ç»æ¹æ³

å®è£ä¾èµ

åå¤æ°æ®

ä¸è½½ tokenizer æ¨¡å

éç½® DeepSpeed

æ§è¡è®ç»

åè®®