Top Related Projects
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
Quick Overview
ChatGLM2-6B is an open-source, bilingual (Chinese and English) dialogue model developed by Tsinghua University. It is an improved version of ChatGLM-6B, featuring enhanced performance, longer context, and better stability. The model is designed for efficient deployment and fine-tuning on consumer-grade graphics cards.
Pros
- Improved performance and longer context compared to its predecessor
- Efficient deployment on consumer-grade hardware
- Open-source and freely available for research and commercial use
- Supports both Chinese and English languages
Cons
- Limited to 6 billion parameters, which may not match the capabilities of larger models
- May require fine-tuning for specific domain applications
- Potential biases or limitations inherent in large language models
- Limited support for languages other than Chinese and English
Code Examples
# Loading the model
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
# Generating a response
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
# Streaming output
for response, history in model.stream_chat(tokenizer, "请介绍一下你自己", history=[]):
print(response)
Getting Started
To get started with ChatGLM2-6B, follow these steps:
-
Install the required dependencies:
pip install transformers torch
-
Load the model and tokenizer:
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True) model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
-
Start a conversation:
response, history = model.chat(tokenizer, "你好", history=[]) print(response)
For more advanced usage and fine-tuning instructions, refer to the project's GitHub repository.
Competitor Comparisons
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Pros of ChatGLM-6B
- More established and potentially more stable codebase
- Wider community adoption and support
- Potentially more comprehensive documentation due to longer development time
Cons of ChatGLM-6B
- Older architecture, potentially less efficient
- May lack some of the latest improvements and features
- Possibly slower inference speed compared to the newer version
Code Comparison
ChatGLM-6B:
def forward(self, input_ids, position_ids=None, attention_mask=None):
batch_size, seq_length = input_ids.size()
if attention_mask is None:
attention_mask = torch.ones((batch_size, seq_length), device=input_ids.device)
ChatGLM2-6B:
def forward(self, input_ids, position_ids=None, attention_mask=None, past_key_values=None):
batch_size, seq_length = input_ids.size()
if attention_mask is None:
if past_key_values is None:
attention_mask = torch.ones((batch_size, seq_length), device=input_ids.device)
The main difference in the code snippets is that ChatGLM2-6B includes support for past_key_values
, which can potentially improve efficiency in processing sequential data. This suggests that ChatGLM2-6B may have optimizations for better performance in certain scenarios.
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Pros of ChatGLM2-6B
- Improved performance and efficiency compared to its predecessor
- Enhanced multilingual capabilities, especially in Chinese and English
- Better context understanding and more coherent responses
Cons of ChatGLM2-6B
- May require more computational resources due to increased model size
- Potential limitations in handling specialized or domain-specific tasks
- Possible biases inherited from training data
Code Comparison
ChatGLM2-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
ChatGLM2-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
Both repositories use the same code for initializing and using the ChatGLM2-6B model. The main differences lie in the model's architecture and training data rather than the implementation code.
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
Pros of FlagAI
- Broader scope: FlagAI is a comprehensive AI toolkit supporting various tasks beyond language models
- More extensive documentation and examples for different AI applications
- Active community and regular updates
Cons of FlagAI
- Larger codebase and potentially steeper learning curve
- May require more computational resources due to its broader feature set
- Less focused on specific language model tasks compared to ChatGLM2-6B
Code Comparison
ChatGLM2-6B example:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello", history=[])
print(response)
FlagAI example:
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor
auto_loader = AutoLoader("GLM-large-ch", "lm")
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()
predictor = Predictor(model, tokenizer)
response = predictor.predict_generate_randomsample("Hello")
print(response)
Both examples demonstrate how to load and use the respective models for text generation. ChatGLM2-6B focuses on chat-style interactions, while FlagAI showcases its more general-purpose approach to language modeling.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Pros of Qwen
- Supports a wider range of languages and tasks, including code generation
- Offers more extensive documentation and usage examples
- Provides pre-trained models of various sizes (1.8B, 7B, 14B, and 72B parameters)
Cons of Qwen
- Requires more computational resources due to larger model sizes
- May have a steeper learning curve for beginners due to its complexity
- Less focused on Chinese language processing compared to ChatGLM2-6B
Code Comparison
ChatGLM2-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
Qwen:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True)
response, history = model.chat(tokenizer, "Hello", history=[])
print(response)
Both repositories provide easy-to-use interfaces for loading and interacting with their respective models. The main differences lie in the model names and the specific methods used for chat functionality.
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
Pros of Baichuan-7B
- Larger model size (7B parameters) potentially offering better performance
- Supports both Chinese and English languages
- More recent release with newer training data
Cons of Baichuan-7B
- Less documentation and examples compared to ChatGLM2-6B
- Fewer fine-tuning options and tools provided in the repository
- Limited community contributions and discussions
Code Comparison
ChatGLM2-6B example:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
Baichuan-7B example:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Both repositories provide pre-trained language models, but they differ in size, language support, and implementation details. ChatGLM2-6B offers more comprehensive documentation and examples, while Baichuan-7B provides a larger model with bilingual capabilities.
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
Pros of GLM-130B
- Larger model size (130B parameters) potentially offering higher performance
- More suitable for complex tasks requiring deeper understanding
- Likely better at handling nuanced or specialized domains
Cons of GLM-130B
- Requires significantly more computational resources
- Slower inference time due to larger size
- May be overkill for simpler tasks or resource-constrained environments
Code Comparison
GLM-130B:
model = GLM130B.from_pretrained("path/to/model")
input_ids = tokenizer.encode("Hello, how are you?", return_tensors="pt")
output = model.generate(input_ids, max_length=50)
ChatGLM2-6B:
model = ChatGLM2Model.from_pretrained("THUDM/chatglm2-6b")
input_ids = tokenizer.encode("Hello, how are you?", return_tensors="pt")
output = model.generate(input_ids, max_length=50)
The code usage is similar for both models, with the main difference being the model class and pretrained model path. GLM-130B may require additional setup due to its larger size and resource requirements.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ChatGLM2-6B
ð¤ HF Repo ⢠ð¦ Twitter ⢠ð [GLM@ACL 22] [GitHub] ⢠ð [GLM-130B@ICLR 23] [GitHub]
ð å å ¥æä»¬ç Discord å WeChat
ðå¨ chatglm.cn ä½éªæ´å¤§è§æ¨¡ç ChatGLM 模åã
Read this in English
GLM-4 弿ºæ¨¡ååAPI
æä»¬å·²ç»å叿æ°ç GLM-4 模åï¼è¯¥æ¨¡åå¨å¤ä¸ªææ ä¸æäºæ°ççªç ´ï¼æ¨å¯ä»¥å¨ä»¥ä¸ä¸¤ä¸ªæ¸ éä½éªæä»¬çææ°æ¨¡åã
-
GLM-4 弿ºæ¨¡å æä»¬å·²ç»å¼æºäº GLM-4-9B ç³»åæ¨¡åï¼å¨åé¡¹ææ çceæ¯ä¸æææ¾æåï¼æ¬¢è¿å°è¯ã
-
æºè°±æ¸ è¨ ä½éªææ°ç GLM-4ï¼å æ¬ GLMsï¼All toolsçåè½ã
-
APIå¹³å° æ°ä¸ä»£ API å¹³å°å·²ç»ä¸çº¿ï¼æ¨å¯ä»¥ç´æ¥å¨ API å¹³å°ä¸ä½éª
GLM-4-0520
ãGLM-4-air
ãGLM-4-airx
ãGLM-4-flash
ãGLM-4
ãGLM-3-Turbo
ãCharacterGLM-3
ï¼CogView-3
çæ°æ¨¡åã å ¶ä¸GLM-4
ãGLM-3-Turbo
ä¸¤ä¸ªæ¨¡åæ¯æäºSystem Prompt
ãFunction Call
ãRetrieval
ãWeb_Search
çæ°åè½ï¼æ¬¢è¿ä½éªã -
GLM-4 API 弿ºæç¨ GLM-4 APIæç¨ååºç¡åºç¨ï¼æ¬¢è¿å°è¯ã APIç¸å ³é®é¢å¯ä»¥å¨æ¬å¼æºæç¨çé®ï¼æè ä½¿ç¨ GLM-4 API AI婿 æ¥è·å¾å¸¸è§é®é¢ç帮å©ã
ä»ç»
ChatGLM2-6B æ¯å¼æºä¸è±åè¯å¯¹è¯æ¨¡å ChatGLM-6B ç第äºä»£çæ¬ï¼å¨ä¿çäºå代模åå¯¹è¯æµç ãé¨ç½²é¨æ§è¾ä½çä¼å¤ä¼ç§ç¹æ§çåºç¡ä¹ä¸ï¼ChatGLM2-6B å¼å ¥äºå¦ä¸æ°ç¹æ§ï¼
- æ´å¼ºå¤§çæ§è½ï¼åºäº ChatGLM å代模åçå¼åç»éªï¼æä»¬å ¨é¢åçº§äº ChatGLM2-6B çåºåº§æ¨¡åãChatGLM2-6B 使ç¨äº GLM çæ··åç®æ 彿°ï¼ç»è¿äº 1.4T ä¸è±æ è¯ç¬¦çé¢è®ç»ä¸äººç±»å好对é½è®ç»ï¼è¯æµç»ææ¾ç¤ºï¼ç¸æ¯äºå代模åï¼ChatGLM2-6B å¨ MMLUï¼+23%ï¼ãCEvalï¼+33%ï¼ãGSM8Kï¼+571%ï¼ ãBBHï¼+60%ï¼çæ°æ®éä¸çæ§è½åå¾äºå¤§å¹ 度çæåï¼å¨åå°ºå¯¸å¼æºæ¨¡åä¸å ·æè¾å¼ºçç«äºåã
- æ´é¿çä¸ä¸æï¼åºäº FlashAttention ææ¯ï¼æä»¬å°åºåº§æ¨¡åçä¸ä¸æé¿åº¦ï¼Context Lengthï¼ç± ChatGLM-6B ç 2K æ©å±å°äº 32Kï¼å¹¶å¨å¯¹è¯é¶æ®µä½¿ç¨ 8K çä¸ä¸æé¿åº¦è®ç»ãå¯¹äºæ´é¿çä¸ä¸æï¼æä»¬åå¸äº ChatGLM2-6B-32K 模åãLongBench çæµè¯ç»æè¡¨æï¼å¨çé级ç弿ºæ¨¡åä¸ï¼ChatGLM2-6B-32K æçè¾ä¸ºææ¾çç«äºä¼å¿ã
- æ´é«æçæ¨çï¼åºäº Multi-Query Attention ææ¯ï¼ChatGLM2-6B ææ´é«æçæ¨çéåº¦åæ´ä½çæ¾åå ç¨ï¼å¨å®æ¹ç模åå®ç°ä¸ï¼æ¨çéåº¦ç¸æ¯å代æåäº 42%ï¼INT4 éåä¸ï¼6G æ¾åæ¯æç对è¯é¿åº¦ç± 1K æåå°äº 8Kã
- **æ´å¼æ¾çåè®®ï¼ChatGLM2-6B æé坹妿¯ç ç©¶å®å ¨å¼æ¾ï¼å¨å¡«åé®å·è¿è¡ç»è®°å亦å 许å è´¹åä¸ä½¿ç¨**ã
ChatGLM2-6B 弿ºæ¨¡åæ¨å¨ä¸å¼æºç¤¾åºä¸èµ·æ¨å¨å¤§æ¨¡åææ¯åå±ï¼æ³è¯·å¼åè å大家éµå®å¼æºåè®®ï¼å¿å°å¼æºæ¨¡åå代ç ååºäºå¼æºé¡¹ç®äº§ççè¡çç©ç¨äºä»»ä½å¯è½ç»å½å®¶å社ä¼å¸¦æ¥å±å®³çç¨é以åç¨äºä»»ä½æªç»è¿å®å ¨è¯ä¼°å夿¡çæå¡ãç®åï¼æ¬é¡¹ç®å¢éæªåºäº ChatGLM2-6B å¼åä»»ä½åºç¨ï¼å æ¬ç½é¡µç«¯ãå®åãè¹æ iOS å Windows App çåºç¨ã
尽管模åå¨è®ç»çåä¸ªé¶æ®µé½å°½åç¡®ä¿æ°æ®çåè§æ§ååç¡®æ§ï¼ä½ç±äº ChatGLM2-6B 模åè§æ¨¡è¾å°ï¼ä¸æ¨¡å忦çéæºæ§å ç´ å½±åï¼æ æ³ä¿è¯è¾åºå 容çåç¡®æ§ï¼ä¸æ¨¡åæè¢«è¯¯å¯¼ãæ¬é¡¹ç®ä¸æ¿æ 弿ºæ¨¡åå代ç 导è´çæ°æ®å®å ¨ãèæ é£é©æåç任使¨¡åè¢«è¯¯å¯¼ãæ»¥ç¨ãä¼ æãä¸å½å©ç¨è产ççé£é©å责任ã
æ´æ°ä¿¡æ¯
[2023/07/31] åå¸ ChatGLM2-6B-32K 模åï¼æå对äºé¿ææ¬ççè§£è½åã
[2023/07/25] åå¸ CodeGeeX2 模åï¼åºäº ChatGLM2-6B å å ¥ä»£ç é¢è®ç»å®ç°ï¼ä»£ç è½åå ¨é¢æåã
[2023/07/04] åå¸ P-Tuning v2 ä¸ å ¨åæ°å¾®è°èæ¬ï¼åè§ P-Tuningã
åæ é¾æ¥
对 ChatGLM2 è¿è¡å éç弿ºé¡¹ç®ï¼
- fastllm: å ¨å¹³å°å éæ¨çæ¹æ¡ï¼åGPUæ¹éæ¨çæ¯ç§å¯è¾¾10000+tokenï¼ææºç«¯æä½3Gå å宿¶è¿è¡ï¼éªé¾865ä¸çº¦4~5 token/sï¼
- chatglm.cpp: 类似 llama.cpp ç CPU éåå éæ¨çæ¹æ¡ï¼å®ç° Mac ç¬è®°æ¬ä¸å®æ¶å¯¹è¯
- ChatGLM2-TPU: éç¨TPUå éæ¨çæ¹æ¡ï¼å¨ç®è½ç«¯ä¾§è¯çBM1684Xï¼16T@FP16ï¼å å16Gï¼ä¸å®æ¶è¿è¡çº¦5 token/s
åºäºæä½¿ç¨äº ChatGLM2-6B ç弿ºé¡¹ç®ï¼
- Chuanhu Chat: 为å个大è¯è¨æ¨¡ååå¨çº¿æ¨¡åAPIæä¾ç¾è§æç¨ãåè½ä¸°å¯ãå¿«éé¨ç½²çç¨æ·çé¢ï¼æ¯æChatGLM2-6Bã
æ¯æ ChatGLM-6B åç¸å ³åºç¨å¨çº¿è®ç»ç示ä¾é¡¹ç®ï¼
è¯æµç»æ
æä»¬éåäºé¨åä¸è±æå ¸åæ°æ®éè¿è¡äºè¯æµï¼ä»¥ä¸ä¸º ChatGLM2-6B 模åå¨ MMLU (è±æ)ãC-Evalï¼ä¸æï¼ãGSM8Kï¼æ°å¦ï¼ãBBHï¼è±æï¼ ä¸çæµè¯ç»æãå¨ evaluation 䏿ä¾äºå¨ C-Eval ä¸è¿è¡æµè¯çèæ¬ã
MMLU
Model | Average | STEM | Social Sciences | Humanities | Others |
---|---|---|---|---|---|
ChatGLM-6B | 40.63 | 33.89 | 44.84 | 39.02 | 45.71 |
ChatGLM2-6B (base) | 47.86 | 41.20 | 54.44 | 43.66 | 54.46 |
ChatGLM2-6B | 45.46 | 40.06 | 51.61 | 41.23 | 51.24 |
ChatGLM2-12B (base) | 56.18 | 48.18 | 65.13 | 52.58 | 60.93 |
ChatGLM2-12B | 52.13 | 47.00 | 61.00 | 46.10 | 56.05 |
Chat 模åä½¿ç¨ zero-shot CoT (Chain-of-Thought) çæ¹æ³æµè¯ï¼Base 模åä½¿ç¨ few-shot answer-only çæ¹æ³æµè¯
C-Eval
Model | Average | STEM | Social Sciences | Humanities | Others |
---|---|---|---|---|---|
ChatGLM-6B | 38.9 | 33.3 | 48.3 | 41.3 | 38.0 |
ChatGLM2-6B (base) | 51.7 | 48.6 | 60.5 | 51.3 | 49.8 |
ChatGLM2-6B | 50.1 | 46.4 | 60.4 | 50.6 | 46.9 |
ChatGLM2-12B (base) | 61.6 | 55.4 | 73.7 | 64.2 | 59.4 |
ChatGLM2-12B | 57.0 | 52.1 | 69.3 | 58.5 | 53.2 |
Chat 模åä½¿ç¨ zero-shot CoT çæ¹æ³æµè¯ï¼Base 模åä½¿ç¨ few-shot answer only çæ¹æ³æµè¯
GSM8K
Model | Accuracy | Accuracy (Chinese)* |
---|---|---|
ChatGLM-6B | 4.82 | 5.85 |
ChatGLM2-6B (base) | 32.37 | 28.95 |
ChatGLM2-6B | 28.05 | 20.45 |
ChatGLM2-12B (base) | 40.94 | 42.71 |
ChatGLM2-12B | 38.13 | 23.43 |
æææ¨¡ååä½¿ç¨ few-shot CoT çæ¹æ³æµè¯ï¼CoT prompt æ¥èª http://arxiv.org/abs/2201.11903
* æä»¬ä½¿ç¨ç¿»è¯ API ç¿»è¯äº GSM8K ä¸ç 500 éé¢ç®å CoT prompt å¹¶è¿è¡äºäººå·¥æ ¡å¯¹
BBH
Model | Accuracy |
---|---|
ChatGLM-6B | 18.73 |
ChatGLM2-6B (base) | 33.68 |
ChatGLM2-6B | 30.00 |
ChatGLM2-12B (base) | 36.02 |
ChatGLM2-12B | 39.98 |
æææ¨¡ååä½¿ç¨ few-shot CoT çæ¹æ³æµè¯ï¼CoT prompt æ¥èª https://github.com/suzgunmirac/BIG-Bench-Hard/tree/main/cot-prompts
æ¨çæ§è½
ChatGLM2-6B 使ç¨äº Multi-Query Attentionï¼æé«äºçæé度ãçæ 2000 个å符çå¹³åé度对æ¯å¦ä¸
Model | æ¨çé度 (å符/ç§) |
---|---|
ChatGLM-6B | 31.49 |
ChatGLM2-6B | 44.62 |
使ç¨å®æ¹å®ç°ï¼batch size = 1ï¼max length = 2048ï¼bf16 ç²¾åº¦ï¼æµè¯ç¡¬ä»¶ä¸º A100-SXM4-80Gï¼è½¯ä»¶ç¯å¢ä¸º PyTorch 2.0.1
Multi-Query Attention åæ¶ä¹éä½äºçæè¿ç¨ä¸ KV Cache çæ¾åå ç¨ï¼æ¤å¤ï¼ChatGLM2-6B éç¨ Causal Mask è¿è¡å¯¹è¯è®ç»ï¼è¿ç»å¯¹è¯æ¶å¯å¤ç¨åé¢è½®æ¬¡ç KV Cacheï¼è¿ä¸æ¥ä¼åäºæ¾åå ç¨ãå æ¤ï¼ä½¿ç¨ 6GB æ¾åçæ¾å¡è¿è¡ INT4 éåçæ¨çæ¶ï¼å代ç ChatGLM-6B 模åæå¤è½å¤çæ 1119 个åç¬¦å°±ä¼æç¤ºæ¾åèå°½ï¼è ChatGLM2-6B è½å¤çæè³å° 8192 个å符ã
éåç级 | ç¼ç 2048 é¿åº¦çæå°æ¾å | çæ 8192 é¿åº¦çæå°æ¾å |
---|---|---|
FP16 / BF16 | 13.1 GB | 12.8 GB |
INT8 | 8.2 GB | 8.1 GB |
INT4 | 5.5 GB | 5.1 GB |
ChatGLM2-6B å©ç¨äº PyTorch 2.0 å¼å ¥ç
torch.nn.functional.scaled_dot_product_attention
å®ç°é«æç Attention 计ç®ï¼å¦æ PyTorch çæ¬è¾ä½åä¼ fallback å°æ´ç´ ç Attention å®ç°ï¼åºç°æ¾åå ç¨é«äºä¸è¡¨çæ åµã
æä»¬ä¹æµè¯äºéåå¯¹æ¨¡åæ§è½çå½±åãç»æè¡¨æï¼éåå¯¹æ¨¡åæ§è½çå½±åå¨å¯æ¥åèå´å ã
éåç级 | Accuracy (MMLU) | Accuracy (C-Eval dev) |
---|---|---|
BF16 | 45.47 | 53.57 |
INT4 | 43.13 | 50.30 |
ChatGLM2-6B 示ä¾
ç¸æ¯äºå代模åï¼ChatGLM2-6B å¤ä¸ªç»´åº¦çè½åé½åå¾äºæåï¼ä»¥ä¸æ¯ä¸äºå¯¹æ¯ç¤ºä¾ãæ´å¤ ChatGLM2-6B çå¯è½ï¼çå¾ ä½ æ¥æ¢ç´¢åç°ï¼
æ°çé»è¾
ç¥è¯æ¨ç
é¿ææ¡£çè§£
ä½¿ç¨æ¹å¼
ç¯å¢å®è£
é¦å éè¦ä¸è½½æ¬ä»åºï¼
git clone https://github.com/THUDM/ChatGLM2-6B
cd ChatGLM2-6B
ç¶åä½¿ç¨ pip å®è£ ä¾èµï¼
pip install -r requirements.txt
å
¶ä¸ transformers
åºçæ¬æ¨è为 4.30.2
ï¼torch
æ¨èä½¿ç¨ 2.0 å以ä¸ççæ¬ï¼ä»¥è·å¾æä½³çæ¨çæ§è½ã
代ç è°ç¨
å¯ä»¥éè¿å¦ä¸ä»£ç è°ç¨ ChatGLM2-6B æ¨¡åæ¥çæå¯¹è¯ï¼
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='cuda')
>>> model = model.eval()
>>> response, history = model.chat(tokenizer, "ä½ å¥½", history=[])
>>> print(response)
ä½ å¥½ð!ææ¯äººå·¥æºè½å©æ ChatGLM2-6B,å¾é«å
´è§å°ä½ ,欢è¿é®æä»»ä½é®é¢ã
>>> response, history = model.chat(tokenizer, "æä¸ç¡ä¸çåºè¯¥æä¹å", history=history)
>>> print(response)
æä¸ç¡ä¸çå¯è½ä¼è®©ä½ æå°ç¦èæä¸èæ,ä½ä»¥ä¸æ¯ä¸äºå¯ä»¥å¸®å©ä½ å
¥ç¡çæ¹æ³:
1. å¶å®è§å¾çç¡ç æ¶é´è¡¨:ä¿æè§å¾çç¡ç æ¶é´è¡¨å¯ä»¥å¸®å©ä½ 建ç«å¥åº·çç¡ç ä¹ æ¯,ä½¿ä½ æ´å®¹æå
¥ç¡ãå°½é卿¯å¤©çç¸åæ¶é´ä¸åº,å¹¶å¨å䏿¶é´èµ·åºã
2. åé ä¸ä¸ªèéçç¡ç ç¯å¢:ç¡®ä¿ç¡ç ç¯å¢èé,å®é,黿䏿¸©åº¦éå®ãå¯ä»¥ä½¿ç¨èéçåºä¸ç¨å,å¹¶ä¿ææ¿é´éé£ã
3. æ¾æ¾èº«å¿:å¨ç¡ååäºæ¾æ¾çæ´»å¨,ä¾å¦æ³¡ä¸ªç水澡,å¬äºè½»æçé³ä¹,é
读ä¸äºæè¶£ç书ç±ç,æå©äºç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå
¥ç¡ã
4. é¿å
饮ç¨å«æåå¡å ç饮æ:åå¡å æ¯ä¸ç§åºæ¿æ§ç©è´¨,ä¼å½±åä½ çç¡ç è´¨éãå°½éé¿å
å¨ç¡å饮ç¨å«æåå¡å ç饮æ,ä¾å¦åå¡,è¶åå¯ä¹ã
5. é¿å
å¨åºä¸åä¸ç¡ç æ å
³çäºæ
:å¨åºä¸åäºä¸ç¡ç æ å
³çäºæ
,ä¾å¦ççµå½±,ç©æ¸¸ææå·¥ä½ç,å¯è½ä¼å¹²æ°ä½ çç¡ç ã
6. å°è¯å¼å¸æå·§:æ·±å¼å¸æ¯ä¸ç§æ¾æ¾æå·§,å¯ä»¥å¸®å©ä½ ç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå
¥ç¡ãè¯çæ
¢æ
¢å¸æ°,ä¿æå ç§é,ç¶åç¼æ
¢å¼æ°ã
妿è¿äºæ¹æ³æ æ³å¸®å©ä½ å
¥ç¡,ä½ å¯ä»¥èèå¨è¯¢å»çæç¡ç ä¸å®¶,寻æ±è¿ä¸æ¥ç建议ã
仿¬å°å 载模å
以ä¸ä»£ç ä¼ç± transformers
èªå¨ä¸è½½æ¨¡åå®ç°ååæ°ã宿´ç模åå®ç°å¨ Hugging Face Hubãå¦æä½ çç½ç»ç¯å¢è¾å·®ï¼ä¸è½½æ¨¡ååæ°å¯è½ä¼è±è´¹è¾é¿æ¶é´çè³å¤±è´¥ãæ¤æ¶å¯ä»¥å
å°æ¨¡åä¸è½½å°æ¬å°ï¼ç¶å仿¬å°å è½½ã
ä» Hugging Face Hub ä¸è½½æ¨¡åéè¦å å®è£ Git LFSï¼ç¶åè¿è¡
git clone https://huggingface.co/THUDM/chatglm2-6b
å¦æä½ ä» Hugging Face Hub ä¸ä¸è½½ checkpoint çéåº¦è¾æ ¢ï¼å¯ä»¥åªä¸è½½æ¨¡åå®ç°
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm2-6b
ç¶åä»è¿éæå¨ä¸è½½æ¨¡ååæ°æä»¶ï¼å¹¶å°ä¸è½½çæä»¶æ¿æ¢å°æ¬å°ç chatglm2-6b
ç®å½ä¸ã
å°æ¨¡åä¸è½½å°æ¬å°ä¹åï¼å°ä»¥ä¸ä»£ç ä¸ç THUDM/chatglm2-6b
æ¿æ¢ä¸ºä½ æ¬å°ç chatglm2-6b
æä»¶å¤¹çè·¯å¾ï¼å³å¯ä»æ¬å°å 载模åã
模åçå®ç°ä»ç¶å¤å¨åå¨ä¸ã妿叿åºå®ä½¿ç¨ç模åå®ç°ä»¥ä¿è¯å
¼å®¹æ§ï¼å¯ä»¥å¨ from_pretrained
çè°ç¨ä¸å¢å revision="v1.0"
åæ°ãv1.0
æ¯å½åææ°ççæ¬å·ï¼å®æ´ççæ¬å表åè§ Change Logã
ç½é¡µç Demo
å¯ä»¥éè¿ä»¥ä¸å½ä»¤å¯å¨åºäº Gradio çç½é¡µç demoï¼
python web_demo.py
å¯ä»¥éè¿ä»¥ä¸å½ä»¤å¯å¨åºäº Streamlit çç½é¡µç demoï¼
streamlit run web_demo2.py
ç½é¡µç demo ä¼è¿è¡ä¸ä¸ª Web Serverï¼å¹¶è¾åºå°åã卿µè§å¨ä¸æå¼è¾åºçå°åå³å¯ä½¿ç¨ã ç»æµè¯ï¼åºäº Streamlit çç½é¡µç Demo 伿´æµç ã
å½ä»¤è¡ Demo
è¿è¡ä»åºä¸ cli_demo.pyï¼
python cli_demo.py
ç¨åºä¼å¨å½ä»¤è¡ä¸è¿è¡äº¤äºå¼ç对è¯ï¼å¨å½ä»¤è¡ä¸è¾å
¥æç¤ºå¹¶å车å³å¯çæåå¤ï¼è¾å
¥ clear
å¯ä»¥æ¸
空对è¯åå²ï¼è¾å
¥ stop
ç»æ¢ç¨åºã
API é¨ç½²
é¦å
éè¦å®è£
é¢å¤çä¾èµ pip install fastapi uvicorn
ï¼ç¶åè¿è¡ä»åºä¸ç api.pyï¼
python api.py
é»è®¤é¨ç½²å¨æ¬å°ç 8000 端å£ï¼éè¿ POST æ¹æ³è¿è¡è°ç¨
curl -X POST "http://127.0.0.1:8000" \
-H 'Content-Type: application/json' \
-d '{"prompt": "ä½ å¥½", "history": []}'
å¾å°çè¿åå¼ä¸º
{
"response":"ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM2-6Bï¼å¾é«å
´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã",
"history":[["ä½ å¥½","ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM2-6Bï¼å¾é«å
´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã"]],
"status":200,
"time":"2023-03-23 21:38:40"
}
æè°¢ @hiyouga å®ç°äº OpenAI æ ¼å¼çæµå¼ API é¨ç½²ï¼å¯ä»¥ä½ä¸ºä»»æåºäº ChatGPT çåºç¨çåç«¯ï¼æ¯å¦ ChatGPT-Next-Webãå¯ä»¥éè¿è¿è¡ä»åºä¸çopenai_api.py è¿è¡é¨ç½²ï¼
python openai_api.py
è¿è¡ API è°ç¨ç示ä¾ä»£ç 为
import openai
if __name__ == "__main__":
openai.api_base = "http://localhost:8000/v1"
openai.api_key = "none"
for chunk in openai.ChatCompletion.create(
model="chatglm2-6b",
messages=[
{"role": "user", "content": "ä½ å¥½"}
],
stream=True
):
if hasattr(chunk.choices[0].delta, "content"):
print(chunk.choices[0].delta.content, end="", flush=True)
使æ¬é¨ç½²
模åéå
é»è®¤æ åµä¸ï¼æ¨¡å以 FP16 精度å è½½ï¼è¿è¡ä¸è¿°ä»£ç éè¦å¤§æ¦ 13GB æ¾åãå¦æä½ ç GPU æ¾åæéï¼å¯ä»¥å°è¯ä»¥éåæ¹å¼å 载模åï¼ä½¿ç¨æ¹æ³å¦ä¸ï¼
model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4",trust_remote_code=True).cuda()
模åéåä¼å¸¦æ¥ä¸å®çæ§è½æå¤±ï¼ç»è¿æµè¯ï¼ChatGLM2-6B å¨ 4-bit éåä¸ä»ç¶è½å¤è¿è¡èªç¶æµç ççæã é忍¡åçåæ°æä»¶ä¹å¯ä»¥ä»è¿éæå¨ä¸è½½ã
CPU é¨ç½²
å¦æä½ æ²¡æ GPU 硬件çè¯ï¼ä¹å¯ä»¥å¨ CPU ä¸è¿è¡æ¨çï¼ä½æ¯æ¨çéåº¦ä¼æ´æ ¢ãä½¿ç¨æ¹æ³å¦ä¸ï¼éè¦å¤§æ¦ 32GB å åï¼
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).float()
å¦æä½ çå åä¸è¶³çè¯ï¼ä¹å¯ä»¥ä½¿ç¨éååçæ¨¡å
model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4",trust_remote_code=True).float()
å¨ cpu ä¸è¿è¡éååçæ¨¡åéè¦å®è£
gcc
ä¸ openmp
ã夿° Linux åè¡çé»è®¤å·²å®è£
ãå¯¹äº Windows ï¼å¯å¨å®è£
TDM-GCC æ¶å¾é openmp
ã Windows æµè¯ç¯å¢ gcc
çæ¬ä¸º TDM-GCC 10.3.0
ï¼ Linux 为 gcc 11.3.0
ãå¨ MacOS ä¸è¯·åè Q1ã
Mac é¨ç½²
å¯¹äºæè½½äº Apple Silicon æè AMD GPU ç Macï¼å¯ä»¥ä½¿ç¨ MPS å端æ¥å¨ GPU ä¸è¿è¡ ChatGLM2-6Bãéè¦åè Apple ç 宿¹è¯´æ å®è£ PyTorch-Nightlyï¼æ£ç¡®ççæ¬å·åºè¯¥æ¯2.x.x.dev2023xxxxï¼è䏿¯ 2.x.xï¼ã
ç®åå¨ MacOS ä¸åªæ¯æä»æ¬å°å 载模åãå°ä»£ç ä¸ç模åå è½½æ¹ä¸ºä»æ¬å°å è½½ï¼å¹¶ä½¿ç¨ mps å端ï¼
model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')
å è½½å精度ç ChatGLM2-6B 模åéè¦å¤§æ¦ 13GB å åãå åè¾å°çæºå¨ï¼æ¯å¦ 16GB å åç MacBook Proï¼ï¼å¨ç©ºä½å åä¸è¶³çæ åµä¸ä¼ä½¿ç¨ç¡¬çä¸çèæå åï¼å¯¼è´æ¨çé度严éåæ ¢ã æ¤æ¶å¯ä»¥ä½¿ç¨éååçæ¨¡å chatglm2-6b-int4ãå 为 GPU ä¸éåç kernel æ¯ä½¿ç¨ CUDA ç¼åçï¼å æ¤æ æ³å¨ MacOS ä¸ä½¿ç¨ï¼åªè½ä½¿ç¨ CPU è¿è¡æ¨çã 为äºå åä½¿ç¨ CPU å¹¶è¡ï¼è¿éè¦åç¬å®è£ OpenMPã
å¨ Mac ä¸è¿è¡æ¨çä¹å¯ä»¥ä½¿ç¨ ChatGLM.cpp
å¤å¡é¨ç½²
å¦æä½ æå¤å¼ GPUï¼ä½æ¯æ¯å¼ GPU çæ¾å大å°é½ä¸è¶³ä»¥å®¹çº³å®æ´ç模åï¼é£ä¹å¯ä»¥å°æ¨¡åååå¨å¤å¼ GPUä¸ãé¦å
å®è£
accelerate: pip install accelerate
ï¼ç¶åéè¿å¦ä¸æ¹æ³å 载模åï¼
from utils import load_model_on_gpus
model = load_model_on_gpus("THUDM/chatglm2-6b", num_gpus=2)
å³å¯å°æ¨¡åé¨ç½²å°ä¸¤å¼ GPU ä¸è¿è¡æ¨çãä½ å¯ä»¥å° num_gpus
æ¹ä¸ºä½ å¸æä½¿ç¨ç GPU æ°ãé»è®¤æ¯ååååçï¼ä½ ä¹å¯ä»¥ä¼ å
¥ device_map
åæ°æ¥èªå·±æå®ã
åè®®
æ¬ä»åºç代ç ä¾ç § Apache-2.0 åè®®å¼æºï¼ChatGLM2-6B 模åçæéç使ç¨åéè¦éµå¾ª Model LicenseãChatGLM2-6B æé坹妿¯ç ç©¶å®å ¨å¼æ¾ï¼å¨å¡«åé®å·è¿è¡ç»è®°å**亦å 许å è´¹åä¸ä½¿ç¨**ã
å¼ç¨
å¦æä½ è§å¾æä»¬ç工使叮å©çè¯ï¼è¯·èèå¼ç¨ä¸å论æï¼ChatGLM2-6B ç论æä¼å¨è¿æå ¬å¸ï¼æ¬è¯·æå¾ ï½
@misc{glm2024chatglm,
title={ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools},
author={Team GLM and Aohan Zeng and Bin Xu and Bowen Wang and Chenhui Zhang and Da Yin and Diego Rojas and Guanyu Feng and Hanlin Zhao and Hanyu Lai and Hao Yu and Hongning Wang and Jiadai Sun and Jiajie Zhang and Jiale Cheng and Jiayi Gui and Jie Tang and Jing Zhang and Juanzi Li and Lei Zhao and Lindong Wu and Lucen Zhong and Mingdao Liu and Minlie Huang and Peng Zhang and Qinkai Zheng and Rui Lu and Shuaiqi Duan and Shudan Zhang and Shulin Cao and Shuxun Yang and Weng Lam Tam and Wenyi Zhao and Xiao Liu and Xiao Xia and Xiaohan Zhang and Xiaotao Gu and Xin Lv and Xinghan Liu and Xinyi Liu and Xinyue Yang and Xixuan Song and Xunkai Zhang and Yifan An and Yifan Xu and Yilin Niu and Yuantao Yang and Yueyan Li and Yushi Bai and Yuxiao Dong and Zehan Qi and Zhaoyu Wang and Zhen Yang and Zhengxiao Du and Zhenyu Hou and Zihan Wang},
year={2024},
eprint={2406.12793},
archivePrefix={arXiv},
primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}
Top Related Projects
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot