ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

15,724

1,833

15,724

454

View on GitHub

Top Related Projects

ChatGLM-6B

41,113

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

ChatGLM2-6B

15,722

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

FlagAI

3,873

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

Qwen

18,976

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Baichuan-7B

5,685

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

GLM-130B

7,691

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Quick Overview

ChatGLM2-6B is an open-source, bilingual (Chinese and English) dialogue model developed by Tsinghua University. It is an improved version of ChatGLM-6B, featuring enhanced performance, longer context, and better stability. The model is designed for efficient deployment and fine-tuning on consumer-grade graphics cards.

Pros

Improved performance and longer context compared to its predecessor
Efficient deployment on consumer-grade hardware
Open-source and freely available for research and commercial use
Supports both Chinese and English languages

Cons

Limited to 6 billion parameters, which may not match the capabilities of larger models
May require fine-tuning for specific domain applications
Potential biases or limitations inherent in large language models
Limited support for languages other than Chinese and English

Code Examples

# Loading the model
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()

# Generating a response
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

# Streaming output
for response, history in model.stream_chat(tokenizer, "请介绍一下你自己", history=[]):
    print(response)

Getting Started

To get started with ChatGLM2-6B, follow these steps:

Install the required dependencies:
```
pip install transformers torch
```

Load the model and tokenizer:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()

Start a conversation:

response, history = model.chat(tokenizer, "你好", history=[])
print(response)

For more advanced usage and fine-tuning instructions, refer to the project's GitHub repository.

Competitor Comparisons

ChatGLM-6B

41,113

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Pros of ChatGLM-6B

More established and potentially more stable codebase
Wider community adoption and support
Potentially more comprehensive documentation due to longer development time

Cons of ChatGLM-6B

Older architecture, potentially less efficient
May lack some of the latest improvements and features
Possibly slower inference speed compared to the newer version

Code Comparison

ChatGLM-6B:

def forward(self, input_ids, position_ids=None, attention_mask=None):
    batch_size, seq_length = input_ids.size()
    if attention_mask is None:
        attention_mask = torch.ones((batch_size, seq_length), device=input_ids.device)

ChatGLM2-6B:

def forward(self, input_ids, position_ids=None, attention_mask=None, past_key_values=None):
    batch_size, seq_length = input_ids.size()
    if attention_mask is None:
        if past_key_values is None:
            attention_mask = torch.ones((batch_size, seq_length), device=input_ids.device)

The main difference in the code snippets is that ChatGLM2-6B includes support for past_key_values, which can potentially improve efficiency in processing sequential data. This suggests that ChatGLM2-6B may have optimizations for better performance in certain scenarios.

ChatGLM2-6B

15,722

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

Pros of ChatGLM2-6B

Improved performance and efficiency compared to its predecessor
Enhanced multilingual capabilities, especially in Chinese and English
Better context understanding and more coherent responses

Cons of ChatGLM2-6B

May require more computational resources due to increased model size
Potential limitations in handling specialized or domain-specific tasks
Possible biases inherited from training data

Code Comparison

ChatGLM2-6B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

ChatGLM2-6B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

Both repositories use the same code for initializing and using the ChatGLM2-6B model. The main differences lie in the model's architecture and training data rather than the implementation code.

FlagAI

3,873

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

Pros of FlagAI

Broader scope: FlagAI is a comprehensive AI toolkit supporting various tasks beyond language models
More extensive documentation and examples for different AI applications
Active community and regular updates

Cons of FlagAI

Larger codebase and potentially steeper learning curve
May require more computational resources due to its broader feature set
Less focused on specific language model tasks compared to ChatGLM2-6B

Code Comparison

ChatGLM2-6B example:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello", history=[])
print(response)

FlagAI example:

from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor

auto_loader = AutoLoader("GLM-large-ch", "lm")
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()
predictor = Predictor(model, tokenizer)
response = predictor.predict_generate_randomsample("Hello")
print(response)

Both examples demonstrate how to load and use the respective models for text generation. ChatGLM2-6B focuses on chat-style interactions, while FlagAI showcases its more general-purpose approach to language modeling.

Qwen

18,976

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Pros of Qwen

Supports a wider range of languages and tasks, including code generation
Offers more extensive documentation and usage examples
Provides pre-trained models of various sizes (1.8B, 7B, 14B, and 72B parameters)

Cons of Qwen

Requires more computational resources due to larger model sizes
May have a steeper learning curve for beginners due to its complexity
Less focused on Chinese language processing compared to ChatGLM2-6B

Code Comparison

ChatGLM2-6B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

Qwen:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True)
response, history = model.chat(tokenizer, "Hello", history=[])
print(response)

Both repositories provide easy-to-use interfaces for loading and interacting with their respective models. The main differences lie in the model names and the specific methods used for chat functionality.

Baichuan-7B

5,685

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Pros of Baichuan-7B

Larger model size (7B parameters) potentially offering better performance
Supports both Chinese and English languages
More recent release with newer training data

Cons of Baichuan-7B

Less documentation and examples compared to ChatGLM2-6B
Fewer fine-tuning options and tools provided in the repository
Limited community contributions and discussions

Code Comparison

ChatGLM2-6B example:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

Baichuan-7B example:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Both repositories provide pre-trained language models, but they differ in size, language support, and implementation details. ChatGLM2-6B offers more comprehensive documentation and examples, while Baichuan-7B provides a larger model with bilingual capabilities.

GLM-130B

7,691

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Pros of GLM-130B

Larger model size (130B parameters) potentially offering higher performance
More suitable for complex tasks requiring deeper understanding
Likely better at handling nuanced or specialized domains

Cons of GLM-130B

Requires significantly more computational resources
Slower inference time due to larger size
May be overkill for simpler tasks or resource-constrained environments

Code Comparison

GLM-130B:

model = GLM130B.from_pretrained("path/to/model")
input_ids = tokenizer.encode("Hello, how are you?", return_tensors="pt")
output = model.generate(input_ids, max_length=50)

ChatGLM2-6B:

model = ChatGLM2Model.from_pretrained("THUDM/chatglm2-6b")
input_ids = tokenizer.encode("Hello, how are you?", return_tensors="pt")
output = model.generate(input_ids, max_length=50)

The code usage is similar for both models, with the main difference being the model class and pretrained model path. GLM-130B may require additional setup due to its larger size and resource requirements.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ChatGLM2-6B

ð¤ HF Repo â¢ ð¦ Twitter â¢ ð [GLM@ACL 22] [GitHub] â¢ ð [GLM-130B@ICLR 23] [GitHub]

ð å å¥æä»¬ç Discord å WeChat

ðå¨ chatglm.cn ä½éªæ´å¤§è§æ¨¡ç ChatGLM æ¨¡åã

Read this in English

GLM-4 å¼æºæ¨¡ååAPI

GLM-4 å¼æºæ¨¡å æä»¬å·²ç»å¼æºäº GLM-4-9B ç³»åæ¨¡åï¼å¨åé¡¹ææ çceæ¯ä¸æææ¾æåï¼æ¬¢è¿å°è¯ã
æºè°±æ¸è¨ ä½éªææ°ç GLM-4ï¼åæ¬ GLMsï¼All toolsçåè½ã
APIå¹³å° æ°ä¸ä»£ API å¹³å°å·²ç»ä¸çº¿ï¼æ¨å¯ä»¥ç´æ¥å¨ API å¹³å°ä¸ä½éª GLM-4-0520ãGLM-4-airãGLM-4-airxãGLM-4-flashãGLM-4ãGLM-3-TurboãCharacterGLM-3ï¼CogView-3 çæ°æ¨¡åã å¶ä¸GLM-4ãGLM-3-Turboä¸¤ä¸ªæ¨¡åæ¯æäº System PromptãFunction Callã RetrievalãWeb_Searchçæ°åè½ï¼æ¬¢è¿ä½éªã
GLM-4 API å¼æºæç¨ GLM-4 APIæç¨ååºç¡åºç¨ï¼æ¬¢è¿å°è¯ã APIç¸å³é®é¢å¯ä»¥å¨æ¬å¼æºæç¨çé®ï¼æèä½¿ç¨ GLM-4 API AIå©æ æ¥è·å¾å¸¸è§é®é¢çå¸®å©ã

ä»ç»

æ´å¼ºå¤§çæ§è½ï¼åºäº ChatGLM åä»£æ¨¡åçå¼åç»éªï¼æä»¬å¨é¢åçº§äº ChatGLM2-6B çåºåº§æ¨¡åãChatGLM2-6B ä½¿ç¨äº GLM çæ··åç®æ å½æ°ï¼ç»è¿äº 1.4T ä¸è±æ è¯ç¬¦çé¢è®ç»ä¸äººç±»åå¥½å¯¹é½è®ç»ï¼è¯æµç»ææ¾ç¤ºï¼ç¸æ¯äºåä»£æ¨¡åï¼ChatGLM2-6B å¨ MMLUï¼+23%ï¼ãCEvalï¼+33%ï¼ãGSM8Kï¼+571%ï¼ ãBBHï¼+60%ï¼çæ°æ®éä¸çæ§è½åå¾äºå¤§å¹åº¦çæåï¼å¨åå°ºå¯¸å¼æºæ¨¡åä¸å·æè¾å¼ºçç«äºåã
æ´é¿çä¸ä¸æï¼åºäº FlashAttention ææ¯ï¼æä»¬å°åºåº§æ¨¡åçä¸ä¸æé¿åº¦ï¼Context Lengthï¼ç± ChatGLM-6B ç 2K æ©å±å°äº 32Kï¼å¹¶å¨å¯¹è¯é¶æ®µä½¿ç¨ 8K çä¸ä¸æé¿åº¦è®ç»ãå¯¹äºæ´é¿çä¸ä¸æï¼æä»¬åå¸äº ChatGLM2-6B-32K æ¨¡åãLongBench çæµè¯ç»æè¡¨æï¼å¨çéçº§çå¼æºæ¨¡åä¸ï¼ChatGLM2-6B-32K æçè¾ä¸ºææ¾çç«äºä¼å¿ã
æ´é«æçæ¨çï¼åºäº Multi-Query Attention ææ¯ï¼ChatGLM2-6B ææ´é«æçæ¨çéåº¦åæ´ä½çæ¾åå ç¨ï¼å¨å®æ¹çæ¨¡åå®ç°ä¸ï¼æ¨çéåº¦ç¸æ¯åä»£æåäº 42%ï¼INT4 éåä¸ï¼6G æ¾åæ¯æçå¯¹è¯é¿åº¦ç± 1K æåå°äº 8Kã
**æ´å¼æ¾çåè®®ï¼ChatGLM2-6B æéå¯¹å¦æ¯ç ç©¶å®å¨å¼æ¾ï¼å¨å¡«åé®å·è¿è¡ç»è®°åäº¦åè®¸åè´¹åä¸ä½¿ç¨**ã

æ´æ°ä¿¡æ¯

[2023/07/31] åå¸ ChatGLM2-6B-32K æ¨¡åï¼æåå¯¹äºé¿ææ¬ççè§£è½åã

[2023/07/25] åå¸ CodeGeeX2 æ¨¡åï¼åºäº ChatGLM2-6B å å¥ä»£ç é¢è®ç»å®ç°ï¼ä»£ç è½åå¨é¢æåã

[2023/07/04] åå¸ P-Tuning v2 ä¸ å¨åæ°å¾®è°èæ¬ï¼åè§ P-Tuningã

åæé¾æ¥

å¯¹ ChatGLM2 è¿è¡å éçå¼æºé¡¹ç®ï¼

fastllm: å¨å¹³å°å éæ¨çæ¹æ¡ï¼åGPUæ¹éæ¨çæ¯ç§å¯è¾¾10000+tokenï¼ææºç«¯æä½3Gååå®æ¶è¿è¡ï¼éªé¾865ä¸çº¦4~5 token/sï¼
chatglm.cpp: ç±»ä¼¼ llama.cpp ç CPU éåå éæ¨çæ¹æ¡ï¼å®ç° Mac ç¬è®°æ¬ä¸å®æ¶å¯¹è¯
ChatGLM2-TPU: éç¨TPUå éæ¨çæ¹æ¡ï¼å¨ç®è½ç«¯ä¾§è¯çBM1684Xï¼16T@FP16ï¼åå16Gï¼ä¸å®æ¶è¿è¡çº¦5 token/s

åºäºæä½¿ç¨äº ChatGLM2-6B çå¼æºé¡¹ç®ï¼

Chuanhu Chat: ä¸ºåä¸ªå¤§è¯è¨æ¨¡ååå¨çº¿æ¨¡åAPIæä¾ç¾è§æç¨ãåè½ä¸°å¯ãå¿«éé¨ç½²çç¨æ·çé¢ï¼æ¯æChatGLM2-6Bã

æ¯æ ChatGLM-6B åç¸å³åºç¨å¨çº¿è®ç»çç¤ºä¾é¡¹ç®ï¼

è¯æµç»æ

MMLU

Model	Average	STEM	Social Sciences	Humanities	Others
ChatGLM-6B	40.63	33.89	44.84	39.02	45.71
ChatGLM2-6B (base)	47.86	41.20	54.44	43.66	54.46
ChatGLM2-6B	45.46	40.06	51.61	41.23	51.24
ChatGLM2-12B (base)	56.18	48.18	65.13	52.58	60.93
ChatGLM2-12B	52.13	47.00	61.00	46.10	56.05

Chat æ¨¡åä½¿ç¨ zero-shot CoT (Chain-of-Thought) çæ¹æ³æµè¯ï¼Base æ¨¡åä½¿ç¨ few-shot answer-only çæ¹æ³æµè¯

C-Eval

Model	Average	STEM	Social Sciences	Humanities	Others
ChatGLM-6B	38.9	33.3	48.3	41.3	38.0
ChatGLM2-6B (base)	51.7	48.6	60.5	51.3	49.8
ChatGLM2-6B	50.1	46.4	60.4	50.6	46.9
ChatGLM2-12B (base)	61.6	55.4	73.7	64.2	59.4
ChatGLM2-12B	57.0	52.1	69.3	58.5	53.2

Chat æ¨¡åä½¿ç¨ zero-shot CoT çæ¹æ³æµè¯ï¼Base æ¨¡åä½¿ç¨ few-shot answer only çæ¹æ³æµè¯

GSM8K

Model	Accuracy	Accuracy (Chinese)*
ChatGLM-6B	4.82	5.85
ChatGLM2-6B (base)	32.37	28.95
ChatGLM2-6B	28.05	20.45
ChatGLM2-12B (base)	40.94	42.71
ChatGLM2-12B	38.13	23.43

æææ¨¡ååä½¿ç¨ few-shot CoT çæ¹æ³æµè¯ï¼CoT prompt æ¥èª http://arxiv.org/abs/2201.11903

* æä»¬ä½¿ç¨ç¿»è¯ API ç¿»è¯äº GSM8K ä¸ç 500 éé¢ç®å CoT prompt å¹¶è¿è¡äºäººå·¥æ ¡å¯¹

BBH

Model	Accuracy
ChatGLM-6B	18.73
ChatGLM2-6B (base)	33.68
ChatGLM2-6B	30.00
ChatGLM2-12B (base)	36.02
ChatGLM2-12B	39.98

æææ¨¡ååä½¿ç¨ few-shot CoT çæ¹æ³æµè¯ï¼CoT prompt æ¥èª https://github.com/suzgunmirac/BIG-Bench-Hard/tree/main/cot-prompts

æ¨çæ§è½

ChatGLM2-6B ä½¿ç¨äº Multi-Query Attentionï¼æé«äºçæéåº¦ãçæ 2000 ä¸ªåç¬¦çå¹³åéåº¦å¯¹æ¯å¦ä¸

Model	æ¨çéåº¦ (åç¬¦/ç§)
ChatGLM-6B	31.49
ChatGLM2-6B	44.62

ä½¿ç¨å®æ¹å®ç°ï¼batch size = 1ï¼max length = 2048ï¼bf16 ç²¾åº¦ï¼æµè¯ç¡¬ä»¶ä¸º A100-SXM4-80Gï¼è½¯ä»¶ç¯å¢ä¸º PyTorch 2.0.1

Multi-Query Attention åæ¶ä¹éä½äºçæè¿ç¨ä¸ KV Cache çæ¾åå ç¨ï¼æ¤å¤ï¼ChatGLM2-6B éç¨ Causal Mask è¿è¡å¯¹è¯è®ç»ï¼è¿ç»å¯¹è¯æ¶å¯å¤ç¨åé¢è½®æ¬¡ç KV Cacheï¼è¿ä¸æ¥ä¼åäºæ¾åå ç¨ãå æ¤ï¼ä½¿ç¨ 6GB æ¾åçæ¾å¡è¿è¡ INT4 éåçæ¨çæ¶ï¼åä»£ç ChatGLM-6B æ¨¡åæå¤è½å¤çæ 1119 ä¸ªåç¬¦å°±ä¼æç¤ºæ¾åèå°½ï¼è ChatGLM2-6B è½å¤çæè³å° 8192 ä¸ªåç¬¦ã

éåççº§	ç¼ç 2048 é¿åº¦çæå°æ¾å	çæ 8192 é¿åº¦çæå°æ¾å
FP16 / BF16	13.1 GB	12.8 GB
INT8	8.2 GB	8.1 GB
INT4	5.5 GB	5.1 GB

ChatGLM2-6B å©ç¨äº PyTorch 2.0 å¼å¥ç torch.nn.functional.scaled_dot_product_attention å®ç°é«æç Attention è®¡ç®ï¼å¦æ PyTorch çæ¬è¾ä½åä¼ fallback å°æ´ç´ ç Attention å®ç°ï¼åºç°æ¾åå ç¨é«äºä¸è¡¨çæåµã

éåççº§	Accuracy (MMLU)	Accuracy (C-Eval dev)
BF16	45.47	53.57
INT4	43.13	50.30

ChatGLM2-6B ç¤ºä¾

æ°çé»è¾

ç¥è¯æ¨ç

é¿ææ¡£çè§£

ä½¿ç¨æ¹å¼

ç¯å¢å®è£

é¦åéè¦ä¸è½½æ¬ä»åºï¼

git clone https://github.com/THUDM/ChatGLM2-6B
cd ChatGLM2-6B

ç¶åä½¿ç¨ pip å®è£ä¾èµï¼

pip install -r requirements.txt

å¶ä¸ transformers åºçæ¬æ¨èä¸º 4.30.2ï¼torch æ¨èä½¿ç¨ 2.0 åä»¥ä¸ççæ¬ï¼ä»¥è·å¾æä½³çæ¨çæ§è½ã

ä»£ç è°ç¨

å¯ä»¥éè¿å¦ä¸ä»£ç è°ç¨ ChatGLM2-6B æ¨¡åæ¥çæå¯¹è¯ï¼

>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='cuda')
>>> model = model.eval()
>>> response, history = model.chat(tokenizer, "ä½ å¥½", history=[])
>>> print(response)
ä½ å¥½ð!ææ¯äººå·¥æºè½å©æ ChatGLM2-6B,å¾é«å´è§å°ä½ ,æ¬¢è¿é®æä»»ä½é®é¢ã
>>> response, history = model.chat(tokenizer, "æä¸ç¡ä¸çåºè¯¥æä¹å", history=history)
>>> print(response)
æä¸ç¡ä¸çå¯è½ä¼è®©ä½ æå°ç¦èæä¸èæ,ä½ä»¥ä¸æ¯ä¸äºå¯ä»¥å¸®å©ä½ å¥ç¡çæ¹æ³:

1. å¶å®è§å¾çç¡ç æ¶é´è¡¨:ä¿æè§å¾çç¡ç æ¶é´è¡¨å¯ä»¥å¸®å©ä½ å»ºç«å¥åº·çç¡ç ä¹ æ¯,ä½¿ä½ æ´å®¹æå¥ç¡ãå°½éå¨æ¯å¤©çç¸åæ¶é´ä¸åº,å¹¶å¨åä¸æ¶é´èµ·åºã
2. åé ä¸ä¸ªèéçç¡ç ç¯å¢:ç¡®ä¿ç¡ç ç¯å¢èé,å®é,é»æä¸æ¸©åº¦éå®ãå¯ä»¥ä½¿ç¨èéçåºä¸ç¨å,å¹¶ä¿ææ¿é´éé£ã
3. æ¾æ¾èº«å¿:å¨ç¡ååäºæ¾æ¾çæ´»å¨,ä¾å¦æ³¡ä¸ªçæ°´æ¾¡,å¬äºè½»æçé³ä¹,éè¯»ä¸äºæè¶£çä¹¦ç±ç,æå©äºç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå¥ç¡ã
4. é¿åé¥®ç¨å«æåå¡å çé¥®æ:åå¡å æ¯ä¸ç§åºæ¿æ§ç©è´¨,ä¼å½±åä½ çç¡ç è´¨éãå°½éé¿åå¨ç¡åé¥®ç¨å«æåå¡å çé¥®æ,ä¾å¦åå¡,è¶åå¯ä¹ã
5. é¿åå¨åºä¸åä¸ç¡ç æ å³çäºæ:å¨åºä¸åäºä¸ç¡ç æ å³çäºæ,ä¾å¦ççµå½±,ç©æ¸¸ææå·¥ä½ç,å¯è½ä¼å¹²æ°ä½ çç¡ç ã
6. å°è¯å¼å¸æå·§:æ·±å¼å¸æ¯ä¸ç§æ¾æ¾æå·§,å¯ä»¥å¸®å©ä½ ç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå¥ç¡ãè¯çæ¢æ¢å¸æ°,ä¿æå ç§é,ç¶åç¼æ¢å¼æ°ã

å¦æè¿äºæ¹æ³æ æ³å¸®å©ä½ å¥ç¡,ä½ å¯ä»¥èèå¨è¯¢å»çæç¡ç ä¸å®¶,å¯»æ±è¿ä¸æ¥çå»ºè®®ã

ä»æ¬å°å è½½æ¨¡å

ä» Hugging Face Hub ä¸è½½æ¨¡åéè¦åå®è£Git LFSï¼ç¶åè¿è¡

git clone https://huggingface.co/THUDM/chatglm2-6b

å¦æä½ ä» Hugging Face Hub ä¸ä¸è½½ checkpoint çéåº¦è¾æ¢ï¼å¯ä»¥åªä¸è½½æ¨¡åå®ç°

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm2-6b

ç½é¡µç Demo

web-demo å¯ä»¥éè¿ä»¥ä¸å½ä»¤å¯å¨åºäº Gradio çç½é¡µç demoï¼

python web_demo.py

web-demo

streamlit run web_demo2.py

å½ä»¤è¡ Demo

cli-demo

è¿è¡ä»åºä¸ cli_demo.pyï¼

python cli_demo.py

API é¨ç½²

é¦åéè¦å®è£é¢å¤çä¾èµ pip install fastapi uvicornï¼ç¶åè¿è¡ä»åºä¸ç api.pyï¼

python api.py

curl -X POST "http://127.0.0.1:8000" \
     -H 'Content-Type: application/json' \
     -d '{"prompt": "ä½ å¥½", "history": []}'

å¾å°çè¿åå¼ä¸º

{
  "response":"ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM2-6Bï¼å¾é«å´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã",
  "history":[["ä½ å¥½","ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM2-6Bï¼å¾é«å´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã"]],
  "status":200,
  "time":"2023-03-23 21:38:40"
}

æè°¢ @hiyouga å®ç°äº OpenAI æ ¼å¼çæµå¼ API é¨ç½²ï¼å¯ä»¥ä½ä¸ºä»»æåºäº ChatGPT çåºç¨çåç«¯ï¼æ¯å¦ ChatGPT-Next-Webãå¯ä»¥éè¿è¿è¡ä»åºä¸çopenai_api.py è¿è¡é¨ç½²ï¼

python openai_api.py

è¿è¡ API è°ç¨çç¤ºä¾ä»£ç ä¸º

import openai
if __name__ == "__main__":
    openai.api_base = "http://localhost:8000/v1"
    openai.api_key = "none"
    for chunk in openai.ChatCompletion.create(
        model="chatglm2-6b",
        messages=[
            {"role": "user", "content": "ä½ å¥½"}
        ],
        stream=True
    ):
        if hasattr(chunk.choices[0].delta, "content"):
            print(chunk.choices[0].delta.content, end="", flush=True)

ä½ææ¬é¨ç½²

æ¨¡åéå

model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4",trust_remote_code=True).cuda()

CPU é¨ç½²

model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).float()

å¦æä½ çååä¸è¶³çè¯ï¼ä¹å¯ä»¥ä½¿ç¨éååçæ¨¡å

model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4",trust_remote_code=True).float()

å¨ cpu ä¸è¿è¡éååçæ¨¡åéè¦å®è£ gcc ä¸ openmpãå¤æ° Linux åè¡çé»è®¤å·²å®è£ãå¯¹äº Windows ï¼å¯å¨å®è£ TDM-GCC æ¶å¾é openmpã Windows æµè¯ç¯å¢ gcc çæ¬ä¸º TDM-GCC 10.3.0ï¼ Linux ä¸º gcc 11.3.0ãå¨ MacOS ä¸è¯·åè Q1ã

Mac é¨ç½²

å¯¹äºæè½½äº Apple Silicon æè AMD GPU ç Macï¼å¯ä»¥ä½¿ç¨ MPS åç«¯æ¥å¨ GPU ä¸è¿è¡ ChatGLM2-6Bãéè¦åè Apple ç å®æ¹è¯´æ å®è£ PyTorch-Nightlyï¼æ£ç¡®ççæ¬å·åºè¯¥æ¯2.x.x.dev2023xxxxï¼èä¸æ¯ 2.x.xï¼ã

model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')

å è½½åç²¾åº¦ç ChatGLM2-6B æ¨¡åéè¦å¤§æ¦ 13GB ååãååè¾å°çæºå¨ï¼æ¯å¦ 16GB ååç MacBook Proï¼ï¼å¨ç©ºä½ååä¸è¶³çæåµä¸ä¼ä½¿ç¨ç¡¬çä¸çèæååï¼å¯¼è´æ¨çéåº¦ä¸¥éåæ¢ã æ¤æ¶å¯ä»¥ä½¿ç¨éååçæ¨¡å chatglm2-6b-int4ãå ä¸º GPU ä¸éåç kernel æ¯ä½¿ç¨ CUDA ç¼åçï¼å æ¤æ æ³å¨ MacOS ä¸ä½¿ç¨ï¼åªè½ä½¿ç¨ CPU è¿è¡æ¨çã ä¸ºäºååä½¿ç¨ CPU å¹¶è¡ï¼è¿éè¦åç¬å®è£ OpenMPã

å¨ Mac ä¸è¿è¡æ¨çä¹å¯ä»¥ä½¿ç¨ ChatGLM.cpp

å¤å¡é¨ç½²

from utils import load_model_on_gpus
model = load_model_on_gpus("THUDM/chatglm2-6b", num_gpus=2)

åè®®

æ¬ä»åºçä»£ç ä¾ç§ Apache-2.0 åè®®å¼æºï¼ChatGLM2-6B æ¨¡åçæéçä½¿ç¨åéè¦éµå¾ª Model LicenseãChatGLM2-6B æéå¯¹å¦æ¯ç ç©¶å®å¨å¼æ¾ï¼å¨å¡«åé®å·è¿è¡ç»è®°å**äº¦åè®¸åè´¹åä¸ä½¿ç¨**ã

å¼ç¨

@misc{glm2024chatglm,
      title={ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools}, 
      author={Team GLM and Aohan Zeng and Bin Xu and Bowen Wang and Chenhui Zhang and Da Yin and Diego Rojas and Guanyu Feng and Hanlin Zhao and Hanyu Lai and Hao Yu and Hongning Wang and Jiadai Sun and Jiajie Zhang and Jiale Cheng and Jiayi Gui and Jie Tang and Jing Zhang and Juanzi Li and Lei Zhao and Lindong Wu and Lucen Zhong and Mingdao Liu and Minlie Huang and Peng Zhang and Qinkai Zheng and Rui Lu and Shuaiqi Duan and Shudan Zhang and Shulin Cao and Shuxun Yang and Weng Lam Tam and Wenyi Zhao and Xiao Liu and Xiao Xia and Xiaohan Zhang and Xiaotao Gu and Xin Lv and Xinghan Liu and Xinyi Liu and Xinyue Yang and Xixuan Song and Xunkai Zhang and Yifan An and Yifan Xu and Yilin Niu and Yuantao Yang and Yueyan Li and Yushi Bai and Yuxiao Dong and Zehan Qi and Zhaoyu Wang and Zhen Yang and Zhengxiao Du and Zhenyu Hou and Zihan Wang},
      year={2024},
      eprint={2406.12793},
      archivePrefix={arXiv},
      primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of ChatGLM-6B

Cons of ChatGLM-6B

Code Comparison

Pros of ChatGLM2-6B

Cons of ChatGLM2-6B

Code Comparison

Pros of FlagAI

Cons of FlagAI

Code Comparison

Pros of Qwen

Cons of Qwen

Code Comparison

Pros of Baichuan-7B

Cons of Baichuan-7B

Code Comparison

Pros of GLM-130B

Cons of GLM-130B

Code Comparison

Convert designs to code with AI

README

ChatGLM2-6B

GLM-4 å¼æºæ¨¡ååAPI

ä»ç»

æ´æ°ä¿¡æ¯

åæ é¾æ¥

è¯æµç»æ

MMLU

C-Eval

GSM8K

BBH

æ¨çæ§è½

ChatGLM2-6B ç¤ºä¾

ä½¿ç¨æ¹å¼

ç¯å¢å®è£

ä»£ç è°ç¨

ä»æ¬å°å è½½æ¨¡å

ç½é¡µç Demo

å½ä»¤è¡ Demo

API é¨ç½²

ä½ææ¬é¨ç½²

æ¨¡åéå

CPU é¨ç½²

Mac é¨ç½²

å¤å¡é¨ç½²

åè®®

å¼ç¨

Top Related Projects

Convert designs to code with AI

GLM-4 å¼æºæ¨¡ååAPI

ä»ç»

æ´æ°ä¿¡æ¯

åæé¾æ¥

è¯æµç»æ

æ¨çæ§è½

ChatGLM2-6B ç¤ºä¾

ä½¿ç¨æ¹å¼

ç¯å¢å®è£

ä»£ç è°ç¨

ä»æ¬å°å è½½æ¨¡å

ç½é¡µç Demo

å½ä»¤è¡ Demo

API é¨ç½²

ä½ææ¬é¨ç½²

æ¨¡åéå

CPU é¨ç½²

Mac é¨ç½²

å¤å¡é¨ç½²

åè®®

å¼ç¨