ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

41,110

5,214

41,110

598

View on GitHub

Top Related Projects

ChatGLM-6B

41,113

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

ChatGLM2-6B

15,722

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

FlagAI

3,873

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

gpt-neox

7,276

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

DialoGPT

2,411

Large-scale pretraining for dialogue

Quick Overview

ChatGLM-6B is an open-source, bilingual (Chinese and English) dialogue language model developed by Tsinghua University. It is based on General Language Model (GLM) architecture and has 6 billion parameters. The model aims to provide high-quality responses in both languages and can be deployed on consumer-grade GPUs.

Pros

Bilingual support for Chinese and English
Can run on consumer-grade GPUs with 6GB+ VRAM
Open-source and free to use
Provides good performance in various dialogue tasks

Cons

Limited to 6 billion parameters, which may affect its performance compared to larger models
May require fine-tuning for specific domain applications
Documentation is primarily in Chinese, which could be a barrier for non-Chinese speakers
Still in active development, so may have occasional instability or bugs

Code Examples

# Load the model
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()

# Generate a response
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

# Continue the conversation
response, history = model.chat(tokenizer, "What's the capital of France?", history=history)
print(response)

Getting Started

To get started with ChatGLM-6B, follow these steps:

Install the required dependencies:
```
pip install transformers torch
```

Load the model and tokenizer:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()

Start a conversation:

response, history = model.chat(tokenizer, "Hello, how are you?", history=[])
print(response)

Continue the conversation by passing the history to subsequent calls:

response, history = model.chat(tokenizer, "What's your name?", history=history)
print(response)

Competitor Comparisons

ChatGLM-6B

41,113

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Pros of ChatGLM-6B

Identical repository names make it challenging to identify unique advantages
Both repositories likely contain the same codebase and features

Cons of ChatGLM-6B

Duplicate repositories may lead to confusion for users and contributors
Potential for inconsistent updates or maintenance between the two repositories

Code Comparison

Since both repositories appear to be identical, a code comparison is not applicable. However, here's a hypothetical example of what a code difference might look like if there were any:

# ChatGLM-6B
def process_input(text):
    return text.lower()

# ChatGLM-6B (hypothetical difference)
def process_input(text):
    return text.lower().strip()

In this example, the second repository might have an additional .strip() method to remove leading and trailing whitespace. However, it's important to note that this is purely hypothetical, as the repositories appear to be identical.

Given the identical names and likely identical content, it's recommended to investigate further to determine if there's any actual difference between these repositories or if one is a fork of the other. Users should be cautious when choosing between them and consider factors such as update frequency, community engagement, and official documentation to decide which one to use.

ChatGLM2-6B

15,722

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

Pros of ChatGLM2-6B

Improved performance and efficiency over its predecessor
Enhanced support for long-form content generation
Better handling of context and more coherent responses

Cons of ChatGLM2-6B

May require more computational resources due to increased complexity
Potential compatibility issues with existing integrations built for ChatGLM-6B
Steeper learning curve for developers unfamiliar with the new architecture

Code Comparison

ChatGLM-6B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()

ChatGLM2-6B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()

The code snippets show that the basic usage remains similar between the two versions, with the main difference being the model name in the from_pretrained method. This suggests that transitioning from ChatGLM-6B to ChatGLM2-6B should be relatively straightforward for existing projects, requiring minimal code changes.

FlagAI

3,873

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

Pros of FlagAI

Broader scope: FlagAI is a comprehensive AI toolkit supporting various tasks beyond language models
More extensive documentation and examples for different AI applications
Active community and regular updates

Cons of FlagAI

Steeper learning curve due to its broader scope and more complex architecture
Potentially slower inference for specific language tasks compared to ChatGLM-6B

Code Comparison

ChatGLM-6B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello", history=[])

FlagAI:

from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor

loader = AutoLoader("seq2seq", "THUDM/chatglm-6b")
model = loader.get_model()
tokenizer = loader.get_tokenizer()
predictor = Predictor(model, tokenizer)
response = predictor.predict(["Hello"])

Both repositories provide easy-to-use interfaces for loading and using pre-trained language models. ChatGLM-6B focuses specifically on the ChatGLM model, while FlagAI offers a more generalized approach to working with various AI models and tasks.

gpt-neox

7,276

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Pros of GPT-NeoX

Larger model size (20B parameters) potentially offering higher performance
More extensive documentation and community support
Designed for distributed training across multiple GPUs

Cons of GPT-NeoX

Higher computational requirements for training and inference
Less optimized for Chinese language tasks
More complex setup and configuration process

Code Comparison

GPT-NeoX:

from transformers import GPTNeoXForCausalLM, GPTNeoXTokenizerFast

model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b")
tokenizer = GPTNeoXTokenizerFast.from_pretrained("EleutherAI/gpt-neox-20b")

ChatGLM-6B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()

The code snippets show that GPT-NeoX uses specific model and tokenizer classes, while ChatGLM-6B uses more generic AutoTokenizer and AutoModel classes. ChatGLM-6B also includes additional parameters for remote code trust and GPU optimization.

DialoGPT

2,411

Large-scale pretraining for dialogue

Pros of DialoGPT

More extensive documentation and examples for implementation
Larger community support and contributions
Pre-trained on a diverse range of conversational data

Cons of DialoGPT

Less focus on multilingual capabilities
May require more fine-tuning for specific use cases
Potentially higher computational requirements

Code Comparison

DialoGPT:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

ChatGLM-6B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)

Both repositories provide pre-trained language models for conversational AI. DialoGPT offers more extensive documentation and community support, making it easier for developers to implement and customize. However, ChatGLM-6B has a stronger focus on multilingual capabilities, particularly for Chinese language processing.

The code comparison shows that both models can be loaded using the Transformers library, with slight differences in the model class and additional parameters for ChatGLM-6B. DialoGPT uses AutoModelForCausalLM, while ChatGLM-6B uses AutoModel with the trust_remote_code=True parameter.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ChatGLM-6B

ð Blog â¢ ð¤ HF Repo â¢ ð¦ Twitter â¢ ð Report

ð å å¥æä»¬ç Discord å WeChat

Read this in English.

GLM-4 å¼æºæ¨¡ååAPI

GLM-4 å¼æºæ¨¡å æä»¬å·²ç»å¼æºäº GLM-4-9B ç³»åæ¨¡åï¼å¨åé¡¹ææ çceæ¯ä¸æææ¾æåï¼æ¬¢è¿å°è¯ã
æºè°±æ¸è¨ ä½éªææ°ç GLM-4ï¼åæ¬ GLMsï¼All toolsçåè½ã
APIå¹³å° æ°ä¸ä»£ API å¹³å°å·²ç»ä¸çº¿ï¼æ¨å¯ä»¥ç´æ¥å¨ API å¹³å°ä¸ä½éª GLM-4-0520ãGLM-4-airãGLM-4-airxãGLM-4-flashãGLM-4ãGLM-3-TurboãCharacterGLM-3ï¼CogView-3 çæ°æ¨¡åã å¶ä¸GLM-4ãGLM-3-Turboä¸¤ä¸ªæ¨¡åæ¯æäº System PromptãFunction Callã RetrievalãWeb_Searchçæ°åè½ï¼æ¬¢è¿ä½éªã
GLM-4 API å¼æºæç¨ GLM-4 APIæç¨ååºç¡åºç¨ï¼æ¬¢è¿å°è¯ã APIç¸å³é®é¢å¯ä»¥å¨æ¬å¼æºæç¨çé®ï¼æèä½¿ç¨ GLM-4 API AIå©æ æ¥è·å¾å¸¸è§é®é¢çå¸®å©ã

ä»ç»

æ´æ°ä¿¡æ¯

[2023/07/25] åå¸ CodeGeeX2 ï¼åºäº ChatGLM2-6B çä»£ç çææ¨¡åï¼ä»£ç è½åå¨é¢æåï¼æ´å¤ç¹æ§åæ¬ï¼

æ´å¼ºå¤§çä»£ç è½åï¼CodeGeeX2-6B è¿ä¸æ¥ç»è¿äº 600B ä»£ç æ°æ®é¢è®ç»ï¼ç¸æ¯ CodeGeeX ä¸ä»£æ¨¡åï¼å¨ä»£ç è½åä¸å¨é¢æåï¼HumanEval-X è¯æµéçåç§ç¼ç¨è¯è¨åå¤§å¹æå (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321%)ï¼å¨Pythonä¸è¾¾å° 35.9% ç Pass@1 ä¸æ¬¡éè¿çï¼è¶è¶è§æ¨¡æ´å¤§ç StarCoder-15Bã
**æ´ä¼ç§çæ¨¡åç¹æ§**ï¼ç»§æ¿ ChatGLM2-6B æ¨¡åç¹æ§ï¼CodeGeeX2-6B æ´å¥½æ¯æä¸è±æè¾å¥ï¼æ¯ææå¤§ 8192 åºåé¿åº¦ï¼æ¨çéåº¦è¾ä¸ä»£ å¤§å¹æåï¼éååä»é6GBæ¾åå³å¯è¿è¡ï¼æ¯æè½»éçº§æ¬å°åé¨ç½²ã
æ´å¨é¢çAIç¼ç¨å©æï¼CodeGeeXæä»¶ï¼VS Code, Jetbrainsï¼åç«¯åçº§ï¼æ¯æè¶è¿100ç§ç¼ç¨è¯è¨ï¼æ°å¢ä¸ä¸æè¡¥å¨ãè·¨æä»¶è¡¥å¨çå®ç¨åè½ãç»å Ask CodeGeeX äº¤äºå¼AIç¼ç¨å©æï¼æ¯æä¸è±æå¯¹è¯è§£å³åç§ç¼ç¨é®é¢ï¼åæ¬ä¸ä¸éäºä»£ç è§£éãä»£ç ç¿»è¯ãä»£ç çº éãææ¡£çæçï¼å¸®å©ç¨åºåæ´é«æå¼åã

æ´å¼ºå¤§çæ§è½ï¼åºäº ChatGLM åä»£æ¨¡åçå¼åç»éªï¼æä»¬å¨é¢åçº§äº ChatGLM2-6B çåºåº§æ¨¡åãChatGLM2-6B ä½¿ç¨äº GLM çæ··åç®æ å½æ°ï¼ç»è¿äº 1.4T ä¸è±æ è¯ç¬¦çé¢è®ç»ä¸äººç±»åå¥½å¯¹é½è®ç»ï¼è¯æµç»ææ¾ç¤ºï¼ç¸æ¯äºåä»£æ¨¡åï¼ChatGLM2-6B å¨ MMLUï¼+23%ï¼ãCEvalï¼+33%ï¼ãGSM8Kï¼+571%ï¼ ãBBHï¼+60%ï¼çæ°æ®éä¸çæ§è½åå¾äºå¤§å¹åº¦çæåï¼å¨åå°ºå¯¸å¼æºæ¨¡åä¸å·æè¾å¼ºçç«äºåã
æ´é¿çä¸ä¸æï¼åºäº FlashAttention ææ¯ï¼æä»¬å°åºåº§æ¨¡åçä¸ä¸æé¿åº¦ï¼Context Lengthï¼ç± ChatGLM-6B ç 2K æ©å±å°äº 32Kï¼å¹¶å¨å¯¹è¯é¶æ®µä½¿ç¨ 8K çä¸ä¸æé¿åº¦è®ç»ï¼åè®¸æ´å¤è½®æ¬¡çå¯¹è¯ãä½å½åçæ¬ç ChatGLM2-6B å¯¹åè½®è¶é¿ææ¡£ççè§£è½åæéï¼æä»¬ä¼å¨åç»è¿ä»£åçº§ä¸çéè¿è¡ä¼åã
æ´é«æçæ¨çï¼åºäº Multi-Query Attention ææ¯ï¼ChatGLM2-6B ææ´é«æçæ¨çéåº¦åæ´ä½çæ¾åå ç¨ï¼å¨å®æ¹çæ¨¡åå®ç°ä¸ï¼æ¨çéåº¦ç¸æ¯åä»£æåäº 42%ï¼INT4 éåä¸ï¼6G æ¾åæ¯æçå¯¹è¯é¿åº¦ç± 1K æåå°äº 8Kã

æ´å¤ä¿¡æ¯åè§ ChatGLM2-6Bã

[2023/06/14] åå¸ WebGLMï¼ä¸é¡¹è¢«æ¥åäºKDD 2023çç ç©¶å·¥ä½ï¼æ¯æå©ç¨ç½ç»ä¿¡æ¯çæå¸¦æåç¡®å¼ç¨çé¿åçã

[2023/05/17] åå¸ VisualGLM-6Bï¼ä¸ä¸ªæ¯æå¾åçè§£çå¤æ¨¡æå¯¹è¯è¯è¨æ¨¡åã

å¯ä»¥éè¿æ¬ä»åºä¸ç cli_demo_vision.py å web_demo_vision.py æ¥è¿è¡å½ä»¤è¡åç½é¡µ Demoãæ³¨æ VisualGLM-6B éè¦é¢å¤å®è£ SwissArmyTransformer å torchvisionãæ´å¤ä¿¡æ¯åè§ VisualGLM-6Bã

ä»¥ä¸æ¯æ´æ°ååçè±æé®é¢å¯¹æ¯ï¼

é®é¢ï¼Describe a time when you had to make a difficult decision.
- v1.0:
- v1.1:
é®é¢ï¼Describe the function of a computer motherboard
- v1.0:
- v1.1:
é®é¢ï¼Develop a plan to reduce electricity usage in a home.
- v1.0:
- v1.1:
é®é¢ï¼æªæ¥çNFTï¼å¯è½çå®å®ä¹ä¸ç§ç°å®çèµäº§ï¼å®ä¼æ¯ä¸å¤æ¿äº§ï¼ä¸è¾æ±½è½¦ï¼ä¸çåå°ççï¼è¿æ ·çæ°ååè¯å¯è½æ¯çå®çä¸è¥¿æ´æä»·å¼ï¼ä½ å¯ä»¥éæ¶äº¤æåä½¿ç¨ï¼å¨èæåç°å®ä¸æ ç¼çè®©æ¥æçèµäº§ç»§ç»åé ä»·å¼ï¼æªæ¥ä¼æ¯ä¸ç©å½ææç¨ï¼ä½ä¸å½æææçæ¶ä»£ãç¿»è¯æä¸ä¸çè±è¯
- v1.0:
- v1.1:

æ´å¤æ´æ°ä¿¡æ¯åè§ UPDATE.md

åæé¾æ¥

å¯¹ ChatGLM è¿è¡å éçå¼æºé¡¹ç®ï¼

lyraChatGLM: å¯¹ ChatGLM-6B è¿è¡æ¨çå éï¼æé«å¯ä»¥å®ç° 9000+ tokens/s çæ¨çéåº¦
ChatGLM-MNN: ä¸ä¸ªåºäº MNN ç ChatGLM-6B C++ æ¨çå®ç°ï¼æ¯ææ ¹æ®æ¾åå¤§å°èªå¨åéè®¡ç®ä»»å¡ç» GPU å CPU
JittorLLMsï¼æä½3Gæ¾åæèæ²¡ææ¾å¡é½å¯è¿è¡ ChatGLM-6B FP16ï¼ æ¯æLinuxãwindowsãMacé¨ç½²
InferLLMï¼è½»éçº§ C++ æ¨çï¼å¯ä»¥å®ç°æ¬å° x86ï¼Arm å¤çå¨ä¸å®æ¶èå¤©ï¼ææºä¸ä¹åæ ·å¯ä»¥å®æ¶è¿è¡ï¼è¿è¡åååªéè¦ 4G

åºäºæä½¿ç¨äº ChatGLM-6B çå¼æºé¡¹ç®ï¼

langchain-ChatGLMï¼åºäº langchain ç ChatGLM åºç¨ï¼å®ç°åºäºå¯æ©å±ç¥è¯åºçé®ç
é»è¾¾ï¼å¤§åè¯è¨æ¨¡åè°ç¨å¹³å°ï¼åºäº ChatGLM-6B å®ç°äºç±» ChatPDF åè½
glm-botï¼å°ChatGLMæ¥å¥Koishiå¯å¨åå¤§èå¤©å¹³å°ä¸è°ç¨ChatGLM
Chuanhu Chat: ä¸ºåä¸ªå¤§è¯è¨æ¨¡ååå¨çº¿æ¨¡åAPIæä¾ç¾è§æç¨ãåè½ä¸°å¯ãå¿«éé¨ç½²çç¨æ·çé¢ï¼æ¯æChatGLM-6Bã

æ¯æ ChatGLM-6B åç¸å³åºç¨å¨çº¿è®ç»çç¤ºä¾é¡¹ç®ï¼

ç¬¬ä¸æ¹è¯æµï¼

Measuring Massive Multitask Chinese Understanding

æ´å¤å¼æºé¡¹ç®åè§ PROJECT.md

ä½¿ç¨æ¹å¼

ç¡¬ä»¶éæ±

éåççº§	æä½ GPU æ¾åï¼æ¨çï¼	æä½ GPU æ¾åï¼é«æåæ°å¾®è°ï¼
FP16ï¼æ éåï¼	13 GB	14 GB
INT8	8 GB	9 GB
INT4	6 GB	7 GB

ç¯å¢å®è£

ä½¿ç¨ pip å®è£ä¾èµï¼pip install -r requirements.txtï¼å¶ä¸ transformers åºçæ¬æ¨èä¸º 4.27.1ï¼ä½çè®ºä¸ä¸ä½äº 4.23.1 å³å¯ã

æ¤å¤ï¼å¦æéè¦å¨ cpu ä¸è¿è¡éååçæ¨¡åï¼è¿éè¦å®è£ gcc ä¸ openmpãå¤æ° Linux åè¡çé»è®¤å·²å®è£ãå¯¹äº Windows ï¼å¯å¨å®è£ TDM-GCC æ¶å¾é openmpã Windows æµè¯ç¯å¢ gcc çæ¬ä¸º TDM-GCC 10.3.0ï¼ Linux ä¸º gcc 11.3.0ãå¨ MacOS ä¸è¯·åè Q1ã

ä»£ç è°ç¨

å¯ä»¥éè¿å¦ä¸ä»£ç è°ç¨ ChatGLM-6B æ¨¡åæ¥çæå¯¹è¯ï¼

>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
>>> model = model.eval()
>>> response, history = model.chat(tokenizer, "ä½ å¥½", history=[])
>>> print(response)
ä½ å¥½ð!ææ¯äººå·¥æºè½å©æ ChatGLM-6B,å¾é«å´è§å°ä½ ,æ¬¢è¿é®æä»»ä½é®é¢ã
>>> response, history = model.chat(tokenizer, "æä¸ç¡ä¸çåºè¯¥æä¹å", history=history)
>>> print(response)
æä¸ç¡ä¸çå¯è½ä¼è®©ä½ æå°ç¦èæä¸èæ,ä½ä»¥ä¸æ¯ä¸äºå¯ä»¥å¸®å©ä½ å¥ç¡çæ¹æ³:

1. å¶å®è§å¾çç¡ç æ¶é´è¡¨:ä¿æè§å¾çç¡ç æ¶é´è¡¨å¯ä»¥å¸®å©ä½ å»ºç«å¥åº·çç¡ç ä¹ æ¯,ä½¿ä½ æ´å®¹æå¥ç¡ãå°½éå¨æ¯å¤©çç¸åæ¶é´ä¸åº,å¹¶å¨åä¸æ¶é´èµ·åºã
2. åé ä¸ä¸ªèéçç¡ç ç¯å¢:ç¡®ä¿ç¡ç ç¯å¢èé,å®é,é»æä¸æ¸©åº¦éå®ãå¯ä»¥ä½¿ç¨èéçåºä¸ç¨å,å¹¶ä¿ææ¿é´éé£ã
3. æ¾æ¾èº«å¿:å¨ç¡ååäºæ¾æ¾çæ´»å¨,ä¾å¦æ³¡ä¸ªçæ°´æ¾¡,å¬äºè½»æçé³ä¹,éè¯»ä¸äºæè¶£çä¹¦ç±ç,æå©äºç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå¥ç¡ã
4. é¿åé¥®ç¨å«æåå¡å çé¥®æ:åå¡å æ¯ä¸ç§åºæ¿æ§ç©è´¨,ä¼å½±åä½ çç¡ç è´¨éãå°½éé¿åå¨ç¡åé¥®ç¨å«æåå¡å çé¥®æ,ä¾å¦åå¡,è¶åå¯ä¹ã
5. é¿åå¨åºä¸åä¸ç¡ç æ å³çäºæ:å¨åºä¸åäºä¸ç¡ç æ å³çäºæ,ä¾å¦ççµå½±,ç©æ¸¸ææå·¥ä½ç,å¯è½ä¼å¹²æ°ä½ çç¡ç ã
6. å°è¯å¼å¸æå·§:æ·±å¼å¸æ¯ä¸ç§æ¾æ¾æå·§,å¯ä»¥å¸®å©ä½ ç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå¥ç¡ãè¯çæ¢æ¢å¸æ°,ä¿æå ç§é,ç¶åç¼æ¢å¼æ°ã

å¦æè¿äºæ¹æ³æ æ³å¸®å©ä½ å¥ç¡,ä½ å¯ä»¥èèå¨è¯¢å»çæç¡ç ä¸å®¶,å¯»æ±è¿ä¸æ¥çå»ºè®®ã

ä»æ¬å°å è½½æ¨¡å

ä» Hugging Face Hub ä¸è½½æ¨¡åéè¦åå®è£Git LFSï¼ç¶åè¿è¡

git clone https://huggingface.co/THUDM/chatglm-6b

å¦æä½ ä» Hugging Face Hub ä¸ä¸è½½ checkpoint çéåº¦è¾æ¢ï¼å¯ä»¥åªä¸è½½æ¨¡åå®ç°

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm-6b

git checkout v1.1.0

Demo & API

git clone https://github.com/THUDM/ChatGLM-6B
cd ChatGLM-6B

ç½é¡µç Demo

web-demo

é¦åå®è£ Gradioï¼pip install gradioï¼ç¶åè¿è¡ä»åºä¸ç web_demo.pyï¼

python web_demo.py

æè°¢ @AdamBear å®ç°äºåºäº Streamlit çç½é¡µç Demoï¼è¿è¡æ¹å¼è§#117.

å½ä»¤è¡ Demo

cli-demo

è¿è¡ä»åºä¸ cli_demo.pyï¼

python cli_demo.py

APIé¨ç½²

é¦åéè¦å®è£é¢å¤çä¾èµ pip install fastapi uvicornï¼ç¶åè¿è¡ä»åºä¸ç api.pyï¼

python api.py

curl -X POST "http://127.0.0.1:8000" \
     -H 'Content-Type: application/json' \
     -d '{"prompt": "ä½ å¥½", "history": []}'

å¾å°çè¿åå¼ä¸º

{
  "response":"ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM-6Bï¼å¾é«å´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã",
  "history":[["ä½ å¥½","ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM-6Bï¼å¾é«å´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã"]],
  "status":200,
  "time":"2023-03-23 21:38:40"
}

ä½ææ¬é¨ç½²

æ¨¡åéå

# æéä¿®æ¹ï¼ç®ååªæ¯æ 4/8 bit éå
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).quantize(8).half().cuda()

éåè¿ç¨éè¦å¨ååä¸é¦åå è½½ FP16 æ ¼å¼çæ¨¡åï¼æ¶èå¤§æ¦ 13GB çååãå¦æä½ çååä¸è¶³çè¯ï¼å¯ä»¥ç´æ¥å è½½éååçæ¨¡åï¼INT4 éååçæ¨¡åä»éå¤§æ¦ 5.2GB çååï¼

# INT8 éåçæ¨¡åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).half().cuda()

éåæ¨¡åçåæ°æä»¶ä¹å¯ä»¥ä»è¿éæå¨ä¸è½½ã

CPU é¨ç½²

model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).float()

å¦æä½ çååä¸è¶³ï¼å¯ä»¥ç´æ¥å è½½éååçæ¨¡åï¼

# INT8 éåçæ¨¡åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()

å¦æéå°äºæ¥é Could not find module 'nvcuda.dll' æè RuntimeError: Unknown platform: darwin (MacOS) ï¼è¯·ä»æ¬å°å è½½æ¨¡å

Mac é¨ç½²

å¯¹äºæè½½äº Apple Silicon æè AMD GPU çMacï¼å¯ä»¥ä½¿ç¨ MPS åç«¯æ¥å¨ GPU ä¸è¿è¡ ChatGLM-6Bãéè¦åè Apple ç å®æ¹è¯´æ å®è£ PyTorch-Nightlyï¼æ£ç¡®ççæ¬å·åºè¯¥æ¯2.1.0.dev2023xxxxï¼èä¸æ¯2.0.0ï¼ã

model = AutoModel.from_pretrained("your local path", trust_remote_code=True).half().to('mps')

å è½½åç²¾åº¦ç ChatGLM-6B æ¨¡åéè¦å¤§æ¦ 13GB ååãååè¾å°çæºå¨ï¼æ¯å¦ 16GB ååç MacBook Proï¼ï¼å¨ç©ºä½ååä¸è¶³çæåµä¸ä¼ä½¿ç¨ç¡¬çä¸çèæååï¼å¯¼è´æ¨çéåº¦ä¸¥éåæ¢ãæ¤æ¶å¯ä»¥ä½¿ç¨éååçæ¨¡åå¦ chatglm-6b-int4ãå ä¸º GPU ä¸éåç kernel æ¯ä½¿ç¨ CUDA ç¼åçï¼å æ¤æ æ³å¨ MacOS ä¸ä½¿ç¨ï¼åªè½ä½¿ç¨ CPU è¿è¡æ¨çã

# INT8 éåçæ¨¡åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()

ä¸ºäºååä½¿ç¨ CPU å¹¶è¡ï¼è¿éè¦åç¬å®è£ OpenMPã

å¤å¡é¨ç½²

from utils import load_model_on_gpus
model = load_model_on_gpus("THUDM/chatglm-6b", num_gpus=2)

é«æåæ°å¾®è°

åºäº P-tuning v2 çé«æåæ°å¾®è°ãå·ä½ä½¿ç¨æ¹æ³è¯¦è§ ptuning/README.mdã

ChatGLM-6B ç¤ºä¾

èªæè®¤ç¥

æçº²åä½

ææ¡åä½

ä¿¡æ¯æ½å

è§è²æ®æ¼

è¯è®ºæ¯è¾

ææ¸¸åå¯¼

å±éæ§

æ¨¡åå®¹éè¾å°ï¼6B çå°å®¹éï¼å³å®äºå¶ç¸å¯¹è¾å¼±çæ¨¡åè®°å¿åè¯è¨è½åãå¨é¢å¯¹è®¸å¤äºå®æ§ç¥è¯ä»»å¡æ¶ï¼ChatGLM-6B å¯è½ä¼çæä¸æ£ç¡®çä¿¡æ¯ï¼å®ä¹ä¸æé¿é»è¾ç±»é®é¢ï¼å¦æ°å¦ãç¼ç¨ï¼çè§£çã

ç¹å»æ¥çä¾å
äº§çæå®³è¯´æææåè§çåå®¹ï¼ChatGLM-6B åªæ¯ä¸ä¸ªåæ¥ä¸äººç±»æå¾å¯¹é½çè¯è¨æ¨¡åï¼å¯è½ä¼çææå®³ãæåè§çåå®¹ãï¼åå®¹å¯è½å·æåç¯æ§ï¼æ¤å¤ä¸å±ç¤ºï¼
è±æè½åä¸è¶³ï¼ChatGLM-6B è®ç»æ¶ä½¿ç¨çæç¤º/åçå¤§é¨åé½æ¯ä¸æçï¼ä»ææå°ä¸é¨åè±æåå®¹ãå æ¤ï¼å¦æè¾å¥è±ææç¤ºï¼åå¤çè´¨éè¿ä¸å¦ä¸æï¼çè³ä¸ä¸ææç¤ºä¸çåå®¹çç¾ï¼å¹¶ä¸åºç°ä¸è±å¤¹æçæåµã
æè¢«è¯¯å¯¼ï¼å¯¹è¯è½åè¾å¼±ï¼ChatGLM-6B å¯¹è¯è½åè¿æ¯è¾å¼±ï¼èä¸ âèªæè®¤ç¥â åå¨é®é¢ï¼å¹¶å¾å®¹æè¢«è¯¯å¯¼å¹¶äº§çéè¯¯çè¨è®ºãä¾å¦å½åçæ¬çæ¨¡åå¨è¢«è¯¯å¯¼çæåµä¸ï¼ä¼å¨èªæè®¤ç¥ä¸åçåå·®ã

ç¹å»æ¥çä¾å

åè®®

æ¬ä»åºçä»£ç ä¾ç§ Apache-2.0 åè®®å¼æºï¼ChatGLM-6B æ¨¡åçæéçä½¿ç¨åéè¦éµå¾ª Model LicenseãChatGLM-6B æéå¯¹å¦æ¯ç ç©¶å®å¨å¼æ¾ï¼å¨å¡«åé®å·è¿è¡ç»è®°å**äº¦åè®¸åè´¹åä¸ä½¿ç¨**ã

å¼ç¨

@misc{glm2024chatglm,
      title={ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools}, 
      author={Team GLM and Aohan Zeng and Bin Xu and Bowen Wang and Chenhui Zhang and Da Yin and Diego Rojas and Guanyu Feng and Hanlin Zhao and Hanyu Lai and Hao Yu and Hongning Wang and Jiadai Sun and Jiajie Zhang and Jiale Cheng and Jiayi Gui and Jie Tang and Jing Zhang and Juanzi Li and Lei Zhao and Lindong Wu and Lucen Zhong and Mingdao Liu and Minlie Huang and Peng Zhang and Qinkai Zheng and Rui Lu and Shuaiqi Duan and Shudan Zhang and Shulin Cao and Shuxun Yang and Weng Lam Tam and Wenyi Zhao and Xiao Liu and Xiao Xia and Xiaohan Zhang and Xiaotao Gu and Xin Lv and Xinghan Liu and Xinyi Liu and Xinyue Yang and Xixuan Song and Xunkai Zhang and Yifan An and Yifan Xu and Yilin Niu and Yuantao Yang and Yueyan Li and Yushi Bai and Yuxiao Dong and Zehan Qi and Zhaoyu Wang and Zhen Yang and Zhengxiao Du and Zhenyu Hou and Zihan Wang},
      year={2024},
      eprint={2406.12793},
      archivePrefix={arXiv},
      primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of ChatGLM-6B

Cons of ChatGLM-6B

Code Comparison

Pros of ChatGLM2-6B

Cons of ChatGLM2-6B

Code Comparison

Pros of FlagAI

Cons of FlagAI

Code Comparison

Pros of GPT-NeoX

Cons of GPT-NeoX

Code Comparison

Pros of DialoGPT

Cons of DialoGPT

Code Comparison

Convert designs to code with AI

README

ChatGLM-6B

GLM-4 å¼æºæ¨¡ååAPI

ä»ç»

æ´æ°ä¿¡æ¯

åæ é¾æ¥

ä½¿ç¨æ¹å¼

ç¡¬ä»¶éæ±

ç¯å¢å®è£

ä»£ç è°ç¨

ä»æ¬å°å è½½æ¨¡å

Demo & API

ç½é¡µç Demo

å½ä»¤è¡ Demo

APIé¨ç½²

ä½ææ¬é¨ç½²

æ¨¡åéå

CPU é¨ç½²

Mac é¨ç½²

å¤å¡é¨ç½²

é«æåæ°å¾®è°

ChatGLM-6B ç¤ºä¾

å±éæ§

åè®®

å¼ç¨

Top Related Projects

Convert designs to code with AI

GLM-4 å¼æºæ¨¡ååAPI

ä»ç»

æ´æ°ä¿¡æ¯

åæé¾æ¥

ä½¿ç¨æ¹å¼

ç¡¬ä»¶éæ±

ç¯å¢å®è£

ä»£ç è°ç¨

ä»æ¬å°å è½½æ¨¡å

ç½é¡µç Demo

å½ä»¤è¡ Demo

APIé¨ç½²

ä½ææ¬é¨ç½²

æ¨¡åéå

CPU é¨ç½²

Mac é¨ç½²

å¤å¡é¨ç½²

é«æåæ°å¾®è°

ChatGLM-6B ç¤ºä¾

å±éæ§

åè®®

å¼ç¨