ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

41,113

5,218

41,113

598

View on GitHub

Top Related Projects

whisper

85,961

Robust Speech Recognition via Large-Scale Weak Supervision

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

bert

39,267

TensorFlow code and pre-trained models for BERT

minGPT

22,376

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Quick Overview

ChatGLM-6B is an open-source, bilingual (Chinese and English) dialogue language model developed by Tsinghua University. It is based on General Language Model (GLM) architecture and has 6 billion parameters. The model is designed to engage in human-like conversations and can be deployed on consumer-grade graphics cards.

Pros

Bilingual support for Chinese and English
Can run on consumer-grade GPUs with as little as 6GB of VRAM
Open-source and freely available for research and commercial use
Supports efficient inference with low latency

Cons

Limited to 6 billion parameters, which may affect performance compared to larger models
May require fine-tuning for specific domain applications
Primarily focused on Chinese and English, limiting its use for other languages
Potential biases and limitations inherent in large language models

Code Examples

Loading the model and tokenizer:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()

Generating a response:

response, history = model.chat(tokenizer, "你好，请介绍一下你自己。", history=[])
print(response)

Streaming the generated response:

for response, history in model.stream_chat(tokenizer, "请解释一下人工智能的概念。", history=[]):
    print(response, end="", flush=True)

Quantizing the model for lower memory usage:

model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).quantize(4).half().cuda()

Getting Started

To get started with ChatGLM-6B, follow these steps:

Install the required dependencies:

pip install transformers torch

Load the model and tokenizer:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()

Start a conversation:

response, history = model.chat(tokenizer, "你好！", history=[])
print(response)

Competitor Comparisons

whisper

85,961

Robust Speech Recognition via Large-Scale Weak Supervision

Pros of Whisper

Specialized for speech recognition and transcription tasks
Supports multiple languages and can perform translation
Well-documented and extensively tested on diverse audio datasets

Cons of Whisper

Limited to audio processing, not a general-purpose language model
Requires more computational resources for real-time transcription

Code Comparison

Whisper:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

ChatGLM-6B:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello", history=[])
print(response)

Key Differences

Whisper focuses on speech-to-text tasks, while ChatGLM-6B is a general-purpose language model
Whisper is designed for audio processing, whereas ChatGLM-6B excels in text-based interactions
ChatGLM-6B offers more flexibility for various NLP tasks, but Whisper provides specialized audio transcription capabilities

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

Highly scalable and efficient for training large language models
Supports a wide range of optimization techniques and hardware configurations
Integrates well with popular deep learning frameworks like PyTorch

Cons of DeepSpeed

Steeper learning curve for beginners due to its complexity
Requires more setup and configuration compared to ChatGLM-6B
May be overkill for smaller models or simpler training tasks

Code Comparison

ChatGLM-6B example:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

DeepSpeed example:

import deepspeed
import torch

model = MyModel()
engine = deepspeed.initialize(model=model, config_params=ds_config)
output = engine(input_data)

The ChatGLM-6B code focuses on easy model loading and inference, while the DeepSpeed code emphasizes initialization and integration with custom models for optimized training.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

Broader scope: Supports a wide range of NLP tasks and models
Extensive documentation and community support
Regular updates and contributions from the open-source community

Cons of transformers

Larger codebase, potentially more complex to navigate
May require more setup and configuration for specific tasks

Code comparison

ChatGLM-6B:

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello", history=[])

transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs)

The code comparison shows that ChatGLM-6B is more focused on chat-based interactions, while transformers provides a more general approach to working with language models. transformers offers greater flexibility in model selection and task-specific implementations, but may require more setup for specialized use cases.

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

More comprehensive and versatile toolkit for sequence modeling
Extensive documentation and community support
Supports a wider range of tasks and architectures

Cons of fairseq

Steeper learning curve due to its complexity
Potentially higher computational requirements
Less focused on specific chat-based applications

Code Comparison

fairseq:

from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('/path/to/model', checkpoint_file='model.pt')
translations = model.translate(['Hello world!'])

ChatGLM-6B:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello world!", history=[])

The code comparison shows that fairseq requires more setup for specific tasks, while ChatGLM-6B provides a more straightforward interface for chat-based interactions. fairseq's code demonstrates its flexibility for various sequence modeling tasks, whereas ChatGLM-6B's code is tailored for conversational AI applications.

bert

39,267

TensorFlow code and pre-trained models for BERT

Pros of BERT

Well-established and widely adopted in the NLP community
Extensive documentation and pre-trained models available
Suitable for a variety of NLP tasks with minimal fine-tuning

Cons of BERT

Smaller model size (110M parameters) compared to ChatGLM-6B (6B parameters)
Less advanced in generating human-like responses for open-ended tasks
May require more task-specific fine-tuning for optimal performance

Code Comparison

BERT example:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

ChatGLM-6B example:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)

Both repositories provide pre-trained models and tokenizers, but ChatGLM-6B requires the trust_remote_code=True parameter due to its custom implementation. BERT offers a more straightforward setup, while ChatGLM-6B provides a larger, more advanced model for complex language tasks.

minGPT

22,376

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Pros of minGPT

Lightweight and easy to understand implementation of GPT
Excellent educational resource for learning about transformer architecture
Highly customizable and adaptable for various tasks

Cons of minGPT

Limited scale compared to ChatGLM-6B (6B parameters)
Lacks multilingual support and advanced features of ChatGLM-6B
Not optimized for production-level performance

Code Comparison

minGPT:

class GPT(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.tok_emb = nn.Embedding(config.vocab_size, config.n_embd)
        self.pos_emb = nn.Parameter(torch.zeros(1, config.block_size, config.n_embd))
        self.drop = nn.Dropout(config.embd_pdrop)

ChatGLM-6B:

class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel):
    def __init__(self, config: ChatGLMConfig):
        super().__init__(config)
        self.transformer = ChatGLMModel(config)
        self.config = config
        self.quantized = False

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ChatGLM-6B

ð Blog â¢ ð¤ HF Repo â¢ ð¦ Twitter â¢ ð Report

ð å å¥æä»¬ç Discord å WeChat

Read this in English.

GLM-4 å¼æºæ¨¡ååAPI

GLM-4 å¼æºæ¨¡å æä»¬å·²ç»å¼æºäº GLM-4-9B ç³»åæ¨¡åï¼å¨åé¡¹ææ çceæ¯ä¸æææ¾æåï¼æ¬¢è¿å°è¯ã
æºè°±æ¸è¨ ä½éªææ°ç GLM-4ï¼åæ¬ GLMsï¼All toolsçåè½ã
APIå¹³å° æ°ä¸ä»£ API å¹³å°å·²ç»ä¸çº¿ï¼æ¨å¯ä»¥ç´æ¥å¨ API å¹³å°ä¸ä½éª GLM-4-0520ãGLM-4-airãGLM-4-airxãGLM-4-flashãGLM-4ãGLM-3-TurboãCharacterGLM-3ï¼CogView-3 çæ°æ¨¡åã å¶ä¸GLM-4ãGLM-3-Turboä¸¤ä¸ªæ¨¡åæ¯æäº System PromptãFunction Callã RetrievalãWeb_Searchçæ°åè½ï¼æ¬¢è¿ä½éªã
GLM-4 API å¼æºæç¨ GLM-4 APIæç¨ååºç¡åºç¨ï¼æ¬¢è¿å°è¯ã APIç¸å³é®é¢å¯ä»¥å¨æ¬å¼æºæç¨çé®ï¼æèä½¿ç¨ GLM-4 API AIå©æ æ¥è·å¾å¸¸è§é®é¢çå¸®å©ã

ä»ç»

æ´æ°ä¿¡æ¯

[2023/07/25] åå¸ CodeGeeX2 ï¼åºäº ChatGLM2-6B çä»£ç çææ¨¡åï¼ä»£ç è½åå¨é¢æåï¼æ´å¤ç¹æ§åæ¬ï¼

æ´å¼ºå¤§çä»£ç è½åï¼CodeGeeX2-6B è¿ä¸æ¥ç»è¿äº 600B ä»£ç æ°æ®é¢è®ç»ï¼ç¸æ¯ CodeGeeX ä¸ä»£æ¨¡åï¼å¨ä»£ç è½åä¸å¨é¢æåï¼HumanEval-X è¯æµéçåç§ç¼ç¨è¯è¨åå¤§å¹æå (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321%)ï¼å¨Pythonä¸è¾¾å° 35.9% ç Pass@1 ä¸æ¬¡éè¿çï¼è¶è¶è§æ¨¡æ´å¤§ç StarCoder-15Bã
**æ´ä¼ç§çæ¨¡åç¹æ§**ï¼ç»§æ¿ ChatGLM2-6B æ¨¡åç¹æ§ï¼CodeGeeX2-6B æ´å¥½æ¯æä¸è±æè¾å¥ï¼æ¯ææå¤§ 8192 åºåé¿åº¦ï¼æ¨çéåº¦è¾ä¸ä»£ å¤§å¹æåï¼éååä»é6GBæ¾åå³å¯è¿è¡ï¼æ¯æè½»éçº§æ¬å°åé¨ç½²ã
æ´å¨é¢çAIç¼ç¨å©æï¼CodeGeeXæä»¶ï¼VS Code, Jetbrainsï¼åç«¯åçº§ï¼æ¯æè¶è¿100ç§ç¼ç¨è¯è¨ï¼æ°å¢ä¸ä¸æè¡¥å¨ãè·¨æä»¶è¡¥å¨çå®ç¨åè½ãç»å Ask CodeGeeX äº¤äºå¼AIç¼ç¨å©æï¼æ¯æä¸è±æå¯¹è¯è§£å³åç§ç¼ç¨é®é¢ï¼åæ¬ä¸ä¸éäºä»£ç è§£éãä»£ç ç¿»è¯ãä»£ç çº éãææ¡£çæçï¼å¸®å©ç¨åºåæ´é«æå¼åã

æ´å¼ºå¤§çæ§è½ï¼åºäº ChatGLM åä»£æ¨¡åçå¼åç»éªï¼æä»¬å¨é¢åçº§äº ChatGLM2-6B çåºåº§æ¨¡åãChatGLM2-6B ä½¿ç¨äº GLM çæ··åç®æ å½æ°ï¼ç»è¿äº 1.4T ä¸è±æ è¯ç¬¦çé¢è®ç»ä¸äººç±»åå¥½å¯¹é½è®ç»ï¼è¯æµç»ææ¾ç¤ºï¼ç¸æ¯äºåä»£æ¨¡åï¼ChatGLM2-6B å¨ MMLUï¼+23%ï¼ãCEvalï¼+33%ï¼ãGSM8Kï¼+571%ï¼ ãBBHï¼+60%ï¼çæ°æ®éä¸çæ§è½åå¾äºå¤§å¹åº¦çæåï¼å¨åå°ºå¯¸å¼æºæ¨¡åä¸å·æè¾å¼ºçç«äºåã
æ´é¿çä¸ä¸æï¼åºäº FlashAttention ææ¯ï¼æä»¬å°åºåº§æ¨¡åçä¸ä¸æé¿åº¦ï¼Context Lengthï¼ç± ChatGLM-6B ç 2K æ©å±å°äº 32Kï¼å¹¶å¨å¯¹è¯é¶æ®µä½¿ç¨ 8K çä¸ä¸æé¿åº¦è®ç»ï¼åè®¸æ´å¤è½®æ¬¡çå¯¹è¯ãä½å½åçæ¬ç ChatGLM2-6B å¯¹åè½®è¶é¿ææ¡£ççè§£è½åæéï¼æä»¬ä¼å¨åç»è¿ä»£åçº§ä¸çéè¿è¡ä¼åã
æ´é«æçæ¨çï¼åºäº Multi-Query Attention ææ¯ï¼ChatGLM2-6B ææ´é«æçæ¨çéåº¦åæ´ä½çæ¾åå ç¨ï¼å¨å®æ¹çæ¨¡åå®ç°ä¸ï¼æ¨çéåº¦ç¸æ¯åä»£æåäº 42%ï¼INT4 éåä¸ï¼6G æ¾åæ¯æçå¯¹è¯é¿åº¦ç± 1K æåå°äº 8Kã

æ´å¤ä¿¡æ¯åè§ ChatGLM2-6Bã

[2023/06/14] åå¸ WebGLMï¼ä¸é¡¹è¢«æ¥åäºKDD 2023çç ç©¶å·¥ä½ï¼æ¯æå©ç¨ç½ç»ä¿¡æ¯çæå¸¦æåç¡®å¼ç¨çé¿åçã

[2023/05/17] åå¸ VisualGLM-6Bï¼ä¸ä¸ªæ¯æå¾åçè§£çå¤æ¨¡æå¯¹è¯è¯è¨æ¨¡åã

å¯ä»¥éè¿æ¬ä»åºä¸ç cli_demo_vision.py å web_demo_vision.py æ¥è¿è¡å½ä»¤è¡åç½é¡µ Demoãæ³¨æ VisualGLM-6B éè¦é¢å¤å®è£ SwissArmyTransformer å torchvisionãæ´å¤ä¿¡æ¯åè§ VisualGLM-6Bã

ä»¥ä¸æ¯æ´æ°ååçè±æé®é¢å¯¹æ¯ï¼

é®é¢ï¼Describe a time when you had to make a difficult decision.
- v1.0:
- v1.1:
é®é¢ï¼Describe the function of a computer motherboard
- v1.0:
- v1.1:
é®é¢ï¼Develop a plan to reduce electricity usage in a home.
- v1.0:
- v1.1:
é®é¢ï¼æªæ¥çNFTï¼å¯è½çå®å®ä¹ä¸ç§ç°å®çèµäº§ï¼å®ä¼æ¯ä¸å¤æ¿äº§ï¼ä¸è¾æ±½è½¦ï¼ä¸çåå°ççï¼è¿æ ·çæ°ååè¯å¯è½æ¯çå®çä¸è¥¿æ´æä»·å¼ï¼ä½ å¯ä»¥éæ¶äº¤æåä½¿ç¨ï¼å¨èæåç°å®ä¸æ ç¼çè®©æ¥æçèµäº§ç»§ç»åé ä»·å¼ï¼æªæ¥ä¼æ¯ä¸ç©å½ææç¨ï¼ä½ä¸å½æææçæ¶ä»£ãç¿»è¯æä¸ä¸çè±è¯
- v1.0:
- v1.1:

æ´å¤æ´æ°ä¿¡æ¯åè§ UPDATE.md

åæé¾æ¥

å¯¹ ChatGLM è¿è¡å éçå¼æºé¡¹ç®ï¼

lyraChatGLM: å¯¹ ChatGLM-6B è¿è¡æ¨çå éï¼æé«å¯ä»¥å®ç° 9000+ tokens/s çæ¨çéåº¦
ChatGLM-MNN: ä¸ä¸ªåºäº MNN ç ChatGLM-6B C++ æ¨çå®ç°ï¼æ¯ææ ¹æ®æ¾åå¤§å°èªå¨åéè®¡ç®ä»»å¡ç» GPU å CPU
JittorLLMsï¼æä½3Gæ¾åæèæ²¡ææ¾å¡é½å¯è¿è¡ ChatGLM-6B FP16ï¼ æ¯æLinuxãwindowsãMacé¨ç½²
InferLLMï¼è½»éçº§ C++ æ¨çï¼å¯ä»¥å®ç°æ¬å° x86ï¼Arm å¤çå¨ä¸å®æ¶èå¤©ï¼ææºä¸ä¹åæ ·å¯ä»¥å®æ¶è¿è¡ï¼è¿è¡åååªéè¦ 4G

åºäºæä½¿ç¨äº ChatGLM-6B çå¼æºé¡¹ç®ï¼

langchain-ChatGLMï¼åºäº langchain ç ChatGLM åºç¨ï¼å®ç°åºäºå¯æ©å±ç¥è¯åºçé®ç
é»è¾¾ï¼å¤§åè¯è¨æ¨¡åè°ç¨å¹³å°ï¼åºäº ChatGLM-6B å®ç°äºç±» ChatPDF åè½
glm-botï¼å°ChatGLMæ¥å¥Koishiå¯å¨åå¤§èå¤©å¹³å°ä¸è°ç¨ChatGLM
Chuanhu Chat: ä¸ºåä¸ªå¤§è¯è¨æ¨¡ååå¨çº¿æ¨¡åAPIæä¾ç¾è§æç¨ãåè½ä¸°å¯ãå¿«éé¨ç½²çç¨æ·çé¢ï¼æ¯æChatGLM-6Bã

æ¯æ ChatGLM-6B åç¸å³åºç¨å¨çº¿è®ç»çç¤ºä¾é¡¹ç®ï¼

ç¬¬ä¸æ¹è¯æµï¼

Measuring Massive Multitask Chinese Understanding

æ´å¤å¼æºé¡¹ç®åè§ PROJECT.md

ä½¿ç¨æ¹å¼

ç¡¬ä»¶éæ±

éåççº§	æä½ GPU æ¾åï¼æ¨çï¼	æä½ GPU æ¾åï¼é«æåæ°å¾®è°ï¼
FP16ï¼æ éåï¼	13 GB	14 GB
INT8	8 GB	9 GB
INT4	6 GB	7 GB

ç¯å¢å®è£

ä½¿ç¨ pip å®è£ä¾èµï¼pip install -r requirements.txtï¼å¶ä¸ transformers åºçæ¬æ¨èä¸º 4.27.1ï¼ä½çè®ºä¸ä¸ä½äº 4.23.1 å³å¯ã

æ¤å¤ï¼å¦æéè¦å¨ cpu ä¸è¿è¡éååçæ¨¡åï¼è¿éè¦å®è£ gcc ä¸ openmpãå¤æ° Linux åè¡çé»è®¤å·²å®è£ãå¯¹äº Windows ï¼å¯å¨å®è£ TDM-GCC æ¶å¾é openmpã Windows æµè¯ç¯å¢ gcc çæ¬ä¸º TDM-GCC 10.3.0ï¼ Linux ä¸º gcc 11.3.0ãå¨ MacOS ä¸è¯·åè Q1ã

ä»£ç è°ç¨

å¯ä»¥éè¿å¦ä¸ä»£ç è°ç¨ ChatGLM-6B æ¨¡åæ¥çæå¯¹è¯ï¼

>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
>>> model = model.eval()
>>> response, history = model.chat(tokenizer, "ä½ å¥½", history=[])
>>> print(response)
ä½ å¥½ð!ææ¯äººå·¥æºè½å©æ ChatGLM-6B,å¾é«å´è§å°ä½ ,æ¬¢è¿é®æä»»ä½é®é¢ã
>>> response, history = model.chat(tokenizer, "æä¸ç¡ä¸çåºè¯¥æä¹å", history=history)
>>> print(response)
æä¸ç¡ä¸çå¯è½ä¼è®©ä½ æå°ç¦èæä¸èæ,ä½ä»¥ä¸æ¯ä¸äºå¯ä»¥å¸®å©ä½ å¥ç¡çæ¹æ³:

1. å¶å®è§å¾çç¡ç æ¶é´è¡¨:ä¿æè§å¾çç¡ç æ¶é´è¡¨å¯ä»¥å¸®å©ä½ å»ºç«å¥åº·çç¡ç ä¹ æ¯,ä½¿ä½ æ´å®¹æå¥ç¡ãå°½éå¨æ¯å¤©çç¸åæ¶é´ä¸åº,å¹¶å¨åä¸æ¶é´èµ·åºã
2. åé ä¸ä¸ªèéçç¡ç ç¯å¢:ç¡®ä¿ç¡ç ç¯å¢èé,å®é,é»æä¸æ¸©åº¦éå®ãå¯ä»¥ä½¿ç¨èéçåºä¸ç¨å,å¹¶ä¿ææ¿é´éé£ã
3. æ¾æ¾èº«å¿:å¨ç¡ååäºæ¾æ¾çæ´»å¨,ä¾å¦æ³¡ä¸ªçæ°´æ¾¡,å¬äºè½»æçé³ä¹,éè¯»ä¸äºæè¶£çä¹¦ç±ç,æå©äºç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå¥ç¡ã
4. é¿åé¥®ç¨å«æåå¡å çé¥®æ:åå¡å æ¯ä¸ç§åºæ¿æ§ç©è´¨,ä¼å½±åä½ çç¡ç è´¨éãå°½éé¿åå¨ç¡åé¥®ç¨å«æåå¡å çé¥®æ,ä¾å¦åå¡,è¶åå¯ä¹ã
5. é¿åå¨åºä¸åä¸ç¡ç æ å³çäºæ:å¨åºä¸åäºä¸ç¡ç æ å³çäºæ,ä¾å¦ççµå½±,ç©æ¸¸ææå·¥ä½ç,å¯è½ä¼å¹²æ°ä½ çç¡ç ã
6. å°è¯å¼å¸æå·§:æ·±å¼å¸æ¯ä¸ç§æ¾æ¾æå·§,å¯ä»¥å¸®å©ä½ ç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå¥ç¡ãè¯çæ¢æ¢å¸æ°,ä¿æå ç§é,ç¶åç¼æ¢å¼æ°ã

å¦æè¿äºæ¹æ³æ æ³å¸®å©ä½ å¥ç¡,ä½ å¯ä»¥èèå¨è¯¢å»çæç¡ç ä¸å®¶,å¯»æ±è¿ä¸æ¥çå»ºè®®ã

ä»æ¬å°å è½½æ¨¡å

ä» Hugging Face Hub ä¸è½½æ¨¡åéè¦åå®è£Git LFSï¼ç¶åè¿è¡

git clone https://huggingface.co/THUDM/chatglm-6b

å¦æä½ ä» Hugging Face Hub ä¸ä¸è½½ checkpoint çéåº¦è¾æ¢ï¼å¯ä»¥åªä¸è½½æ¨¡åå®ç°

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm-6b

git checkout v1.1.0

Demo & API

git clone https://github.com/THUDM/ChatGLM-6B
cd ChatGLM-6B

ç½é¡µç Demo

web-demo

é¦åå®è£ Gradioï¼pip install gradioï¼ç¶åè¿è¡ä»åºä¸ç web_demo.pyï¼

python web_demo.py

æè°¢ @AdamBear å®ç°äºåºäº Streamlit çç½é¡µç Demoï¼è¿è¡æ¹å¼è§#117.

å½ä»¤è¡ Demo

cli-demo

è¿è¡ä»åºä¸ cli_demo.pyï¼

python cli_demo.py

APIé¨ç½²

é¦åéè¦å®è£é¢å¤çä¾èµ pip install fastapi uvicornï¼ç¶åè¿è¡ä»åºä¸ç api.pyï¼

python api.py

curl -X POST "http://127.0.0.1:8000" \
     -H 'Content-Type: application/json' \
     -d '{"prompt": "ä½ å¥½", "history": []}'

å¾å°çè¿åå¼ä¸º

{
  "response":"ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM-6Bï¼å¾é«å´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã",
  "history":[["ä½ å¥½","ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM-6Bï¼å¾é«å´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã"]],
  "status":200,
  "time":"2023-03-23 21:38:40"
}

ä½ææ¬é¨ç½²

æ¨¡åéå

# æéä¿®æ¹ï¼ç®ååªæ¯æ 4/8 bit éå
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).quantize(8).half().cuda()

éåè¿ç¨éè¦å¨ååä¸é¦åå è½½ FP16 æ ¼å¼çæ¨¡åï¼æ¶èå¤§æ¦ 13GB çååãå¦æä½ çååä¸è¶³çè¯ï¼å¯ä»¥ç´æ¥å è½½éååçæ¨¡åï¼INT4 éååçæ¨¡åä»éå¤§æ¦ 5.2GB çååï¼

# INT8 éåçæ¨¡åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).half().cuda()

éåæ¨¡åçåæ°æä»¶ä¹å¯ä»¥ä»è¿éæå¨ä¸è½½ã

CPU é¨ç½²

model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).float()

å¦æä½ çååä¸è¶³ï¼å¯ä»¥ç´æ¥å è½½éååçæ¨¡åï¼

# INT8 éåçæ¨¡åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()

å¦æéå°äºæ¥é Could not find module 'nvcuda.dll' æè RuntimeError: Unknown platform: darwin (MacOS) ï¼è¯·ä»æ¬å°å è½½æ¨¡å

Mac é¨ç½²

å¯¹äºæè½½äº Apple Silicon æè AMD GPU çMacï¼å¯ä»¥ä½¿ç¨ MPS åç«¯æ¥å¨ GPU ä¸è¿è¡ ChatGLM-6Bãéè¦åè Apple ç å®æ¹è¯´æ å®è£ PyTorch-Nightlyï¼æ£ç¡®ççæ¬å·åºè¯¥æ¯2.1.0.dev2023xxxxï¼èä¸æ¯2.0.0ï¼ã

model = AutoModel.from_pretrained("your local path", trust_remote_code=True).half().to('mps')

å è½½åç²¾åº¦ç ChatGLM-6B æ¨¡åéè¦å¤§æ¦ 13GB ååãååè¾å°çæºå¨ï¼æ¯å¦ 16GB ååç MacBook Proï¼ï¼å¨ç©ºä½ååä¸è¶³çæåµä¸ä¼ä½¿ç¨ç¡¬çä¸çèæååï¼å¯¼è´æ¨çéåº¦ä¸¥éåæ¢ãæ¤æ¶å¯ä»¥ä½¿ç¨éååçæ¨¡åå¦ chatglm-6b-int4ãå ä¸º GPU ä¸éåç kernel æ¯ä½¿ç¨ CUDA ç¼åçï¼å æ¤æ æ³å¨ MacOS ä¸ä½¿ç¨ï¼åªè½ä½¿ç¨ CPU è¿è¡æ¨çã

# INT8 éåçæ¨¡åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()

ä¸ºäºååä½¿ç¨ CPU å¹¶è¡ï¼è¿éè¦åç¬å®è£ OpenMPã

å¤å¡é¨ç½²

from utils import load_model_on_gpus
model = load_model_on_gpus("THUDM/chatglm-6b", num_gpus=2)

é«æåæ°å¾®è°

åºäº P-tuning v2 çé«æåæ°å¾®è°ãå·ä½ä½¿ç¨æ¹æ³è¯¦è§ ptuning/README.mdã

ChatGLM-6B ç¤ºä¾

èªæè®¤ç¥

æçº²åä½

ææ¡åä½

ä¿¡æ¯æ½å

è§è²æ®æ¼

è¯è®ºæ¯è¾

ææ¸¸åå¯¼

å±éæ§

æ¨¡åå®¹éè¾å°ï¼6B çå°å®¹éï¼å³å®äºå¶ç¸å¯¹è¾å¼±çæ¨¡åè®°å¿åè¯è¨è½åãå¨é¢å¯¹è®¸å¤äºå®æ§ç¥è¯ä»»å¡æ¶ï¼ChatGLM-6B å¯è½ä¼çæä¸æ£ç¡®çä¿¡æ¯ï¼å®ä¹ä¸æé¿é»è¾ç±»é®é¢ï¼å¦æ°å¦ãç¼ç¨ï¼çè§£çã

ç¹å»æ¥çä¾å
äº§çæå®³è¯´æææåè§çåå®¹ï¼ChatGLM-6B åªæ¯ä¸ä¸ªåæ¥ä¸äººç±»æå¾å¯¹é½çè¯è¨æ¨¡åï¼å¯è½ä¼çææå®³ãæåè§çåå®¹ãï¼åå®¹å¯è½å·æåç¯æ§ï¼æ¤å¤ä¸å±ç¤ºï¼
è±æè½åä¸è¶³ï¼ChatGLM-6B è®ç»æ¶ä½¿ç¨çæç¤º/åçå¤§é¨åé½æ¯ä¸æçï¼ä»ææå°ä¸é¨åè±æåå®¹ãå æ¤ï¼å¦æè¾å¥è±ææç¤ºï¼åå¤çè´¨éè¿ä¸å¦ä¸æï¼çè³ä¸ä¸ææç¤ºä¸çåå®¹çç¾ï¼å¹¶ä¸åºç°ä¸è±å¤¹æçæåµã
æè¢«è¯¯å¯¼ï¼å¯¹è¯è½åè¾å¼±ï¼ChatGLM-6B å¯¹è¯è½åè¿æ¯è¾å¼±ï¼èä¸ âèªæè®¤ç¥â åå¨é®é¢ï¼å¹¶å¾å®¹æè¢«è¯¯å¯¼å¹¶äº§çéè¯¯çè¨è®ºãä¾å¦å½åçæ¬çæ¨¡åå¨è¢«è¯¯å¯¼çæåµä¸ï¼ä¼å¨èªæè®¤ç¥ä¸åçåå·®ã

ç¹å»æ¥çä¾å

åè®®

æ¬ä»åºçä»£ç ä¾ç§ Apache-2.0 åè®®å¼æºï¼ChatGLM-6B æ¨¡åçæéçä½¿ç¨åéè¦éµå¾ª Model LicenseãChatGLM-6B æéå¯¹å¦æ¯ç ç©¶å®å¨å¼æ¾ï¼å¨å¡«åé®å·è¿è¡ç»è®°å**äº¦åè®¸åè´¹åä¸ä½¿ç¨**ã

å¼ç¨

@misc{glm2024chatglm,
      title={ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools}, 
      author={Team GLM and Aohan Zeng and Bin Xu and Bowen Wang and Chenhui Zhang and Da Yin and Diego Rojas and Guanyu Feng and Hanlin Zhao and Hanyu Lai and Hao Yu and Hongning Wang and Jiadai Sun and Jiajie Zhang and Jiale Cheng and Jiayi Gui and Jie Tang and Jing Zhang and Juanzi Li and Lei Zhao and Lindong Wu and Lucen Zhong and Mingdao Liu and Minlie Huang and Peng Zhang and Qinkai Zheng and Rui Lu and Shuaiqi Duan and Shudan Zhang and Shulin Cao and Shuxun Yang and Weng Lam Tam and Wenyi Zhao and Xiao Liu and Xiao Xia and Xiaohan Zhang and Xiaotao Gu and Xin Lv and Xinghan Liu and Xinyi Liu and Xinyue Yang and Xixuan Song and Xunkai Zhang and Yifan An and Yifan Xu and Yilin Niu and Yuantao Yang and Yueyan Li and Yushi Bai and Yuxiao Dong and Zehan Qi and Zhaoyu Wang and Zhen Yang and Zhengxiao Du and Zhenyu Hou and Zihan Wang},
      year={2024},
      eprint={2406.12793},
      archivePrefix={arXiv},
      primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of Whisper

Cons of Whisper

Code Comparison

Key Differences

Pros of DeepSpeed

Cons of DeepSpeed

Code Comparison

Pros of transformers

Cons of transformers

Code comparison

Pros of fairseq

Cons of fairseq

Code Comparison

Pros of BERT

Cons of BERT

Code Comparison

Pros of minGPT

Cons of minGPT

Code Comparison

Convert designs to code with AI

README

ChatGLM-6B

GLM-4 å¼æºæ¨¡ååAPI

ä»ç»

æ´æ°ä¿¡æ¯

åæ é¾æ¥

ä½¿ç¨æ¹å¼

ç¡¬ä»¶éæ±

ç¯å¢å®è£

ä»£ç è°ç¨

ä»æ¬å°å è½½æ¨¡å

Demo & API

ç½é¡µç Demo

å½ä»¤è¡ Demo

APIé¨ç½²

ä½ææ¬é¨ç½²

æ¨¡åéå

CPU é¨ç½²

Mac é¨ç½²

å¤å¡é¨ç½²

é«æåæ°å¾®è°

ChatGLM-6B ç¤ºä¾

å±éæ§

åè®®

å¼ç¨

Top Related Projects

Convert designs to code with AI

GLM-4 å¼æºæ¨¡ååAPI

ä»ç»

æ´æ°ä¿¡æ¯

åæé¾æ¥

ä½¿ç¨æ¹å¼

ç¡¬ä»¶éæ±

ç¯å¢å®è£

ä»£ç è°ç¨

ä»æ¬å°å è½½æ¨¡å

ç½é¡µç Demo

å½ä»¤è¡ Demo

APIé¨ç½²

ä½ææ¬é¨ç½²

æ¨¡åéå

CPU é¨ç½²

Mac é¨ç½²

å¤å¡é¨ç½²

é«æåæ°å¾®è°

ChatGLM-6B ç¤ºä¾

å±éæ§

åè®®

å¼ç¨