Top Related Projects
Robust Speech Recognition via Large-Scale Weak Supervision
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
TensorFlow code and pre-trained models for BERT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Quick Overview
ChatGLM-6B is an open-source, bilingual (Chinese and English) dialogue language model developed by Tsinghua University. It is based on General Language Model (GLM) architecture and has 6 billion parameters. The model is designed to engage in human-like conversations and can be deployed on consumer-grade graphics cards.
Pros
- Bilingual support for Chinese and English
- Can run on consumer-grade GPUs with as little as 6GB of VRAM
- Open-source and freely available for research and commercial use
- Supports efficient inference with low latency
Cons
- Limited to 6 billion parameters, which may affect performance compared to larger models
- May require fine-tuning for specific domain applications
- Primarily focused on Chinese and English, limiting its use for other languages
- Potential biases and limitations inherent in large language models
Code Examples
- Loading the model and tokenizer:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
- Generating a response:
response, history = model.chat(tokenizer, "你好,请介绍一下你自己。", history=[])
print(response)
- Streaming the generated response:
for response, history in model.stream_chat(tokenizer, "请解释一下人工智能的概念。", history=[]):
print(response, end="", flush=True)
- Quantizing the model for lower memory usage:
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).quantize(4).half().cuda()
Getting Started
To get started with ChatGLM-6B, follow these steps:
- Install the required dependencies:
pip install transformers torch
- Load the model and tokenizer:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
- Start a conversation:
response, history = model.chat(tokenizer, "你好!", history=[])
print(response)
Competitor Comparisons
Robust Speech Recognition via Large-Scale Weak Supervision
Pros of Whisper
- Specialized for speech recognition and transcription tasks
- Supports multiple languages and can perform translation
- Well-documented and extensively tested on diverse audio datasets
Cons of Whisper
- Limited to audio processing, not a general-purpose language model
- Requires more computational resources for real-time transcription
Code Comparison
Whisper:
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
ChatGLM-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello", history=[])
print(response)
Key Differences
- Whisper focuses on speech-to-text tasks, while ChatGLM-6B is a general-purpose language model
- Whisper is designed for audio processing, whereas ChatGLM-6B excels in text-based interactions
- ChatGLM-6B offers more flexibility for various NLP tasks, but Whisper provides specialized audio transcription capabilities
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- Highly scalable and efficient for training large language models
- Supports a wide range of optimization techniques and hardware configurations
- Integrates well with popular deep learning frameworks like PyTorch
Cons of DeepSpeed
- Steeper learning curve for beginners due to its complexity
- Requires more setup and configuration compared to ChatGLM-6B
- May be overkill for smaller models or simpler training tasks
Code Comparison
ChatGLM-6B example:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
DeepSpeed example:
import deepspeed
import torch
model = MyModel()
engine = deepspeed.initialize(model=model, config_params=ds_config)
output = engine(input_data)
The ChatGLM-6B code focuses on easy model loading and inference, while the DeepSpeed code emphasizes initialization and integration with custom models for optimized training.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of transformers
- Broader scope: Supports a wide range of NLP tasks and models
- Extensive documentation and community support
- Regular updates and contributions from the open-source community
Cons of transformers
- Larger codebase, potentially more complex to navigate
- May require more setup and configuration for specific tasks
Code comparison
ChatGLM-6B:
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello", history=[])
transformers:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs)
The code comparison shows that ChatGLM-6B is more focused on chat-based interactions, while transformers provides a more general approach to working with language models. transformers offers greater flexibility in model selection and task-specific implementations, but may require more setup for specialized use cases.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Pros of fairseq
- More comprehensive and versatile toolkit for sequence modeling
- Extensive documentation and community support
- Supports a wider range of tasks and architectures
Cons of fairseq
- Steeper learning curve due to its complexity
- Potentially higher computational requirements
- Less focused on specific chat-based applications
Code Comparison
fairseq:
from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('/path/to/model', checkpoint_file='model.pt')
translations = model.translate(['Hello world!'])
ChatGLM-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello world!", history=[])
The code comparison shows that fairseq requires more setup for specific tasks, while ChatGLM-6B provides a more straightforward interface for chat-based interactions. fairseq's code demonstrates its flexibility for various sequence modeling tasks, whereas ChatGLM-6B's code is tailored for conversational AI applications.
TensorFlow code and pre-trained models for BERT
Pros of BERT
- Well-established and widely adopted in the NLP community
- Extensive documentation and pre-trained models available
- Suitable for a variety of NLP tasks with minimal fine-tuning
Cons of BERT
- Smaller model size (110M parameters) compared to ChatGLM-6B (6B parameters)
- Less advanced in generating human-like responses for open-ended tasks
- May require more task-specific fine-tuning for optimal performance
Code Comparison
BERT example:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
ChatGLM-6B example:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
Both repositories provide pre-trained models and tokenizers, but ChatGLM-6B requires the trust_remote_code=True
parameter due to its custom implementation. BERT offers a more straightforward setup, while ChatGLM-6B provides a larger, more advanced model for complex language tasks.
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Pros of minGPT
- Lightweight and easy to understand implementation of GPT
- Excellent educational resource for learning about transformer architecture
- Highly customizable and adaptable for various tasks
Cons of minGPT
- Limited scale compared to ChatGLM-6B (6B parameters)
- Lacks multilingual support and advanced features of ChatGLM-6B
- Not optimized for production-level performance
Code Comparison
minGPT:
class GPT(nn.Module):
def __init__(self, config):
super().__init__()
self.tok_emb = nn.Embedding(config.vocab_size, config.n_embd)
self.pos_emb = nn.Parameter(torch.zeros(1, config.block_size, config.n_embd))
self.drop = nn.Dropout(config.embd_pdrop)
ChatGLM-6B:
class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel):
def __init__(self, config: ChatGLMConfig):
super().__init__(config)
self.transformer = ChatGLMModel(config)
self.config = config
self.quantized = False
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ChatGLM-6B
ð Blog ⢠ð¤ HF Repo ⢠ð¦ Twitter ⢠ð Report
ð å å ¥æä»¬ç Discord å WeChat
ðå¨ æºè°±AI弿¾å¹³å° ä½éªåä½¿ç¨æ´å¤§è§æ¨¡ç GLM å䏿¨¡åã
Read this in English.
GLM-4 弿ºæ¨¡ååAPI
æä»¬å·²ç»å叿æ°ç GLM-4 大è¯è¨å¯¹è¯æ¨¡åï¼è¯¥æ¨¡åå¨å¤ä¸ªææ ä¸æäºæ°ççªç ´ï¼æ¨å¯ä»¥å¨ä»¥ä¸ä¸¤ä¸ªæ¸ éä½éªæä»¬çææ°æ¨¡åã
-
GLM-4 弿ºæ¨¡å æä»¬å·²ç»å¼æºäº GLM-4-9B ç³»åæ¨¡åï¼å¨åé¡¹ææ çceæ¯ä¸æææ¾æåï¼æ¬¢è¿å°è¯ã
-
æºè°±æ¸ è¨ ä½éªææ°ç GLM-4ï¼å æ¬ GLMsï¼All toolsçåè½ã
-
APIå¹³å° æ°ä¸ä»£ API å¹³å°å·²ç»ä¸çº¿ï¼æ¨å¯ä»¥ç´æ¥å¨ API å¹³å°ä¸ä½éª
GLM-4-0520
ãGLM-4-air
ãGLM-4-airx
ãGLM-4-flash
ãGLM-4
ãGLM-3-Turbo
ãCharacterGLM-3
ï¼CogView-3
çæ°æ¨¡åã å ¶ä¸GLM-4
ãGLM-3-Turbo
ä¸¤ä¸ªæ¨¡åæ¯æäºSystem Prompt
ãFunction Call
ãRetrieval
ãWeb_Search
çæ°åè½ï¼æ¬¢è¿ä½éªã -
GLM-4 API 弿ºæç¨ GLM-4 APIæç¨ååºç¡åºç¨ï¼æ¬¢è¿å°è¯ã APIç¸å ³é®é¢å¯ä»¥å¨æ¬å¼æºæç¨çé®ï¼æè ä½¿ç¨ GLM-4 API AI婿 æ¥è·å¾å¸¸è§é®é¢ç帮å©ã
ä»ç»
ChatGLM-6B æ¯ä¸ä¸ªå¼æºçãæ¯æä¸è±åè¯ç对è¯è¯è¨æ¨¡åï¼åºäº General Language Model (GLM) æ¶æï¼å ·æ 62 äº¿åæ°ãç»å模åéåææ¯ï¼ç¨æ·å¯ä»¥å¨æ¶è´¹çº§çæ¾å¡ä¸è¿è¡æ¬å°é¨ç½²ï¼INT4 éå级å«ä¸æä½åªé 6GB æ¾åï¼ã ChatGLM-6B 使ç¨äºå ChatGPT ç¸ä¼¼çææ¯ï¼é坹䏿é®çå对è¯è¿è¡äºä¼åãç»è¿çº¦ 1T æ è¯ç¬¦çä¸è±åè¯è®ç»ï¼è¾ 以çç£å¾®è°ãåé¦èªå©ã人类åé¦å¼ºåå¦ä¹ çææ¯çå æï¼62 äº¿åæ°ç ChatGLM-6B å·²ç»è½çæç¸å½ç¬¦å人类å好çåçï¼æ´å¤ä¿¡æ¯è¯·åèæä»¬çåå®¢ãæ¬¢è¿éè¿ chatglm.cn ä½éªæ´å¤§è§æ¨¡ç ChatGLM 模åã
ä¸ºäºæ¹ä¾¿ä¸æ¸¸å¼åè é对èªå·±çåºç¨åºæ¯å®å¶æ¨¡åï¼æä»¬åæ¶å®ç°äºåºäº P-Tuning v2 çé«æåæ°å¾®è°æ¹æ³ (ä½¿ç¨æå) ï¼INT4 éå级å«ä¸æä½åªé 7GB æ¾åå³å¯å¯å¨å¾®è°ã
ChatGLM-6B æé坹妿¯ç ç©¶å®å ¨å¼æ¾ï¼å¨å¡«åé®å·è¿è¡ç»è®°å**亦å 许å è´¹åä¸ä½¿ç¨**ã
ChatGLM-6B 弿ºæ¨¡åæ¨å¨ä¸å¼æºç¤¾åºä¸èµ·æ¨å¨å¤§æ¨¡åææ¯åå±ï¼æ³è¯·å¼åè å大家éµå®å¼æºåè®®ï¼å¿å°å¼æºæ¨¡åå代ç ååºäºå¼æºé¡¹ç®äº§ççè¡çç©ç¨äºä»»ä½å¯è½ç»å½å®¶å社ä¼å¸¦æ¥å±å®³çç¨é以åç¨äºä»»ä½æªç»è¿å®å ¨è¯ä¼°å夿¡çæå¡ãç®åï¼æ¬é¡¹ç®å¢éæªåºäº ChatGLM-6B å¼åä»»ä½åºç¨ï¼å æ¬ç½é¡µç«¯ãå®åãè¹æ iOS å Windows App çåºç¨ã
尽管模åå¨è®ç»çåä¸ªé¶æ®µé½å°½åç¡®ä¿æ°æ®çåè§æ§ååç¡®æ§ï¼ä½ç±äº ChatGLM-6B 模åè§æ¨¡è¾å°ï¼ä¸æ¨¡å忦çéæºæ§å ç´ å½±åï¼æ æ³ä¿è¯è¾åºå 容çåç¡®æ§ï¼ä¸æ¨¡åæè¢«è¯¯å¯¼ï¼è¯¦è§å±éæ§ï¼ãæ¬é¡¹ç®ä¸æ¿æ 弿ºæ¨¡åå代ç 导è´çæ°æ®å®å ¨ãèæ é£é©æåç任使¨¡åè¢«è¯¯å¯¼ãæ»¥ç¨ãä¼ æãä¸å½å©ç¨è产ççé£é©å责任ã
æ´æ°ä¿¡æ¯
[2023/07/25] åå¸ CodeGeeX2 ï¼åºäº ChatGLM2-6B ç代ç çææ¨¡åï¼ä»£ç è½åå ¨é¢æåï¼æ´å¤ç¹æ§å æ¬ï¼
- æ´å¼ºå¤§ç代ç è½åï¼CodeGeeX2-6B è¿ä¸æ¥ç»è¿äº 600B ä»£ç æ°æ®é¢è®ç»ï¼ç¸æ¯ CodeGeeX ä¸ä»£æ¨¡åï¼å¨ä»£ç è½åä¸å ¨é¢æåï¼HumanEval-X è¯æµéçå ç§ç¼ç¨è¯è¨åå¤§å¹ æå (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321%)ï¼å¨Pythonä¸è¾¾å° 35.9% ç Pass@1 䏿¬¡éè¿çï¼è¶ è¶è§æ¨¡æ´å¤§ç StarCoder-15Bã
- **æ´ä¼ç§ç模åç¹æ§**ï¼ç»§æ¿ ChatGLM2-6B 模åç¹æ§ï¼CodeGeeX2-6B æ´å¥½æ¯æä¸è±æè¾å ¥ï¼æ¯ææå¤§ 8192 åºåé¿åº¦ï¼æ¨çé度è¾ä¸ä»£ å¤§å¹ æåï¼éååä» é6GBæ¾åå³å¯è¿è¡ï¼æ¯æè½»é级æ¬å°åé¨ç½²ã
- æ´å ¨é¢çAIç¼ç¨å©æï¼CodeGeeXæä»¶ï¼VS Code, Jetbrainsï¼å端åçº§ï¼æ¯æè¶ è¿100ç§ç¼ç¨è¯è¨ï¼æ°å¢ä¸ä¸æè¡¥å ¨ãè·¨æä»¶è¡¥å ¨çå®ç¨åè½ãç»å Ask CodeGeeX 交äºå¼AIç¼ç¨å©æï¼æ¯æä¸è±æå¯¹è¯è§£å³åç§ç¼ç¨é®é¢ï¼å æ¬ä¸ä¸éäºä»£ç è§£éã代ç ç¿»è¯ã代ç çº éãææ¡£çæçï¼å¸®å©ç¨åºåæ´é«æå¼åã
[2023/06/25] åå¸ ChatGLM2-6Bï¼ChatGLM-6B çåçº§çæ¬ï¼å¨ä¿çäºäºå代模åå¯¹è¯æµç ãé¨ç½²é¨æ§è¾ä½çä¼å¤ä¼ç§ç¹æ§çåºç¡ä¹ä¸ï¼ChatGLM2-6B å¼å ¥äºå¦ä¸æ°ç¹æ§ï¼
- æ´å¼ºå¤§çæ§è½ï¼åºäº ChatGLM å代模åçå¼åç»éªï¼æä»¬å ¨é¢åçº§äº ChatGLM2-6B çåºåº§æ¨¡åãChatGLM2-6B 使ç¨äº GLM çæ··åç®æ 彿°ï¼ç»è¿äº 1.4T ä¸è±æ è¯ç¬¦çé¢è®ç»ä¸äººç±»å好对é½è®ç»ï¼è¯æµç»ææ¾ç¤ºï¼ç¸æ¯äºå代模åï¼ChatGLM2-6B å¨ MMLUï¼+23%ï¼ãCEvalï¼+33%ï¼ãGSM8Kï¼+571%ï¼ ãBBHï¼+60%ï¼çæ°æ®éä¸çæ§è½åå¾äºå¤§å¹ 度çæåï¼å¨åå°ºå¯¸å¼æºæ¨¡åä¸å ·æè¾å¼ºçç«äºåã
- æ´é¿çä¸ä¸æï¼åºäº FlashAttention ææ¯ï¼æä»¬å°åºåº§æ¨¡åçä¸ä¸æé¿åº¦ï¼Context Lengthï¼ç± ChatGLM-6B ç 2K æ©å±å°äº 32Kï¼å¹¶å¨å¯¹è¯é¶æ®µä½¿ç¨ 8K çä¸ä¸æé¿åº¦è®ç»ï¼å 许æ´å¤è½®æ¬¡ç对è¯ãä½å½åçæ¬ç ChatGLM2-6B 对åè½®è¶ é¿ææ¡£ççè§£è½åæéï¼æä»¬ä¼å¨åç»è¿ä»£å级ä¸çéè¿è¡ä¼åã
- æ´é«æçæ¨çï¼åºäº Multi-Query Attention ææ¯ï¼ChatGLM2-6B ææ´é«æçæ¨çéåº¦åæ´ä½çæ¾åå ç¨ï¼å¨å®æ¹ç模åå®ç°ä¸ï¼æ¨çéåº¦ç¸æ¯å代æåäº 42%ï¼INT4 éåä¸ï¼6G æ¾åæ¯æç对è¯é¿åº¦ç± 1K æåå°äº 8Kã
æ´å¤ä¿¡æ¯åè§ ChatGLM2-6Bã
[2023/06/14] åå¸ WebGLMï¼ä¸é¡¹è¢«æ¥åäºKDD 2023çç ç©¶å·¥ä½ï¼æ¯æå©ç¨ç½ç»ä¿¡æ¯çæå¸¦æåç¡®å¼ç¨çé¿åçã
[2023/05/17] åå¸ VisualGLM-6Bï¼ä¸ä¸ªæ¯æå¾åçè§£ç夿¨¡æå¯¹è¯è¯è¨æ¨¡åã
å¯ä»¥éè¿æ¬ä»åºä¸ç cli_demo_vision.py å web_demo_vision.py æ¥è¿è¡å½ä»¤è¡åç½é¡µ Demoãæ³¨æ VisualGLM-6B éè¦é¢å¤å®è£ SwissArmyTransformer å torchvisionãæ´å¤ä¿¡æ¯åè§ VisualGLM-6Bã
[2023/05/15] æ´æ° v1.1 çæ¬ checkpointï¼è®ç»æ°æ®å¢å è±ææä»¤å¾®è°æ°æ®ä»¥å¹³è¡¡ä¸è±ææ°æ®æ¯ä¾ï¼è§£å³è±æåçä¸å¤¹æä¸æè¯è¯çç°è±¡ã
以䏿¯æ´æ°ååçè±æé®é¢å¯¹æ¯ï¼
- é®é¢ï¼Describe a time when you had to make a difficult decision.
- v1.0:
- v1.1:
- v1.0:
- é®é¢ï¼Describe the function of a computer motherboard
- v1.0:
- v1.1:
- v1.0:
- é®é¢ï¼Develop a plan to reduce electricity usage in a home.
- v1.0:
- v1.1:
- v1.0:
- é®é¢ï¼æªæ¥çNFTï¼å¯è½çå®å®ä¹ä¸ç§ç°å®çèµäº§ï¼å®ä¼æ¯ä¸å¤æ¿äº§ï¼ä¸è¾æ±½è½¦ï¼ä¸çåå°ççï¼è¿æ ·çæ°ååè¯å¯è½æ¯çå®çä¸è¥¿æ´æä»·å¼ï¼ä½ å¯ä»¥éæ¶äº¤æå使ç¨ï¼å¨èæåç°å®ä¸æ ç¼çè®©æ¥æçèµäº§ç»§ç»åé ä»·å¼ï¼æªæ¥ä¼æ¯ä¸ç©å½ææç¨ï¼ä½ä¸å½æææçæ¶ä»£ãç¿»è¯æä¸ä¸çè±è¯
- v1.0:
- v1.1:
- v1.0:
æ´å¤æ´æ°ä¿¡æ¯åè§ UPDATE.md
åæ é¾æ¥
对 ChatGLM è¿è¡å éç弿ºé¡¹ç®ï¼
- lyraChatGLM: 对 ChatGLM-6B è¿è¡æ¨çå éï¼æé«å¯ä»¥å®ç° 9000+ tokens/s çæ¨çé度
- ChatGLM-MNN: ä¸ä¸ªåºäº MNN ç ChatGLM-6B C++ æ¨çå®ç°ï¼æ¯ææ ¹æ®æ¾å大å°èªå¨åé 计ç®ä»»å¡ç» GPU å CPU
- JittorLLMsï¼æä½3Gæ¾åæè æ²¡ææ¾å¡é½å¯è¿è¡ ChatGLM-6B FP16ï¼ æ¯æLinuxãwindowsãMacé¨ç½²
- InferLLMï¼è½»é级 C++ æ¨çï¼å¯ä»¥å®ç°æ¬å° x86ï¼Arm å¤çå¨ä¸å®æ¶èå¤©ï¼ææºä¸ä¹åæ ·å¯ä»¥å®æ¶è¿è¡ï¼è¿è¡å ååªéè¦ 4G
åºäºæä½¿ç¨äº ChatGLM-6B ç弿ºé¡¹ç®ï¼
- langchain-ChatGLMï¼åºäº langchain ç ChatGLM åºç¨ï¼å®ç°åºäºå¯æ©å±ç¥è¯åºçé®ç
- é»è¾¾ï¼å¤§åè¯è¨æ¨¡åè°ç¨å¹³å°ï¼åºäº ChatGLM-6B å®ç°äºç±» ChatPDF åè½
- glm-botï¼å°ChatGLMæ¥å ¥Koishiå¯å¨å大è天平å°ä¸è°ç¨ChatGLM
- Chuanhu Chat: 为å个大è¯è¨æ¨¡ååå¨çº¿æ¨¡åAPIæä¾ç¾è§æç¨ãåè½ä¸°å¯ãå¿«éé¨ç½²çç¨æ·çé¢ï¼æ¯æChatGLM-6Bã
æ¯æ ChatGLM-6B åç¸å ³åºç¨å¨çº¿è®ç»ç示ä¾é¡¹ç®ï¼
ç¬¬ä¸æ¹è¯æµï¼
æ´å¤å¼æºé¡¹ç®åè§ PROJECT.md
ä½¿ç¨æ¹å¼
ç¡¬ä»¶éæ±
éåç级 | æä½ GPU æ¾åï¼æ¨çï¼ | æä½ GPU æ¾åï¼é«æåæ°å¾®è°ï¼ |
---|---|---|
FP16ï¼æ éåï¼ | 13 GB | 14 GB |
INT8 | 8 GB | 9 GB |
INT4 | 6 GB | 7 GB |
ç¯å¢å®è£
ä½¿ç¨ pip å®è£
ä¾èµï¼pip install -r requirements.txt
ï¼å
¶ä¸ transformers
åºçæ¬æ¨è为 4.27.1
ï¼ä½ç论ä¸ä¸ä½äº 4.23.1
å³å¯ã
æ¤å¤ï¼å¦æéè¦å¨ cpu ä¸è¿è¡éååçæ¨¡åï¼è¿éè¦å®è£
gcc
ä¸ openmp
ã夿° Linux åè¡çé»è®¤å·²å®è£
ãå¯¹äº Windows ï¼å¯å¨å®è£
TDM-GCC æ¶å¾é openmp
ã Windows æµè¯ç¯å¢ gcc
çæ¬ä¸º TDM-GCC 10.3.0
ï¼ Linux 为 gcc 11.3.0
ãå¨ MacOS ä¸è¯·åè Q1ã
代ç è°ç¨
å¯ä»¥éè¿å¦ä¸ä»£ç è°ç¨ ChatGLM-6B æ¨¡åæ¥çæå¯¹è¯ï¼
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
>>> model = model.eval()
>>> response, history = model.chat(tokenizer, "ä½ å¥½", history=[])
>>> print(response)
ä½ å¥½ð!ææ¯äººå·¥æºè½å©æ ChatGLM-6B,å¾é«å
´è§å°ä½ ,欢è¿é®æä»»ä½é®é¢ã
>>> response, history = model.chat(tokenizer, "æä¸ç¡ä¸çåºè¯¥æä¹å", history=history)
>>> print(response)
æä¸ç¡ä¸çå¯è½ä¼è®©ä½ æå°ç¦èæä¸èæ,ä½ä»¥ä¸æ¯ä¸äºå¯ä»¥å¸®å©ä½ å
¥ç¡çæ¹æ³:
1. å¶å®è§å¾çç¡ç æ¶é´è¡¨:ä¿æè§å¾çç¡ç æ¶é´è¡¨å¯ä»¥å¸®å©ä½ 建ç«å¥åº·çç¡ç ä¹ æ¯,ä½¿ä½ æ´å®¹æå
¥ç¡ãå°½é卿¯å¤©çç¸åæ¶é´ä¸åº,å¹¶å¨å䏿¶é´èµ·åºã
2. åé ä¸ä¸ªèéçç¡ç ç¯å¢:ç¡®ä¿ç¡ç ç¯å¢èé,å®é,黿䏿¸©åº¦éå®ãå¯ä»¥ä½¿ç¨èéçåºä¸ç¨å,å¹¶ä¿ææ¿é´éé£ã
3. æ¾æ¾èº«å¿:å¨ç¡ååäºæ¾æ¾çæ´»å¨,ä¾å¦æ³¡ä¸ªç水澡,å¬äºè½»æçé³ä¹,é
读ä¸äºæè¶£ç书ç±ç,æå©äºç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå
¥ç¡ã
4. é¿å
饮ç¨å«æåå¡å ç饮æ:åå¡å æ¯ä¸ç§åºæ¿æ§ç©è´¨,ä¼å½±åä½ çç¡ç è´¨éãå°½éé¿å
å¨ç¡å饮ç¨å«æåå¡å ç饮æ,ä¾å¦åå¡,è¶åå¯ä¹ã
5. é¿å
å¨åºä¸åä¸ç¡ç æ å
³çäºæ
:å¨åºä¸åäºä¸ç¡ç æ å
³çäºæ
,ä¾å¦ççµå½±,ç©æ¸¸ææå·¥ä½ç,å¯è½ä¼å¹²æ°ä½ çç¡ç ã
6. å°è¯å¼å¸æå·§:æ·±å¼å¸æ¯ä¸ç§æ¾æ¾æå·§,å¯ä»¥å¸®å©ä½ ç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå
¥ç¡ãè¯çæ
¢æ
¢å¸æ°,ä¿æå ç§é,ç¶åç¼æ
¢å¼æ°ã
妿è¿äºæ¹æ³æ æ³å¸®å©ä½ å
¥ç¡,ä½ å¯ä»¥èèå¨è¯¢å»çæç¡ç ä¸å®¶,寻æ±è¿ä¸æ¥ç建议ã
模åçå®ç°ä»ç¶å¤å¨åå¨ä¸ã妿叿åºå®ä½¿ç¨ç模åå®ç°ä»¥ä¿è¯å
¼å®¹æ§ï¼å¯ä»¥å¨ from_pretrained
çè°ç¨ä¸å¢å revision="v1.1.0"
åæ°ãv1.1.0
æ¯å½åææ°ççæ¬å·ï¼å®æ´ççæ¬å表åè§ Change Logã
仿¬å°å 载模å
以ä¸ä»£ç ä¼ç± transformers
èªå¨ä¸è½½æ¨¡åå®ç°ååæ°ã宿´ç模åå®ç°å¯ä»¥å¨ Hugging Face Hubãå¦æä½ çç½ç»ç¯å¢è¾å·®ï¼ä¸è½½æ¨¡ååæ°å¯è½ä¼è±è´¹è¾é¿æ¶é´çè³å¤±è´¥ãæ¤æ¶å¯ä»¥å
å°æ¨¡åä¸è½½å°æ¬å°ï¼ç¶å仿¬å°å è½½ã
ä» Hugging Face Hub ä¸è½½æ¨¡åéè¦å å®è£ Git LFSï¼ç¶åè¿è¡
git clone https://huggingface.co/THUDM/chatglm-6b
å¦æä½ ä» Hugging Face Hub ä¸ä¸è½½ checkpoint çéåº¦è¾æ ¢ï¼å¯ä»¥åªä¸è½½æ¨¡åå®ç°
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm-6b
ç¶åä»è¿éæå¨ä¸è½½æ¨¡ååæ°æä»¶ï¼å¹¶å°ä¸è½½çæä»¶æ¿æ¢å°æ¬å°ç chatglm-6b
ç®å½ä¸ã
å°æ¨¡åä¸è½½å°æ¬å°ä¹åï¼å°ä»¥ä¸ä»£ç ä¸ç THUDM/chatglm-6b
æ¿æ¢ä¸ºä½ æ¬å°ç chatglm-6b
æä»¶å¤¹çè·¯å¾ï¼å³å¯ä»æ¬å°å 载模åã
Optional 模åçå®ç°ä»ç¶å¤å¨åå¨ä¸ã妿叿åºå®ä½¿ç¨ç模åå®ç°ä»¥ä¿è¯å ¼å®¹æ§ï¼å¯ä»¥æ§è¡
git checkout v1.1.0
Demo & API
æä»¬æä¾äºä¸ä¸ªåºäº Gradio çç½é¡µç Demo åä¸ä¸ªå½ä»¤è¡ Demoãä½¿ç¨æ¶é¦å éè¦ä¸è½½æ¬ä»åºï¼
git clone https://github.com/THUDM/ChatGLM-6B
cd ChatGLM-6B
ç½é¡µç Demo
é¦å
å®è£
Gradioï¼pip install gradio
ï¼ç¶åè¿è¡ä»åºä¸ç web_demo.pyï¼
python web_demo.py
ç¨åºä¼è¿è¡ä¸ä¸ª Web Serverï¼å¹¶è¾åºå°åã卿µè§å¨ä¸æå¼è¾åºçå°åå³å¯ä½¿ç¨ãææ°ç Demo å®ç°äºæåæºææï¼é度ä½éªå¤§å¤§æåãæ³¨æï¼ç±äºå½å
Gradio çç½ç»è®¿é®è¾ä¸ºç¼æ
¢ï¼å¯ç¨ demo.queue().launch(share=True, inbrowser=True)
æ¶ææç½ç»ä¼ç»è¿ Gradio æå¡å¨è½¬åï¼å¯¼è´æåæºä½éªå¤§å¹
ä¸éï¼ç°å¨é»è®¤å¯å¨æ¹å¼å·²ç»æ¹ä¸º share=False
ï¼å¦æéè¦å
¬ç½è®¿é®çéæ±ï¼å¯ä»¥éæ°ä¿®æ¹ä¸º share=True
å¯å¨ã
æè°¢ @AdamBear å®ç°äºåºäº Streamlit çç½é¡µç Demoï¼è¿è¡æ¹å¼è§#117.
å½ä»¤è¡ Demo
è¿è¡ä»åºä¸ cli_demo.pyï¼
python cli_demo.py
ç¨åºä¼å¨å½ä»¤è¡ä¸è¿è¡äº¤äºå¼ç对è¯ï¼å¨å½ä»¤è¡ä¸è¾å
¥æç¤ºå¹¶å车å³å¯çæåå¤ï¼è¾å
¥ clear
å¯ä»¥æ¸
空对è¯åå²ï¼è¾å
¥ stop
ç»æ¢ç¨åºã
APIé¨ç½²
é¦å
éè¦å®è£
é¢å¤çä¾èµ pip install fastapi uvicorn
ï¼ç¶åè¿è¡ä»åºä¸ç api.pyï¼
python api.py
é»è®¤é¨ç½²å¨æ¬å°ç 8000 端å£ï¼éè¿ POST æ¹æ³è¿è¡è°ç¨
curl -X POST "http://127.0.0.1:8000" \
-H 'Content-Type: application/json' \
-d '{"prompt": "ä½ å¥½", "history": []}'
å¾å°çè¿åå¼ä¸º
{
"response":"ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM-6Bï¼å¾é«å
´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã",
"history":[["ä½ å¥½","ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM-6Bï¼å¾é«å
´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã"]],
"status":200,
"time":"2023-03-23 21:38:40"
}
使æ¬é¨ç½²
模åéå
é»è®¤æ åµä¸ï¼æ¨¡å以 FP16 精度å è½½ï¼è¿è¡ä¸è¿°ä»£ç éè¦å¤§æ¦ 13GB æ¾åãå¦æä½ ç GPU æ¾åæéï¼å¯ä»¥å°è¯ä»¥éåæ¹å¼å 载模åï¼ä½¿ç¨æ¹æ³å¦ä¸ï¼
# æéä¿®æ¹ï¼ç®ååªæ¯æ 4/8 bit éå
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).quantize(8).half().cuda()
è¿è¡ 2 è³ 3 轮对è¯åï¼8-bit éåä¸ GPU æ¾åå ç¨çº¦ä¸º 10GBï¼4-bit éåä¸ä» é 6GB å ç¨ãéç对è¯è½®æ°çå¢å¤ï¼å¯¹åºæ¶èæ¾åä¹éä¹å¢é¿ï¼ç±äºéç¨äºç¸å¯¹ä½ç½®ç¼ç ï¼çè®ºä¸ ChatGLM-6B æ¯ææ éé¿ç context-lengthï¼ä½æ»é¿åº¦è¶ è¿ 2048ï¼è®ç»é¿åº¦ï¼åæ§è½ä¼éæ¸ä¸éã
模åéåä¼å¸¦æ¥ä¸å®çæ§è½æå¤±ï¼ç»è¿æµè¯ï¼ChatGLM-6B å¨ 4-bit éåä¸ä»ç¶è½å¤è¿è¡èªç¶æµç ççæãä½¿ç¨ GPT-Q çéåæ¹æ¡å¯ä»¥è¿ä¸æ¥å缩éå精度/æåç¸åéå精度ä¸çæ¨¡åæ§è½ï¼æ¬¢è¿å¤§å®¶æåºå¯¹åºç Pull Requestã
éåè¿ç¨éè¦å¨å åä¸é¦å å è½½ FP16 æ ¼å¼ç模åï¼æ¶èå¤§æ¦ 13GB çå åãå¦æä½ çå åä¸è¶³çè¯ï¼å¯ä»¥ç´æ¥å è½½éååçæ¨¡åï¼INT4 éååçæ¨¡åä» éå¤§æ¦ 5.2GB çå åï¼
# INT8 éåçæ¨¡åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).half().cuda()
é忍¡åçåæ°æä»¶ä¹å¯ä»¥ä»è¿éæå¨ä¸è½½ã
CPU é¨ç½²
å¦æä½ æ²¡æ GPU 硬件çè¯ï¼ä¹å¯ä»¥å¨ CPU ä¸è¿è¡æ¨çï¼ä½æ¯æ¨çéåº¦ä¼æ´æ ¢ãä½¿ç¨æ¹æ³å¦ä¸ï¼éè¦å¤§æ¦ 32GB å åï¼
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).float()
å¦æä½ çå åä¸è¶³ï¼å¯ä»¥ç´æ¥å è½½éååçæ¨¡åï¼
# INT8 éåçæ¨¡åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()
妿éå°äºæ¥é Could not find module 'nvcuda.dll'
æè
RuntimeError: Unknown platform: darwin
(MacOS) ï¼è¯·ä»æ¬å°å 载模å
Mac é¨ç½²
å¯¹äºæè½½äº Apple Silicon æè AMD GPU çMacï¼å¯ä»¥ä½¿ç¨ MPS å端æ¥å¨ GPU ä¸è¿è¡ ChatGLM-6Bãéè¦åè Apple ç 宿¹è¯´æ å®è£ PyTorch-Nightlyï¼æ£ç¡®ççæ¬å·åºè¯¥æ¯2.1.0.dev2023xxxxï¼è䏿¯2.0.0ï¼ã
ç®åå¨ MacOS ä¸åªæ¯æä»æ¬å°å 载模åãå°ä»£ç ä¸ç模åå è½½æ¹ä¸ºä»æ¬å°å è½½ï¼å¹¶ä½¿ç¨ mps å端ï¼
model = AutoModel.from_pretrained("your local path", trust_remote_code=True).half().to('mps')
å è½½å精度ç ChatGLM-6B 模åéè¦å¤§æ¦ 13GB å åãå åè¾å°çæºå¨ï¼æ¯å¦ 16GB å åç MacBook Proï¼ï¼å¨ç©ºä½å åä¸è¶³çæ åµä¸ä¼ä½¿ç¨ç¡¬çä¸çèæå åï¼å¯¼è´æ¨çé度严éåæ ¢ãæ¤æ¶å¯ä»¥ä½¿ç¨éååçæ¨¡åå¦ chatglm-6b-int4ãå 为 GPU ä¸éåç kernel æ¯ä½¿ç¨ CUDA ç¼åçï¼å æ¤æ æ³å¨ MacOS ä¸ä½¿ç¨ï¼åªè½ä½¿ç¨ CPU è¿è¡æ¨çã
# INT8 éåçæ¨¡åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()
为äºå åä½¿ç¨ CPU å¹¶è¡ï¼è¿éè¦åç¬å®è£ OpenMPã
å¤å¡é¨ç½²
å¦æä½ æå¤å¼ GPUï¼ä½æ¯æ¯å¼ GPU çæ¾å大å°é½ä¸è¶³ä»¥å®¹çº³å®æ´ç模åï¼é£ä¹å¯ä»¥å°æ¨¡åååå¨å¤å¼ GPUä¸ãé¦å
å®è£
accelerate: pip install accelerate
ï¼ç¶åéè¿å¦ä¸æ¹æ³å 载模åï¼
from utils import load_model_on_gpus
model = load_model_on_gpus("THUDM/chatglm-6b", num_gpus=2)
å³å¯å°æ¨¡åé¨ç½²å°ä¸¤å¼ GPU ä¸è¿è¡æ¨çãä½ å¯ä»¥å° num_gpus
æ¹ä¸ºä½ å¸æä½¿ç¨ç GPU æ°ãé»è®¤æ¯ååååçï¼ä½ ä¹å¯ä»¥ä¼ å
¥ device_map
åæ°æ¥èªå·±æå®ã
髿忰微è°
åºäº P-tuning v2 ç髿忰微è°ãå ·ä½ä½¿ç¨æ¹æ³è¯¦è§ ptuning/README.mdã
ChatGLM-6B 示ä¾
以䏿¯ä¸äºä½¿ç¨ web_demo.py
å¾å°çç¤ºä¾æªå¾ãæ´å¤ ChatGLM-6B çå¯è½ï¼çå¾
ä½ æ¥æ¢ç´¢åç°ï¼
èªæè®¤ç¥
æçº²åä½
ææ¡åä½
é®ä»¶åä½å©æ
ä¿¡æ¯æ½å
è§è²æ®æ¼
è¯è®ºæ¯è¾
æ æ¸¸å导
å±éæ§
ç±äº ChatGLM-6B çå°è§æ¨¡ï¼å ¶è½åä»ç¶æè®¸å¤å±éæ§ã以䏿¯æä»¬ç®ååç°çä¸äºé®é¢ï¼
-
模å容éè¾å°ï¼6B çå°å®¹éï¼å³å®äºå ¶ç¸å¯¹è¾å¼±ç模åè®°å¿åè¯è¨è½åãå¨é¢å¯¹è®¸å¤äºå®æ§ç¥è¯ä»»å¡æ¶ï¼ChatGLM-6B å¯è½ä¼çæä¸æ£ç¡®çä¿¡æ¯ï¼å®ä¹ä¸æ é¿é»è¾ç±»é®é¢ï¼å¦æ°å¦ãç¼ç¨ï¼çè§£çã
ç¹å»æ¥çä¾å
-
产çæå®³è¯´æææåè§çå 容ï¼ChatGLM-6B åªæ¯ä¸ä¸ªåæ¥ä¸äººç±»æå¾å¯¹é½çè¯è¨æ¨¡åï¼å¯è½ä¼çææå®³ãæåè§çå 容ãï¼å 容å¯è½å ·æåç¯æ§ï¼æ¤å¤ä¸å±ç¤ºï¼
-
è±æè½åä¸è¶³ï¼ChatGLM-6B è®ç»æ¶ä½¿ç¨çæç¤º/åç大é¨å齿¯ä¸æçï¼ä» ææå°ä¸é¨åè±æå 容ãå æ¤ï¼å¦æè¾å ¥è±ææç¤ºï¼åå¤çè´¨éè¿ä¸å¦ä¸æï¼çè³ä¸ä¸ææç¤ºä¸çå 容çç¾ï¼å¹¶ä¸åºç°ä¸è±å¤¹æçæ åµã
-
æè¢«è¯¯å¯¼ï¼å¯¹è¯è½åè¾å¼±ï¼ChatGLM-6B 对è¯è½åè¿æ¯è¾å¼±ï¼èä¸ âèªæè®¤ç¥â åå¨é®é¢ï¼å¹¶å¾å®¹æè¢«è¯¯å¯¼å¹¶äº§çé误çè¨è®ºãä¾å¦å½åçæ¬çæ¨¡åå¨è¢«è¯¯å¯¼çæ åµä¸ï¼ä¼å¨èªæè®¤ç¥ä¸åçåå·®ã
ç¹å»æ¥çä¾å
åè®®
æ¬ä»åºç代ç ä¾ç § Apache-2.0 åè®®å¼æºï¼ChatGLM-6B 模åçæéç使ç¨åéè¦éµå¾ª Model LicenseãChatGLM-6B æé坹妿¯ç ç©¶å®å ¨å¼æ¾ï¼å¨å¡«åé®å·è¿è¡ç»è®°å**亦å 许å è´¹åä¸ä½¿ç¨**ã
å¼ç¨
å¦æä½ è§å¾æä»¬ç工使叮å©çè¯ï¼è¯·èèå¼ç¨ä¸å论æ
@misc{glm2024chatglm,
title={ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools},
author={Team GLM and Aohan Zeng and Bin Xu and Bowen Wang and Chenhui Zhang and Da Yin and Diego Rojas and Guanyu Feng and Hanlin Zhao and Hanyu Lai and Hao Yu and Hongning Wang and Jiadai Sun and Jiajie Zhang and Jiale Cheng and Jiayi Gui and Jie Tang and Jing Zhang and Juanzi Li and Lei Zhao and Lindong Wu and Lucen Zhong and Mingdao Liu and Minlie Huang and Peng Zhang and Qinkai Zheng and Rui Lu and Shuaiqi Duan and Shudan Zhang and Shulin Cao and Shuxun Yang and Weng Lam Tam and Wenyi Zhao and Xiao Liu and Xiao Xia and Xiaohan Zhang and Xiaotao Gu and Xin Lv and Xinghan Liu and Xinyi Liu and Xinyue Yang and Xixuan Song and Xunkai Zhang and Yifan An and Yifan Xu and Yilin Niu and Yuantao Yang and Yueyan Li and Yushi Bai and Yuxiao Dong and Zehan Qi and Zhaoyu Wang and Zhen Yang and Zhengxiao Du and Zhenyu Hou and Zihan Wang},
year={2024},
eprint={2406.12793},
archivePrefix={arXiv},
primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}
Top Related Projects
Robust Speech Recognition via Large-Scale Weak Supervision
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
TensorFlow code and pre-trained models for BERT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot