Top Related Projects
Robust Speech Recognition via Large-Scale Weak Supervision
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
TensorFlow code and pre-trained models for BERT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Quick Overview
ChatGLM-6B is an open-source, bilingual (Chinese and English) dialogue language model developed by Tsinghua University. It is based on General Language Model (GLM) architecture and has 6 billion parameters. The model is designed to engage in human-like conversations and can be deployed on consumer-grade graphics cards.
Pros
- Bilingual support for Chinese and English
- Can run on consumer-grade GPUs with as little as 6GB of VRAM
- Open-source and freely available for research and commercial use
- Supports efficient inference with low latency
Cons
- Limited to 6 billion parameters, which may affect performance compared to larger models
- May require fine-tuning for specific domain applications
- Primarily focused on Chinese and English, limiting its use for other languages
- Potential biases and limitations inherent in large language models
Code Examples
- Loading the model and tokenizer:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
- Generating a response:
response, history = model.chat(tokenizer, "你好,请介绍一下你自己。", history=[])
print(response)
- Streaming the generated response:
for response, history in model.stream_chat(tokenizer, "请解释一下人工智能的概念。", history=[]):
print(response, end="", flush=True)
- Quantizing the model for lower memory usage:
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).quantize(4).half().cuda()
Getting Started
To get started with ChatGLM-6B, follow these steps:
- Install the required dependencies:
pip install transformers torch
- Load the model and tokenizer:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
- Start a conversation:
response, history = model.chat(tokenizer, "你好!", history=[])
print(response)
Competitor Comparisons
Robust Speech Recognition via Large-Scale Weak Supervision
Pros of Whisper
- Specialized for speech recognition and transcription tasks
- Supports multiple languages and can perform translation
- Well-documented and extensively tested on diverse audio datasets
Cons of Whisper
- Limited to audio processing, not a general-purpose language model
- Requires more computational resources for real-time transcription
Code Comparison
Whisper:
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
ChatGLM-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello", history=[])
print(response)
Key Differences
- Whisper focuses on speech-to-text tasks, while ChatGLM-6B is a general-purpose language model
- Whisper is designed for audio processing, whereas ChatGLM-6B excels in text-based interactions
- ChatGLM-6B offers more flexibility for various NLP tasks, but Whisper provides specialized audio transcription capabilities
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- Highly optimized for distributed training and inference
- Supports a wide range of models and architectures
- Extensive documentation and active community support
Cons of DeepSpeed
- Steeper learning curve for beginners
- Requires more configuration and setup compared to ChatGLM-6B
- May have higher computational requirements for some use cases
Code Comparison
ChatGLM-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello", history=[])
DeepSpeed:
import deepspeed
import torch
model = MyModel()
engine = deepspeed.initialize(model=model, config_params=ds_config)
output = engine(input_data)
Summary
DeepSpeed offers advanced optimization techniques for large-scale training and inference, making it suitable for complex projects. ChatGLM-6B provides a more straightforward implementation for Chinese language models. The choice between them depends on the specific requirements of your project, such as scale, language focus, and available resources.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of transformers
- Broader scope: Supports a wide range of NLP tasks and models
- Extensive documentation and community support
- Regular updates and contributions from the open-source community
Cons of transformers
- Larger codebase, potentially more complex to navigate
- May require more setup and configuration for specific tasks
Code comparison
ChatGLM-6B:
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello", history=[])
transformers:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs)
The code comparison shows that ChatGLM-6B is more focused on chat-based interactions, while transformers provides a more general approach to working with language models. transformers offers greater flexibility in model selection and task-specific implementations, but may require more setup for specialized use cases.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Pros of fairseq
- More comprehensive and versatile toolkit for sequence modeling
- Extensive documentation and community support
- Supports a wider range of tasks and architectures
Cons of fairseq
- Steeper learning curve due to its complexity
- Potentially higher computational requirements
- Less focused on specific chat-based applications
Code Comparison
fairseq:
from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('/path/to/model', checkpoint_file='model.pt')
translations = model.translate(['Hello world!'])
ChatGLM-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello world!", history=[])
The code comparison shows that fairseq requires more setup for specific tasks, while ChatGLM-6B provides a more straightforward interface for chat-based interactions. fairseq's code demonstrates its flexibility for various sequence modeling tasks, whereas ChatGLM-6B's code is tailored for conversational AI applications.
TensorFlow code and pre-trained models for BERT
Pros of BERT
- Well-established and widely adopted in the NLP community
- Extensive documentation and pre-trained models available
- Suitable for a variety of NLP tasks with minimal fine-tuning
Cons of BERT
- Smaller model size (110M parameters) compared to ChatGLM-6B (6B parameters)
- Less advanced in generating human-like responses for open-ended tasks
- May require more task-specific fine-tuning for optimal performance
Code Comparison
BERT example:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
ChatGLM-6B example:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
Both repositories provide pre-trained models and tokenizers, but ChatGLM-6B requires the trust_remote_code=True
parameter due to its custom implementation. BERT offers a more straightforward setup, while ChatGLM-6B provides a larger, more advanced model for complex language tasks.
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Pros of minGPT
- Lightweight and easy to understand implementation of GPT
- Excellent educational resource for learning about transformer architecture
- Highly customizable and adaptable for various tasks
Cons of minGPT
- Limited scale compared to ChatGLM-6B (6B parameters)
- Lacks multilingual support and advanced features of ChatGLM-6B
- Not optimized for production-level performance
Code Comparison
minGPT:
class GPT(nn.Module):
def __init__(self, config):
super().__init__()
self.tok_emb = nn.Embedding(config.vocab_size, config.n_embd)
self.pos_emb = nn.Parameter(torch.zeros(1, config.block_size, config.n_embd))
self.drop = nn.Dropout(config.embd_pdrop)
ChatGLM-6B:
class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel):
def __init__(self, config: ChatGLMConfig):
super().__init__(config)
self.transformer = ChatGLMModel(config)
self.config = config
self.quantized = False
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ChatGLM-6B
ð Blog ⢠ð¤ HF Repo ⢠ð¦ Twitter ⢠ð Report
ð å å ¥æ们ç Discord å WeChat
ðå¨ æºè°±AIå¼æ¾å¹³å° ä½éªå使ç¨æ´å¤§è§æ¨¡ç GLM åä¸æ¨¡åã
Read this in English.
GLM-4 å¼æºæ¨¡ååAPI
æ们已ç»åå¸ææ°ç GLM-4 大è¯è¨å¯¹è¯æ¨¡åï¼è¯¥æ¨¡åå¨å¤ä¸ªææ ä¸æäºæ°ççªç ´ï¼æ¨å¯ä»¥å¨ä»¥ä¸ä¸¤ä¸ªæ¸ éä½éªæ们çææ°æ¨¡åã
-
GLM-4 å¼æºæ¨¡å æ们已ç»å¼æºäº GLM-4-9B ç³»å模åï¼å¨å项ææ çceæ¯ä¸æææ¾æåï¼æ¬¢è¿å°è¯ã
-
æºè°±æ¸ è¨ ä½éªææ°ç GLM-4ï¼å æ¬ GLMsï¼All toolsçåè½ã
-
APIå¹³å° æ°ä¸ä»£ API å¹³å°å·²ç»ä¸çº¿ï¼æ¨å¯ä»¥ç´æ¥å¨ API å¹³å°ä¸ä½éª
GLM-4-0520
ãGLM-4-air
ãGLM-4-airx
ãGLM-4-flash
ãGLM-4
ãGLM-3-Turbo
ãCharacterGLM-3
ï¼CogView-3
çæ°æ¨¡åã å ¶ä¸GLM-4
ãGLM-3-Turbo
两个模åæ¯æäºSystem Prompt
ãFunction Call
ãRetrieval
ãWeb_Search
çæ°åè½ï¼æ¬¢è¿ä½éªã -
GLM-4 API å¼æºæç¨ GLM-4 APIæç¨ååºç¡åºç¨ï¼æ¬¢è¿å°è¯ã APIç¸å ³é®é¢å¯ä»¥å¨æ¬å¼æºæç¨çé®ï¼æè ä½¿ç¨ GLM-4 API AIå©æ æ¥è·å¾å¸¸è§é®é¢ç帮å©ã
ä»ç»
ChatGLM-6B æ¯ä¸ä¸ªå¼æºçãæ¯æä¸è±åè¯ç对è¯è¯è¨æ¨¡åï¼åºäº General Language Model (GLM) æ¶æï¼å ·æ 62 亿åæ°ãç»å模åéåææ¯ï¼ç¨æ·å¯ä»¥å¨æ¶è´¹çº§çæ¾å¡ä¸è¿è¡æ¬å°é¨ç½²ï¼INT4 éå级å«ä¸æä½åªé 6GB æ¾åï¼ã ChatGLM-6B 使ç¨äºå ChatGPT ç¸ä¼¼çææ¯ï¼é对ä¸æé®çå对è¯è¿è¡äºä¼åãç»è¿çº¦ 1T æ è¯ç¬¦çä¸è±åè¯è®ç»ï¼è¾ 以çç£å¾®è°ãåé¦èªå©ã人类åé¦å¼ºåå¦ä¹ çææ¯çå æï¼62 亿åæ°ç ChatGLM-6B å·²ç»è½çæç¸å½ç¬¦å人类å好çåçï¼æ´å¤ä¿¡æ¯è¯·åèæ们çå客ã欢è¿éè¿ chatglm.cn ä½éªæ´å¤§è§æ¨¡ç ChatGLM 模åã
为äºæ¹ä¾¿ä¸æ¸¸å¼åè é对èªå·±çåºç¨åºæ¯å®å¶æ¨¡åï¼æ们åæ¶å®ç°äºåºäº P-Tuning v2 çé«æåæ°å¾®è°æ¹æ³ (使ç¨æå) ï¼INT4 éå级å«ä¸æä½åªé 7GB æ¾åå³å¯å¯å¨å¾®è°ã
ChatGLM-6B æé对å¦æ¯ç 究å®å ¨å¼æ¾ï¼å¨å¡«åé®å·è¿è¡ç»è®°å**亦å 许å è´¹åä¸ä½¿ç¨**ã
ChatGLM-6B å¼æºæ¨¡åæ¨å¨ä¸å¼æºç¤¾åºä¸èµ·æ¨å¨å¤§æ¨¡åææ¯åå±ï¼æ³è¯·å¼åè å大家éµå®å¼æºåè®®ï¼å¿å°å¼æºæ¨¡åå代ç ååºäºå¼æºé¡¹ç®äº§ççè¡çç©ç¨äºä»»ä½å¯è½ç»å½å®¶å社ä¼å¸¦æ¥å±å®³çç¨é以åç¨äºä»»ä½æªç»è¿å®å ¨è¯ä¼°åå¤æ¡çæå¡ãç®åï¼æ¬é¡¹ç®å¢éæªåºäº ChatGLM-6B å¼åä»»ä½åºç¨ï¼å æ¬ç½é¡µç«¯ãå®åãè¹æ iOS å Windows App çåºç¨ã
尽管模åå¨è®ç»çå个é¶æ®µé½å°½åç¡®ä¿æ°æ®çåè§æ§ååç¡®æ§ï¼ä½ç±äº ChatGLM-6B 模åè§æ¨¡è¾å°ï¼ä¸æ¨¡ååæ¦çéæºæ§å ç´ å½±åï¼æ æ³ä¿è¯è¾åºå 容çåç¡®æ§ï¼ä¸æ¨¡åæ被误导ï¼è¯¦è§å±éæ§ï¼ãæ¬é¡¹ç®ä¸æ¿æ å¼æºæ¨¡åå代ç 导è´çæ°æ®å®å ¨ãèæ é£é©æåçä»»ä½æ¨¡å被误导ã滥ç¨ãä¼ æãä¸å½å©ç¨è产ççé£é©å责任ã
æ´æ°ä¿¡æ¯
[2023/07/25] åå¸ CodeGeeX2 ï¼åºäº ChatGLM2-6B ç代ç çæ模åï¼ä»£ç è½åå ¨é¢æåï¼æ´å¤ç¹æ§å æ¬ï¼
- æ´å¼ºå¤§ç代ç è½åï¼CodeGeeX2-6B è¿ä¸æ¥ç»è¿äº 600B 代ç æ°æ®é¢è®ç»ï¼ç¸æ¯ CodeGeeX ä¸ä»£æ¨¡åï¼å¨ä»£ç è½åä¸å ¨é¢æåï¼HumanEval-X è¯æµéçå ç§ç¼ç¨è¯è¨åå¤§å¹ æå (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321%)ï¼å¨Pythonä¸è¾¾å° 35.9% ç Pass@1 ä¸æ¬¡éè¿çï¼è¶ è¶è§æ¨¡æ´å¤§ç StarCoder-15Bã
- **æ´ä¼ç§ç模åç¹æ§**ï¼ç»§æ¿ ChatGLM2-6B 模åç¹æ§ï¼CodeGeeX2-6B æ´å¥½æ¯æä¸è±æè¾å ¥ï¼æ¯ææ大 8192 åºåé¿åº¦ï¼æ¨çé度è¾ä¸ä»£ å¤§å¹ æåï¼éååä» é6GBæ¾åå³å¯è¿è¡ï¼æ¯æè½»é级æ¬å°åé¨ç½²ã
- æ´å ¨é¢çAIç¼ç¨å©æï¼CodeGeeXæ件ï¼VS Code, Jetbrainsï¼å端å级ï¼æ¯æè¶ è¿100ç§ç¼ç¨è¯è¨ï¼æ°å¢ä¸ä¸æè¡¥å ¨ãè·¨æä»¶è¡¥å ¨çå®ç¨åè½ãç»å Ask CodeGeeX 交äºå¼AIç¼ç¨å©æï¼æ¯æä¸è±æ对è¯è§£å³åç§ç¼ç¨é®é¢ï¼å æ¬ä¸ä¸éäºä»£ç 解éã代ç ç¿»è¯ã代ç çº éãææ¡£çæçï¼å¸®å©ç¨åºåæ´é«æå¼åã
[2023/06/25] åå¸ ChatGLM2-6Bï¼ChatGLM-6B çå级çæ¬ï¼å¨ä¿çäºäºå代模å对è¯æµç ãé¨ç½²é¨æ§è¾ä½çä¼å¤ä¼ç§ç¹æ§çåºç¡ä¹ä¸ï¼ChatGLM2-6B å¼å ¥äºå¦ä¸æ°ç¹æ§ï¼
- æ´å¼ºå¤§çæ§è½ï¼åºäº ChatGLM å代模åçå¼åç»éªï¼æä»¬å ¨é¢åçº§äº ChatGLM2-6B çåºåº§æ¨¡åãChatGLM2-6B 使ç¨äº GLM çæ··åç®æ å½æ°ï¼ç»è¿äº 1.4T ä¸è±æ è¯ç¬¦çé¢è®ç»ä¸äººç±»å好对é½è®ç»ï¼è¯æµç»ææ¾ç¤ºï¼ç¸æ¯äºå代模åï¼ChatGLM2-6B å¨ MMLUï¼+23%ï¼ãCEvalï¼+33%ï¼ãGSM8Kï¼+571%ï¼ ãBBHï¼+60%ï¼çæ°æ®éä¸çæ§è½åå¾äºå¤§å¹ 度çæåï¼å¨å尺寸å¼æºæ¨¡åä¸å ·æè¾å¼ºçç«äºåã
- æ´é¿çä¸ä¸æï¼åºäº FlashAttention ææ¯ï¼æ们å°åºåº§æ¨¡åçä¸ä¸æé¿åº¦ï¼Context Lengthï¼ç± ChatGLM-6B ç 2K æ©å±å°äº 32Kï¼å¹¶å¨å¯¹è¯é¶æ®µä½¿ç¨ 8K çä¸ä¸æé¿åº¦è®ç»ï¼å 许æ´å¤è½®æ¬¡ç对è¯ãä½å½åçæ¬ç ChatGLM2-6B 对åè½®è¶ é¿ææ¡£çç解è½åæéï¼æ们ä¼å¨åç»è¿ä»£å级ä¸çéè¿è¡ä¼åã
- æ´é«æçæ¨çï¼åºäº Multi-Query Attention ææ¯ï¼ChatGLM2-6B ææ´é«æçæ¨çé度åæ´ä½çæ¾åå ç¨ï¼å¨å®æ¹ç模åå®ç°ä¸ï¼æ¨çé度ç¸æ¯å代æåäº 42%ï¼INT4 éåä¸ï¼6G æ¾åæ¯æç对è¯é¿åº¦ç± 1K æåå°äº 8Kã
æ´å¤ä¿¡æ¯åè§ ChatGLM2-6Bã
[2023/06/14] åå¸ WebGLMï¼ä¸é¡¹è¢«æ¥åäºKDD 2023çç 究工ä½ï¼æ¯æå©ç¨ç½ç»ä¿¡æ¯çæ带æåç¡®å¼ç¨çé¿åçã
[2023/05/17] åå¸ VisualGLM-6Bï¼ä¸ä¸ªæ¯æå¾åç解çå¤æ¨¡æ对è¯è¯è¨æ¨¡åã
å¯ä»¥éè¿æ¬ä»åºä¸ç cli_demo_vision.py å web_demo_vision.py æ¥è¿è¡å½ä»¤è¡åç½é¡µ Demoã注æ VisualGLM-6B éè¦é¢å¤å®è£ SwissArmyTransformer å torchvisionãæ´å¤ä¿¡æ¯åè§ VisualGLM-6Bã
[2023/05/15] æ´æ° v1.1 çæ¬ checkpointï¼è®ç»æ°æ®å¢å è±ææ令微è°æ°æ®ä»¥å¹³è¡¡ä¸è±ææ°æ®æ¯ä¾ï¼è§£å³è±æåçä¸å¤¹æä¸æè¯è¯çç°è±¡ã
以ä¸æ¯æ´æ°ååçè±æé®é¢å¯¹æ¯ï¼
- é®é¢ï¼Describe a time when you had to make a difficult decision.
- v1.0:
- v1.1:
- é®é¢ï¼Describe the function of a computer motherboard
- v1.0:
- v1.1:
- é®é¢ï¼Develop a plan to reduce electricity usage in a home.
- v1.0:
- v1.1:
- é®é¢ï¼æªæ¥çNFTï¼å¯è½çå®å®ä¹ä¸ç§ç°å®çèµäº§ï¼å®ä¼æ¯ä¸å¤æ¿äº§ï¼ä¸è¾æ±½è½¦ï¼ä¸çåå°ççï¼è¿æ ·çæ°ååè¯å¯è½æ¯çå®çä¸è¥¿æ´æä»·å¼ï¼ä½ å¯ä»¥éæ¶äº¤æå使ç¨ï¼å¨èæåç°å®ä¸æ ç¼ç让æ¥æçèµäº§ç»§ç»åé ä»·å¼ï¼æªæ¥ä¼æ¯ä¸ç©å½ææç¨ï¼ä½ä¸å½æææçæ¶ä»£ãç¿»è¯æä¸ä¸çè±è¯
- v1.0:
- v1.1:
æ´å¤æ´æ°ä¿¡æ¯åè§ UPDATE.md
åæ é¾æ¥
对 ChatGLM è¿è¡å éçå¼æºé¡¹ç®ï¼
- lyraChatGLM: 对 ChatGLM-6B è¿è¡æ¨çå éï¼æé«å¯ä»¥å®ç° 9000+ tokens/s çæ¨çé度
- ChatGLM-MNN: ä¸ä¸ªåºäº MNN ç ChatGLM-6B C++ æ¨çå®ç°ï¼æ¯ææ ¹æ®æ¾å大å°èªå¨åé 计ç®ä»»å¡ç» GPU å CPU
- JittorLLMsï¼æä½3Gæ¾åæè 没ææ¾å¡é½å¯è¿è¡ ChatGLM-6B FP16ï¼ æ¯æLinuxãwindowsãMacé¨ç½²
- InferLLMï¼è½»é级 C++ æ¨çï¼å¯ä»¥å®ç°æ¬å° x86ï¼Arm å¤çå¨ä¸å®æ¶è天ï¼ææºä¸ä¹åæ ·å¯ä»¥å®æ¶è¿è¡ï¼è¿è¡å ååªéè¦ 4G
åºäºæ使ç¨äº ChatGLM-6B çå¼æºé¡¹ç®ï¼
- langchain-ChatGLMï¼åºäº langchain ç ChatGLM åºç¨ï¼å®ç°åºäºå¯æ©å±ç¥è¯åºçé®ç
- é»è¾¾ï¼å¤§åè¯è¨æ¨¡åè°ç¨å¹³å°ï¼åºäº ChatGLM-6B å®ç°äºç±» ChatPDF åè½
- glm-botï¼å°ChatGLMæ¥å ¥Koishiå¯å¨å大è天平å°ä¸è°ç¨ChatGLM
- Chuanhu Chat: 为å个大è¯è¨æ¨¡ååå¨çº¿æ¨¡åAPIæä¾ç¾è§æç¨ãåè½ä¸°å¯ãå¿«éé¨ç½²çç¨æ·çé¢ï¼æ¯æChatGLM-6Bã
æ¯æ ChatGLM-6B åç¸å ³åºç¨å¨çº¿è®ç»ç示ä¾é¡¹ç®ï¼
第ä¸æ¹è¯æµï¼
æ´å¤å¼æºé¡¹ç®åè§ PROJECT.md
使ç¨æ¹å¼
硬件éæ±
éåç级 | æä½ GPU æ¾åï¼æ¨çï¼ | æä½ GPU æ¾åï¼é«æåæ°å¾®è°ï¼ |
---|---|---|
FP16ï¼æ éåï¼ | 13 GB | 14 GB |
INT8 | 8 GB | 9 GB |
INT4 | 6 GB | 7 GB |
ç¯å¢å®è£
ä½¿ç¨ pip å®è£
ä¾èµï¼pip install -r requirements.txt
ï¼å
¶ä¸ transformers
åºçæ¬æ¨è为 4.27.1
ï¼ä½ç论ä¸ä¸ä½äº 4.23.1
å³å¯ã
æ¤å¤ï¼å¦æéè¦å¨ cpu ä¸è¿è¡éååç模åï¼è¿éè¦å®è£
gcc
ä¸ openmp
ãå¤æ° Linux åè¡çé»è®¤å·²å®è£
ãå¯¹äº Windows ï¼å¯å¨å®è£
TDM-GCC æ¶å¾é openmp
ã Windows æµè¯ç¯å¢ gcc
çæ¬ä¸º TDM-GCC 10.3.0
ï¼ Linux 为 gcc 11.3.0
ãå¨ MacOS ä¸è¯·åè Q1ã
代ç è°ç¨
å¯ä»¥éè¿å¦ä¸ä»£ç è°ç¨ ChatGLM-6B 模åæ¥çæ对è¯ï¼
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
>>> model = model.eval()
>>> response, history = model.chat(tokenizer, "ä½ å¥½", history=[])
>>> print(response)
ä½ å¥½ð!ææ¯äººå·¥æºè½å©æ ChatGLM-6B,å¾é«å
´è§å°ä½ ,欢è¿é®æä»»ä½é®é¢ã
>>> response, history = model.chat(tokenizer, "æä¸ç¡ä¸çåºè¯¥æä¹å", history=history)
>>> print(response)
æä¸ç¡ä¸çå¯è½ä¼è®©ä½ æå°ç¦èæä¸èæ,ä½ä»¥ä¸æ¯ä¸äºå¯ä»¥å¸®å©ä½ å
¥ç¡çæ¹æ³:
1. å¶å®è§å¾çç¡ç æ¶é´è¡¨:ä¿æè§å¾çç¡ç æ¶é´è¡¨å¯ä»¥å¸®å©ä½ 建ç«å¥åº·çç¡ç ä¹ æ¯,ä½¿ä½ æ´å®¹æå
¥ç¡ãå°½éå¨æ¯å¤©çç¸åæ¶é´ä¸åº,并å¨åä¸æ¶é´èµ·åºã
2. åé ä¸ä¸ªèéçç¡ç ç¯å¢:ç¡®ä¿ç¡ç ç¯å¢èé,å®é,é»æä¸æ¸©åº¦éå®ãå¯ä»¥ä½¿ç¨èéçåºä¸ç¨å,并ä¿ææ¿é´éé£ã
3. æ¾æ¾èº«å¿:å¨ç¡ååäºæ¾æ¾çæ´»å¨,ä¾å¦æ³¡ä¸ªç水澡,å¬äºè½»æçé³ä¹,é
读ä¸äºæ趣ç书ç±ç,æå©äºç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå
¥ç¡ã
4. é¿å
饮ç¨å«æåå¡å ç饮æ:åå¡å æ¯ä¸ç§åºæ¿æ§ç©è´¨,ä¼å½±åä½ çç¡ç è´¨éãå°½éé¿å
å¨ç¡å饮ç¨å«æåå¡å ç饮æ,ä¾å¦åå¡,è¶åå¯ä¹ã
5. é¿å
å¨åºä¸åä¸ç¡ç æ å
³çäºæ
:å¨åºä¸åäºä¸ç¡ç æ å
³çäºæ
,ä¾å¦ççµå½±,ç©æ¸¸ææå·¥ä½ç,å¯è½ä¼å¹²æ°ä½ çç¡ç ã
6. å°è¯å¼å¸æå·§:æ·±å¼å¸æ¯ä¸ç§æ¾æ¾æå·§,å¯ä»¥å¸®å©ä½ ç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå
¥ç¡ãè¯çæ
¢æ
¢å¸æ°,ä¿æå ç§é,ç¶åç¼æ
¢å¼æ°ã
å¦æè¿äºæ¹æ³æ æ³å¸®å©ä½ å
¥ç¡,ä½ å¯ä»¥èèå¨è¯¢å»çæç¡ç ä¸å®¶,寻æ±è¿ä¸æ¥ç建议ã
模åçå®ç°ä»ç¶å¤å¨åå¨ä¸ãå¦æå¸æåºå®ä½¿ç¨ç模åå®ç°ä»¥ä¿è¯å
¼å®¹æ§ï¼å¯ä»¥å¨ from_pretrained
çè°ç¨ä¸å¢å revision="v1.1.0"
åæ°ãv1.1.0
æ¯å½åææ°ççæ¬å·ï¼å®æ´ççæ¬å表åè§ Change Logã
ä»æ¬å°å 载模å
以ä¸ä»£ç ä¼ç± transformers
èªå¨ä¸è½½æ¨¡åå®ç°ååæ°ãå®æ´ç模åå®ç°å¯ä»¥å¨ Hugging Face Hubãå¦æä½ çç½ç»ç¯å¢è¾å·®ï¼ä¸è½½æ¨¡ååæ°å¯è½ä¼è±è´¹è¾é¿æ¶é´çè³å¤±è´¥ãæ¤æ¶å¯ä»¥å
å°æ¨¡åä¸è½½å°æ¬å°ï¼ç¶åä»æ¬å°å è½½ã
ä» Hugging Face Hub ä¸è½½æ¨¡åéè¦å å®è£ Git LFSï¼ç¶åè¿è¡
git clone https://huggingface.co/THUDM/chatglm-6b
å¦æä½ ä» Hugging Face Hub ä¸ä¸è½½ checkpoint çé度è¾æ ¢ï¼å¯ä»¥åªä¸è½½æ¨¡åå®ç°
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm-6b
ç¶åä»è¿éæå¨ä¸è½½æ¨¡ååæ°æ件ï¼å¹¶å°ä¸è½½çæ件æ¿æ¢å°æ¬å°ç chatglm-6b
ç®å½ä¸ã
å°æ¨¡åä¸è½½å°æ¬å°ä¹åï¼å°ä»¥ä¸ä»£ç ä¸ç THUDM/chatglm-6b
æ¿æ¢ä¸ºä½ æ¬å°ç chatglm-6b
æ件夹çè·¯å¾ï¼å³å¯ä»æ¬å°å 载模åã
Optional 模åçå®ç°ä»ç¶å¤å¨åå¨ä¸ãå¦æå¸æåºå®ä½¿ç¨ç模åå®ç°ä»¥ä¿è¯å ¼å®¹æ§ï¼å¯ä»¥æ§è¡
git checkout v1.1.0
Demo & API
æ们æä¾äºä¸ä¸ªåºäº Gradio çç½é¡µç Demo åä¸ä¸ªå½ä»¤è¡ Demoã使ç¨æ¶é¦å éè¦ä¸è½½æ¬ä»åºï¼
git clone https://github.com/THUDM/ChatGLM-6B
cd ChatGLM-6B
ç½é¡µç Demo
é¦å
å®è£
Gradioï¼pip install gradio
ï¼ç¶åè¿è¡ä»åºä¸ç web_demo.pyï¼
python web_demo.py
ç¨åºä¼è¿è¡ä¸ä¸ª Web Serverï¼å¹¶è¾åºå°åãå¨æµè§å¨ä¸æå¼è¾åºçå°åå³å¯ä½¿ç¨ãææ°ç Demo å®ç°äºæåæºææï¼é度ä½éªå¤§å¤§æåã注æï¼ç±äºå½å
Gradio çç½ç»è®¿é®è¾ä¸ºç¼æ
¢ï¼å¯ç¨ demo.queue().launch(share=True, inbrowser=True)
æ¶ææç½ç»ä¼ç»è¿ Gradio æå¡å¨è½¬åï¼å¯¼è´æåæºä½éªå¤§å¹
ä¸éï¼ç°å¨é»è®¤å¯å¨æ¹å¼å·²ç»æ¹ä¸º share=False
ï¼å¦æéè¦å
¬ç½è®¿é®çéæ±ï¼å¯ä»¥éæ°ä¿®æ¹ä¸º share=True
å¯å¨ã
æè°¢ @AdamBear å®ç°äºåºäº Streamlit çç½é¡µç Demoï¼è¿è¡æ¹å¼è§#117.
å½ä»¤è¡ Demo
è¿è¡ä»åºä¸ cli_demo.pyï¼
python cli_demo.py
ç¨åºä¼å¨å½ä»¤è¡ä¸è¿è¡äº¤äºå¼ç对è¯ï¼å¨å½ä»¤è¡ä¸è¾å
¥æ示并å车å³å¯çæåå¤ï¼è¾å
¥ clear
å¯ä»¥æ¸
空对è¯åå²ï¼è¾å
¥ stop
ç»æ¢ç¨åºã
APIé¨ç½²
é¦å
éè¦å®è£
é¢å¤çä¾èµ pip install fastapi uvicorn
ï¼ç¶åè¿è¡ä»åºä¸ç api.pyï¼
python api.py
é»è®¤é¨ç½²å¨æ¬å°ç 8000 端å£ï¼éè¿ POST æ¹æ³è¿è¡è°ç¨
curl -X POST "http://127.0.0.1:8000" \
-H 'Content-Type: application/json' \
-d '{"prompt": "ä½ å¥½", "history": []}'
å¾å°çè¿åå¼ä¸º
{
"response":"ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM-6Bï¼å¾é«å
´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã",
"history":[["ä½ å¥½","ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM-6Bï¼å¾é«å
´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã"]],
"status":200,
"time":"2023-03-23 21:38:40"
}
ä½ææ¬é¨ç½²
模åéå
é»è®¤æ åµä¸ï¼æ¨¡å以 FP16 精度å è½½ï¼è¿è¡ä¸è¿°ä»£ç éè¦å¤§æ¦ 13GB æ¾åãå¦æä½ ç GPU æ¾åæéï¼å¯ä»¥å°è¯ä»¥éåæ¹å¼å 载模åï¼ä½¿ç¨æ¹æ³å¦ä¸ï¼
# æéä¿®æ¹ï¼ç®ååªæ¯æ 4/8 bit éå
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).quantize(8).half().cuda()
è¿è¡ 2 è³ 3 轮对è¯åï¼8-bit éåä¸ GPU æ¾åå ç¨çº¦ä¸º 10GBï¼4-bit éåä¸ä» é 6GB å ç¨ãéç对è¯è½®æ°çå¢å¤ï¼å¯¹åºæ¶èæ¾åä¹éä¹å¢é¿ï¼ç±äºéç¨äºç¸å¯¹ä½ç½®ç¼ç ï¼çè®ºä¸ ChatGLM-6B æ¯ææ éé¿ç context-lengthï¼ä½æ»é¿åº¦è¶ è¿ 2048ï¼è®ç»é¿åº¦ï¼åæ§è½ä¼éæ¸ä¸éã
模åéåä¼å¸¦æ¥ä¸å®çæ§è½æ失ï¼ç»è¿æµè¯ï¼ChatGLM-6B å¨ 4-bit éåä¸ä»ç¶è½å¤è¿è¡èªç¶æµç ççæãä½¿ç¨ GPT-Q çéåæ¹æ¡å¯ä»¥è¿ä¸æ¥å缩éå精度/æåç¸åéå精度ä¸ç模åæ§è½ï¼æ¬¢è¿å¤§å®¶æåºå¯¹åºç Pull Requestã
éåè¿ç¨éè¦å¨å åä¸é¦å å è½½ FP16 æ ¼å¼ç模åï¼æ¶èå¤§æ¦ 13GB çå åãå¦æä½ çå åä¸è¶³çè¯ï¼å¯ä»¥ç´æ¥å è½½éååç模åï¼INT4 éååç模åä» éå¤§æ¦ 5.2GB çå åï¼
# INT8 éåç模åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).half().cuda()
éå模åçåæ°æ件ä¹å¯ä»¥ä»è¿éæå¨ä¸è½½ã
CPU é¨ç½²
å¦æä½ æ²¡æ GPU 硬件çè¯ï¼ä¹å¯ä»¥å¨ CPU ä¸è¿è¡æ¨çï¼ä½æ¯æ¨çé度ä¼æ´æ ¢ã使ç¨æ¹æ³å¦ä¸ï¼éè¦å¤§æ¦ 32GB å åï¼
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).float()
å¦æä½ çå åä¸è¶³ï¼å¯ä»¥ç´æ¥å è½½éååç模åï¼
# INT8 éåç模åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()
å¦æéå°äºæ¥é Could not find module 'nvcuda.dll'
æè
RuntimeError: Unknown platform: darwin
(MacOS) ï¼è¯·ä»æ¬å°å 载模å
Mac é¨ç½²
对äºæè½½äº Apple Silicon æè AMD GPU çMacï¼å¯ä»¥ä½¿ç¨ MPS å端æ¥å¨ GPU ä¸è¿è¡ ChatGLM-6Bãéè¦åè Apple ç å®æ¹è¯´æ å®è£ PyTorch-Nightlyï¼æ£ç¡®ççæ¬å·åºè¯¥æ¯2.1.0.dev2023xxxxï¼èä¸æ¯2.0.0ï¼ã
ç®åå¨ MacOS ä¸åªæ¯æä»æ¬å°å 载模åãå°ä»£ç ä¸ç模åå è½½æ¹ä¸ºä»æ¬å°å è½½ï¼å¹¶ä½¿ç¨ mps å端ï¼
model = AutoModel.from_pretrained("your local path", trust_remote_code=True).half().to('mps')
å è½½å精度ç ChatGLM-6B 模åéè¦å¤§æ¦ 13GB å åãå åè¾å°çæºå¨ï¼æ¯å¦ 16GB å åç MacBook Proï¼ï¼å¨ç©ºä½å åä¸è¶³çæ åµä¸ä¼ä½¿ç¨ç¡¬çä¸çèæå åï¼å¯¼è´æ¨çé度严éåæ ¢ãæ¤æ¶å¯ä»¥ä½¿ç¨éååç模åå¦ chatglm-6b-int4ãå 为 GPU ä¸éåç kernel æ¯ä½¿ç¨ CUDA ç¼åçï¼å æ¤æ æ³å¨ MacOS ä¸ä½¿ç¨ï¼åªè½ä½¿ç¨ CPU è¿è¡æ¨çã
# INT8 éåç模åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()
为äºå åä½¿ç¨ CPU 并è¡ï¼è¿éè¦åç¬å®è£ OpenMPã
å¤å¡é¨ç½²
å¦æä½ æå¤å¼ GPUï¼ä½æ¯æ¯å¼ GPU çæ¾å大å°é½ä¸è¶³ä»¥å®¹çº³å®æ´ç模åï¼é£ä¹å¯ä»¥å°æ¨¡åååå¨å¤å¼ GPUä¸ãé¦å
å®è£
accelerate: pip install accelerate
ï¼ç¶åéè¿å¦ä¸æ¹æ³å 载模åï¼
from utils import load_model_on_gpus
model = load_model_on_gpus("THUDM/chatglm-6b", num_gpus=2)
å³å¯å°æ¨¡åé¨ç½²å°ä¸¤å¼ GPU ä¸è¿è¡æ¨çãä½ å¯ä»¥å° num_gpus
æ¹ä¸ºä½ å¸æ使ç¨ç GPU æ°ãé»è®¤æ¯ååååçï¼ä½ ä¹å¯ä»¥ä¼ å
¥ device_map
åæ°æ¥èªå·±æå®ã
é«æåæ°å¾®è°
åºäº P-tuning v2 çé«æåæ°å¾®è°ãå ·ä½ä½¿ç¨æ¹æ³è¯¦è§ ptuning/README.mdã
ChatGLM-6B 示ä¾
以ä¸æ¯ä¸äºä½¿ç¨ web_demo.py
å¾å°ç示ä¾æªå¾ãæ´å¤ ChatGLM-6B çå¯è½ï¼çå¾
ä½ æ¥æ¢ç´¢åç°ï¼
èªæ认ç¥
æ纲åä½
ææ¡åä½
é®ä»¶åä½å©æ
ä¿¡æ¯æ½å
è§è²æ®æ¼
è¯è®ºæ¯è¾
æ 游å导
å±éæ§
ç±äº ChatGLM-6B çå°è§æ¨¡ï¼å ¶è½åä»ç¶æ许å¤å±éæ§ã以ä¸æ¯æ们ç®ååç°çä¸äºé®é¢ï¼
-
模å容éè¾å°ï¼6B çå°å®¹éï¼å³å®äºå ¶ç¸å¯¹è¾å¼±ç模åè®°å¿åè¯è¨è½åãå¨é¢å¯¹è®¸å¤äºå®æ§ç¥è¯ä»»å¡æ¶ï¼ChatGLM-6B å¯è½ä¼çæä¸æ£ç¡®çä¿¡æ¯ï¼å®ä¹ä¸æ é¿é»è¾ç±»é®é¢ï¼å¦æ°å¦ãç¼ç¨ï¼ç解çã
ç¹å»æ¥çä¾å
-
产çæ害说æææåè§çå 容ï¼ChatGLM-6B åªæ¯ä¸ä¸ªåæ¥ä¸äººç±»æå¾å¯¹é½çè¯è¨æ¨¡åï¼å¯è½ä¼çææ害ãæåè§çå 容ãï¼å 容å¯è½å ·æåç¯æ§ï¼æ¤å¤ä¸å±ç¤ºï¼
-
è±æè½åä¸è¶³ï¼ChatGLM-6B è®ç»æ¶ä½¿ç¨çæ示/åç大é¨åé½æ¯ä¸æçï¼ä» ææå°ä¸é¨åè±æå 容ãå æ¤ï¼å¦æè¾å ¥è±ææ示ï¼åå¤çè´¨éè¿ä¸å¦ä¸æï¼çè³ä¸ä¸ææ示ä¸çå 容çç¾ï¼å¹¶ä¸åºç°ä¸è±å¤¹æçæ åµã
-
æ被误导ï¼å¯¹è¯è½åè¾å¼±ï¼ChatGLM-6B 对è¯è½åè¿æ¯è¾å¼±ï¼èä¸ âèªæ认ç¥â åå¨é®é¢ï¼å¹¶å¾å®¹æ被误导并产çé误çè¨è®ºãä¾å¦å½åçæ¬ç模åå¨è¢«è¯¯å¯¼çæ åµä¸ï¼ä¼å¨èªæ认ç¥ä¸åçåå·®ã
ç¹å»æ¥çä¾å
åè®®
æ¬ä»åºç代ç ä¾ç § Apache-2.0 åè®®å¼æºï¼ChatGLM-6B 模åçæéç使ç¨åéè¦éµå¾ª Model LicenseãChatGLM-6B æé对å¦æ¯ç 究å®å ¨å¼æ¾ï¼å¨å¡«åé®å·è¿è¡ç»è®°å**亦å 许å è´¹åä¸ä½¿ç¨**ã
å¼ç¨
å¦æä½ è§å¾æ们çå·¥ä½æ帮å©çè¯ï¼è¯·èèå¼ç¨ä¸å论æ
@misc{glm2024chatglm,
title={ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools},
author={Team GLM and Aohan Zeng and Bin Xu and Bowen Wang and Chenhui Zhang and Da Yin and Diego Rojas and Guanyu Feng and Hanlin Zhao and Hanyu Lai and Hao Yu and Hongning Wang and Jiadai Sun and Jiajie Zhang and Jiale Cheng and Jiayi Gui and Jie Tang and Jing Zhang and Juanzi Li and Lei Zhao and Lindong Wu and Lucen Zhong and Mingdao Liu and Minlie Huang and Peng Zhang and Qinkai Zheng and Rui Lu and Shuaiqi Duan and Shudan Zhang and Shulin Cao and Shuxun Yang and Weng Lam Tam and Wenyi Zhao and Xiao Liu and Xiao Xia and Xiaohan Zhang and Xiaotao Gu and Xin Lv and Xinghan Liu and Xinyi Liu and Xinyue Yang and Xixuan Song and Xunkai Zhang and Yifan An and Yifan Xu and Yilin Niu and Yuantao Yang and Yueyan Li and Yushi Bai and Yuxiao Dong and Zehan Qi and Zhaoyu Wang and Zhen Yang and Zhengxiao Du and Zhenyu Hou and Zihan Wang},
year={2024},
eprint={2406.12793},
archivePrefix={arXiv},
primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}
Top Related Projects
Robust Speech Recognition via Large-Scale Weak Supervision
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
TensorFlow code and pre-trained models for BERT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot