Top Related Projects
Robust Speech Recognition via Large-Scale Weak Supervision
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Inference code for Llama models
TensorFlow code and pre-trained models for BERT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Quick Overview
ChatGLM3 is an open-source bilingual (Chinese-English) chat model developed by Tsinghua University. It is the latest iteration in the ChatGLM series, featuring improved performance, expanded capabilities, and enhanced safety measures. ChatGLM3 aims to provide a powerful, flexible, and responsible foundation for various natural language processing tasks.
Pros
- Advanced bilingual capabilities in Chinese and English
- Improved performance and expanded knowledge base compared to previous versions
- Enhanced safety features and ethical considerations
- Open-source nature allows for community contributions and customization
Cons
- May require significant computational resources for optimal performance
- Limited support for languages other than Chinese and English
- Potential biases inherent in large language models
- Ongoing development may lead to frequent updates and changes
Code Examples
# Loading the model
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
# Generating a response
response, history = model.chat(tokenizer, "What is the capital of France?", history=[])
print(response)
# Multi-turn conversation
history = []
for query in ["Hello!", "What's the weather like today?", "Thank you!"]:
response, history = model.chat(tokenizer, query, history=history)
print(f"User: {query}")
print(f"ChatGLM3: {response}\n")
Getting Started
To get started with ChatGLM3, follow these steps:
-
Install the required dependencies:
pip install transformers torch
-
Load the model and tokenizer:
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True) model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
-
Start a conversation:
response, history = model.chat(tokenizer, "Hello! How can you help me today?", history=[]) print(response)
For more detailed information and advanced usage, refer to the official documentation in the GitHub repository.
Competitor Comparisons
Robust Speech Recognition via Large-Scale Weak Supervision
Pros of Whisper
- Specialized for speech recognition and transcription tasks
- Supports multiple languages and can perform translation
- Well-documented with extensive examples and pre-trained models
Cons of Whisper
- Limited to audio processing, not a general-purpose language model
- Requires significant computational resources for real-time transcription
- Less flexible for customization compared to ChatGLM3
Code Comparison
ChatGLM3:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello", history=[])
Whisper:
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
ChatGLM3 is a general-purpose language model that can be used for various NLP tasks, while Whisper is specifically designed for speech recognition and transcription. ChatGLM3 offers more flexibility in terms of language understanding and generation, whereas Whisper excels in audio processing tasks. The code examples demonstrate the different use cases and implementation approaches for each project.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- Highly optimized for distributed training and inference of large models
- Supports a wide range of AI frameworks and model architectures
- Offers advanced features like ZeRO optimizer and 3D parallelism
Cons of DeepSpeed
- Steeper learning curve due to its complexity and advanced features
- Requires more setup and configuration compared to ChatGLM3
- May be overkill for smaller projects or single-GPU setups
Code Comparison
ChatGLM3:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
DeepSpeed:
import deepspeed
import torch
model = MyModel()
engine = deepspeed.initialize(model=model, config_params=ds_config)
output = engine(torch.randn(batch_size, seq_len))
The ChatGLM3 example shows straightforward model loading, while the DeepSpeed example demonstrates initialization with custom configurations for optimized training.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of transformers
- Broader scope: Supports a wide range of NLP tasks and models
- Extensive documentation and community support
- Regular updates and contributions from the open-source community
Cons of transformers
- Larger codebase, potentially more complex for beginners
- May require more setup and configuration for specific tasks
Code comparison
ChatGLM3:
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).cuda()
response, history = model.chat(tokenizer, "Hello", history=[])
transformers:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs)
The code snippets demonstrate that ChatGLM3 is more focused on chat-based interactions, while transformers provides a more general approach to language model usage. ChatGLM3 requires trust_remote_code=True, which may have security implications. transformers offers a more standardized API across different models.
Inference code for Llama models
Pros of Llama
- Larger model with more parameters, potentially offering better performance on complex tasks
- Developed by Meta, benefiting from extensive resources and research expertise
- Broader language support and more diverse training data
Cons of Llama
- More resource-intensive, requiring higher computational power for deployment
- Less optimized for Chinese language tasks compared to ChatGLM3
- Stricter licensing and usage restrictions
Code Comparison
ChatGLM3:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
Llama:
from transformers import LlamaTokenizer, LlamaForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b")
model = LlamaForCausalLM.from_pretrained("meta-llama/Llama-2-7b")
Both repositories provide powerful language models, but they cater to different use cases. ChatGLM3 is more focused on Chinese language tasks and offers easier deployment, while Llama provides a larger, more versatile model with potentially better performance on complex tasks across multiple languages.
TensorFlow code and pre-trained models for BERT
Pros of BERT
- Widely adopted and well-documented, with extensive research and community support
- Versatile pre-trained model applicable to various NLP tasks
- Relatively lightweight and computationally efficient
Cons of BERT
- Limited context window size compared to more recent models
- Less capable of generating human-like text or engaging in open-ended conversations
- May struggle with tasks requiring more advanced reasoning or multi-turn interactions
Code Comparison
BERT example:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)
ChatGLM3 example:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello, my dog is cute", history=[])
Both repositories provide pre-trained language models, but ChatGLM3 is more focused on conversational AI and large language model capabilities, while BERT is a foundational model for various NLP tasks. ChatGLM3 offers more advanced features for dialogue generation and multi-turn interactions, whereas BERT excels in tasks like text classification and named entity recognition.
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Pros of minGPT
- Lightweight and easy to understand implementation of GPT
- Excellent educational resource for learning about transformer architecture
- Highly customizable and adaptable for various tasks
Cons of minGPT
- Limited in scale compared to ChatGLM3's more advanced capabilities
- Lacks multilingual support and advanced features present in ChatGLM3
- Not optimized for production use or large-scale deployment
Code Comparison
minGPT:
class GPT(nn.Module):
def __init__(self, config):
super().__init__()
self.tok_emb = nn.Embedding(config.vocab_size, config.n_embd)
self.pos_emb = nn.Parameter(torch.zeros(1, config.block_size, config.n_embd))
self.drop = nn.Dropout(config.embd_pdrop)
self.blocks = nn.Sequential(*[Block(config) for _ in range(config.n_layer)])
self.ln_f = nn.LayerNorm(config.n_embd)
self.head = nn.Linear(config.n_embd, config.vocab_size, bias=False)
ChatGLM3:
class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel):
def __init__(self, config: ChatGLMConfig):
super().__init__(config)
self.transformer = ChatGLMModel(config)
self.config = config
self.quantized = False
def forward(
self,
input_ids: Optional[torch.Tensor] = None,
position_ids: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
past_key_values: Optional[Tuple[torch.FloatTensor]] = None,
inputs_embeds: Optional[torch.Tensor] = None,
labels: Optional[torch.Tensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
) -> Union[Tuple, BaseModelOutputWithPast]:
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ChatGLM3
ð Report ⢠ð¤ HF Repo ⢠ð¤ ModelScope ⢠ð£ WiseModel ⢠ð Document ⢠𧰠OpenXLab ⢠ð¦ Twitter
ð å å ¥æ们ç Discord å 微信
ðå¨ chatglm.cn ä½éªæ´å¤§è§æ¨¡ç ChatGLM 模åã
ð å
³äºChatGLM3-6B
æ´ä¸ºè¯¦ç»ç使ç¨ä¿¡æ¯ï¼å¯ä»¥åè
GLM-4 å¼æºæ¨¡ååAPI
æ们已ç»åå¸ææ°ç GLM-4 模åï¼è¯¥æ¨¡åå¨å¤ä¸ªææ ä¸æäºæ°ççªç ´ï¼æ¨å¯ä»¥å¨ä»¥ä¸ä¸¤ä¸ªæ¸ éä½éªæ们çææ°æ¨¡åã
-
GLM-4 å¼æºæ¨¡å æ们已ç»å¼æºäº GLM-4-9B ç³»å模åï¼å¨å项ææ çæµè¯ä¸æææ¾æåï¼æ¬¢è¿å°è¯ã
-
æºè°±æ¸ è¨ ä½éªææ°ç GLM-4ï¼å æ¬ GLMsï¼All toolsçåè½ã
-
APIå¹³å° æ°ä¸ä»£ API å¹³å°å·²ç»ä¸çº¿ï¼æ¨å¯ä»¥ç´æ¥å¨ API å¹³å°ä¸ä½éª
GLM-4-0520
ãGLM-4-air
ãGLM-4-airx
ãGLM-4-flash
ãGLM-4
ãGLM-3-Turbo
ãCharacterGLM-3
ï¼CogView-3
çæ°æ¨¡åã å ¶ä¸GLM-4
ãGLM-3-Turbo
两个模åæ¯æäºSystem Prompt
ãFunction Call
ãRetrieval
ãWeb_Search
çæ°åè½ï¼æ¬¢è¿ä½éªã -
GLM-4 API å¼æºæç¨ GLM-4 APIæç¨ååºç¡åºç¨ï¼æ¬¢è¿å°è¯ã APIç¸å ³é®é¢å¯ä»¥å¨æ¬å¼æºæç¨çé®ï¼æè ä½¿ç¨ GLM-4 API AIå©æ æ¥è·å¾å¸¸è§é®é¢ç帮å©ã
ChatGLM3 ä»ç»
ChatGLM3 æ¯æºè°±AIåæ¸ åå¤§å¦ KEG å®éªå®¤èååå¸ç对è¯é¢è®ç»æ¨¡åãChatGLM3-6B æ¯ ChatGLM3 ç³»åä¸çå¼æºæ¨¡åï¼å¨ä¿çäºå两代模å对è¯æµç ãé¨ç½²é¨æ§ä½çä¼å¤ä¼ç§ç¹æ§çåºç¡ä¸ï¼ChatGLM3-6B å¼å ¥äºå¦ä¸ç¹æ§ï¼
- æ´å¼ºå¤§çåºç¡æ¨¡åï¼ ChatGLM3-6B çåºç¡æ¨¡å ChatGLM3-6B-Base éç¨äºæ´å¤æ ·çè®ç»æ°æ®ãæ´å åçè®ç»æ¥æ°åæ´åççè®ç»çç¥ãå¨è¯ä¹ãæ°å¦ãæ¨çã代ç ãç¥è¯çä¸åè§åº¦çæ°æ®éä¸æµè¯æ¾ç¤ºï¼* *ChatGLM3-6B-Base å ·æå¨ 10B 以ä¸çåºç¡æ¨¡åä¸æ强çæ§è½**ã
- æ´å®æ´çåè½æ¯æï¼ ChatGLM3-6B éç¨äºå ¨æ°è®¾è®¡ç Prompt æ ¼å¼ ï¼é¤æ£å¸¸çå¤è½®å¯¹è¯å¤ãåæ¶åçæ¯æå·¥å ·è°ç¨ï¼Function Callï¼ã代ç æ§è¡ï¼Code Interpreterï¼å Agent ä»»å¡çå¤æåºæ¯ã
- æ´å ¨é¢çå¼æºåºåï¼ é¤äºå¯¹è¯æ¨¡å ChatGLM3-6B å¤ï¼è¿å¼æºäºåºç¡æ¨¡å ChatGLM3-6B-Base ãé¿ææ¬å¯¹è¯æ¨¡å ChatGLM3-6B-32K åè¿ä¸æ¥å¼ºåäºå¯¹äºé¿ææ¬ç解è½åç ChatGLM3-6B-128Kã以ä¸æææé对å¦æ¯ç 究å®å ¨å¼æ¾ ï¼å¨å¡«å é®å· è¿è¡ç»è®°å**亦å 许å è´¹åä¸ä½¿ç¨**ã
ChatGLM3 å¼æºæ¨¡åæ¨å¨ä¸å¼æºç¤¾åºä¸èµ·æ¨å¨å¤§æ¨¡åææ¯åå±ï¼æ³è¯·å¼åè å大家éµå® å¼æºåè®® ï¼å¿å°å¼æºæ¨¡åå代ç ååºäºå¼æºé¡¹ç®äº§ççè¡çç©ç¨äºä»»ä½å¯è½ç»å½å®¶å社ä¼å¸¦æ¥å±å®³çç¨é以åç¨äºä»»ä½æªç»è¿å®å ¨è¯ä¼°åå¤æ¡çæå¡ãç®åï¼æ¬é¡¹ç®å¢éæªåºäº ChatGLM3 å¼æºæ¨¡åå¼åä»»ä½åºç¨ï¼å æ¬ç½é¡µç«¯ãå®åãè¹æ iOS å Windows App çåºç¨ã
尽管模åå¨è®ç»çå个é¶æ®µé½å°½åç¡®ä¿æ°æ®çåè§æ§ååç¡®æ§ï¼ä½ç±äº ChatGLM3-6B 模åè§æ¨¡è¾å°ï¼ä¸æ¨¡ååæ¦çéæºæ§å ç´ å½±åï¼æ æ³ä¿è¯è¾åºå 容çåç¡®ãåæ¶æ¨¡åçè¾åºå®¹æ被ç¨æ·çè¾å ¥è¯¯å¯¼ã* æ¬é¡¹ç®ä¸æ¿æ å¼æºæ¨¡åå代ç 导è´çæ°æ®å®å ¨ãèæ é£é©æåçä»»ä½æ¨¡å被误导ã滥ç¨ãä¼ æãä¸å½å©ç¨è产ççé£é©å责任ã*
模åå表
Model | Seq Length | Download |
---|---|---|
ChatGLM3-6B | 8k | HuggingFace | ModelScope | WiseModel | OpenXLab |
ChatGLM3-6B-Base | 8k | HuggingFace | ModelScope | WiseModel | OpenXLabl |
ChatGLM3-6B-32K | 32k | HuggingFace | ModelScope | WiseModel | OpenXLab |
ChatGLM3-6B-128K | 128k | HuggingFace ï½ ModelScope| OpenXLab |
请注æï¼ææ模åçææ°æ´æ°é½ä¼å¨ Huggingface çå
åå¸ã ModelScope å WiseModel ç±äºæ²¡æä¸ Huggingface åæ¥ï¼éè¦å¼å人åæå¨æ´æ°ï¼å¯è½ä¼å¨
Huggingface æ´æ°åä¸æ®µæ¶é´å
åæ¥æ´æ°ã
åæ é¾æ¥
以ä¸ä¼ç§å¼æºä»åºå·²ç»å¯¹ ChatGLM3-6B 模å深度æ¯æï¼æ¬¢è¿å¤§å®¶æ©å±å¦ä¹ ã
æ¨çå éï¼
- chatglm.cpp: 类似 llama.cpp çéåå éæ¨çæ¹æ¡ï¼å®ç°ç¬è®°æ¬ä¸å®æ¶å¯¹è¯
- ChatGLM3-TPU: éç¨TPUå éæ¨çæ¹æ¡ï¼å¨ç®è½ç«¯ä¾§è¯çBM1684Xï¼16T@FP16ï¼å å16Gï¼ä¸å®æ¶è¿è¡çº¦7.5 token/s
- TensorRT-LLM: NVIDIAå¼åçé«æ§è½ GPU å éæ¨çæ¹æ¡ï¼å¯ä»¥åèæ¤ æ¥éª¤ é¨ç½² ChatGLM3-6B 模å
- OpenVINO: Intel å¼åçé«æ§è½ CPU å GPU å éæ¨çæ¹æ¡ï¼å¯ä»¥åèæ¤ æ¥éª¤ é¨ç½² ChatGLM3-6B 模å
é«æå¾®è°ï¼
- LLaMA-Factory: ä¼ç§æä¸æçé«æå¾®è°æ¡æ¶ã
åºç¨æ¡æ¶ï¼
-
LangChain-Chatchat: åºäº ChatGLM ç大è¯è¨æ¨¡åä¸ Langchain çåºç¨æ¡æ¶å®ç°ï¼å¼æºãå¯ç¦»çº¿é¨ç½²çæ£ç´¢å¢å¼ºçæ(RAG)大模åç¥è¯åºé¡¹ç®ã
-
BISHENG: å¼æºå¤§æ¨¡ååºç¨å¼åå¹³å°,èµè½åå é大模ååºç¨å¼åè½å°ï¼å¸®å©ç¨æ·ä»¥æä½³ä½éªè¿å ¥ä¸ä¸ä»£åºç¨å¼å模å¼ã
-
RAGFlow: RAGFlow æ¯ä¸æ¬¾åºäºæ·±åº¦ææ¡£ç解æ建çå¼æº RAGï¼Retrieval-Augmented Generationï¼å¼æãå¯ä¸ºåç§è§æ¨¡çä¼ä¸å个人æä¾ä¸å¥ç²¾ç®ç RAG å·¥ä½æµç¨ï¼ç»å大è¯è¨æ¨¡åï¼LLMï¼é对ç¨æ·åç±»ä¸åçå¤ææ ¼å¼æ°æ®æä¾å¯é çé®ç以åæçææ®çå¼ç¨ã
è¯æµç»æ
å ¸åä»»å¡
æ们éåäº 8 个ä¸è±æå ¸åæ°æ®éï¼å¨ ChatGLM3-6B (base) çæ¬ä¸è¿è¡äºæ§è½æµè¯ã
Model | GSM8K | MATH | BBH | MMLU | C-Eval | CMMLU | MBPP | AGIEval |
---|---|---|---|---|---|---|---|---|
ChatGLM2-6B-Base | 32.4 | 6.5 | 33.7 | 47.9 | 51.7 | 50.0 | - | - |
Best Baseline | 52.1 | 13.1 | 45.0 | 60.1 | 63.5 | 62.2 | 47.5 | 45.8 |
ChatGLM3-6B-Base | 72.3 | 25.7 | 66.1 | 61.4 | 69.0 | 67.5 | 52.4 | 53.7 |
Best Baseline æçæ¯æªæ¢ 2023å¹´10æ27æ¥ã模ååæ°å¨ 10B 以ä¸ãå¨å¯¹åºæ°æ®éä¸è¡¨ç°æ好çé¢è®ç»æ¨¡åï¼ä¸å æ¬åªé对æä¸é¡¹ä»»å¡è®ç»èæªä¿æéç¨è½åç模åã
对 ChatGLM3-6B-Base çæµè¯ä¸ï¼BBH éç¨ 3-shot æµè¯ï¼éè¦æ¨çç GSM8KãMATH éç¨ 0-shot CoT æµè¯ï¼MBPP éç¨ 0-shot çæåè¿è¡æµä¾è®¡ç® Pass@1 ï¼å ¶ä»éæ©é¢ç±»åæ°æ®éåéç¨ 0-shot æµè¯ã
æ们å¨å¤ä¸ªé¿ææ¬åºç¨åºæ¯ä¸å¯¹ ChatGLM3-6B-32K è¿è¡äºäººå·¥è¯ä¼°æµè¯ãä¸äºä»£æ¨¡åç¸æ¯ï¼å ¶ææå¹³åæåäºè¶ è¿ 50%ãå¨è®ºæé 读ãææ¡£æè¦åè´¢æ¥åæçåºç¨ä¸ï¼è¿ç§æå尤为æ¾èãæ¤å¤ï¼æ们è¿å¨ LongBench è¯æµéä¸å¯¹æ¨¡åè¿è¡äºæµè¯ï¼å ·ä½ç»æå¦ä¸è¡¨æ示
Model | å¹³å | Summary | Single-Doc QA | Multi-Doc QA | Code | Few-shot | Synthetic |
---|---|---|---|---|---|---|---|
ChatGLM2-6B-32K | 41.5 | 24.8 | 37.6 | 34.7 | 52.8 | 51.3 | 47.7 |
ChatGLM3-6B-32K | 50.2 | 26.6 | 45.8 | 46.1 | 56.2 | 61.2 | 65 |
使ç¨æ¹å¼
ç¯å¢å®è£
é¦å éè¦ä¸è½½æ¬ä»åºï¼
git clone https://github.com/THUDM/ChatGLM3
cd ChatGLM3
ç¶åä½¿ç¨ pip å®è£ ä¾èµï¼
pip install -r requirements.txt
- 为äºä¿è¯
torch
ççæ¬æ£ç¡®ï¼è¯·ä¸¥æ ¼æç § å®æ¹ææ¡£ ç说æå®è£ ã
综å Demo
æ们æä¾äºä¸ä¸ªéæ以ä¸ä¸ç§åè½ç综å Demoï¼è¿è¡æ¹æ³è¯·åè综å Demo
- Chat: 对è¯æ¨¡å¼ï¼å¨æ¤æ¨¡å¼ä¸å¯ä»¥ä¸æ¨¡åè¿è¡å¯¹è¯ã
- Tool: å·¥å ·æ¨¡å¼ï¼æ¨¡åé¤äºå¯¹è¯å¤ï¼è¿å¯ä»¥éè¿å·¥å ·è¿è¡å ¶ä»æä½ã

- Code Interpreter: 代ç 解éå¨æ¨¡å¼ï¼æ¨¡åå¯ä»¥å¨ä¸ä¸ª Jupyter ç¯å¢ä¸æ§è¡ä»£ç 并è·åç»æï¼ä»¥å®æå¤æä»»å¡ã

代ç è°ç¨
å¯ä»¥éè¿å¦ä¸ä»£ç è°ç¨ ChatGLM 模åæ¥çæ对è¯ï¼
>> from transformers import AutoTokenizer, AutoModel
>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
>> model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True, device='cuda')
>> model = model.eval()
>> response, history = model.chat(tokenizer, "ä½ å¥½", history=[])
>> print(response)
ä½ å¥½ð!ææ¯äººå·¥æºè½å©æ ChatGLM3 - 6B, å¾é«å
´è§å°ä½ , 欢è¿é®æä»»ä½é®é¢ã
>> response, history = model.chat(tokenizer, "æä¸ç¡ä¸çåºè¯¥æä¹å", history=history)
>> print(response)
æä¸ç¡ä¸çå¯è½ä¼è®©ä½ æå°ç¦èæä¸èæ, ä½ä»¥ä¸æ¯ä¸äºå¯ä»¥å¸®å©ä½ å
¥ç¡çæ¹æ³:
1.å¶å®è§å¾çç¡ç æ¶é´è¡¨: ä¿æè§å¾çç¡ç æ¶é´è¡¨å¯ä»¥å¸®å©ä½ 建ç«å¥åº·çç¡ç ä¹ æ¯, ä½¿ä½ æ´å®¹æå
¥ç¡ãå°½éå¨æ¯å¤©çç¸åæ¶é´ä¸åº, 并å¨åä¸æ¶é´èµ·åºã
2.åé ä¸ä¸ªèéçç¡ç ç¯å¢: ç¡®ä¿ç¡ç ç¯å¢èé, å®é, é»æä¸æ¸©åº¦éå®ãå¯ä»¥ä½¿ç¨èéçåºä¸ç¨å, 并ä¿ææ¿é´éé£ã
3.æ¾æ¾èº«å¿: å¨ç¡ååäºæ¾æ¾çæ´»å¨, ä¾å¦æ³¡ä¸ªç水澡, å¬äºè½»æçé³ä¹, é
读ä¸äºæ趣ç书ç±ç, æå©äºç¼è§£ç´§å¼ åç¦è, ä½¿ä½ æ´å®¹æå
¥ç¡ã
4.é¿å
饮ç¨å«æåå¡å ç饮æ: åå¡å æ¯ä¸ç§åºæ¿æ§ç©è´¨, ä¼å½±åä½ çç¡ç è´¨éãå°½éé¿å
å¨ç¡å饮ç¨å«æåå¡å ç饮æ, ä¾å¦åå¡, è¶åå¯ä¹ã
5.é¿å
å¨åºä¸åä¸ç¡ç æ å
³çäºæ
: å¨åºä¸åäºä¸ç¡ç æ å
³çäºæ
, ä¾å¦ççµå½±, ç©æ¸¸ææå·¥ä½ç, å¯è½ä¼å¹²æ°ä½ çç¡ç ã
6.å°è¯å¼å¸æå·§: æ·±å¼å¸æ¯ä¸ç§æ¾æ¾æå·§, å¯ä»¥å¸®å©ä½ ç¼è§£ç´§å¼ åç¦è, ä½¿ä½ æ´å®¹æå
¥ç¡ãè¯çæ
¢æ
¢å¸æ°, ä¿æå ç§é, ç¶åç¼æ
¢å¼æ°ã
å¦æè¿äºæ¹æ³æ æ³å¸®å©ä½ å
¥ç¡, ä½ å¯ä»¥èèå¨è¯¢å»çæç¡ç ä¸å®¶, 寻æ±è¿ä¸æ¥ç建议ã
ä»æ¬å°å 载模å
以ä¸ä»£ç ä¼ç± transformers
èªå¨ä¸è½½æ¨¡åå®ç°ååæ°ãå®æ´ç模åå®ç°å¨ Hugging Face Hub
ãå¦æä½ çç½ç»ç¯å¢è¾å·®ï¼ä¸è½½æ¨¡ååæ°å¯è½ä¼è±è´¹è¾é¿æ¶é´çè³å¤±è´¥ãæ¤æ¶å¯ä»¥å
å°æ¨¡åä¸è½½å°æ¬å°ï¼ç¶åä»æ¬å°å è½½ã
ä» Hugging Face Hub ä¸è½½æ¨¡åéè¦å å®è£ Git LFS ï¼ç¶åè¿è¡
git clone https://huggingface.co/THUDM/chatglm3-6b
å¦æä»ä½ ä» HuggingFace ä¸è½½æ¯è¾æ ¢ï¼ä¹å¯ä»¥ä» ModelScope ä¸ä¸è½½ã
模åå¾®è°
æ们æä¾äºä¸ä¸ªå¾®è° ChatGLM3-6B 模åçåºç¡å¥ä»¶ï¼å¯ä»¥ç¨æ¥å¾®è° ChatGLM3-6B 模åãå¾®è°å¥ä»¶ç使ç¨æ¹æ³è¯·åè å¾®è°å¥ä»¶ã
ç½é¡µçå¯¹è¯ Demo
å¯ä»¥éè¿ä»¥ä¸å½ä»¤å¯å¨åºäº Gradio çç½é¡µç demoï¼
python web_demo_gradio.py
å¯ä»¥éè¿ä»¥ä¸å½ä»¤å¯å¨åºäº Streamlit çç½é¡µç demoï¼
streamlit run web_demo_streamlit.py
ç½é¡µç demo ä¼è¿è¡ä¸ä¸ª Web Serverï¼å¹¶è¾åºå°åãå¨æµè§å¨ä¸æå¼è¾åºçå°åå³å¯ä½¿ç¨ã ç»æµè¯ï¼åºäº Streamlit çç½é¡µç Demo ä¼æ´æµç ã
å½ä»¤è¡å¯¹è¯ Demo
è¿è¡ä»åºä¸ cli_demo.pyï¼
python cli_demo.py
ç¨åºä¼å¨å½ä»¤è¡ä¸è¿è¡äº¤äºå¼ç对è¯ï¼å¨å½ä»¤è¡ä¸è¾å
¥æ示并å车å³å¯çæåå¤ï¼è¾å
¥ clear
å¯ä»¥æ¸
空对è¯åå²ï¼è¾å
¥ stop
ç»æ¢ç¨åºã
LangChain Demo
代ç å®ç°è¯·åè LangChain Demoã
å·¥å ·è°ç¨
å ³äºå·¥å ·è°ç¨çæ¹æ³è¯·åè å·¥å ·è°ç¨ã
OpenAI API / Zhipu API Demo
æ们已ç»æ¨åºäº OpenAI / ZhipuAI æ ¼å¼ç å¼æºæ¨¡å API é¨ç½²ä»£ç ï¼å¯ä»¥ä½ä¸ºä»»æåºäº ChatGPT çåºç¨çå端ã ç®åï¼å¯ä»¥éè¿è¿è¡ä»åºä¸ç api_server.py è¿è¡é¨ç½²
cd openai_api_demo
python api_server.py
åæ¶ï¼æ们ä¹ä¹¦åäºä¸ä¸ªç¤ºä¾ä»£ç ï¼ç¨æ¥æµè¯APIè°ç¨çæ§è½ã
-
OpenAI æµè¯èæ¬ï¼openai_api_request.py
-
ZhipuAI æµè¯èæ¬ï¼zhipu_api_request.py
-
使ç¨Curlè¿è¡æµè¯
-
chat Curl æµè¯
curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"system\", \"content\": \"You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.\"}, {\"role\": \"user\", \"content\": \"ä½ å¥½ï¼ç»æ讲ä¸ä¸ªæ
äºï¼å¤§æ¦100å\"}], \"stream\": false, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"
- Standard openai interface agent-chat Curl æµè¯
curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"user\", \"content\": \"37ä¹ä»¥8å 7é¤2çäºå¤å°ï¼\"}], "tools": [{"name": "track", "description": "追踪æå®è¡ç¥¨çå®æ¶ä»·æ ¼",
"parameters": {"type": "object", "properties": {"symbol": {"description": "éè¦è¿½è¸ªçè¡ç¥¨ä»£ç "}},
"required": []}},
{"name": "Calculator", "description": "æ°å¦è®¡ç®å¨ï¼è®¡ç®æ°å¦é®é¢",
"parameters": {"type": "object", "properties": {"symbol": {"description": "è¦è®¡ç®çæ°å¦å
¬å¼"}},
"required": []}}
], \"stream\": true, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"
- Openai style custom interface agent-chat Curl æµè¯ï¼ä½ éè¦å®ç°èªå®ä¹çå·¥å ·æè¿°èæ¬openai_api_demo/tools/schema.pyçå 容ï¼å¹¶ä¸å°api_server.pyä¸AGENT_CONTROLLERæå®ä¸º'true'ï¼ï¼
curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"user\", \"content\": \"37ä¹ä»¥8å 7é¤2çäºå¤å°ï¼\"}], \"stream\": true, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"
该æ¥å£ç¨äºopenaié£æ ¼çèªå®ä¹å·¥å ·ç®±çèªä¸»è°åº¦ãå ·æè°åº¦å¼å¸¸çèªå¤çåå¤è½åï¼æ éå¦å¤å®ç°è°åº¦ç®æ³ï¼ç¨æ·æ éapi_keyã
- 使ç¨Pythonè¿è¡æµè¯
cd openai_api_demo
python openai_api_request.py
å¦ææµè¯æåï¼å模ååºè¯¥è¿åä¸æ®µæ äºã
ä½ææ¬é¨ç½²
模åéå
é»è®¤æ åµä¸ï¼æ¨¡å以 FP16 精度å è½½ï¼è¿è¡ä¸è¿°ä»£ç éè¦å¤§æ¦ 13GB æ¾åãå¦æä½ ç GPU æ¾åæéï¼å¯ä»¥å°è¯ä»¥éåæ¹å¼å 载模åï¼ä½¿ç¨æ¹æ³å¦ä¸ï¼
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(4).cuda()
模åéåä¼å¸¦æ¥ä¸å®çæ§è½æ失ï¼ç»è¿æµè¯ï¼ChatGLM3-6B å¨ 4-bit éåä¸ä»ç¶è½å¤è¿è¡èªç¶æµç ççæã
CPU é¨ç½²
å¦æä½ æ²¡æ GPU 硬件çè¯ï¼ä¹å¯ä»¥å¨ CPU ä¸è¿è¡æ¨çï¼ä½æ¯æ¨çé度ä¼æ´æ ¢ã使ç¨æ¹æ³å¦ä¸ï¼éè¦å¤§æ¦ 32GB å åï¼
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).float()
Mac é¨ç½²
对äºæè½½äº Apple Silicon æè AMD GPU ç Macï¼å¯ä»¥ä½¿ç¨ MPS å端æ¥å¨ GPU ä¸è¿è¡ ChatGLM3-6Bãéè¦åè Apple ç å®æ¹è¯´æ å®è£ PyTorch-Nightlyï¼æ£ç¡®ççæ¬å·åºè¯¥æ¯2.x.x.dev2023xxxxï¼èä¸æ¯ 2.x.xï¼ã
ç®åå¨ MacOS ä¸åªæ¯æä»æ¬å°å 载模åãå°ä»£ç ä¸ç模åå è½½æ¹ä¸ºä»æ¬å°å è½½ï¼å¹¶ä½¿ç¨ mps å端ï¼
model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')
å è½½å精度ç ChatGLM3-6B 模åéè¦å¤§æ¦ 13GB å åãå åè¾å°çæºå¨ï¼æ¯å¦ 16GB å åç MacBook Proï¼ï¼å¨ç©ºä½å åä¸è¶³çæ åµä¸ä¼ä½¿ç¨ç¡¬çä¸çèæå åï¼å¯¼è´æ¨çé度严éåæ ¢ã
å¤å¡é¨ç½²
å¦æä½ æå¤å¼ GPUï¼ä½æ¯æ¯å¼ GPU çæ¾å大å°é½ä¸è¶³ä»¥å®¹çº³å®æ´ç模åï¼é£ä¹å¯ä»¥å°æ¨¡åååå¨å¤å¼ GPUä¸ãé¦å
å®è£
accelerate: pip install accelerate
ï¼ç¶åå³å¯æ£å¸¸å 载模åã
OpenVINO Demo
ChatGLM3-6B å·²ç»æ¯æä½¿ç¨ OpenVINO å·¥å ·å è¿è¡å éæ¨çï¼å¨è±ç¹å°çGPUåGPU设å¤ä¸æè¾å¤§æ¨çé度æåãå ·ä½ä½¿ç¨æ¹æ³è¯·åè OpenVINO Demoã
TensorRT-LLM Demo
ChatGLM3-6Bå·²ç»æ¯æä½¿ç¨ TensorRT-LLM å·¥å ·å è¿è¡å éæ¨çï¼æ¨¡åæ¨çé度å¾å°å¤åçæåãå ·ä½ä½¿ç¨æ¹æ³è¯·åè TensorRT-LLM Demo å å®æ¹ææ¯ææ¡£ã
å¼ç¨
å¦æä½ è§å¾æ们çå·¥ä½æ帮å©çè¯ï¼è¯·èèå¼ç¨ä¸å论æã
@misc{glm2024chatglm,
title={ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools},
author={Team GLM and Aohan Zeng and Bin Xu and Bowen Wang and Chenhui Zhang and Da Yin and Diego Rojas and Guanyu Feng and Hanlin Zhao and Hanyu Lai and Hao Yu and Hongning Wang and Jiadai Sun and Jiajie Zhang and Jiale Cheng and Jiayi Gui and Jie Tang and Jing Zhang and Juanzi Li and Lei Zhao and Lindong Wu and Lucen Zhong and Mingdao Liu and Minlie Huang and Peng Zhang and Qinkai Zheng and Rui Lu and Shuaiqi Duan and Shudan Zhang and Shulin Cao and Shuxun Yang and Weng Lam Tam and Wenyi Zhao and Xiao Liu and Xiao Xia and Xiaohan Zhang and Xiaotao Gu and Xin Lv and Xinghan Liu and Xinyi Liu and Xinyue Yang and Xixuan Song and Xunkai Zhang and Yifan An and Yifan Xu and Yilin Niu and Yuantao Yang and Yueyan Li and Yushi Bai and Yuxiao Dong and Zehan Qi and Zhaoyu Wang and Zhen Yang and Zhengxiao Du and Zhenyu Hou and Zihan Wang},
year={2024},
eprint={2406.12793},
archivePrefix={arXiv},
primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}
Top Related Projects
Robust Speech Recognition via Large-Scale Weak Supervision
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Inference code for Llama models
TensorFlow code and pre-trained models for BERT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot