ChatGLM3

ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型

13,716

1,602

13,716

View on GitHub

Top Related Projects

whisper

80,764

Robust Speech Recognition via Large-Scale Weak Supervision

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

bert

39,267

TensorFlow code and pre-trained models for BERT

minGPT

21,810

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Quick Overview

ChatGLM3 is an open-source bilingual (Chinese-English) chat model developed by Tsinghua University. It is the latest iteration in the ChatGLM series, featuring improved performance, expanded capabilities, and enhanced safety measures. ChatGLM3 aims to provide a powerful, flexible, and responsible foundation for various natural language processing tasks.

Pros

Advanced bilingual capabilities in Chinese and English
Improved performance and expanded knowledge base compared to previous versions
Enhanced safety features and ethical considerations
Open-source nature allows for community contributions and customization

Cons

May require significant computational resources for optimal performance
Limited support for languages other than Chinese and English
Potential biases inherent in large language models
Ongoing development may lead to frequent updates and changes

Code Examples

# Loading the model
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()

# Generating a response
response, history = model.chat(tokenizer, "What is the capital of France?", history=[])
print(response)

# Multi-turn conversation
history = []
for query in ["Hello!", "What's the weather like today?", "Thank you!"]:
    response, history = model.chat(tokenizer, query, history=history)
    print(f"User: {query}")
    print(f"ChatGLM3: {response}\n")

Getting Started

To get started with ChatGLM3, follow these steps:

Install the required dependencies:
```
pip install transformers torch
```

Load the model and tokenizer:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()

Start a conversation:

response, history = model.chat(tokenizer, "Hello! How can you help me today?", history=[])
print(response)

For more detailed information and advanced usage, refer to the official documentation in the GitHub repository.

Competitor Comparisons

whisper

80,764

Robust Speech Recognition via Large-Scale Weak Supervision

Pros of Whisper

Specialized for speech recognition and transcription tasks
Supports multiple languages and can perform translation
Well-documented with extensive examples and pre-trained models

Cons of Whisper

Limited to audio processing, not a general-purpose language model
Requires significant computational resources for real-time transcription
Less flexible for customization compared to ChatGLM3

Code Comparison

ChatGLM3:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello", history=[])

Whisper:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

ChatGLM3 is a general-purpose language model that can be used for various NLP tasks, while Whisper is specifically designed for speech recognition and transcription. ChatGLM3 offers more flexibility in terms of language understanding and generation, whereas Whisper excels in audio processing tasks. The code examples demonstrate the different use cases and implementation approaches for each project.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

Highly optimized for distributed training and inference of large models
Supports a wide range of AI frameworks and model architectures
Offers advanced features like ZeRO optimizer and 3D parallelism

Cons of DeepSpeed

Steeper learning curve due to its complexity and advanced features
Requires more setup and configuration compared to ChatGLM3
May be overkill for smaller projects or single-GPU setups

Code Comparison

ChatGLM3:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()

DeepSpeed:

import deepspeed
import torch

model = MyModel()
engine = deepspeed.initialize(model=model, config_params=ds_config)
output = engine(torch.randn(batch_size, seq_len))

The ChatGLM3 example shows straightforward model loading, while the DeepSpeed example demonstrates initialization with custom configurations for optimized training.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

Broader scope: Supports a wide range of NLP tasks and models
Extensive documentation and community support
Regular updates and contributions from the open-source community

Cons of transformers

Larger codebase, potentially more complex for beginners
May require more setup and configuration for specific tasks

Code comparison

ChatGLM3:

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).cuda()
response, history = model.chat(tokenizer, "Hello", history=[])

transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs)

The code snippets demonstrate that ChatGLM3 is more focused on chat-based interactions, while transformers provides a more general approach to language model usage. ChatGLM3 requires trust_remote_code=True, which may have security implications. transformers offers a more standardized API across different models.

llama

58,164

Inference code for Llama models

Pros of Llama

Larger model with more parameters, potentially offering better performance on complex tasks
Developed by Meta, benefiting from extensive resources and research expertise
Broader language support and more diverse training data

Cons of Llama

More resource-intensive, requiring higher computational power for deployment
Less optimized for Chinese language tasks compared to ChatGLM3
Stricter licensing and usage restrictions

Code Comparison

ChatGLM3:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()

Llama:

from transformers import LlamaTokenizer, LlamaForCausalLM

tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b")
model = LlamaForCausalLM.from_pretrained("meta-llama/Llama-2-7b")

Both repositories provide powerful language models, but they cater to different use cases. ChatGLM3 is more focused on Chinese language tasks and offers easier deployment, while Llama provides a larger, more versatile model with potentially better performance on complex tasks across multiple languages.

bert

39,267

TensorFlow code and pre-trained models for BERT

Pros of BERT

Widely adopted and well-documented, with extensive research and community support
Versatile pre-trained model applicable to various NLP tasks
Relatively lightweight and computationally efficient

Cons of BERT

Limited context window size compared to more recent models
Less capable of generating human-like text or engaging in open-ended conversations
May struggle with tasks requiring more advanced reasoning or multi-turn interactions

Code Comparison

BERT example:

from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

ChatGLM3 example:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()

response, history = model.chat(tokenizer, "Hello, my dog is cute", history=[])

Both repositories provide pre-trained language models, but ChatGLM3 is more focused on conversational AI and large language model capabilities, while BERT is a foundational model for various NLP tasks. ChatGLM3 offers more advanced features for dialogue generation and multi-turn interactions, whereas BERT excels in tasks like text classification and named entity recognition.

minGPT

21,810

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Pros of minGPT

Lightweight and easy to understand implementation of GPT
Excellent educational resource for learning about transformer architecture
Highly customizable and adaptable for various tasks

Cons of minGPT

Limited in scale compared to ChatGLM3's more advanced capabilities
Lacks multilingual support and advanced features present in ChatGLM3
Not optimized for production use or large-scale deployment

Code Comparison

minGPT:

class GPT(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.tok_emb = nn.Embedding(config.vocab_size, config.n_embd)
        self.pos_emb = nn.Parameter(torch.zeros(1, config.block_size, config.n_embd))
        self.drop = nn.Dropout(config.embd_pdrop)
        self.blocks = nn.Sequential(*[Block(config) for _ in range(config.n_layer)])
        self.ln_f = nn.LayerNorm(config.n_embd)
        self.head = nn.Linear(config.n_embd, config.vocab_size, bias=False)

ChatGLM3:

class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel):
    def __init__(self, config: ChatGLMConfig):
        super().__init__(config)
        self.transformer = ChatGLMModel(config)
        self.config = config
        self.quantized = False

    def forward(
        self,
        input_ids: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.Tensor] = None,
        attention_mask: Optional[torch.Tensor] = None,
        past_key_values: Optional[Tuple[torch.FloatTensor]] = None,
        inputs_embeds: Optional[torch.Tensor] = None,
        labels: Optional[torch.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, BaseModelOutputWithPast]:

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ChatGLM3

ð Report â¢ ð¤ HF Repo â¢ ð¤ ModelScope â¢ ð£ WiseModel â¢ ð Document â¢ ð§° OpenXLab â¢ ð¦ Twitter

ð å å¥æä»¬ç Discord å å¾®ä¿¡

ðå¨ chatglm.cn ä½éªæ´å¤§è§æ¨¡ç ChatGLM æ¨¡åã

Read this in English.

GLM-4 å¼æºæ¨¡ååAPI

GLM-4 å¼æºæ¨¡å æä»¬å·²ç»å¼æºäº GLM-4-9B ç³»åæ¨¡åï¼å¨åé¡¹ææ çæµè¯ä¸æææ¾æåï¼æ¬¢è¿å°è¯ã
æºè°±æ¸è¨ ä½éªææ°ç GLM-4ï¼åæ¬ GLMsï¼All toolsçåè½ã
APIå¹³å° æ°ä¸ä»£ API å¹³å°å·²ç»ä¸çº¿ï¼æ¨å¯ä»¥ç´æ¥å¨ API å¹³å°ä¸ä½éª GLM-4-0520ãGLM-4-airãGLM-4-airxãGLM-4-flashãGLM-4ãGLM-3-TurboãCharacterGLM-3ï¼CogView-3 çæ°æ¨¡åã å¶ä¸GLM-4ãGLM-3-Turboä¸¤ä¸ªæ¨¡åæ¯æäº System PromptãFunction Callã RetrievalãWeb_Searchçæ°åè½ï¼æ¬¢è¿ä½éªã
GLM-4 API å¼æºæç¨ GLM-4 APIæç¨ååºç¡åºç¨ï¼æ¬¢è¿å°è¯ã APIç¸å³é®é¢å¯ä»¥å¨æ¬å¼æºæç¨çé®ï¼æèä½¿ç¨ GLM-4 API AIå©æ æ¥è·å¾å¸¸è§é®é¢çå¸®å©ã

ChatGLM3 ä»ç»

æ´å¼ºå¤§çåºç¡æ¨¡åï¼ ChatGLM3-6B çåºç¡æ¨¡å ChatGLM3-6B-Base éç¨äºæ´å¤æ ·çè®ç»æ°æ®ãæ´ååçè®ç»æ¥æ°åæ´åççè®ç»çç¥ãå¨è¯ä¹ãæ°å¦ãæ¨çãä»£ç ãç¥è¯çä¸åè§åº¦çæ°æ®éä¸æµè¯æ¾ç¤ºï¼* *ChatGLM3-6B-Base å·æå¨ 10B ä»¥ä¸çåºç¡æ¨¡åä¸æå¼ºçæ§è½**ã
æ´å®æ´çåè½æ¯æï¼ ChatGLM3-6B éç¨äºå¨æ°è®¾è®¡ç Prompt æ ¼å¼ ï¼é¤æ£å¸¸çå¤è½®å¯¹è¯å¤ãåæ¶åçæ¯æå·¥å·è°ç¨ï¼Function Callï¼ãä»£ç æ§è¡ï¼Code Interpreterï¼å Agent ä»»å¡çå¤æåºæ¯ã
æ´å¨é¢çå¼æºåºåï¼ é¤äºå¯¹è¯æ¨¡å ChatGLM3-6B å¤ï¼è¿å¼æºäºåºç¡æ¨¡å ChatGLM3-6B-Base ãé¿ææ¬å¯¹è¯æ¨¡å ChatGLM3-6B-32K åè¿ä¸æ¥å¼ºåäºå¯¹äºé¿ææ¬çè§£è½åç ChatGLM3-6B-128Kãä»¥ä¸æææéå¯¹å¦æ¯ç ç©¶å®å¨å¼æ¾ ï¼å¨å¡«å é®å· è¿è¡ç»è®°å**äº¦åè®¸åè´¹åä¸ä½¿ç¨**ã

æ¨¡ååè¡¨

Model	Seq Length	Download
ChatGLM3-6B	8k	HuggingFace \| ModelScope \| WiseModel \| OpenXLab
ChatGLM3-6B-Base	8k	HuggingFace \| ModelScope \| WiseModel \| OpenXLabl
ChatGLM3-6B-32K	32k	HuggingFace \| ModelScope \| WiseModel \| OpenXLab
ChatGLM3-6B-128K	128k	HuggingFace ï½ ModelScope\| OpenXLab

è¯·æ³¨æï¼æææ¨¡åçææ°æ´æ°é½ä¼å¨ Huggingface çååå¸ã ModelScope å WiseModel ç±äºæ²¡æä¸ Huggingface åæ¥ï¼éè¦å¼åäººåæå¨æ´æ°ï¼å¯è½ä¼å¨ Huggingface æ´æ°åä¸æ®µæ¶é´ååæ¥æ´æ°ã

åæé¾æ¥

æ¨çå éï¼

chatglm.cpp: ç±»ä¼¼ llama.cpp çéåå éæ¨çæ¹æ¡ï¼å®ç°ç¬è®°æ¬ä¸å®æ¶å¯¹è¯
ChatGLM3-TPU: éç¨TPUå éæ¨çæ¹æ¡ï¼å¨ç®è½ç«¯ä¾§è¯çBM1684Xï¼16T@FP16ï¼åå16Gï¼ä¸å®æ¶è¿è¡çº¦7.5 token/s
TensorRT-LLM: NVIDIAå¼åçé«æ§è½ GPU å éæ¨çæ¹æ¡ï¼å¯ä»¥åèæ¤ æ¥éª¤ é¨ç½² ChatGLM3-6B æ¨¡å
OpenVINO: Intel å¼åçé«æ§è½ CPU å GPU å éæ¨çæ¹æ¡ï¼å¯ä»¥åèæ¤ æ¥éª¤ é¨ç½² ChatGLM3-6B æ¨¡å

é«æå¾®è°ï¼

LLaMA-Factory: ä¼ç§æä¸æçé«æå¾®è°æ¡æ¶ã

åºç¨æ¡æ¶ï¼

LangChain-Chatchat: åºäº ChatGLM çå¤§è¯è¨æ¨¡åä¸ Langchain çåºç¨æ¡æ¶å®ç°ï¼å¼æºãå¯ç¦»çº¿é¨ç½²çæ£ç´¢å¢å¼ºçæ(RAG)å¤§æ¨¡åç¥è¯åºé¡¹ç®ã
BISHENG: å¼æºå¤§æ¨¡ååºç¨å¼åå¹³å°,èµè½åå éå¤§æ¨¡ååºç¨å¼åè½å°ï¼å¸®å©ç¨æ·ä»¥æä½³ä½éªè¿å¥ä¸ä¸ä»£åºç¨å¼åæ¨¡å¼ã
RAGFlow: RAGFlow æ¯ä¸æ¬¾åºäºæ·±åº¦ææ¡£çè§£æå»ºçå¼æº RAGï¼Retrieval-Augmented Generationï¼å¼æãå¯ä¸ºåç§è§æ¨¡çä¼ä¸åä¸ªäººæä¾ä¸å¥ç²¾ç®ç RAG å·¥ä½æµç¨ï¼ç»åå¤§è¯è¨æ¨¡åï¼LLMï¼éå¯¹ç¨æ·åç±»ä¸åçå¤ææ ¼å¼æ°æ®æä¾å¯é çé®çä»¥åæçææ®çå¼ç¨ã

è¯æµç»æ

å¸åä»»å¡

Model	GSM8K	MATH	BBH	MMLU	C-Eval	CMMLU	MBPP	AGIEval
ChatGLM2-6B-Base	32.4	6.5	33.7	47.9	51.7	50.0	-	-
Best Baseline	52.1	13.1	45.0	60.1	63.5	62.2	47.5	45.8
ChatGLM3-6B-Base	72.3	25.7	66.1	61.4	69.0	67.5	52.4	53.7

Best Baseline æçæ¯æªæ¢ 2023å¹´10æ27æ¥ãæ¨¡ååæ°å¨ 10B ä»¥ä¸ãå¨å¯¹åºæ°æ®éä¸è¡¨ç°æå¥½çé¢è®ç»æ¨¡åï¼ä¸åæ¬åªéå¯¹æä¸é¡¹ä»»å¡è®ç»èæªä¿æéç¨è½åçæ¨¡åã

å¯¹ ChatGLM3-6B-Base çæµè¯ä¸ï¼BBH éç¨ 3-shot æµè¯ï¼éè¦æ¨çç GSM8KãMATH éç¨ 0-shot CoT æµè¯ï¼MBPP éç¨ 0-shot çæåè¿è¡æµä¾è®¡ç® Pass@1 ï¼å¶ä»éæ©é¢ç±»åæ°æ®éåéç¨ 0-shot æµè¯ã

Model	å¹³å	Summary	Single-Doc QA	Multi-Doc QA	Code	Few-shot	Synthetic
ChatGLM2-6B-32K	41.5	24.8	37.6	34.7	52.8	51.3	47.7
ChatGLM3-6B-32K	50.2	26.6	45.8	46.1	56.2	61.2	65

ä½¿ç¨æ¹å¼

ç¯å¢å®è£

é¦åéè¦ä¸è½½æ¬ä»åºï¼

git clone https://github.com/THUDM/ChatGLM3
cd ChatGLM3

ç¶åä½¿ç¨ pip å®è£ä¾èµï¼

pip install -r requirements.txt

ä¸ºäºä¿è¯ torch ççæ¬æ£ç¡®ï¼è¯·ä¸¥æ ¼æç§ å®æ¹ææ¡£ çè¯´æå®è£ã

ç»¼å Demo

Chat: å¯¹è¯æ¨¡å¼ï¼å¨æ¤æ¨¡å¼ä¸å¯ä»¥ä¸æ¨¡åè¿è¡å¯¹è¯ã
Tool: å·¥å·æ¨¡å¼ï¼æ¨¡åé¤äºå¯¹è¯å¤ï¼è¿å¯ä»¥éè¿å·¥å·è¿è¡å¶ä»æä½ã

Code Interpreter: ä»£ç è§£éå¨æ¨¡å¼ï¼æ¨¡åå¯ä»¥å¨ä¸ä¸ª Jupyter ç¯å¢ä¸æ§è¡ä»£ç å¹¶è·åç»æï¼ä»¥å®æå¤æä»»å¡ã

ä»£ç è°ç¨

å¯ä»¥éè¿å¦ä¸ä»£ç è°ç¨ ChatGLM æ¨¡åæ¥çæå¯¹è¯ï¼

>> from transformers import AutoTokenizer, AutoModel
>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
>> model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True, device='cuda')
>> model = model.eval()
>> response, history = model.chat(tokenizer, "ä½ å¥½", history=[])
>> print(response)

ä½ å¥½ð!ææ¯äººå·¥æºè½å©æ ChatGLM3 - 6B, å¾é«å´è§å°ä½ , æ¬¢è¿é®æä»»ä½é®é¢ã
>> response, history = model.chat(tokenizer, "æä¸ç¡ä¸çåºè¯¥æä¹å", history=history)
>> print(response)

æä¸ç¡ä¸çå¯è½ä¼è®©ä½ æå°ç¦èæä¸èæ, ä½ä»¥ä¸æ¯ä¸äºå¯ä»¥å¸®å©ä½ å¥ç¡çæ¹æ³:

1.å¶å®è§å¾çç¡ç æ¶é´è¡¨: ä¿æè§å¾çç¡ç æ¶é´è¡¨å¯ä»¥å¸®å©ä½ å»ºç«å¥åº·çç¡ç ä¹ æ¯, ä½¿ä½ æ´å®¹æå¥ç¡ãå°½éå¨æ¯å¤©çç¸åæ¶é´ä¸åº, å¹¶å¨åä¸æ¶é´èµ·åºã
2.åé ä¸ä¸ªèéçç¡ç ç¯å¢: ç¡®ä¿ç¡ç ç¯å¢èé, å®é, é»æä¸æ¸©åº¦éå®ãå¯ä»¥ä½¿ç¨èéçåºä¸ç¨å, å¹¶ä¿ææ¿é´éé£ã
3.æ¾æ¾èº«å¿: å¨ç¡ååäºæ¾æ¾çæ´»å¨, ä¾å¦æ³¡ä¸ªçæ°´æ¾¡, å¬äºè½»æçé³ä¹, éè¯»ä¸äºæè¶£çä¹¦ç±ç, æå©äºç¼è§£ç´§å¼ åç¦è, ä½¿ä½ æ´å®¹æå¥ç¡ã
4.é¿åé¥®ç¨å«æåå¡å çé¥®æ: åå¡å æ¯ä¸ç§åºæ¿æ§ç©è´¨, ä¼å½±åä½ çç¡ç è´¨éãå°½éé¿åå¨ç¡åé¥®ç¨å«æåå¡å çé¥®æ, ä¾å¦åå¡, è¶åå¯ä¹ã
5.é¿åå¨åºä¸åä¸ç¡ç æ å³çäºæ: å¨åºä¸åäºä¸ç¡ç æ å³çäºæ, ä¾å¦ççµå½±, ç©æ¸¸ææå·¥ä½ç, å¯è½ä¼å¹²æ°ä½ çç¡ç ã
6.å°è¯å¼å¸æå·§: æ·±å¼å¸æ¯ä¸ç§æ¾æ¾æå·§, å¯ä»¥å¸®å©ä½ ç¼è§£ç´§å¼ åç¦è, ä½¿ä½ æ´å®¹æå¥ç¡ãè¯çæ¢æ¢å¸æ°, ä¿æå ç§é, ç¶åç¼æ¢å¼æ°ã

å¦æè¿äºæ¹æ³æ æ³å¸®å©ä½ å¥ç¡, ä½ å¯ä»¥èèå¨è¯¢å»çæç¡ç ä¸å®¶, å¯»æ±è¿ä¸æ¥çå»ºè®®ã

ä»æ¬å°å è½½æ¨¡å

ä» Hugging Face Hub ä¸è½½æ¨¡åéè¦åå®è£Git LFS ï¼ç¶åè¿è¡

git clone https://huggingface.co/THUDM/chatglm3-6b

æ¨¡åå¾®è°

ç½é¡µçå¯¹è¯ Demo

web-demo å¯ä»¥éè¿ä»¥ä¸å½ä»¤å¯å¨åºäº Gradio çç½é¡µç demoï¼

python web_demo_gradio.py

web-demo

streamlit run web_demo_streamlit.py

å½ä»¤è¡å¯¹è¯ Demo

cli-demo

è¿è¡ä»åºä¸ cli_demo.pyï¼

python cli_demo.py

LangChain Demo

ä»£ç å®ç°è¯·åè LangChain Demoã

å·¥å·è°ç¨

å³äºå·¥å·è°ç¨çæ¹æ³è¯·åè å·¥å·è°ç¨ã

OpenAI API / Zhipu API Demo

cd openai_api_demo
python api_server.py

OpenAI æµè¯èæ¬ï¼openai_api_request.py
ZhipuAI æµè¯èæ¬ï¼zhipu_api_request.py
ä½¿ç¨Curlè¿è¡æµè¯
chat Curl æµè¯

curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"system\", \"content\": \"You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.\"}, {\"role\": \"user\", \"content\": \"ä½ å¥½ï¼ç»æè®²ä¸ä¸ªæäºï¼å¤§æ¦100å\"}], \"stream\": false, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"

Standard openai interface agent-chat Curl æµè¯

curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"user\", \"content\": \"37ä¹ä»¥8å 7é¤2çäºå¤å°ï¼\"}], "tools": [{"name": "track", "description": "è¿½è¸ªæå®è¡ç¥¨çå®æ¶ä»·æ ¼",
          "parameters": {"type": "object", "properties": {"symbol": {"description": "éè¦è¿½è¸ªçè¡ç¥¨ä»£ç "}},
                         "required": []}},
         {"name": "Calculator", "description": "æ°å¦è®¡ç®å¨ï¼è®¡ç®æ°å¦é®é¢",
          "parameters": {"type": "object", "properties": {"symbol": {"description": "è¦è®¡ç®çæ°å¦å¬å¼"}},
                         "required": []}}
         ], \"stream\": true, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"

Openai style custom interface agent-chat Curl æµè¯ï¼ä½ éè¦å®ç°èªå®ä¹çå·¥å·æè¿°èæ¬openai_api_demo/tools/schema.pyçåå®¹ï¼å¹¶ä¸å°api_server.pyä¸AGENT_CONTROLLERæå®ä¸º'true'ï¼ï¼

curl -X POST "http://127.0.0.1:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"chatglm3-6b\", \"messages\": [{\"role\": \"user\", \"content\": \"37ä¹ä»¥8å 7é¤2çäºå¤å°ï¼\"}], \"stream\": true, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"

ä½¿ç¨Pythonè¿è¡æµè¯

cd openai_api_demo
python openai_api_request.py

å¦ææµè¯æåï¼åæ¨¡ååºè¯¥è¿åä¸æ®µæäºã

ä½ææ¬é¨ç½²

æ¨¡åéå

model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(4).cuda()

CPU é¨ç½²

model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).float()

Mac é¨ç½²

å¯¹äºæè½½äº Apple Silicon æè AMD GPU ç Macï¼å¯ä»¥ä½¿ç¨ MPS åç«¯æ¥å¨ GPU ä¸è¿è¡ ChatGLM3-6Bãéè¦åè Apple ç å®æ¹è¯´æ å®è£ PyTorch-Nightlyï¼æ£ç¡®ççæ¬å·åºè¯¥æ¯2.x.x.dev2023xxxxï¼èä¸æ¯ 2.x.xï¼ã

model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')

å è½½åç²¾åº¦ç ChatGLM3-6B æ¨¡åéè¦å¤§æ¦ 13GB ååãååè¾å°çæºå¨ï¼æ¯å¦ 16GB ååç MacBook Proï¼ï¼å¨ç©ºä½ååä¸è¶³çæåµä¸ä¼ä½¿ç¨ç¡¬çä¸çèæååï¼å¯¼è´æ¨çéåº¦ä¸¥éåæ¢ã

å¤å¡é¨ç½²

OpenVINO Demo

ChatGLM3-6B å·²ç»æ¯æä½¿ç¨ OpenVINO å·¥å·åè¿è¡å éæ¨çï¼å¨è±ç¹å°çGPUåGPUè®¾å¤ä¸æè¾å¤§æ¨çéåº¦æåãå·ä½ä½¿ç¨æ¹æ³è¯·åè OpenVINO Demoã

TensorRT-LLM Demo

ChatGLM3-6Bå·²ç»æ¯æä½¿ç¨ TensorRT-LLM å·¥å·åè¿è¡å éæ¨çï¼æ¨¡åæ¨çéåº¦å¾å°å¤åçæåãå·ä½ä½¿ç¨æ¹æ³è¯·åè TensorRT-LLM Demo å å®æ¹ææ¯ææ¡£ã

å¼ç¨

@misc{glm2024chatglm,
      title={ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools}, 
      author={Team GLM and Aohan Zeng and Bin Xu and Bowen Wang and Chenhui Zhang and Da Yin and Diego Rojas and Guanyu Feng and Hanlin Zhao and Hanyu Lai and Hao Yu and Hongning Wang and Jiadai Sun and Jiajie Zhang and Jiale Cheng and Jiayi Gui and Jie Tang and Jing Zhang and Juanzi Li and Lei Zhao and Lindong Wu and Lucen Zhong and Mingdao Liu and Minlie Huang and Peng Zhang and Qinkai Zheng and Rui Lu and Shuaiqi Duan and Shudan Zhang and Shulin Cao and Shuxun Yang and Weng Lam Tam and Wenyi Zhao and Xiao Liu and Xiao Xia and Xiaohan Zhang and Xiaotao Gu and Xin Lv and Xinghan Liu and Xinyi Liu and Xinyue Yang and Xixuan Song and Xunkai Zhang and Yifan An and Yifan Xu and Yilin Niu and Yuantao Yang and Yueyan Li and Yushi Bai and Yuxiao Dong and Zehan Qi and Zhaoyu Wang and Zhen Yang and Zhengxiao Du and Zhenyu Hou and Zihan Wang},
      year={2024},
      eprint={2406.12793},
      archivePrefix={arXiv},
      primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of Whisper

Cons of Whisper

Code Comparison

Pros of DeepSpeed

Cons of DeepSpeed

Code Comparison

Pros of transformers

Cons of transformers

Code comparison

Pros of Llama

Cons of Llama

Code Comparison

Pros of BERT

Cons of BERT

Code Comparison

Pros of minGPT

Cons of minGPT

Code Comparison

Convert designs to code with AI

README

ChatGLM3

GLM-4 å¼æºæ¨¡ååAPI

ChatGLM3 ä»ç»

æ¨¡ååè¡¨

åæ é¾æ¥

è¯æµç»æ

å ¸åä»»å¡

ä½¿ç¨æ¹å¼

ç¯å¢å®è£

ç»¼å Demo

ä»£ç è°ç¨

ä»æ¬å°å è½½æ¨¡å

æ¨¡åå¾®è°

ç½é¡µçå¯¹è¯ Demo

å½ä»¤è¡å¯¹è¯ Demo

LangChain Demo

å·¥å ·è°ç¨

OpenAI API / Zhipu API Demo

ä½ææ¬é¨ç½²

æ¨¡åéå

CPU é¨ç½²

Mac é¨ç½²

å¤å¡é¨ç½²

OpenVINO Demo

TensorRT-LLM Demo

å¼ç¨

Top Related Projects

Convert designs to code with AI

GLM-4 å¼æºæ¨¡ååAPI

ChatGLM3 ä»ç»

æ¨¡ååè¡¨

åæé¾æ¥

è¯æµç»æ

å¸åä»»å¡

ä½¿ç¨æ¹å¼

ç¯å¢å®è£

ç»¼å Demo

ä»£ç è°ç¨

ä»æ¬å°å è½½æ¨¡å

æ¨¡åå¾®è°

ç½é¡µçå¯¹è¯ Demo

å½ä»¤è¡å¯¹è¯ Demo

å·¥å·è°ç¨

ä½ææ¬é¨ç½²

æ¨¡åéå

CPU é¨ç½²

Mac é¨ç½²

å¤å¡é¨ç½²

å¼ç¨