Top Related Projects
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Large-scale pretraining for dialogue
Quick Overview
ChatGLM-6B is an open-source, bilingual (Chinese and English) dialogue language model developed by Tsinghua University. It is based on General Language Model (GLM) architecture and has 6 billion parameters. The model aims to provide high-quality responses in both languages and can be deployed on consumer-grade GPUs.
Pros
- Bilingual support for Chinese and English
- Can run on consumer-grade GPUs with 6GB+ VRAM
- Open-source and free to use
- Provides good performance in various dialogue tasks
Cons
- Limited to 6 billion parameters, which may affect its performance compared to larger models
- May require fine-tuning for specific domain applications
- Documentation is primarily in Chinese, which could be a barrier for non-Chinese speakers
- Still in active development, so may have occasional instability or bugs
Code Examples
# Load the model
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
# Generate a response
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
# Continue the conversation
response, history = model.chat(tokenizer, "What's the capital of France?", history=history)
print(response)
Getting Started
To get started with ChatGLM-6B, follow these steps:
-
Install the required dependencies:
pip install transformers torch
-
Load the model and tokenizer:
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True) model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
-
Start a conversation:
response, history = model.chat(tokenizer, "Hello, how are you?", history=[]) print(response)
-
Continue the conversation by passing the
history
to subsequent calls:response, history = model.chat(tokenizer, "What's your name?", history=history) print(response)
Competitor Comparisons
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Pros of ChatGLM-6B
- Identical repository names make it challenging to identify unique advantages
- Both repositories likely contain the same codebase and features
Cons of ChatGLM-6B
- Duplicate repositories may lead to confusion for users and contributors
- Potential for inconsistent updates or maintenance between the two repositories
Code Comparison
Since both repositories appear to be identical, a code comparison is not applicable. However, here's a hypothetical example of what a code difference might look like if there were any:
# ChatGLM-6B
def process_input(text):
return text.lower()
# ChatGLM-6B (hypothetical difference)
def process_input(text):
return text.lower().strip()
In this example, the second repository might have an additional .strip()
method to remove leading and trailing whitespace. However, it's important to note that this is purely hypothetical, as the repositories appear to be identical.
Given the identical names and likely identical content, it's recommended to investigate further to determine if there's any actual difference between these repositories or if one is a fork of the other. Users should be cautious when choosing between them and consider factors such as update frequency, community engagement, and official documentation to decide which one to use.
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Pros of ChatGLM2-6B
- Improved performance and efficiency over its predecessor
- Enhanced support for long-form content generation
- Better handling of context and more coherent responses
Cons of ChatGLM2-6B
- May require more computational resources due to increased complexity
- Potential compatibility issues with existing integrations built for ChatGLM-6B
- Steeper learning curve for developers unfamiliar with the new architecture
Code Comparison
ChatGLM-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
ChatGLM2-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
The code snippets show that the basic usage remains similar between the two versions, with the main difference being the model name in the from_pretrained
method. This suggests that transitioning from ChatGLM-6B to ChatGLM2-6B should be relatively straightforward for existing projects, requiring minimal code changes.
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
Pros of FlagAI
- Broader scope: FlagAI is a comprehensive AI toolkit supporting various tasks beyond language models
- More extensive documentation and examples for different AI applications
- Active community and regular updates
Cons of FlagAI
- Steeper learning curve due to its broader scope and more complex architecture
- Potentially slower inference for specific language tasks compared to ChatGLM-6B
Code Comparison
ChatGLM-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "Hello", history=[])
FlagAI:
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor
loader = AutoLoader("seq2seq", "THUDM/chatglm-6b")
model = loader.get_model()
tokenizer = loader.get_tokenizer()
predictor = Predictor(model, tokenizer)
response = predictor.predict(["Hello"])
Both repositories provide easy-to-use interfaces for loading and using pre-trained language models. ChatGLM-6B focuses specifically on the ChatGLM model, while FlagAI offers a more generalized approach to working with various AI models and tasks.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Pros of GPT-NeoX
- Larger model size (20B parameters) potentially offering higher performance
- More extensive documentation and community support
- Designed for distributed training across multiple GPUs
Cons of GPT-NeoX
- Higher computational requirements for training and inference
- Less optimized for Chinese language tasks
- More complex setup and configuration process
Code Comparison
GPT-NeoX:
from transformers import GPTNeoXForCausalLM, GPTNeoXTokenizerFast
model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b")
tokenizer = GPTNeoXTokenizerFast.from_pretrained("EleutherAI/gpt-neox-20b")
ChatGLM-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
The code snippets show that GPT-NeoX uses specific model and tokenizer classes, while ChatGLM-6B uses more generic AutoTokenizer and AutoModel classes. ChatGLM-6B also includes additional parameters for remote code trust and GPU optimization.
Large-scale pretraining for dialogue
Pros of DialoGPT
- More extensive documentation and examples for implementation
- Larger community support and contributions
- Pre-trained on a diverse range of conversational data
Cons of DialoGPT
- Less focus on multilingual capabilities
- May require more fine-tuning for specific use cases
- Potentially higher computational requirements
Code Comparison
DialoGPT:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")
ChatGLM-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
Both repositories provide pre-trained language models for conversational AI. DialoGPT offers more extensive documentation and community support, making it easier for developers to implement and customize. However, ChatGLM-6B has a stronger focus on multilingual capabilities, particularly for Chinese language processing.
The code comparison shows that both models can be loaded using the Transformers library, with slight differences in the model class and additional parameters for ChatGLM-6B. DialoGPT uses AutoModelForCausalLM, while ChatGLM-6B uses AutoModel with the trust_remote_code=True
parameter.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ChatGLM-6B
ð Blog ⢠ð¤ HF Repo ⢠ð¦ Twitter ⢠ð Report
ð å å ¥æä»¬ç Discord å WeChat
ðå¨ æºè°±AI弿¾å¹³å° ä½éªåä½¿ç¨æ´å¤§è§æ¨¡ç GLM å䏿¨¡åã
Read this in English.
GLM-4 弿ºæ¨¡ååAPI
æä»¬å·²ç»å叿æ°ç GLM-4 大è¯è¨å¯¹è¯æ¨¡åï¼è¯¥æ¨¡åå¨å¤ä¸ªææ ä¸æäºæ°ççªç ´ï¼æ¨å¯ä»¥å¨ä»¥ä¸ä¸¤ä¸ªæ¸ éä½éªæä»¬çææ°æ¨¡åã
-
GLM-4 弿ºæ¨¡å æä»¬å·²ç»å¼æºäº GLM-4-9B ç³»åæ¨¡åï¼å¨åé¡¹ææ çceæ¯ä¸æææ¾æåï¼æ¬¢è¿å°è¯ã
-
æºè°±æ¸ è¨ ä½éªææ°ç GLM-4ï¼å æ¬ GLMsï¼All toolsçåè½ã
-
APIå¹³å° æ°ä¸ä»£ API å¹³å°å·²ç»ä¸çº¿ï¼æ¨å¯ä»¥ç´æ¥å¨ API å¹³å°ä¸ä½éª
GLM-4-0520
ãGLM-4-air
ãGLM-4-airx
ãGLM-4-flash
ãGLM-4
ãGLM-3-Turbo
ãCharacterGLM-3
ï¼CogView-3
çæ°æ¨¡åã å ¶ä¸GLM-4
ãGLM-3-Turbo
ä¸¤ä¸ªæ¨¡åæ¯æäºSystem Prompt
ãFunction Call
ãRetrieval
ãWeb_Search
çæ°åè½ï¼æ¬¢è¿ä½éªã -
GLM-4 API 弿ºæç¨ GLM-4 APIæç¨ååºç¡åºç¨ï¼æ¬¢è¿å°è¯ã APIç¸å ³é®é¢å¯ä»¥å¨æ¬å¼æºæç¨çé®ï¼æè ä½¿ç¨ GLM-4 API AI婿 æ¥è·å¾å¸¸è§é®é¢ç帮å©ã
ä»ç»
ChatGLM-6B æ¯ä¸ä¸ªå¼æºçãæ¯æä¸è±åè¯ç对è¯è¯è¨æ¨¡åï¼åºäº General Language Model (GLM) æ¶æï¼å ·æ 62 äº¿åæ°ãç»å模åéåææ¯ï¼ç¨æ·å¯ä»¥å¨æ¶è´¹çº§çæ¾å¡ä¸è¿è¡æ¬å°é¨ç½²ï¼INT4 éå级å«ä¸æä½åªé 6GB æ¾åï¼ã ChatGLM-6B 使ç¨äºå ChatGPT ç¸ä¼¼çææ¯ï¼é坹䏿é®çå对è¯è¿è¡äºä¼åãç»è¿çº¦ 1T æ è¯ç¬¦çä¸è±åè¯è®ç»ï¼è¾ 以çç£å¾®è°ãåé¦èªå©ã人类åé¦å¼ºåå¦ä¹ çææ¯çå æï¼62 äº¿åæ°ç ChatGLM-6B å·²ç»è½çæç¸å½ç¬¦å人类å好çåçï¼æ´å¤ä¿¡æ¯è¯·åèæä»¬çåå®¢ãæ¬¢è¿éè¿ chatglm.cn ä½éªæ´å¤§è§æ¨¡ç ChatGLM 模åã
ä¸ºäºæ¹ä¾¿ä¸æ¸¸å¼åè é对èªå·±çåºç¨åºæ¯å®å¶æ¨¡åï¼æä»¬åæ¶å®ç°äºåºäº P-Tuning v2 çé«æåæ°å¾®è°æ¹æ³ (ä½¿ç¨æå) ï¼INT4 éå级å«ä¸æä½åªé 7GB æ¾åå³å¯å¯å¨å¾®è°ã
ChatGLM-6B æé坹妿¯ç ç©¶å®å ¨å¼æ¾ï¼å¨å¡«åé®å·è¿è¡ç»è®°å**亦å 许å è´¹åä¸ä½¿ç¨**ã
ChatGLM-6B 弿ºæ¨¡åæ¨å¨ä¸å¼æºç¤¾åºä¸èµ·æ¨å¨å¤§æ¨¡åææ¯åå±ï¼æ³è¯·å¼åè å大家éµå®å¼æºåè®®ï¼å¿å°å¼æºæ¨¡åå代ç ååºäºå¼æºé¡¹ç®äº§ççè¡çç©ç¨äºä»»ä½å¯è½ç»å½å®¶å社ä¼å¸¦æ¥å±å®³çç¨é以åç¨äºä»»ä½æªç»è¿å®å ¨è¯ä¼°å夿¡çæå¡ãç®åï¼æ¬é¡¹ç®å¢éæªåºäº ChatGLM-6B å¼åä»»ä½åºç¨ï¼å æ¬ç½é¡µç«¯ãå®åãè¹æ iOS å Windows App çåºç¨ã
尽管模åå¨è®ç»çåä¸ªé¶æ®µé½å°½åç¡®ä¿æ°æ®çåè§æ§ååç¡®æ§ï¼ä½ç±äº ChatGLM-6B 模åè§æ¨¡è¾å°ï¼ä¸æ¨¡å忦çéæºæ§å ç´ å½±åï¼æ æ³ä¿è¯è¾åºå 容çåç¡®æ§ï¼ä¸æ¨¡åæè¢«è¯¯å¯¼ï¼è¯¦è§å±éæ§ï¼ãæ¬é¡¹ç®ä¸æ¿æ 弿ºæ¨¡åå代ç 导è´çæ°æ®å®å ¨ãèæ é£é©æåç任使¨¡åè¢«è¯¯å¯¼ãæ»¥ç¨ãä¼ æãä¸å½å©ç¨è产ççé£é©å责任ã
æ´æ°ä¿¡æ¯
[2023/07/25] åå¸ CodeGeeX2 ï¼åºäº ChatGLM2-6B ç代ç çææ¨¡åï¼ä»£ç è½åå ¨é¢æåï¼æ´å¤ç¹æ§å æ¬ï¼
- æ´å¼ºå¤§ç代ç è½åï¼CodeGeeX2-6B è¿ä¸æ¥ç»è¿äº 600B ä»£ç æ°æ®é¢è®ç»ï¼ç¸æ¯ CodeGeeX ä¸ä»£æ¨¡åï¼å¨ä»£ç è½åä¸å ¨é¢æåï¼HumanEval-X è¯æµéçå ç§ç¼ç¨è¯è¨åå¤§å¹ æå (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321%)ï¼å¨Pythonä¸è¾¾å° 35.9% ç Pass@1 䏿¬¡éè¿çï¼è¶ è¶è§æ¨¡æ´å¤§ç StarCoder-15Bã
- **æ´ä¼ç§ç模åç¹æ§**ï¼ç»§æ¿ ChatGLM2-6B 模åç¹æ§ï¼CodeGeeX2-6B æ´å¥½æ¯æä¸è±æè¾å ¥ï¼æ¯ææå¤§ 8192 åºåé¿åº¦ï¼æ¨çé度è¾ä¸ä»£ å¤§å¹ æåï¼éååä» é6GBæ¾åå³å¯è¿è¡ï¼æ¯æè½»é级æ¬å°åé¨ç½²ã
- æ´å ¨é¢çAIç¼ç¨å©æï¼CodeGeeXæä»¶ï¼VS Code, Jetbrainsï¼å端åçº§ï¼æ¯æè¶ è¿100ç§ç¼ç¨è¯è¨ï¼æ°å¢ä¸ä¸æè¡¥å ¨ãè·¨æä»¶è¡¥å ¨çå®ç¨åè½ãç»å Ask CodeGeeX 交äºå¼AIç¼ç¨å©æï¼æ¯æä¸è±æå¯¹è¯è§£å³åç§ç¼ç¨é®é¢ï¼å æ¬ä¸ä¸éäºä»£ç è§£éã代ç ç¿»è¯ã代ç çº éãææ¡£çæçï¼å¸®å©ç¨åºåæ´é«æå¼åã
[2023/06/25] åå¸ ChatGLM2-6Bï¼ChatGLM-6B çåçº§çæ¬ï¼å¨ä¿çäºäºå代模åå¯¹è¯æµç ãé¨ç½²é¨æ§è¾ä½çä¼å¤ä¼ç§ç¹æ§çåºç¡ä¹ä¸ï¼ChatGLM2-6B å¼å ¥äºå¦ä¸æ°ç¹æ§ï¼
- æ´å¼ºå¤§çæ§è½ï¼åºäº ChatGLM å代模åçå¼åç»éªï¼æä»¬å ¨é¢åçº§äº ChatGLM2-6B çåºåº§æ¨¡åãChatGLM2-6B 使ç¨äº GLM çæ··åç®æ 彿°ï¼ç»è¿äº 1.4T ä¸è±æ è¯ç¬¦çé¢è®ç»ä¸äººç±»å好对é½è®ç»ï¼è¯æµç»ææ¾ç¤ºï¼ç¸æ¯äºå代模åï¼ChatGLM2-6B å¨ MMLUï¼+23%ï¼ãCEvalï¼+33%ï¼ãGSM8Kï¼+571%ï¼ ãBBHï¼+60%ï¼çæ°æ®éä¸çæ§è½åå¾äºå¤§å¹ 度çæåï¼å¨åå°ºå¯¸å¼æºæ¨¡åä¸å ·æè¾å¼ºçç«äºåã
- æ´é¿çä¸ä¸æï¼åºäº FlashAttention ææ¯ï¼æä»¬å°åºåº§æ¨¡åçä¸ä¸æé¿åº¦ï¼Context Lengthï¼ç± ChatGLM-6B ç 2K æ©å±å°äº 32Kï¼å¹¶å¨å¯¹è¯é¶æ®µä½¿ç¨ 8K çä¸ä¸æé¿åº¦è®ç»ï¼å 许æ´å¤è½®æ¬¡ç对è¯ãä½å½åçæ¬ç ChatGLM2-6B 对åè½®è¶ é¿ææ¡£ççè§£è½åæéï¼æä»¬ä¼å¨åç»è¿ä»£å级ä¸çéè¿è¡ä¼åã
- æ´é«æçæ¨çï¼åºäº Multi-Query Attention ææ¯ï¼ChatGLM2-6B ææ´é«æçæ¨çéåº¦åæ´ä½çæ¾åå ç¨ï¼å¨å®æ¹ç模åå®ç°ä¸ï¼æ¨çéåº¦ç¸æ¯å代æåäº 42%ï¼INT4 éåä¸ï¼6G æ¾åæ¯æç对è¯é¿åº¦ç± 1K æåå°äº 8Kã
æ´å¤ä¿¡æ¯åè§ ChatGLM2-6Bã
[2023/06/14] åå¸ WebGLMï¼ä¸é¡¹è¢«æ¥åäºKDD 2023çç ç©¶å·¥ä½ï¼æ¯æå©ç¨ç½ç»ä¿¡æ¯çæå¸¦æåç¡®å¼ç¨çé¿åçã
[2023/05/17] åå¸ VisualGLM-6Bï¼ä¸ä¸ªæ¯æå¾åçè§£ç夿¨¡æå¯¹è¯è¯è¨æ¨¡åã
å¯ä»¥éè¿æ¬ä»åºä¸ç cli_demo_vision.py å web_demo_vision.py æ¥è¿è¡å½ä»¤è¡åç½é¡µ Demoãæ³¨æ VisualGLM-6B éè¦é¢å¤å®è£ SwissArmyTransformer å torchvisionãæ´å¤ä¿¡æ¯åè§ VisualGLM-6Bã
[2023/05/15] æ´æ° v1.1 çæ¬ checkpointï¼è®ç»æ°æ®å¢å è±ææä»¤å¾®è°æ°æ®ä»¥å¹³è¡¡ä¸è±ææ°æ®æ¯ä¾ï¼è§£å³è±æåçä¸å¤¹æä¸æè¯è¯çç°è±¡ã
以䏿¯æ´æ°ååçè±æé®é¢å¯¹æ¯ï¼
- é®é¢ï¼Describe a time when you had to make a difficult decision.
- v1.0:
- v1.1:
- v1.0:
- é®é¢ï¼Describe the function of a computer motherboard
- v1.0:
- v1.1:
- v1.0:
- é®é¢ï¼Develop a plan to reduce electricity usage in a home.
- v1.0:
- v1.1:
- v1.0:
- é®é¢ï¼æªæ¥çNFTï¼å¯è½çå®å®ä¹ä¸ç§ç°å®çèµäº§ï¼å®ä¼æ¯ä¸å¤æ¿äº§ï¼ä¸è¾æ±½è½¦ï¼ä¸çåå°ççï¼è¿æ ·çæ°ååè¯å¯è½æ¯çå®çä¸è¥¿æ´æä»·å¼ï¼ä½ å¯ä»¥éæ¶äº¤æå使ç¨ï¼å¨èæåç°å®ä¸æ ç¼çè®©æ¥æçèµäº§ç»§ç»åé ä»·å¼ï¼æªæ¥ä¼æ¯ä¸ç©å½ææç¨ï¼ä½ä¸å½æææçæ¶ä»£ãç¿»è¯æä¸ä¸çè±è¯
- v1.0:
- v1.1:
- v1.0:
æ´å¤æ´æ°ä¿¡æ¯åè§ UPDATE.md
åæ é¾æ¥
对 ChatGLM è¿è¡å éç弿ºé¡¹ç®ï¼
- lyraChatGLM: 对 ChatGLM-6B è¿è¡æ¨çå éï¼æé«å¯ä»¥å®ç° 9000+ tokens/s çæ¨çé度
- ChatGLM-MNN: ä¸ä¸ªåºäº MNN ç ChatGLM-6B C++ æ¨çå®ç°ï¼æ¯ææ ¹æ®æ¾å大å°èªå¨åé 计ç®ä»»å¡ç» GPU å CPU
- JittorLLMsï¼æä½3Gæ¾åæè æ²¡ææ¾å¡é½å¯è¿è¡ ChatGLM-6B FP16ï¼ æ¯æLinuxãwindowsãMacé¨ç½²
- InferLLMï¼è½»é级 C++ æ¨çï¼å¯ä»¥å®ç°æ¬å° x86ï¼Arm å¤çå¨ä¸å®æ¶èå¤©ï¼ææºä¸ä¹åæ ·å¯ä»¥å®æ¶è¿è¡ï¼è¿è¡å ååªéè¦ 4G
åºäºæä½¿ç¨äº ChatGLM-6B ç弿ºé¡¹ç®ï¼
- langchain-ChatGLMï¼åºäº langchain ç ChatGLM åºç¨ï¼å®ç°åºäºå¯æ©å±ç¥è¯åºçé®ç
- é»è¾¾ï¼å¤§åè¯è¨æ¨¡åè°ç¨å¹³å°ï¼åºäº ChatGLM-6B å®ç°äºç±» ChatPDF åè½
- glm-botï¼å°ChatGLMæ¥å ¥Koishiå¯å¨å大è天平å°ä¸è°ç¨ChatGLM
- Chuanhu Chat: 为å个大è¯è¨æ¨¡ååå¨çº¿æ¨¡åAPIæä¾ç¾è§æç¨ãåè½ä¸°å¯ãå¿«éé¨ç½²çç¨æ·çé¢ï¼æ¯æChatGLM-6Bã
æ¯æ ChatGLM-6B åç¸å ³åºç¨å¨çº¿è®ç»ç示ä¾é¡¹ç®ï¼
ç¬¬ä¸æ¹è¯æµï¼
æ´å¤å¼æºé¡¹ç®åè§ PROJECT.md
ä½¿ç¨æ¹å¼
ç¡¬ä»¶éæ±
éåç级 | æä½ GPU æ¾åï¼æ¨çï¼ | æä½ GPU æ¾åï¼é«æåæ°å¾®è°ï¼ |
---|---|---|
FP16ï¼æ éåï¼ | 13 GB | 14 GB |
INT8 | 8 GB | 9 GB |
INT4 | 6 GB | 7 GB |
ç¯å¢å®è£
ä½¿ç¨ pip å®è£
ä¾èµï¼pip install -r requirements.txt
ï¼å
¶ä¸ transformers
åºçæ¬æ¨è为 4.27.1
ï¼ä½ç论ä¸ä¸ä½äº 4.23.1
å³å¯ã
æ¤å¤ï¼å¦æéè¦å¨ cpu ä¸è¿è¡éååçæ¨¡åï¼è¿éè¦å®è£
gcc
ä¸ openmp
ã夿° Linux åè¡çé»è®¤å·²å®è£
ãå¯¹äº Windows ï¼å¯å¨å®è£
TDM-GCC æ¶å¾é openmp
ã Windows æµè¯ç¯å¢ gcc
çæ¬ä¸º TDM-GCC 10.3.0
ï¼ Linux 为 gcc 11.3.0
ãå¨ MacOS ä¸è¯·åè Q1ã
代ç è°ç¨
å¯ä»¥éè¿å¦ä¸ä»£ç è°ç¨ ChatGLM-6B æ¨¡åæ¥çæå¯¹è¯ï¼
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
>>> model = model.eval()
>>> response, history = model.chat(tokenizer, "ä½ å¥½", history=[])
>>> print(response)
ä½ å¥½ð!ææ¯äººå·¥æºè½å©æ ChatGLM-6B,å¾é«å
´è§å°ä½ ,欢è¿é®æä»»ä½é®é¢ã
>>> response, history = model.chat(tokenizer, "æä¸ç¡ä¸çåºè¯¥æä¹å", history=history)
>>> print(response)
æä¸ç¡ä¸çå¯è½ä¼è®©ä½ æå°ç¦èæä¸èæ,ä½ä»¥ä¸æ¯ä¸äºå¯ä»¥å¸®å©ä½ å
¥ç¡çæ¹æ³:
1. å¶å®è§å¾çç¡ç æ¶é´è¡¨:ä¿æè§å¾çç¡ç æ¶é´è¡¨å¯ä»¥å¸®å©ä½ 建ç«å¥åº·çç¡ç ä¹ æ¯,ä½¿ä½ æ´å®¹æå
¥ç¡ãå°½é卿¯å¤©çç¸åæ¶é´ä¸åº,å¹¶å¨å䏿¶é´èµ·åºã
2. åé ä¸ä¸ªèéçç¡ç ç¯å¢:ç¡®ä¿ç¡ç ç¯å¢èé,å®é,黿䏿¸©åº¦éå®ãå¯ä»¥ä½¿ç¨èéçåºä¸ç¨å,å¹¶ä¿ææ¿é´éé£ã
3. æ¾æ¾èº«å¿:å¨ç¡ååäºæ¾æ¾çæ´»å¨,ä¾å¦æ³¡ä¸ªç水澡,å¬äºè½»æçé³ä¹,é
读ä¸äºæè¶£ç书ç±ç,æå©äºç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå
¥ç¡ã
4. é¿å
饮ç¨å«æåå¡å ç饮æ:åå¡å æ¯ä¸ç§åºæ¿æ§ç©è´¨,ä¼å½±åä½ çç¡ç è´¨éãå°½éé¿å
å¨ç¡å饮ç¨å«æåå¡å ç饮æ,ä¾å¦åå¡,è¶åå¯ä¹ã
5. é¿å
å¨åºä¸åä¸ç¡ç æ å
³çäºæ
:å¨åºä¸åäºä¸ç¡ç æ å
³çäºæ
,ä¾å¦ççµå½±,ç©æ¸¸ææå·¥ä½ç,å¯è½ä¼å¹²æ°ä½ çç¡ç ã
6. å°è¯å¼å¸æå·§:æ·±å¼å¸æ¯ä¸ç§æ¾æ¾æå·§,å¯ä»¥å¸®å©ä½ ç¼è§£ç´§å¼ åç¦è,ä½¿ä½ æ´å®¹æå
¥ç¡ãè¯çæ
¢æ
¢å¸æ°,ä¿æå ç§é,ç¶åç¼æ
¢å¼æ°ã
妿è¿äºæ¹æ³æ æ³å¸®å©ä½ å
¥ç¡,ä½ å¯ä»¥èèå¨è¯¢å»çæç¡ç ä¸å®¶,寻æ±è¿ä¸æ¥ç建议ã
模åçå®ç°ä»ç¶å¤å¨åå¨ä¸ã妿叿åºå®ä½¿ç¨ç模åå®ç°ä»¥ä¿è¯å
¼å®¹æ§ï¼å¯ä»¥å¨ from_pretrained
çè°ç¨ä¸å¢å revision="v1.1.0"
åæ°ãv1.1.0
æ¯å½åææ°ççæ¬å·ï¼å®æ´ççæ¬å表åè§ Change Logã
仿¬å°å 载模å
以ä¸ä»£ç ä¼ç± transformers
èªå¨ä¸è½½æ¨¡åå®ç°ååæ°ã宿´ç模åå®ç°å¯ä»¥å¨ Hugging Face Hubãå¦æä½ çç½ç»ç¯å¢è¾å·®ï¼ä¸è½½æ¨¡ååæ°å¯è½ä¼è±è´¹è¾é¿æ¶é´çè³å¤±è´¥ãæ¤æ¶å¯ä»¥å
å°æ¨¡åä¸è½½å°æ¬å°ï¼ç¶å仿¬å°å è½½ã
ä» Hugging Face Hub ä¸è½½æ¨¡åéè¦å å®è£ Git LFSï¼ç¶åè¿è¡
git clone https://huggingface.co/THUDM/chatglm-6b
å¦æä½ ä» Hugging Face Hub ä¸ä¸è½½ checkpoint çéåº¦è¾æ ¢ï¼å¯ä»¥åªä¸è½½æ¨¡åå®ç°
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm-6b
ç¶åä»è¿éæå¨ä¸è½½æ¨¡ååæ°æä»¶ï¼å¹¶å°ä¸è½½çæä»¶æ¿æ¢å°æ¬å°ç chatglm-6b
ç®å½ä¸ã
å°æ¨¡åä¸è½½å°æ¬å°ä¹åï¼å°ä»¥ä¸ä»£ç ä¸ç THUDM/chatglm-6b
æ¿æ¢ä¸ºä½ æ¬å°ç chatglm-6b
æä»¶å¤¹çè·¯å¾ï¼å³å¯ä»æ¬å°å 载模åã
Optional 模åçå®ç°ä»ç¶å¤å¨åå¨ä¸ã妿叿åºå®ä½¿ç¨ç模åå®ç°ä»¥ä¿è¯å ¼å®¹æ§ï¼å¯ä»¥æ§è¡
git checkout v1.1.0
Demo & API
æä»¬æä¾äºä¸ä¸ªåºäº Gradio çç½é¡µç Demo åä¸ä¸ªå½ä»¤è¡ Demoãä½¿ç¨æ¶é¦å éè¦ä¸è½½æ¬ä»åºï¼
git clone https://github.com/THUDM/ChatGLM-6B
cd ChatGLM-6B
ç½é¡µç Demo
é¦å
å®è£
Gradioï¼pip install gradio
ï¼ç¶åè¿è¡ä»åºä¸ç web_demo.pyï¼
python web_demo.py
ç¨åºä¼è¿è¡ä¸ä¸ª Web Serverï¼å¹¶è¾åºå°åã卿µè§å¨ä¸æå¼è¾åºçå°åå³å¯ä½¿ç¨ãææ°ç Demo å®ç°äºæåæºææï¼é度ä½éªå¤§å¤§æåãæ³¨æï¼ç±äºå½å
Gradio çç½ç»è®¿é®è¾ä¸ºç¼æ
¢ï¼å¯ç¨ demo.queue().launch(share=True, inbrowser=True)
æ¶ææç½ç»ä¼ç»è¿ Gradio æå¡å¨è½¬åï¼å¯¼è´æåæºä½éªå¤§å¹
ä¸éï¼ç°å¨é»è®¤å¯å¨æ¹å¼å·²ç»æ¹ä¸º share=False
ï¼å¦æéè¦å
¬ç½è®¿é®çéæ±ï¼å¯ä»¥éæ°ä¿®æ¹ä¸º share=True
å¯å¨ã
æè°¢ @AdamBear å®ç°äºåºäº Streamlit çç½é¡µç Demoï¼è¿è¡æ¹å¼è§#117.
å½ä»¤è¡ Demo
è¿è¡ä»åºä¸ cli_demo.pyï¼
python cli_demo.py
ç¨åºä¼å¨å½ä»¤è¡ä¸è¿è¡äº¤äºå¼ç对è¯ï¼å¨å½ä»¤è¡ä¸è¾å
¥æç¤ºå¹¶å车å³å¯çæåå¤ï¼è¾å
¥ clear
å¯ä»¥æ¸
空对è¯åå²ï¼è¾å
¥ stop
ç»æ¢ç¨åºã
APIé¨ç½²
é¦å
éè¦å®è£
é¢å¤çä¾èµ pip install fastapi uvicorn
ï¼ç¶åè¿è¡ä»åºä¸ç api.pyï¼
python api.py
é»è®¤é¨ç½²å¨æ¬å°ç 8000 端å£ï¼éè¿ POST æ¹æ³è¿è¡è°ç¨
curl -X POST "http://127.0.0.1:8000" \
-H 'Content-Type: application/json' \
-d '{"prompt": "ä½ å¥½", "history": []}'
å¾å°çè¿åå¼ä¸º
{
"response":"ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM-6Bï¼å¾é«å
´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã",
"history":[["ä½ å¥½","ä½ å¥½ðï¼ææ¯äººå·¥æºè½å©æ ChatGLM-6Bï¼å¾é«å
´è§å°ä½ ï¼æ¬¢è¿é®æä»»ä½é®é¢ã"]],
"status":200,
"time":"2023-03-23 21:38:40"
}
使æ¬é¨ç½²
模åéå
é»è®¤æ åµä¸ï¼æ¨¡å以 FP16 精度å è½½ï¼è¿è¡ä¸è¿°ä»£ç éè¦å¤§æ¦ 13GB æ¾åãå¦æä½ ç GPU æ¾åæéï¼å¯ä»¥å°è¯ä»¥éåæ¹å¼å 载模åï¼ä½¿ç¨æ¹æ³å¦ä¸ï¼
# æéä¿®æ¹ï¼ç®ååªæ¯æ 4/8 bit éå
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).quantize(8).half().cuda()
è¿è¡ 2 è³ 3 轮对è¯åï¼8-bit éåä¸ GPU æ¾åå ç¨çº¦ä¸º 10GBï¼4-bit éåä¸ä» é 6GB å ç¨ãéç对è¯è½®æ°çå¢å¤ï¼å¯¹åºæ¶èæ¾åä¹éä¹å¢é¿ï¼ç±äºéç¨äºç¸å¯¹ä½ç½®ç¼ç ï¼çè®ºä¸ ChatGLM-6B æ¯ææ éé¿ç context-lengthï¼ä½æ»é¿åº¦è¶ è¿ 2048ï¼è®ç»é¿åº¦ï¼åæ§è½ä¼éæ¸ä¸éã
模åéåä¼å¸¦æ¥ä¸å®çæ§è½æå¤±ï¼ç»è¿æµè¯ï¼ChatGLM-6B å¨ 4-bit éåä¸ä»ç¶è½å¤è¿è¡èªç¶æµç ççæãä½¿ç¨ GPT-Q çéåæ¹æ¡å¯ä»¥è¿ä¸æ¥å缩éå精度/æåç¸åéå精度ä¸çæ¨¡åæ§è½ï¼æ¬¢è¿å¤§å®¶æåºå¯¹åºç Pull Requestã
éåè¿ç¨éè¦å¨å åä¸é¦å å è½½ FP16 æ ¼å¼ç模åï¼æ¶èå¤§æ¦ 13GB çå åãå¦æä½ çå åä¸è¶³çè¯ï¼å¯ä»¥ç´æ¥å è½½éååçæ¨¡åï¼INT4 éååçæ¨¡åä» éå¤§æ¦ 5.2GB çå åï¼
# INT8 éåçæ¨¡åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).half().cuda()
é忍¡åçåæ°æä»¶ä¹å¯ä»¥ä»è¿éæå¨ä¸è½½ã
CPU é¨ç½²
å¦æä½ æ²¡æ GPU 硬件çè¯ï¼ä¹å¯ä»¥å¨ CPU ä¸è¿è¡æ¨çï¼ä½æ¯æ¨çéåº¦ä¼æ´æ ¢ãä½¿ç¨æ¹æ³å¦ä¸ï¼éè¦å¤§æ¦ 32GB å åï¼
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).float()
å¦æä½ çå åä¸è¶³ï¼å¯ä»¥ç´æ¥å è½½éååçæ¨¡åï¼
# INT8 éåçæ¨¡åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()
妿éå°äºæ¥é Could not find module 'nvcuda.dll'
æè
RuntimeError: Unknown platform: darwin
(MacOS) ï¼è¯·ä»æ¬å°å 载模å
Mac é¨ç½²
å¯¹äºæè½½äº Apple Silicon æè AMD GPU çMacï¼å¯ä»¥ä½¿ç¨ MPS å端æ¥å¨ GPU ä¸è¿è¡ ChatGLM-6Bãéè¦åè Apple ç 宿¹è¯´æ å®è£ PyTorch-Nightlyï¼æ£ç¡®ççæ¬å·åºè¯¥æ¯2.1.0.dev2023xxxxï¼è䏿¯2.0.0ï¼ã
ç®åå¨ MacOS ä¸åªæ¯æä»æ¬å°å 载模åãå°ä»£ç ä¸ç模åå è½½æ¹ä¸ºä»æ¬å°å è½½ï¼å¹¶ä½¿ç¨ mps å端ï¼
model = AutoModel.from_pretrained("your local path", trust_remote_code=True).half().to('mps')
å è½½å精度ç ChatGLM-6B 模åéè¦å¤§æ¦ 13GB å åãå åè¾å°çæºå¨ï¼æ¯å¦ 16GB å åç MacBook Proï¼ï¼å¨ç©ºä½å åä¸è¶³çæ åµä¸ä¼ä½¿ç¨ç¡¬çä¸çèæå åï¼å¯¼è´æ¨çé度严éåæ ¢ãæ¤æ¶å¯ä»¥ä½¿ç¨éååçæ¨¡åå¦ chatglm-6b-int4ãå 为 GPU ä¸éåç kernel æ¯ä½¿ç¨ CUDA ç¼åçï¼å æ¤æ æ³å¨ MacOS ä¸ä½¿ç¨ï¼åªè½ä½¿ç¨ CPU è¿è¡æ¨çã
# INT8 éåçæ¨¡åå°"THUDM/chatglm-6b-int4"æ¹ä¸º"THUDM/chatglm-6b-int8"
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()
为äºå åä½¿ç¨ CPU å¹¶è¡ï¼è¿éè¦åç¬å®è£ OpenMPã
å¤å¡é¨ç½²
å¦æä½ æå¤å¼ GPUï¼ä½æ¯æ¯å¼ GPU çæ¾å大å°é½ä¸è¶³ä»¥å®¹çº³å®æ´ç模åï¼é£ä¹å¯ä»¥å°æ¨¡åååå¨å¤å¼ GPUä¸ãé¦å
å®è£
accelerate: pip install accelerate
ï¼ç¶åéè¿å¦ä¸æ¹æ³å 载模åï¼
from utils import load_model_on_gpus
model = load_model_on_gpus("THUDM/chatglm-6b", num_gpus=2)
å³å¯å°æ¨¡åé¨ç½²å°ä¸¤å¼ GPU ä¸è¿è¡æ¨çãä½ å¯ä»¥å° num_gpus
æ¹ä¸ºä½ å¸æä½¿ç¨ç GPU æ°ãé»è®¤æ¯ååååçï¼ä½ ä¹å¯ä»¥ä¼ å
¥ device_map
åæ°æ¥èªå·±æå®ã
髿忰微è°
åºäº P-tuning v2 ç髿忰微è°ãå ·ä½ä½¿ç¨æ¹æ³è¯¦è§ ptuning/README.mdã
ChatGLM-6B 示ä¾
以䏿¯ä¸äºä½¿ç¨ web_demo.py
å¾å°çç¤ºä¾æªå¾ãæ´å¤ ChatGLM-6B çå¯è½ï¼çå¾
ä½ æ¥æ¢ç´¢åç°ï¼
èªæè®¤ç¥
æçº²åä½
ææ¡åä½
é®ä»¶åä½å©æ
ä¿¡æ¯æ½å
è§è²æ®æ¼
è¯è®ºæ¯è¾
æ æ¸¸å导
å±éæ§
ç±äº ChatGLM-6B çå°è§æ¨¡ï¼å ¶è½åä»ç¶æè®¸å¤å±éæ§ã以䏿¯æä»¬ç®ååç°çä¸äºé®é¢ï¼
-
模å容éè¾å°ï¼6B çå°å®¹éï¼å³å®äºå ¶ç¸å¯¹è¾å¼±ç模åè®°å¿åè¯è¨è½åãå¨é¢å¯¹è®¸å¤äºå®æ§ç¥è¯ä»»å¡æ¶ï¼ChatGLM-6B å¯è½ä¼çæä¸æ£ç¡®çä¿¡æ¯ï¼å®ä¹ä¸æ é¿é»è¾ç±»é®é¢ï¼å¦æ°å¦ãç¼ç¨ï¼çè§£çã
ç¹å»æ¥çä¾å
-
产çæå®³è¯´æææåè§çå 容ï¼ChatGLM-6B åªæ¯ä¸ä¸ªåæ¥ä¸äººç±»æå¾å¯¹é½çè¯è¨æ¨¡åï¼å¯è½ä¼çææå®³ãæåè§çå 容ãï¼å 容å¯è½å ·æåç¯æ§ï¼æ¤å¤ä¸å±ç¤ºï¼
-
è±æè½åä¸è¶³ï¼ChatGLM-6B è®ç»æ¶ä½¿ç¨çæç¤º/åç大é¨å齿¯ä¸æçï¼ä» ææå°ä¸é¨åè±æå 容ãå æ¤ï¼å¦æè¾å ¥è±ææç¤ºï¼åå¤çè´¨éè¿ä¸å¦ä¸æï¼çè³ä¸ä¸ææç¤ºä¸çå 容çç¾ï¼å¹¶ä¸åºç°ä¸è±å¤¹æçæ åµã
-
æè¢«è¯¯å¯¼ï¼å¯¹è¯è½åè¾å¼±ï¼ChatGLM-6B 对è¯è½åè¿æ¯è¾å¼±ï¼èä¸ âèªæè®¤ç¥â åå¨é®é¢ï¼å¹¶å¾å®¹æè¢«è¯¯å¯¼å¹¶äº§çé误çè¨è®ºãä¾å¦å½åçæ¬çæ¨¡åå¨è¢«è¯¯å¯¼çæ åµä¸ï¼ä¼å¨èªæè®¤ç¥ä¸åçåå·®ã
ç¹å»æ¥çä¾å
åè®®
æ¬ä»åºç代ç ä¾ç § Apache-2.0 åè®®å¼æºï¼ChatGLM-6B 模åçæéç使ç¨åéè¦éµå¾ª Model LicenseãChatGLM-6B æé坹妿¯ç ç©¶å®å ¨å¼æ¾ï¼å¨å¡«åé®å·è¿è¡ç»è®°å**亦å 许å è´¹åä¸ä½¿ç¨**ã
å¼ç¨
å¦æä½ è§å¾æä»¬ç工使叮å©çè¯ï¼è¯·èèå¼ç¨ä¸å论æ
@misc{glm2024chatglm,
title={ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools},
author={Team GLM and Aohan Zeng and Bin Xu and Bowen Wang and Chenhui Zhang and Da Yin and Diego Rojas and Guanyu Feng and Hanlin Zhao and Hanyu Lai and Hao Yu and Hongning Wang and Jiadai Sun and Jiajie Zhang and Jiale Cheng and Jiayi Gui and Jie Tang and Jing Zhang and Juanzi Li and Lei Zhao and Lindong Wu and Lucen Zhong and Mingdao Liu and Minlie Huang and Peng Zhang and Qinkai Zheng and Rui Lu and Shuaiqi Duan and Shudan Zhang and Shulin Cao and Shuxun Yang and Weng Lam Tam and Wenyi Zhao and Xiao Liu and Xiao Xia and Xiaohan Zhang and Xiaotao Gu and Xin Lv and Xinghan Liu and Xinyi Liu and Xinyue Yang and Xixuan Song and Xunkai Zhang and Yifan An and Yifan Xu and Yilin Niu and Yuantao Yang and Yueyan Li and Yushi Bai and Yuxiao Dong and Zehan Qi and Zhaoyu Wang and Zhen Yang and Zhengxiao Du and Zhenyu Hou and Zihan Wang},
year={2024},
eprint={2406.12793},
archivePrefix={arXiv},
primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}
Top Related Projects
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Large-scale pretraining for dialogue
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot