DeepSeek-R1

No description available

90,334

11,655

90,334

View on GitHub

Top Related Projects

pythia

2,600

The hub for EleutherAI's work on interpretability and learning dynamics

ChatGLM3

13,716

ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型

Quick Overview

DeepSeek-R1 is an open-source AI model developed by DeepSeek AI. It is a large language model (LLM) designed to understand and generate human-like text, capable of performing various natural language processing tasks. The repository contains model weights, training scripts, and evaluation code for the DeepSeek-R1 model.

Pros

Open-source and freely available for research and commercial use
Competitive performance compared to other large language models
Supports multiple languages and can handle diverse tasks
Includes detailed documentation and evaluation results

Cons

Requires significant computational resources for training and inference
May produce biased or incorrect outputs, as is common with large language models
Limited fine-tuning options compared to some other popular LLMs
Relatively new, so it may have fewer community resources and third-party integrations

Code Examples

# Loading the DeepSeek-R1 model
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-r1-7b-base")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-r1-7b-base")

# Generating text with DeepSeek-R1
input_text = "Explain the concept of artificial intelligence in simple terms:"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_length=200, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

# Fine-tuning DeepSeek-R1 on a custom dataset
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    save_steps=10_000,
    save_total_limit=2,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=your_custom_dataset,
    data_collator=lambda data: {'input_ids': torch.stack([f[0] for f in data]),
                                'attention_mask': torch.stack([f[1] for f in data]),
                                'labels': torch.stack([f[2] for f in data])},
)

trainer.train()

Getting Started

To get started with DeepSeek-R1, follow these steps:

Install the required dependencies:
```
pip install transformers torch
```

Load the model and tokenizer:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-r1-7b-base")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-r1-7b-base")

Use the model for text generation or other NLP tasks as shown in the code examples above.

Competitor Comparisons

llama

58,578

Inference code for Llama models

Pros of Llama

More extensive documentation and community support
Broader range of pre-trained models available
Better integration with Meta's ecosystem of AI tools

Cons of Llama

More restrictive licensing terms
Higher computational requirements for training and inference
Less focus on specialized domains compared to DeepSeek-R1

Code Comparison

DeepSeek-R1:

from deepseek_r1 import DeepSeekR1Model

model = DeepSeekR1Model.from_pretrained("deepseek-ai/deepseek-r1-base")
output = model.generate("What is the capital of France?")
print(output)

Llama:

from transformers import LlamaForCausalLM, LlamaTokenizer

model = LlamaForCausalLM.from_pretrained("meta-llama/Llama-2-7b")
tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b")
input_ids = tokenizer("What is the capital of France?", return_tensors="pt").input_ids
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))

The code examples show that DeepSeek-R1 has a simpler API for text generation, while Llama requires separate tokenization and model loading steps. However, Llama's approach offers more flexibility for advanced use cases.

pythia

2,600

The hub for EleutherAI's work on interpretability and learning dynamics

Pros of Pythia

More extensive documentation and usage examples
Larger community and contributor base
Broader range of pre-trained models available

Cons of Pythia

Less focus on specific research areas compared to DeepSeek-R1
May require more setup and configuration for specialized tasks
Potentially slower inference speed for certain model sizes

Code Comparison

DeepSeek-R1:

from deepseek_r1 import DeepSeekR1Model

model = DeepSeekR1Model.from_pretrained("deepseek-ai/deepseek-r1-base")
output = model.generate("Hello, how are you?")
print(output)

Pythia:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-1.4b")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-1.4b")
input_ids = tokenizer.encode("Hello, how are you?", return_tensors="pt")
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))

The code comparison shows that DeepSeek-R1 has a more streamlined API for text generation, while Pythia relies on the Hugging Face Transformers library for model loading and tokenization. DeepSeek-R1's approach may be simpler for quick prototyping, but Pythia's integration with Transformers offers more flexibility and compatibility with a wider range of models and tasks.

ChatGLM3

13,716

ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型

Pros of ChatGLM3

Multilingual support with strong performance in Chinese and English
Extensive documentation and examples for various use cases
Active community and frequent updates

Cons of ChatGLM3

Limited model size options compared to DeepSeek-R1
Less focus on specialized domains like scientific research

Code Comparison

ChatGLM3:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()

DeepSeek-R1:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-r1-7b-base")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-r1-7b-base")

Both repositories provide easy-to-use interfaces for loading and using their respective models. ChatGLM3 requires the trust_remote_code=True parameter and explicitly moves the model to GPU, while DeepSeek-R1 uses a more standard approach. ChatGLM3's code suggests better out-of-the-box GPU support, which could be beneficial for users with compatible hardware.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

DeepSeek-R1

Paper Linkðï¸

1. Introduction

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.

2. Model Summary

Post-Training: Large-Scale Reinforcement Learning on the Base Model

We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area.
We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models.

Distillation: Smaller Models Can Be Powerful Too

We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future.
Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.

3. Model Downloads

DeepSeek-R1 Models

Model	#Total Params	#Activated Params	Context Length	Download
DeepSeek-R1-Zero	671B	37B	128K	ð¤ HuggingFace
DeepSeek-R1	671B	37B	128K	ð¤ HuggingFace

DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository.

DeepSeek-R1-Distill Models

Model	Base Model	Download
DeepSeek-R1-Distill-Qwen-1.5B	Qwen2.5-Math-1.5B	ð¤ HuggingFace
DeepSeek-R1-Distill-Qwen-7B	Qwen2.5-Math-7B	ð¤ HuggingFace
DeepSeek-R1-Distill-Llama-8B	Llama-3.1-8B	ð¤ HuggingFace
DeepSeek-R1-Distill-Qwen-14B	Qwen2.5-14B	ð¤ HuggingFace
DeepSeek-R1-Distill-Qwen-32B	Qwen2.5-32B	ð¤ HuggingFace
DeepSeek-R1-Distill-Llama-70B	Llama-3.3-70B-Instruct	ð¤ HuggingFace

DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models.

4. Evaluation Results

DeepSeek-R1-Evaluation

For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.

Category	Benchmark (Metric)	Claude-3.5-Sonnet-1022	GPT-4o 0513	DeepSeek V3	OpenAI o1-mini	OpenAI o1-1217	DeepSeek R1
	Architecture	-	-	MoE	-	-	MoE
	# Activated Params	-	-	37B	-	-	37B
	# Total Params	-	-	671B	-	-	671B
English	MMLU (Pass@1)	88.3	87.2	88.5	85.2	91.8	90.8
	MMLU-Redux (EM)	88.9	88.0	89.1	86.7	-	92.9
	MMLU-Pro (EM)	78.0	72.6	75.9	80.3	-	84.0
	DROP (3-shot F1)	88.3	83.7	91.6	83.9	90.2	92.2
	IF-Eval (Prompt Strict)	86.5	84.3	86.1	84.8	-	83.3
	GPQA-Diamond (Pass@1)	65.0	49.9	59.1	60.0	75.7	71.5
	SimpleQA (Correct)	28.4	38.2	24.9	7.0	47.0	30.1
	FRAMES (Acc.)	72.5	80.5	73.3	76.9	-	82.5
	AlpacaEval2.0 (LC-winrate)	52.0	51.1	70.0	57.8	-	87.6
	ArenaHard (GPT-4-1106)	85.2	80.4	85.5	92.0	-	92.3
Code	LiveCodeBench (Pass@1-COT)	33.8	34.2	-	53.8	63.4	65.9
	Codeforces (Percentile)	20.3	23.6	58.7	93.4	96.6	96.3
	Codeforces (Rating)	717	759	1134	1820	2061	2029
	SWE Verified (Resolved)	50.8	38.8	42.0	41.6	48.9	49.2
	Aider-Polyglot (Acc.)	45.3	16.0	49.6	32.9	61.7	53.3
Math	AIME 2024 (Pass@1)	16.0	9.3	39.2	63.6	79.2	79.8
	MATH-500 (Pass@1)	78.3	74.6	90.2	90.0	96.4	97.3
	CNMO 2024 (Pass@1)	13.1	10.8	43.2	67.6	-	78.8
Chinese	CLUEWSC (EM)	85.4	87.9	90.9	89.9	-	92.8
	C-Eval (EM)	76.7	76.0	86.5	68.9	-	91.8
	C-SimpleQA (Correct)	55.4	58.7	68.0	40.3	-	63.7

Distilled Model Evaluation

Model	AIME 2024 pass@1	AIME 2024 cons@64	MATH-500 pass@1	GPQA Diamond pass@1	LiveCodeBench pass@1	CodeForces rating
GPT-4o-0513	9.3	13.4	74.6	49.9	32.9	759
Claude-3.5-Sonnet-1022	16.0	26.7	78.3	65.0	38.9	717
o1-mini	63.6	80.0	90.0	60.0	53.8	1820
QwQ-32B-Preview	44.0	60.0	90.6	54.5	41.9	1316
DeepSeek-R1-Distill-Qwen-1.5B	28.9	52.7	83.9	33.8	16.9	954
DeepSeek-R1-Distill-Qwen-7B	55.5	83.3	92.8	49.1	37.6	1189
DeepSeek-R1-Distill-Qwen-14B	69.7	80.0	93.9	59.1	53.1	1481
DeepSeek-R1-Distill-Qwen-32B	72.6	83.3	94.3	62.1	57.2	1691
DeepSeek-R1-Distill-Llama-8B	50.4	80.0	89.1	49.0	39.6	1205
DeepSeek-R1-Distill-Llama-70B	70.0	86.7	94.5	65.2	57.5	1633

5. Chat Website & API Platform

You can chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button "DeepThink"

We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com

6. How to Run Locally

DeepSeek-R1 Models

Please visit DeepSeek-V3 repo for more information about running DeepSeek-R1 locally.

NOTE: Hugging Face's Transformers has not been directly supported yet.

DeepSeek-R1-Distill Models

DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models.

For instance, you can easily start a service using vLLM:

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager

You can also easily start a service using SGLang

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2

Usage Recommendations

We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:

Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
Avoid adding a system prompt; all instructions should be contained within the user prompt.
For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
When evaluating model performance, it is recommended to conduct multiple tests and average the results.

Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting "<think>\n\n</think>") when responding to certain queries, which can adversely affect the model's performance. To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with "<think>\n" at the beginning of every output.

Official Prompts

In the official DeepSeek web/app, we don't use system prompts but design two specific prompts for file upload and web search for better user experience. In addition, the temperature in web/app is 0.6.

For file upload, please follow the template to create prompts, where {file_name}, {file_content} and {question} are arguments.

file_template = \
"""[file name]: {file_name}
[file content begin]
{file_content}
[file content end]
{question}"""

For Web Search, {search_results}, {cur_date}, and {question} are arguments.

For Chinese query, we use the prompt:

search_answer_zh_template = \
'''# ä»¥ä¸åå®¹æ¯åºäºç¨æ·åéçæ¶æ¯çæç´¢ç»æ:
{search_results}
å¨æç»ä½ çæç´¢ç»æä¸ï¼æ¯ä¸ªç»æé½æ¯[webpage X begin]...[webpage X end]æ ¼å¼çï¼Xä»£è¡¨æ¯ç¯æç« çæ°åç´¢å¼ãè¯·å¨éå½çæåµä¸å¨å¥åæ«å°¾å¼ç¨ä¸ä¸æãè¯·æç§å¼ç¨ç¼å·[citation:X]çæ ¼å¼å¨çæ¡ä¸å¯¹åºé¨åå¼ç¨ä¸ä¸æãå¦æä¸å¥è¯æºèªå¤ä¸ªä¸ä¸æï¼è¯·ååºææç¸å³çå¼ç¨ç¼å·ï¼ä¾å¦[citation:3][citation:5]ï¼åè®°ä¸è¦å°å¼ç¨éä¸å¨æåè¿åå¼ç¨ç¼å·ï¼èæ¯å¨çæ¡å¯¹åºé¨åååºã
å¨åçæ¶ï¼è¯·æ³¨æä»¥ä¸å ç¹ï¼
- ä»å¤©æ¯{cur_date}ã
- å¹¶éæç´¢ç»æçææåå®¹é½ä¸ç¨æ·çé®é¢å¯åç¸å³ï¼ä½ éè¦ç»åé®é¢ï¼å¯¹æç´¢ç»æè¿è¡çå«ãçéã
- å¯¹äºåä¸¾ç±»çé®é¢ï¼å¦åä¸¾ææèªçä¿¡æ¯ï¼ï¼å°½éå°çæ¡æ§å¶å¨10ä¸ªè¦ç¹ä»¥åï¼å¹¶åè¯ç¨æ·å¯ä»¥æ¥çæç´¢æ¥æºãè·å¾å®æ´ä¿¡æ¯ãä¼åæä¾ä¿¡æ¯å®æ´ãæç¸å³çåä¸¾é¡¹ï¼å¦éå¿è¦ï¼ä¸è¦ä¸»å¨åè¯ç¨æ·æç´¢ç»ææªæä¾çåå®¹ã
- å¯¹äºåä½ç±»çé®é¢ï¼å¦åè®ºæï¼ï¼è¯·å¡å¿å¨æ£æçæ®µè½ä¸å¼ç¨å¯¹åºçåèç¼å·ï¼ä¾å¦[citation:3][citation:5]ï¼ä¸è½åªå¨æç« æ«å°¾å¼ç¨ãä½ éè¦è§£è¯»å¹¶æ¦æ¬ç¨æ·çé¢ç®è¦æ±ï¼éæ©åéçæ ¼å¼ï¼ååå©ç¨æç´¢ç»æå¹¶æ½åéè¦ä¿¡æ¯ï¼çæç¬¦åç¨æ·è¦æ±ãæå·ææ³æ·±åº¦ãå¯æåé åä¸ä¸ä¸æ§ççæ¡ãä½ çåä½ç¯å¹éè¦å°½å¯è½å»¶é¿ï¼å¯¹äºæ¯ä¸ä¸ªè¦ç¹çè®ºè¿°è¦æ¨æµç¨æ·çæå¾ï¼ç»åºå°½å¯è½å¤è§åº¦çåçè¦ç¹ï¼ä¸å¡å¿ä¿¡æ¯éå¤§ãè®ºè¿°è¯¦å°½ã
- å¦æåçå¾é¿ï¼è¯·å°½éç»æåãåæ®µè½æ»ç»ãå¦æéè¦åç¹ä½çï¼å°½éæ§å¶å¨5ä¸ªç¹ä»¥åï¼å¹¶åå¹¶ç¸å³çåå®¹ã
- å¯¹äºå®¢è§ç±»çé®çï¼å¦æé®é¢ççæ¡éå¸¸ç®çï¼å¯ä»¥éå½è¡¥åä¸å°ä¸¤å¥ç¸å³ä¿¡æ¯ï¼ä»¥ä¸°å¯åå®¹ã
- ä½ éè¦æ ¹æ®ç¨æ·è¦æ±ååçåå®¹éæ©åéãç¾è§çåçæ ¼å¼ï¼ç¡®ä¿å¯è¯»æ§å¼ºã
- ä½ çåçåºè¯¥ç»¼åå¤ä¸ªç¸å³ç½é¡µæ¥åçï¼ä¸è½éå¤å¼ç¨ä¸ä¸ªç½é¡µã
- é¤éç¨æ·è¦æ±ï¼å¦åä½ åççè¯è¨éè¦åç¨æ·æé®çè¯è¨ä¿æä¸è´ã

# ç¨æ·æ¶æ¯ä¸ºï¼
{question}'''

For English query, we use the prompt:

search_answer_en_template = \
'''# The following contents are the search results related to the user's message:
{search_results}
In the search results I provide to you, each result is formatted as [webpage X begin]...[webpage X end], where X represents the numerical index of each article. Please cite the context at the end of the relevant sentence when appropriate. Use the citation format [citation:X] in the corresponding part of your answer. If a sentence is derived from multiple contexts, list all relevant citation numbers, such as [citation:3][citation:5]. Be sure not to cluster all citations at the end; instead, include them in the corresponding parts of the answer.
When responding, please keep the following points in mind:
- Today is {cur_date}.
- Not all content in the search results is closely related to the user's question. You need to evaluate and filter the search results based on the question.
- For listing-type questions (e.g., listing all flight information), try to limit the answer to 10 key points and inform the user that they can refer to the search sources for complete information. Prioritize providing the most complete and relevant items in the list. Avoid mentioning content not provided in the search results unless necessary.
- For creative tasks (e.g., writing an essay), ensure that references are cited within the body of the text, such as [citation:3][citation:5], rather than only at the end of the text. You need to interpret and summarize the user's requirements, choose an appropriate format, fully utilize the search results, extract key information, and generate an answer that is insightful, creative, and professional. Extend the length of your response as much as possible, addressing each point in detail and from multiple perspectives, ensuring the content is rich and thorough.
- If the response is lengthy, structure it well and summarize it in paragraphs. If a point-by-point format is needed, try to limit it to 5 points and merge related content.
- For objective Q&A, if the answer is very brief, you may add one or two related sentences to enrich the content.
- Choose an appropriate and visually appealing format for your response based on the user's requirements and the content of the answer, ensuring strong readability.
- Your answer should synthesize information from multiple relevant webpages and avoid repeatedly citing the same webpage.
- Unless the user requests otherwise, your response should be in the same language as the user's question.

# The user's message is:
{question}'''

7. License

This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:

DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1.
DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under Llama3.1 license.
DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under Llama3.3 license.

8. Citation

@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
      title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning}, 
      author={DeepSeek-AI},
      year={2025},
      eprint={2501.12948},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.12948}, 
}

9. Contact

If you have any questions, please raise an issue or contact us at service@deepseek.com.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot