Top Related Projects
Quick Overview
DeepSeek-R1 is an open-source AI model developed by DeepSeek AI. It is a large language model (LLM) designed to understand and generate human-like text, capable of performing various natural language processing tasks. The repository contains model weights, training scripts, and evaluation code for the DeepSeek-R1 model.
Pros
- Open-source and freely available for research and commercial use
- Competitive performance compared to other large language models
- Supports multiple languages and can handle diverse tasks
- Includes detailed documentation and evaluation results
Cons
- Requires significant computational resources for training and inference
- May produce biased or incorrect outputs, as is common with large language models
- Limited fine-tuning options compared to some other popular LLMs
- Relatively new, so it may have fewer community resources and third-party integrations
Code Examples
# Loading the DeepSeek-R1 model
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-r1-7b-base")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-r1-7b-base")
# Generating text with DeepSeek-R1
input_text = "Explain the concept of artificial intelligence in simple terms:"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_length=200, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
# Fine-tuning DeepSeek-R1 on a custom dataset
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=8,
save_steps=10_000,
save_total_limit=2,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=your_custom_dataset,
data_collator=lambda data: {'input_ids': torch.stack([f[0] for f in data]),
'attention_mask': torch.stack([f[1] for f in data]),
'labels': torch.stack([f[2] for f in data])},
)
trainer.train()
Getting Started
To get started with DeepSeek-R1, follow these steps:
-
Install the required dependencies:
pip install transformers torch
-
Load the model and tokenizer:
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-r1-7b-base") model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-r1-7b-base")
-
Use the model for text generation or other NLP tasks as shown in the code examples above.
Competitor Comparisons
Inference code for Llama models
Pros of Llama
- More extensive documentation and community support
- Broader range of pre-trained models available
- Better integration with Meta's ecosystem of AI tools
Cons of Llama
- More restrictive licensing terms
- Higher computational requirements for training and inference
- Less focus on specialized domains compared to DeepSeek-R1
Code Comparison
DeepSeek-R1:
from deepseek_r1 import DeepSeekR1Model
model = DeepSeekR1Model.from_pretrained("deepseek-ai/deepseek-r1-base")
output = model.generate("What is the capital of France?")
print(output)
Llama:
from transformers import LlamaForCausalLM, LlamaTokenizer
model = LlamaForCausalLM.from_pretrained("meta-llama/Llama-2-7b")
tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b")
input_ids = tokenizer("What is the capital of France?", return_tensors="pt").input_ids
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))
The code examples show that DeepSeek-R1 has a simpler API for text generation, while Llama requires separate tokenization and model loading steps. However, Llama's approach offers more flexibility for advanced use cases.
The hub for EleutherAI's work on interpretability and learning dynamics
Pros of Pythia
- More extensive documentation and usage examples
- Larger community and contributor base
- Broader range of pre-trained models available
Cons of Pythia
- Less focus on specific research areas compared to DeepSeek-R1
- May require more setup and configuration for specialized tasks
- Potentially slower inference speed for certain model sizes
Code Comparison
DeepSeek-R1:
from deepseek_r1 import DeepSeekR1Model
model = DeepSeekR1Model.from_pretrained("deepseek-ai/deepseek-r1-base")
output = model.generate("Hello, how are you?")
print(output)
Pythia:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-1.4b")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-1.4b")
input_ids = tokenizer.encode("Hello, how are you?", return_tensors="pt")
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))
The code comparison shows that DeepSeek-R1 has a more streamlined API for text generation, while Pythia relies on the Hugging Face Transformers library for model loading and tokenization. DeepSeek-R1's approach may be simpler for quick prototyping, but Pythia's integration with Transformers offers more flexibility and compatibility with a wider range of models and tasks.
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
Pros of ChatGLM3
- Multilingual support with strong performance in Chinese and English
- Extensive documentation and examples for various use cases
- Active community and frequent updates
Cons of ChatGLM3
- Limited model size options compared to DeepSeek-R1
- Less focus on specialized domains like scientific research
Code Comparison
ChatGLM3:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).half().cuda()
DeepSeek-R1:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-r1-7b-base")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-r1-7b-base")
Both repositories provide easy-to-use interfaces for loading and using their respective models. ChatGLM3 requires the trust_remote_code=True
parameter and explicitly moves the model to GPU, while DeepSeek-R1 uses a more standard approach. ChatGLM3's code suggests better out-of-the-box GPU support, which could be beneficial for users with compatible hardware.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
DeepSeek-R1
1. Introduction
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.
2. Model Summary
Post-Training: Large-Scale Reinforcement Learning on the Base Model
-
We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area.
-
We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models.
Distillation: Smaller Models Can Be Powerful Too
- We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future.
- Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.
3. Model Downloads
DeepSeek-R1 Models
Model | #Total Params | #Activated Params | Context Length | Download |
---|---|---|---|---|
DeepSeek-R1-Zero | 671B | 37B | 128K | ð¤ HuggingFace |
DeepSeek-R1 | 671B | 37B | 128K | ð¤ HuggingFace |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository.
DeepSeek-R1-Distill Models
Model | Base Model | Download |
---|---|---|
DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | ð¤ HuggingFace |
DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | ð¤ HuggingFace |
DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | ð¤ HuggingFace |
DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | ð¤ HuggingFace |
DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | ð¤ HuggingFace |
DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | ð¤ HuggingFace |
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models.
4. Evaluation Results
DeepSeek-R1-Evaluation
For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 |
---|---|---|---|---|---|---|---|
Architecture | - | - | MoE | - | - | MoE | |
# Activated Params | - | - | 37B | - | - | 37B | |
# Total Params | - | - | 671B | - | - | 671B | |
English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | 91.8 | 90.8 |
MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | 92.9 | |
MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | 84.0 | |
DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | 92.2 | |
IF-Eval (Prompt Strict) | 86.5 | 84.3 | 86.1 | 84.8 | - | 83.3 | |
GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | 75.7 | 71.5 | |
SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | 47.0 | 30.1 | |
FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | 82.5 | |
AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | 87.6 | |
ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | 92.3 | |
Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | 65.9 |
Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | 96.6 | 96.3 | |
Codeforces (Rating) | 717 | 759 | 1134 | 1820 | 2061 | 2029 | |
SWE Verified (Resolved) | 50.8 | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | |
Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | 61.7 | 53.3 | |
Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | 79.8 |
MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | 97.3 | |
CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | 78.8 | |
Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | 92.8 |
C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | 91.8 | |
C-SimpleQA (Correct) | 55.4 | 58.7 | 68.0 | 40.3 | - | 63.7 |
Distilled Model Evaluation
Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating |
---|---|---|---|---|---|---|
GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 |
Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 |
o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | 1820 |
QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 |
DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 |
DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 |
DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 |
DeepSeek-R1-Distill-Qwen-32B | 72.6 | 83.3 | 94.3 | 62.1 | 57.2 | 1691 |
DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 |
DeepSeek-R1-Distill-Llama-70B | 70.0 | 86.7 | 94.5 | 65.2 | 57.5 | 1633 |
5. Chat Website & API Platform
You can chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button "DeepThink"
We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com
6. How to Run Locally
DeepSeek-R1 Models
Please visit DeepSeek-V3 repo for more information about running DeepSeek-R1 locally.
NOTE: Hugging Face's Transformers has not been directly supported yet.
DeepSeek-R1-Distill Models
DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models.
For instance, you can easily start a service using vLLM:
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
You can also easily start a service using SGLang
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2
Usage Recommendations
We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:
- Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
- Avoid adding a system prompt; all instructions should be contained within the user prompt.
- For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
- When evaluating model performance, it is recommended to conduct multiple tests and average the results.
Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting "<think>\n\n</think>") when responding to certain queries, which can adversely affect the model's performance. To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with "<think>\n" at the beginning of every output.
Official Prompts
In the official DeepSeek web/app, we don't use system prompts but design two specific prompts for file upload and web search for better user experience. In addition, the temperature in web/app is 0.6.
For file upload, please follow the template to create prompts, where {file_name}, {file_content} and {question} are arguments.
file_template = \
"""[file name]: {file_name}
[file content begin]
{file_content}
[file content end]
{question}"""
For Web Search, {search_results}, {cur_date}, and {question} are arguments.
For Chinese query, we use the prompt:
search_answer_zh_template = \
'''# 以ä¸å
容æ¯åºäºç¨æ·åéçæ¶æ¯çæç´¢ç»æ:
{search_results}
卿ç»ä½ çæç´¢ç»æä¸ï¼æ¯ä¸ªç»æé½æ¯[webpage X begin]...[webpage X end]æ ¼å¼çï¼X代表æ¯ç¯æç« çæ°åç´¢å¼ã请å¨éå½çæ
åµä¸å¨å¥åæ«å°¾å¼ç¨ä¸ä¸æã请æç
§å¼ç¨ç¼å·[citation:X]çæ ¼å¼å¨çæ¡ä¸å¯¹åºé¨åå¼ç¨ä¸ä¸æã妿ä¸å¥è¯æºèªå¤ä¸ªä¸ä¸æï¼è¯·ååºææç¸å
³çå¼ç¨ç¼å·ï¼ä¾å¦[citation:3][citation:5]ï¼åè®°ä¸è¦å°å¼ç¨éä¸å¨æåè¿åå¼ç¨ç¼å·ï¼èæ¯å¨çæ¡å¯¹åºé¨åååºã
å¨åçæ¶ï¼è¯·æ³¨æä»¥ä¸å ç¹ï¼
- ä»å¤©æ¯{cur_date}ã
- å¹¶éæç´¢ç»æçææå
容é½ä¸ç¨æ·çé®é¢å¯åç¸å
³ï¼ä½ éè¦ç»åé®é¢ï¼å¯¹æç´¢ç»æè¿è¡çå«ãçéã
- 对äºå举类çé®é¢ï¼å¦å举ææèªçä¿¡æ¯ï¼ï¼å°½éå°çæ¡æ§å¶å¨10个è¦ç¹ä»¥å
ï¼å¹¶åè¯ç¨æ·å¯ä»¥æ¥çæç´¢æ¥æºãè·å¾å®æ´ä¿¡æ¯ãä¼å
æä¾ä¿¡æ¯å®æ´ãæç¸å
³çå举项ï¼å¦éå¿
è¦ï¼ä¸è¦ä¸»å¨åè¯ç¨æ·æç´¢ç»ææªæä¾çå
容ã
- 对äºåä½ç±»çé®é¢ï¼å¦å论æï¼ï¼è¯·å¡å¿
卿£æç段è½ä¸å¼ç¨å¯¹åºçåèç¼å·ï¼ä¾å¦[citation:3][citation:5]ï¼ä¸è½åªå¨æç« æ«å°¾å¼ç¨ãä½ éè¦è§£è¯»å¹¶æ¦æ¬ç¨æ·çé¢ç®è¦æ±ï¼éæ©åéçæ ¼å¼ï¼å
åå©ç¨æç´¢ç»æå¹¶æ½åéè¦ä¿¡æ¯ï¼çæç¬¦åç¨æ·è¦æ±ãæå
·ææ³æ·±åº¦ã坿åé åä¸ä¸ä¸æ§ççæ¡ãä½ çåä½ç¯å¹
éè¦å°½å¯è½å»¶é¿ï¼å¯¹äºæ¯ä¸ä¸ªè¦ç¹çè®ºè¿°è¦æ¨æµç¨æ·çæå¾ï¼ç»åºå°½å¯è½å¤è§åº¦çåçè¦ç¹ï¼ä¸å¡å¿
ä¿¡æ¯é大ã论述详尽ã
- 妿åçå¾é¿ï¼è¯·å°½éç»æåãåæ®µè½æ»ç»ã妿éè¦åç¹ä½çï¼å°½éæ§å¶å¨5个ç¹ä»¥å
ï¼å¹¶åå¹¶ç¸å
³çå
容ã
- 对äºå®¢è§ç±»çé®çï¼å¦æé®é¢ççæ¡é常ç®çï¼å¯ä»¥éå½è¡¥å
ä¸å°ä¸¤å¥ç¸å
³ä¿¡æ¯ï¼ä»¥ä¸°å¯å
容ã
- ä½ éè¦æ ¹æ®ç¨æ·è¦æ±ååçå
容鿩åéãç¾è§çåçæ ¼å¼ï¼ç¡®ä¿å¯è¯»æ§å¼ºã
- ä½ çåçåºè¯¥ç»¼åå¤ä¸ªç¸å
³ç½é¡µæ¥åçï¼ä¸è½éå¤å¼ç¨ä¸ä¸ªç½é¡µã
- é¤éç¨æ·è¦æ±ï¼å¦åä½ åççè¯è¨éè¦åç¨æ·æé®çè¯è¨ä¿æä¸è´ã
# ç¨æ·æ¶æ¯ä¸ºï¼
{question}'''
For English query, we use the prompt:
search_answer_en_template = \
'''# The following contents are the search results related to the user's message:
{search_results}
In the search results I provide to you, each result is formatted as [webpage X begin]...[webpage X end], where X represents the numerical index of each article. Please cite the context at the end of the relevant sentence when appropriate. Use the citation format [citation:X] in the corresponding part of your answer. If a sentence is derived from multiple contexts, list all relevant citation numbers, such as [citation:3][citation:5]. Be sure not to cluster all citations at the end; instead, include them in the corresponding parts of the answer.
When responding, please keep the following points in mind:
- Today is {cur_date}.
- Not all content in the search results is closely related to the user's question. You need to evaluate and filter the search results based on the question.
- For listing-type questions (e.g., listing all flight information), try to limit the answer to 10 key points and inform the user that they can refer to the search sources for complete information. Prioritize providing the most complete and relevant items in the list. Avoid mentioning content not provided in the search results unless necessary.
- For creative tasks (e.g., writing an essay), ensure that references are cited within the body of the text, such as [citation:3][citation:5], rather than only at the end of the text. You need to interpret and summarize the user's requirements, choose an appropriate format, fully utilize the search results, extract key information, and generate an answer that is insightful, creative, and professional. Extend the length of your response as much as possible, addressing each point in detail and from multiple perspectives, ensuring the content is rich and thorough.
- If the response is lengthy, structure it well and summarize it in paragraphs. If a point-by-point format is needed, try to limit it to 5 points and merge related content.
- For objective Q&A, if the answer is very brief, you may add one or two related sentences to enrich the content.
- Choose an appropriate and visually appealing format for your response based on the user's requirements and the content of the answer, ensuring strong readability.
- Your answer should synthesize information from multiple relevant webpages and avoid repeatedly citing the same webpage.
- Unless the user requests otherwise, your response should be in the same language as the user's question.
# The user's message is:
{question}'''
7. License
This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:
- DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1.
- DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license.
- DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license.
8. Citation
@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
author={DeepSeek-AI},
year={2025},
eprint={2501.12948},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.12948},
}
9. Contact
If you have any questions, please raise an issue or contact us at service@deepseek.com.
Top Related Projects
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot