tencent cloud

DeepSeek API Guide

ダウンロード
フォーカスモード
フォントサイズ
最終更新日: 2026-06-11 18:02:54

Overview

The DeepSeek series of models has been integrated into TokenHub, supporting both the OpenAI Chat Completions and Anthropic protocols. Developers can quickly integrate them without needing to change their SDK. This document introduces general invocation examples and core capabilities specific to DeepSeek, such as its reasoning mode and Function Calling.

Prerequisites

You have registered a Tencent Cloud account and activated the TokenHub service.
You have obtained the API Key in the TokenHub console.
You have installed the SDK for your programming language, or you can make HTTP requests directly.

Supported Models

TokenHub currently supports the following DeepSeek models (for specifics, refer to the model list):
Model ID
Type
Reasoning Capability
Context Window
Max Input
Max Output
deepseek-v4-flash
General Conversation Model
Configurable (Enabled by default)
1M
1M
384K
deepseek-v4-pro
General Conversation Model
Configurable (Enabled by default)
1M
1M
384K
deepseek-v3.2
General Conversation Model
Configurable (Enabled by default)
128K
96K
32K
Note:
Special Notes for the DeepSeek V4-Flash / V4-Pro provided directly by DeepSeek:
For the DeepSeek V4 Pro model service provided directly by DeepSeek, TokenHub does not provide SLA guarantees, and the TokenHub service agreement does not apply. By using these models, you acknowledge and agree to the applicable DeepSeek service agreement. Please read the relevant terms carefully before use. If you do not accept these terms, stop using the service immediately.
The DeepSeek V4-Flash / V4-Pro / V3.2 models support both conversation and reasoning capabilities. Unlike other vendors, you do not need to switch model IDs between a standard model and a reasoning model. You can simply control whether to enable the reasoning capability using the thinking parameter.

Key Differences from Other Models

Level
DeepSeek V4-Flash / V4-Pro / V3.2
OpenAI / Claude / GLM, etc
Reasoning Capability Switch
Explicitly controlled via the thinking.type parameter
Typically controlled by switching the model or a separate reasoning parameter.
Reasoning Process Field
Independently returned in the response as reasoning_content
Most models do not expose the reasoning process.
Access Reasoning Fields via OpenAI SDK
Use hasattr / getattr
-
temperature
0 to 2, default 1, freely adjustable
Freely adjustable between 0 and 2 by default.
Recommended value for max_tokens
1024 to 4096 for general tasks; recommended ≥ 2048 for thinking mode.
Typically 1024 to 4096 is sufficient.
Context Window
Up to 1M tokens
Typically 128K tokens
Maximum Output
Up to 384K tokens
Typically 16K tokens
Writing back messages in multi-turn conversations
Only pass back the `content` field; do not pass back `reasoning_content`.
Typically, only contentneeds to be written back.

Quick Start

The following example demonstrates the simplest single-turn conversation call. Please replace YOUR_API_KEY with the API Key you created.
cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
-H "Content-Type: application/json" \\
-H "Authorization: Bearer YOUR_API_KEY" \\
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "user", "content": "Hello, please introduce yourself"}
],
"max_tokens": 1024
}'

Running Environment

Operating System: Ubuntu 24.04.3 LTS / x86_64

Runtime Version: GNU bash, version 5.2.21(1)-release (x86_64-pc-linux-gnu)

# pip install openai
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)

response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "user", "content": "Hello, please introduce yourself"}
],
max_tokens=1024,
)
print(response.choices[0].message.content)
// npm install openai
import OpenAI from "openai";

const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});

const response = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [
{ role: "user", content: "Hello, please introduce yourself" }
],
max_tokens: 1024,
});
console.log(response.choices[0].message.content);
// To use OkHttp, add the dependency: implementation("com.squareup.okhttp3:okhttp:4.12.0")
import okhttp3.*;
import org.json.*;

OkHttpClient httpClient = new OkHttpClient();

JSONObject body = new JSONObject();
body.put("model", "deepseek-v4-flash");
body.put("max_tokens", 1024);
JSONArray messages = new JSONArray();
JSONObject userMsg = new JSONObject();
userMsg.put("role", "user");
userMsg.put("content", "Hello, please introduce yourself");
messages.put(userMsg);
body.put("messages", messages);

Request request = new Request.Builder()
.url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
.addHeader("Authorization", "Bearer YOUR_API_KEY")
.addHeader("Content-Type", "application/json")
.post(RequestBody.create(body.toString(), MediaType.get("application/json")))
.build();

try (Response response = httpClient.newCall(request).execute()) {
JSONObject result = new JSONObject(response.body().string());
System.out.println(result.getJSONArray("choices")
.getJSONObject(0).getJSONObject("message").getString("content"));
}
package main

import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)

func main() {
body := map[string]interface{}{
"model": "deepseek-v4-flash",
"messages": []map[string]string{
{"role": "user", "content": "Hello, please introduce yourself"},
},
"max_tokens": 1024,
}
data, _ := json.Marshal(body)

req, _ := http.NewRequest("POST",
"https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
bytes.NewBuffer(data))
req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")

resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
respBody, _ := io.ReadAll(resp.Body)

var result map[string]interface{}
json.Unmarshal(respBody, &result)
choices := result["choices"].([]interface{})
msg := choices[0].(map[string]interface{})["message"].(map[string]interface{})
fmt.Println(msg["content"])
}

General Calling Examples

Basic Conversation

Send a single-turn conversation request to obtain the model's response.
cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
-H "Content-Type: application/json" \\
-H "Authorization: Bearer YOUR_API_KEY" \\
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "user", "content": "Introduce large language models"}
],
"max_tokens": 1024,
"thinking": {"type": "disabled"}
}'
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)

response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "user", "content": "Introduce large language models"}
],
max_tokens=1024,
extra_body={"thinking": {"type": "disabled"}}, # Disable thinking mode to reduce token consumption
)
print(response.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});

const response = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [
{ role: "user", content: "Introduce large language models" }
],
max_tokens: 1024,
// @ts-ignore - thinking is a DeepSeek extension field
thinking: { type: "disabled" },
});
console.log(response.choices[0].message.content);
import okhttp3.*;
import org.json.*;

OkHttpClient httpClient = new OkHttpClient();

JSONObject body = new JSONObject();
body.put("model", "deepseek-v4-flash");
body.put("max_tokens", 1024);

JSONArray messages = new JSONArray();
JSONObject userMsg = new JSONObject();
userMsg.put("role", "user");
userMsg.put("content", "Introduce large language models");
messages.put(userMsg);
body.put("messages", messages);

JSONObject thinking = new JSONObject();
thinking.put("type", "disabled");
body.put("thinking", thinking);

Request request = new Request.Builder()
.url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
.addHeader("Authorization", "Bearer YOUR_API_KEY")
.addHeader("Content-Type", "application/json")
.post(RequestBody.create(body.toString(), MediaType.get("application/json")))
.build();

try (Response response = httpClient.newCall(request).execute()) {
JSONObject result = new JSONObject(response.body().string());
System.out.println(result.getJSONArray("choices")
.getJSONObject(0).getJSONObject("message").getString("content"));
}
body := map[string]interface{}{
"model": "deepseek-v4-flash",
"messages": []map[string]string{
{"role": "user", "content": "Introduce large language models"},
},
"max_tokens": 1024,
"thinking": map[string]string{"type": "disabled"},
}
// ... The rest of the request code is the same as in the quick start example.

Streaming Output

Set stream to true to enable SSE streaming output, which is suitable for long-text generation scenarios, effectively prevents timeouts, and improves user experience.
cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
-H "Content-Type: application/json" \\
-H "Authorization: Bearer YOUR_API_KEY" \\
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "user", "content": "Write a short poem about spring"}
],
"max_tokens": 512,
"stream": true,
"thinking": {"type": "disabled"}
}'
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)

stream = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "user", "content": "Write a short poem about spring"}
],
max_tokens=512,
stream=True,
extra_body={"thinking": {"type": "disabled"}},
)

for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
import OpenAI from "openai";

const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});

const stream = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [
{ role: "user", content: "Write a short poem about spring" }
],
max_tokens: 512,
stream: true,
// @ts-ignore
thinking: { type: "disabled" },
});

for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
// For streaming output, use OkHttp EventSource.
import okhttp3.*;
import okhttp3.sse.*;
import org.json.*;

OkHttpClient httpClient = new OkHttpClient();

JSONObject body = new JSONObject();
body.put("model", "deepseek-v4-flash");
body.put("max_tokens", 512);
body.put("stream", true);
JSONArray messages = new JSONArray();
JSONObject msg = new JSONObject();
msg.put("role", "user");
msg.put("content", "Write a short poem about spring");
messages.put(msg);
body.put("messages", messages);
body.put("thinking", new JSONObject().put("type", "disabled"));

Request request = new Request.Builder()
.url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
.addHeader("Authorization", "Bearer YOUR_API_KEY")
.addHeader("Content-Type", "application/json")
.post(RequestBody.create(body.toString(), MediaType.get("application/json")))
.build();

EventSources.createFactory(httpClient).newEventSource(request, new EventSourceListener() {
@Override
public void onEvent(EventSource source, String id, String type, String data) {
if ("[DONE]".equals(data)) return;
try {
JSONObject json = new JSONObject(data);
String content = json.getJSONArray("choices").getJSONObject(0)
.getJSONObject("delta").optString("content", "");
if (!content.isEmpty()) System.out.print(content);
} catch (JSONException ignored) {}
}
});
import (
"bufio"
"bytes"
"encoding/json"
"fmt"
"net/http"
"strings"
)

body := map[string]interface{}{
"model": "deepseek-v4-flash",
"messages": []map[string]string{{"role": "user", "content": "Write a short poem about spring"}},
"max_tokens": 512,
"stream": true,
"thinking": map[string]string{"type": "disabled"},
}
data, _ := json.Marshal(body)

req, _ := http.NewRequest("POST",
"https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
bytes.NewBuffer(data))
req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")

resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()

scanner := bufio.NewScanner(resp.Body)
for scanner.Scan() {
line := scanner.Text()
if !strings.HasPrefix(line, "data: ") || line == "data: [DONE]" {
continue
}
var chunk map[string]interface{}
json.Unmarshal([]byte(strings.TrimPrefix(line, "data: ")), &chunk)
choices := chunk["choices"].([]interface{})
delta := choices[0].(map[string]interface{})["delta"].(map[string]interface{})
if content, ok := delta["content"].(string); ok {
fmt.Print(content)
}
}

System Prompt

Use system role messages to set the model's behavioral instructions and background information.
cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
-H "Content-Type: application/json" \\
-H "Authorization: Bearer YOUR_API_KEY" \\
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "system", "content": "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses."},
{"role": "user", "content": "How to read a CSV file"}
],
"max_tokens": 512,
"thinking": {"type": "disabled"}
}'
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)

response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{
"role": "system",
"content": "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses.",
},
{"role": "user", "content": "How to read a CSV file"},
],
max_tokens=512,
extra_body={"thinking": {"type": "disabled"}},
)
print(response.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});

const response = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [
{
role: "system",
"content": "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses.",
},
{ role: "user", content: "How to read a CSV file" },
],
max_tokens: 512,
// @ts-ignore
thinking: { type: "disabled" },
});
console.log(response.choices[0].message.content);
JSONObject body = new JSONObject();
body.put("model", "deepseek-v4-flash");
body.put("max_tokens", 512);
body.put("thinking", new JSONObject().put("type", "disabled"));

JSONArray messages = new JSONArray();
messages.put(new JSONObject().put("role", "system")
.put("content", "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses."));
messages.put(new JSONObject().put("role", "user")
.put("content", "How to read a CSV file"));
body.put("messages", messages);
// ... The request sending code is the same as above.
body := map[string]interface{}{
"model": "deepseek-v4-flash",
"messages": []map[string]string{
{"role": "system", "content": "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses."},
{"role": "user", "content": "How to read a CSV file"},
},
"max_tokens": 512,
"thinking": map[string]string{"type": "disabled"},
}
// ... The request sending code is the same as in the quick start.

Multi-turn Conversation

Pass the conversation history into the messages array to enable multi-turn dialogue with context memory.
cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
-H "Content-Type: application/json" \\
-H "Authorization: Bearer YOUR_API_KEY" \\
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "user", "content": "My name is Xiaoming, and I like playing basketball."},
{"role": "assistant", "content": "Hello, Xiaoming! Playing basketball is a great sport."},
{"role": "user", "content": "Do you remember my name and hobbies?"}
],
"max_tokens": 256,
"thinking": {"type": "disabled"}
}'
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)

# Maintain Conversation History
conversation = [
{"role": "system", "content": "You are a friendly AI assistant."},
]

def chat(user_input):
conversation.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=conversation,
max_tokens=1024,
extra_body={"thinking": {"type": "disabled"}},
)
reply = response.choices[0].message.content
conversation.append({"role": "assistant", "content": reply})
return reply

print(chat("My name is Xiaoming, and I like playing basketball."))
print(chat("Do you remember my name and hobbies?"))
import OpenAI from "openai";

const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});

const conversation = [
{"role": "system", "content": "You are a friendly AI assistant."},
];

async function chat(userInput) {
conversation.push({ role: "user", content: userInput });
const response = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: conversation,
max_tokens: 1024,
// @ts-ignore
thinking: { type: "disabled" },
});
const reply = response.choices[0].message.content;
conversation.push({ role: "assistant", content: reply });
return reply;
}

console.log(await chat("My name is Xiaoming, and I like playing basketball."));
console.log(await chat("Do you remember my name and hobbies?"));
// Core of multi-turn dialogue: Pass the messages array cumulatively.
JSONArray messages = new JSONArray();
messages.put(new JSONObject().put("role", "system").put("content", "You are a friendly AI assistant."));
messages.put(new JSONObject().put("role", "user").put("content", "My name is Xiaoming, and I like playing basketball."));
messages.put(new JSONObject().put("role", "assistant").put("content", "Hello, Xiaoming! Playing basketball is a great sport."));
messages.put(new JSONObject().put("role", "user").put("content", "Do you remember my name and hobbies?"));

JSONObject body = new JSONObject();
body.put("model", "deepseek-v4-flash");
body.put("messages", messages);
body.put("max_tokens", 1024);
body.put("thinking", new JSONObject().put("type", "disabled"));
// ... The request sending code is the same as above.
body := map[string]interface{}{
"model": "deepseek-v4-flash",
"messages": []map[string]string{
{"role": "system", "content": "You are a friendly AI assistant."},
{"role": "user", "content": "My name is Xiaoming, and I like playing basketball."},
{"role": "assistant", "content": "Hello, Xiaoming! Playing basketball is a great sport."},
{"role": "user", "content": "Do you remember my name and hobbies?"},
},
"max_tokens": 1024,
"thinking": map[string]string{"type": "disabled"},
}
// ... The request sending code is the same as in the quick start.

Function Calling (Tool Invocation)

Function Calling enables models to invoke external tools to obtain real-time data. The model itself does not execute functions. Instead, it returns the function names and parameters to be invoked. User code then executes them and returns the results to the model, which ultimately generates a natural language response.
Calling Process:
1. When a user asks a question, the model returns tool_calls (which contain the function name and parameters).
2. User code executes the function, and the result is returned as a role: tool message.
3. The model generates the final natural language response based on the function result.
cURL
Python
Node.js
Java
Go
# Round 1: Send the Question + Tool Definitions
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
-H "Content-Type: application/json" \\
-H "Authorization: Bearer YOUR_API_KEY" \\
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "user", "content": "What is the weather like in Beijing today?"}
],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Obtain weather information for a specified city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name, such as Beijing"}
},
"required": ["city"]
}
}
}],
"thinking": {"type": "disabled"}
}'

# Round 2: Return the Tool Execution Result (Replace tool_call_id with the actual returned id)
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
-H "Content-Type: application/json" \\
-H "Authorization: Bearer YOUR_API_KEY" \\
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "user", "content": "What is the weather like in Beijing today?"}
{"role": "assistant", "tool_calls": [{"id": "call_xxx", "type": "function", "function": {"name": "get_weather", "arguments": "{\\"city\\": \\"Beijing\\"}"}}]},
{"role": "tool", "tool_call_id": "call_xxx", "content": "Sunny, temperature 28°C, humidity 50%"}
],
"tools": [{"type": "function", "function": {"name": "get_weather", "description": "Obtain weather information for a specified city", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}}}]
"thinking": {"type": "disabled"}
}'
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)

# Define Tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Obtain weather information for a specified city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name, such as Beijing"}
},
"required": ["city"],
},
},
}
]

# Round 1: Send the Question
messages = [{"role": "user", "content": "What is the weather like in Beijing today?"}]
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=messages,
tools=tools,
extra_body={"thinking": {"type": "disabled"}},
)
assistant_message = response.choices[0].message

# The Model Initiates a Tool Call
if response.choices[0].finish_reason == "tool_calls":
tool_call = assistant_message.tool_calls[0]
print(f"Model calls tool: {tool_call.function.name}, parameters: {tool_call.function.arguments}")

# Execute the Tool (This is a simulated return)
tool_result = "Sunny, temperature 28°C, humidity 50%"

# Round 2: Return the Tool Result to the Model
messages.append(assistant_message)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": tool_result,
})

final_response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=messages,
tools=tools,
extra_body={"thinking": {"type": "disabled"}},
)
print(final_response.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});

const tools = [
{
type: "function",
function: {
name: "get_weather",
"description": "Obtain weather information for a specified city",
parameters: {
type: "object",
properties: {
"city": {"type": "string", "description": "City name, such as Beijing"}
},
required: ["city"],
},
},
},
];

// Round 1
const messages = [{ role: "user", content: "What is the weather like in Beijing today?" }];
const response1 = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages,
tools,
// @ts-ignore
thinking: { type: "disabled" },
});

const assistantMsg = response1.choices[0].message;
if (response1.choices[0].finish_reason === "tool_calls") {
const toolCall = assistantMsg.tool_calls[0];
console.log(`Tool call: ${toolCall.function.name}, parameters: ${toolCall.function.arguments}`);

const toolResult = "Sunny, temperature 28°C, humidity 50%";
messages.push(assistantMsg);
messages.push({ role: "tool", tool_call_id: toolCall.id, content: toolResult });

const response2 = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages,
tools,
// @ts-ignore
thinking: { type: "disabled" },
});
console.log(response2.choices[0].message.content);
}
JSONObject toolFunc = new JSONObject()
.put("name", "get_weather")
.put("description", "Obtain weather information for a specified city")
.put("parameters", new JSONObject()
.put("type", "object")
.put("properties", new JSONObject()
.put("city", new JSONObject().put("type", "string").put("description", "City name")))
.put("required", new JSONArray().put("city")));

JSONArray tools = new JSONArray()
.put(new JSONObject().put("type", "function").put("function", toolFunc));

JSONObject body = new JSONObject();
body.put("model", "deepseek-v4-flash");
body.put("messages", new JSONArray().put(new JSONObject().put("role", "user").put("content", "What is the weather like in Beijing today?")));
body.put("tools", tools);
body.put("thinking", new JSONObject().put("type", "disabled"));
// ... Send the request, parse tool_calls, execute the tool, and construct the second-round request
body := map[string]interface{}{
"model": "deepseek-v4-flash",
"messages": []map[string]string{
{"role": "user", "content": "What is the weather like in Beijing today?"}
},
"tools": []map[string]interface{}{{
"type": "function",
"function": map[string]interface{}{
"name": "get_weather",
"description": "Obtain weather information for a specified city",
"parameters": map[string]interface{}{
"type": "object",
"properties": map[string]interface{}{
"city": map[string]string{"type": "string", "description": "City name"},
},
"required": []string{"city"},
},
},
}},
"thinking": map[string]string{"type": "disabled"},
}
// ... Send the request, parse tool_calls, and construct the second-round request

Thinking Mode

DeepSeek models support controlling whether to enable the reasoning mode via the thinking parameter, eliminating the need to switch model IDs. After the reasoning mode is enabled, the model performs internal reasoning before providing the final answer, making it suitable for complex tasks that require precise reasoning.

thinking Parameter Description

Field
Type
Default Value
Value Range
Description
type
string
"enabled"
"enabled" / "disabled"
Controls the thinking mode switch.
reasoning_effort
string
"high"
"high" / "max"
Reasoning depth. max is suitable for complex Agent scenarios; low/medium maps to high, and xhigh maps to max.

Enabling or Disabling Thinking

cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
-H "Content-Type: application/json" \\
-H "Authorization: Bearer YOUR_API_KEY" \\
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "user", "content": "Solve the equation x^2 - 5x + 6 = 0"}
],
"max_tokens": 2048,
"thinking": {"type": "enabled", "reasoning_effort": "high"}
}'
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)

response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Solve the equation x^2 - 5x + 6 = 0"}],
max_tokens=2048,
extra_body={"thinking": {"type": "enabled", "reasoning_effort": "high"}},
)

msg = response.choices[0].message

# Obtain the reasoning process (a field exclusive to thinking mode)
reasoning = getattr(msg, "reasoning_content", None)
if reasoning:
print("=== Reasoning Process ===")
print(reasoning)

print("=== Final Answer ===")
print(msg.content)
import OpenAI from "openai";

const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});

const response = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "Solve the equation x^2 - 5x + 6 = 0" }],
max_tokens: 2048,
// @ts-ignore
thinking: { type: "enabled", reasoning_effort: "high" },
});

const msg = response.choices[0].message;
const reasoning = (msg as any).reasoning_content;
if (reasoning) {
console.log("=== Reasoning Process ===");
console.log(reasoning);
}
console.log("=== Final Answer ===");
console.log(msg.content);
JSONObject body = new JSONObject();
body.put("model", "deepseek-v4-flash");
body.put("max_tokens", 2048);
body.put("messages", new JSONArray()
.put(new JSONObject().put("role", "user").put("content", "Solve the equation x^2 - 5x + 6 = 0")));
body.put("thinking", new JSONObject().put("type", "enabled").put("reasoning_effort", "high"));

// ... Send the request
try (Response response = httpClient.newCall(request).execute()) {
JSONObject result = new JSONObject(response.body().string());
JSONObject message = result.getJSONArray("choices")
.getJSONObject(0).getJSONObject("message");
String reasoning = message.optString("reasoning_content", "");
String content = message.getString("content");
System.out.println("Reasoning Process: " + reasoning);
System.out.println("Final Answer: " + content);
}
body := map[string]interface{}{
"model": "deepseek-v4-flash",
"max_tokens": 2048,
"messages": []map[string]string{
{"role": "user", "content": "Solve the equation x^2 - 5x + 6 = 0"},
},
"thinking": map[string]string{"type": "enabled", "reasoning_effort": "high"},
}
// ... Send the request and parse the reasoning_content and content fields from the response.

Response Structure Examples

After thinking mode is enabled, the reasoning_content field is included in the response's message:
{
"choices": [{
"message": {
"role": "assistant",
"reasoning_content": "I need to solve the quadratic equation x^2 - 5x + 6 = 0.\\nFactorization: (x-2)(x-3) = 0\\nTherefore, x = 2 or x = 3.",
"content": "The solution to the equation x² - 5x + 6 = 0 is: **x = 2** or **x = 3**"
},
"finish_reason": "stop"
}],
"usage": {
"completion_tokens": 120,
"completion_tokens_details": {
"reasoning_tokens": 80
}
}
}

Streaming Thinking Output

When streaming output is enabled, the reasoning_content and content are both returned in incremental delta format and must be processed separately:
Python
Node.js
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)

stream = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Analyze the advantages and challenges of quantum computing."}],
max_tokens=2048,
stream=True,
extra_body={"thinking": {"type": "enabled"}},
)

print("=== Reasoning Process (Real-time) ===")
answer_started = False

for chunk in stream:
if not chunk.choices:
continue
delta = chunk.choices[0].delta

reasoning_delta = getattr(delta, "reasoning_content", None)
if reasoning_delta:
print(reasoning_delta, end="", flush=True)

if delta.content:
if not answer_started:
print("\\n\\n=== Final Answer (Real-time) ===")
answer_started = True
print(delta.content, end="", flush=True)
import OpenAI from "openai";

const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});

const stream = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "Analyze the advantages and challenges of quantum computing." }],
max_tokens: 2048,
stream: true,
// @ts-ignore
thinking: { type: "enabled" },
});

let answerStarted = false;
process.stdout.write("=== Reasoning Process (Real-time) ===\\n");

for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;
if (!delta) continue;

const reasoning = (delta as any).reasoning_content;
if (reasoning) process.stdout.write(reasoning);

if (delta.content) {
if (!answerStarted) {
process.stdout.write("\\n\\n=== Final Answer (Real-time) ===\\n");
answerStarted = true;
}
process.stdout.write(delta.content);
}
}

Using Thinking Mode in Multi-turn Conversations

In multi-turn conversations, you do not need to pass the reasoning_content from the previous round back to the model; only pass the content field.
Python
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)

messages = [{"role": "user", "content": "What is the 10th term of the Fibonacci sequence?"}]

response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=messages,
max_tokens=1024,
extra_body={"thinking": {"type": "enabled"}},
)

assistant_msg = response.choices[0].message
print("First Round Answer:", assistant_msg.content)

# Multi-turn Conversations: Only pass back the content, not the reasoning_content.
messages.append({"role": "assistant", "content": assistant_msg.content})
messages.append({"role": "user", "content": "What about the 20th item?"})

response2 = client.chat.completions.create(
model="deepseek-v4-flash",
messages=messages,
max_tokens=1024,
extra_body={"thinking": {"type": "enabled"}},
)
print("Second Round Answer:", response2.choices[0].message.content)
Note:
When writing back the assistant message in a multi-turn conversation, only pass the content field; do not pass the reasoning_content field.

JSON Mode

Setting response_format to json_object ensures that the model outputs valid JSON strings, which is suitable for scenarios requiring structured data.
Note:
When using JSON mode, you must explicitly instruct the model to output JSON format in the system or user message; otherwise, the model may continuously output empty content.
cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
-H "Content-Type: application/json" \\
-H "Authorization: Bearer YOUR_API_KEY" \\
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "system", "content": "Return the result in JSON format."},
{"role": "user", "content": "Return information for three Chinese cities, each containing the name, province, and population fields."}
],
"max_tokens": 512,
"response_format": {"type": "json_object"},
"thinking": {"type": "disabled"}
}'
import json
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)

response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "Return the result in JSON format."},
{
"role": "user",
"content": "Return information for three Chinese cities, each containing the name, province, and population fields.",
},
],
max_tokens=512,
response_format={"type": "json_object"},
extra_body={"thinking": {"type": "disabled"}},
)

result = json.loads(response.choices[0].message.content)
print(json.dumps(result, ensure_ascii=False, indent=2))
import OpenAI from "openai";

const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});

const response = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [
{ role: "system", content: "Return the result in JSON format." },
{
role: "user",
"content": "Return information for three Chinese cities, each containing the name, province, and population fields.",
},
],
max_tokens: 512,
response_format: { type: "json_object" },
// @ts-ignore
thinking: { type: "disabled" },
});

const result = JSON.parse(response.choices[0].message.content);
console.log(JSON.stringify(result, null, 2));
JSONObject body = new JSONObject();
body.put("model", "deepseek-v4-flash");
body.put("max_tokens", 512);
body.put("response_format", new JSONObject().put("type", "json_object"));
body.put("thinking", new JSONObject().put("type", "disabled"));
body.put("messages", new JSONArray()
.put(new JSONObject().put("role", "system").put("content", "Return the result in JSON format."))
.put(new JSONObject().put("role", "user").put("content",
"Return information for three Chinese cities, each containing the name, province, and population fields.")));
// ... Send the request and parse the returned JSON string.
body := map[string]interface{}{
"model": "deepseek-v4-flash",
"max_tokens": 512,
"response_format": map[string]string{"type": "json_object"},
"thinking": map[string]string{"type": "disabled"},
"messages": []map[string]string{
{"role": "system", "content": "Return the result in JSON format."},
{"role": "user", "content": "Return information for three Chinese cities, each containing the name, province, and population fields."},
},
}
// ... Send the request

Recommended Parameters and Best Practices

Parameter / Practice
Recommendation
Description
max_tokens
1024 to 4096 for general tasks; recommended ≥ 2048 for thinking mode
Reasoning content and the answer share the token quota.
thinking
Use disabled for simple Q&A; use enabled for logical reasoning and math problems.
Proper use can reduce costs.
stream
Enable it for long text generation.
Avoid request timeouts and improve the response experience.
temperature
Generally, no modification is required. Use the default value of 1.
For creative writing, increase to 1.3-1.5; for code generation, decrease to 0.2-0.5.
Multi-turn Conversation
Only return content; do not return reasoning_content.
Reduce token consumption.
Access Reasoning Fields via SDK
Use getattr(msg, "reasoning_content", None) for Python; use (msg as any).reasoning_content for Node.js.
This field is not defined in the OpenAI SDK type definitions.
Model Selection
Use deepseek-v4-flash for daily tasks; use deepseek-v4-pro for high-precision tasks.
Flash has a higher concurrency limit (2500 vs 500).

Use Limits

Restriction Item
Description
Thinking Mode and JSON Mode
It is not recommended to simultaneously enable thinking.type=enabled and response_format.type=json_object.
frequency_penalty / presence_penalty
Deprecated. Passing it has no effect.
Timeout risk
When thinking mode is enabled, the response time is longer. Use it with stream=true to avoid timeouts.

References

Language Model Invocation Overview: This TokenHub language model general invocation document contains general descriptions of BaseURL, API Key, multi-turn conversations, Function Calling, the Anthropic protocol, and more.

ヘルプとサポート

この記事はお役に立ちましたか?

フィードバック