
Feature | Application in AI Intelligent Customer Service Scenarios |
RTC | Streaming transmission technology ensures the continuity and stability of voice and video data, reduces latency and jitter, and delivers a high-quality experience comparable to human customer service calls. Users can engage in more natural conversations with the intelligent customer service system, similar to talking with a real customer service agent. This interactive experience can significantly improve user satisfaction. |
Conversational AI | Tencent Conversational AI enables businesses to flexibly connect with multiple large language models and build real-time audio and video interactions between AI and users. Powered by global low-delay transmission of Tencent Real-Time Communication (Tencent RTC), Conversational AI achieves voice conversation latency as low as 1 second and delivers natural and realistic conversation effects, making integration convenient and ready to use out of the box. |
STT | The STT module captures the user's voice stream in real time, converts it into text, and then sends it to the LLM for processing. Leveraging TRTC's ultra-low-latency audio pipeline and advanced audio processing capabilities, including AI noise reduction and echo cancellation, the STT module delivers clear and accurate transcription even in noisy environments. |
LLM | LLM technology enables intelligent voice customer service to better understand the context of conversations, achieving coherent conversations. LLM can capture semantic and contextual information in conversations, identify user intentions, and associate the content of the previous conversation with the current one. |
TTS | The integration of third-party TTS is supported. By introducing personalized training data to a model or adjusting model parameters, TTS can generate voice output that meets specific requirements. Intelligent voice customer service can offer different voice styles based on user preferences or the needs of specific scenarios. |



TRTCAppSceneAudioCall or the digital human video customer service: TRTCAppSceneVideoCall is recommended.// Enable capture via microphone and set the mode to SPEECH mode (strong denoising capability and resistance to poor network conditions).mCloud.startLocalAudio(TRTCCloudDef.TRTC_AUDIO_QUALITY_SPEECH);
self.trtcCloud = [TRTCCloud sharedInstance];// Enable capture via microphone and set the mode to SPEECH mode (strong denoising capability and resistance to poor network conditions).[self.trtcCloud startLocalAudio:TRTCAudioQualitySpeech];
// Enable capture via microphone and set the mode to SPEECH mode (strong denoising capability and resistance to poor network conditions).trtcCloud.startLocalAudio(TRTCAudioQuality.speech);
await trtc.startLocalAudio({ option: { profile: TRTC.TYPE.AUDIO_PROFILE_STANDARD }});
// Enable capture via microphone and set the mode to SPEECH mode.// Provide strong denoising capability and resistance to poor network conditions.ITRTCCloud* trtcCloud = CRTCWindowsApp::GetInstance()->trtc_cloud_;trtcCloud->startLocalAudio(TRTCAudioQualitySpeech);
// Enable capture via microphone and set the mode to SPEECH mode.// Provide strong denoising capability and resistance to poor network conditions.AppDelegate *appDelegate = (AppDelegate *)[[NSApplication sharedApplication] delegate];[appDelegate.trtcCloud startLocalAudio:TRTCAudioQualitySpeech];
STTConfig, LLMConfig, and TTSConfig with the STT/LLM/TTS-related information from the Prerequisites.STTConfig by using Tencent ASR as the STT engine.{"Language": "zh","VadSilenceTime": 1000}
LLMConfig by using an LLM model that follows the OpenAI standard protocol as an example.Field | Type | Required | Description |
LLMType | String | Yes | The LLM type. For any LLM that complies with the OpenAI API protocol, enter openai. |
Model | String | Yes | Specific LLM name. For example, gpt-4o and deepseek-chat. |
APIKey | String | Yes | The APIKey for the LLM. |
APIUrl | String | Yes | The APIUrl for the LLM. |
Streaming | Boolean | No | Whether streaming is enabled. The default value is false. It is recommended to set it to true. |
SystemPrompt | String | No | System prompt. |
Timeout | Float | No | Timeout period. Value range: [1, 50]. Default value: 3 seconds (Unit: second). |
History | Integer | No | Set the context rounds for LLM. Default value: 0 (No context management is provided). Maximum value: 50 (Context management is provided for the most recent 50 rounds). |
MaxTokens | Integer | No | Maximum token limit for output text. |
Temperature | Float | No | Sampling temperature. |
TopP | Float | No | Selection range for sampling. This parameter controls the diversity of output tokens. |
UserMessages | Object[] | No | User prompt. |
MetaInfo | Object | No | Custom parameters. These parameters will be contained in the request body and passed to the LLM. |
{"LLMType": "openai","Model": "gpt-4o","APIKey": "api-key","APIUrl": "https://api.openai.com/v1/chat/completions","Streaming": true,"SystemPrompt": "You are a personal assistant","Timeout": 3.0,"History": 5,"MetaInfo": {},"MaxTokens": 4096,"Temperature": 0.8,"TopP": 0.8,"UserMessages": [{"Role": "user","Content": "content"},{"Role": "assistant","Content": "content"}]}
TTSConfig by using the built-in TTS in TRTC as an example.Field | Type | Required | Description |
TTSType | String | Yes | Fixed value: "flow". |
VoiceId | String | Yes | |
Model | String | Yes | TTS model version. Current default: flow_01_turbo. |
Speed | Float | No | Speech rate. Range: [0.5, 2.0]. Default value: 1.0. |
Volume | Float | No | Volume. Range: [0, 10]. Default value: -1.0. |
Pitch | Integer | No | Pitch adjustment. Range: [-12, 12]. Default value: 0. |
Language | String | No | Language ID: zh (Chinese), en (English), yue (Cantonese). |
{"TTSType": "flow","VoiceId": "v-female-R2s4N9qJ","Model": "flow_01_turbo","Speed": 1.0,"Volume": 1.0,"Pitch": 0,"Language": "zh"}
STTConfig, LLMConfig, and TTSConfig.RoomId must match the RoomId used by the client to enter the room, and the room ID type (numeric or string) must also be the same. This means the bot and the user must be in the same room.TargetUserId must match the UserId used by the client user to enter the room.LLMConfig and TTSConfig are JSON strings and should be properly configured before you can successfully initiate a real-time AI conversation.{"type": 10000, // 10000 indicates the delivery of real-time subtitles."sender": "user_a", // The user ID of the speaker."receiver": [], // List of receiver user IDs. This message is actually broadcast within the room."payload": {"text":"", // The text recognized by Automatic Speech Recognition (ASR)."translation_text":"", // The translated text."start_time":"00:00:01", // The start time of this sentence."end_time":"00:00:02", // The end time of this sentence."roundid": "xxxxx", // Unique identifier of a conversation round."end": true // If true, it indicates this is a complete sentence.}}
{"type": 10001, // The status of the AI chatbot."sender": "user_a", // The user ID of the sender, which represents the chatbot's ID in this case."receiver": [], // List of receiver user IDs. This message is actually broadcast within the room."payload": {"roundid": "xxx", // A unique identifier for a single conversation round."timestamp": 123,"state": 1, // 1 Listening 2 Thinking 3 Speaking 4 Interrupted}}
@Overridepublic void onRecvCustomCmdMsg(String userId, int cmdID, int seq, byte[] message) {String data = new String(message, StandardCharsets.UTF_8);try {JSONObject jsonData = new JSONObject(data);Log.i(TAG, String.format("receive custom msg from %s cmdId: %d seq: %d data: %s", userId, cmdID, seq, data));} catch (JSONException e) {Log.e(TAG, "onRecvCustomCmdMsg err");throw new RuntimeException(e);}}
func onRecvCustomCmdMsgUserId(_ userId: String, cmdID: Int, seq: UInt32, message: Data) {if cmdID == 1 {do {if let jsonObject = try JSONSerialization.jsonObject(with: message, options: []) as? [String: Any] {print("Dictionary: \\(jsonObject)")} else {print("The data is not a dictionary.")}} catch {print("Error parsing JSON: \\(error)")}}}
trtcClient.on(TRTC.EVENT.CUSTOM_MESSAGE, (event) => {let data = new TextDecoder().decode(event.data);let jsonData = JSON.parse(data);console.log(`receive custom msg from ${event.userId} cmdId: ${event.cmdId} seq: ${event.seq} data: ${data}`);if (jsonData.type == 10000 && jsonData.payload.end == false) {// Subtitle intermediate state.} else if (jsonData.type == 10000 && jsonData.payload.end == true) {// That is all for this sentence.}});
void onRecvCustomCmdMsg(const char* userId, int cmdID, int seq,const uint8_t* message, uint32_t msgLen) {std::string data;if (message != nullptr && msgLen > 0) {data.assign(reinterpret_cast<const char*>(message), msgLen);}if (cmdID == 1) {try {auto j = nlohmann::json::parse(data);std::cout << "Dictionary: " << j.dump() << std::endl;} catch (const std::exception& e) {std::cerr << "Error parsing JSON: " << e.what() << std::endl;}return;}}
void onRecvCustomCmdMsg(String userId, int cmdID, int seq, String message) {if (cmdID == 1) {try {final decoded = json.decode(message);if (decoded is Map<String, dynamic>) {print('Dictionary: $decoded');} else {print('The data is not a dictionary. Raw: $decoded');}} catch (e) {print('Error parsing JSON: $e');}return;}}
type | Description |
20000 | Send custom text, skip the ASR process, and directly communicate with the AI Service via text. |
20001 | Send an interruption signal to interrupt. |
{"type": 20000, // Custom text message sent by the client."sender": "user_a", // Sender userid. The server will check whether the userid is valid."receiver": ["user_bot"], // List of receiver userid. Fill in the chatbot userid. The server will check whether the userid is valid."payload": {"id": "uuid", // Message ID. You can use a UUID. The ID is used for troubleshooting."message": "xxx", // Message content."timestamp": 123 // Timestamp, used for troubleshooting.}}
{"type": 20001, // Interruption signal sent by the client."sender": "user_a", // Sender userid. The server will check whether the userid is valid."receiver": ["user_bot"], // List of receiver userid. Fill in the chatbot userid. The server will check whether the userid is valid."payload": {"id": "uuid", // Message ID. You can use a UUID. The ID is used for troubleshooting."timestamp": 123 // Timestamp, used for troubleshooting.}}
appkey, accesstoken, and virtualmanProjectId.AvatarConfig parameter. A sample JSON is as follows:{"AvatarType": "tencent", // Digital human type. Currently, only tencent is supported."Appkey": "appkey", // The appkey for the digital human service."AccessToken": "accesstoken", // The accesstoken for the digital human service."VirtualmanProjectId": "virtualmanProjectId", // The virtualmanProjectId for the digital human service."AvatarUserID": "robot_xxxx", // The user ID for the TRTC digital human user."DriverType": 1, // Digital human driving method (text-driven only)."AvatarUserSig": "eJw1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", // The user signature for the TRTC digital human user.}
trtc.on(TRTC.EVENT.REMOTE_VIDEO_AVAILABLE, ({ userId, streamType }) => {// To play the video image, you need to place an HTMLElement in the DOM,// which can be a div tag, assuming its id is `${userId}_${streamType}`const view = `${userId}_${streamType}`;trtc.startRemoteVideo({ userId, streamType, view });});
onUserVideoAvailable(userId, true) notification, it indicates that playable video frames for that video stream have arrived.startRemoteView API.// Play the main video view of the remote user robot_xxxx.trtcCloud.startRemoteView("robot_xxxx", TRTCVideoStreamType.big, remoteViewId);
onUserVideoAvailable(userId, true) notification, it indicates that playable video frames for that video stream have arrived.startRemoteView to play the remote video view.// Play the remote video view.TXCloudVideoView cameraVideo = findViewById(R.id.txcvv_main_local);mCloud.startRemoteView("robot_xxxx", TRTCCloudDef.TRTC_VIDEO_STREAM_TYPE_BIG, cameraVideo); // Play the remote video content in high-definition full-screen view.
onUserVideoAvailable(userId, YES) notification, it indicates that playable video frames for that video stream have arrived.startRemoteView to play the remote video view.- (void)startRemoteView {// Play the remote video view.AppDelegate *appDelegate = (AppDelegate *)[[UIApplication sharedApplication] delegate];[appDelegate.trtcCloud startRemoteView:@"robot_xxxx" streamType:TRTCVideoStreamTypeBig view:self.remoteVideoView];}
{"type": "function","function": {"name": "transfer_to_agent","description": "This function is invoked when a user explicitly requests a human agent, when a problem exceeds the AI's capabilities, or when it involves complaints, refunds, or disputes. It is not invoked for casual chat or complaints.","parameters": {"type": "object","properties": {"reason": { "type": "string", "description": "Reason for transferring to a human agent" },"department": { "type": "string", "enum": ["After-sales", "Technical", "Complaint"], "description": "Target skill group" },"urgency": { "type": "string", "enum": ["low", "high"], "description": "Urgency level" }},"required": ["reason"]}}}
LLMConfig field of the StartAIConversation API. Therefore, the essence of "injecting a custom knowledge base/implementing RAG" is to connect a backend or platform with search capabilities at the LLM stage. TRTC automatically injects the following HTTP request headers into each LLM request, which can be used by the business backend for user-level knowledge base routing, authentication, or logging.Request header | Description |
X-Task-Id | The unique task identifier for the current AI session. |
X-Request-Id | The request identifier, which remains consistent when the same request is retried. |
X-Sdk-App-Id | The SdkAppId of your TRTC application. |
X-User-Id | The user ID in the current session. |
X-Room-Id | The room ID of the current TRTC session. |
X-Room-Id-Type | The room ID type. "0" = numeric, "1" = string. |
Implementation Path | LLMType | RAG/Knowledge Base Location | Transformation Cost | Scenarios |
Self-built OpenAI-compatible middleware layer | openai | business backend (self-hosted vector database/ES) | High | Private deployment, full control, complex RAG |
dify | Dify platform knowledge base | Low | Visual low-code RAG, rapid setup | |
coze | Coze platform knowledge base | Low | No-code bots, built-in plugins |
/v1/chat/completions API that complies with OpenAI specifications, and TRTC calls it as a regular OpenAI model. The sequence of "searching the knowledge base → concatenating context → calling the real large model → streaming return" is completed within the business API, making the RAG logic fully self-managed and controllable.LLMConfig.LLMConfig.LLMConfig field in StartAIConversation and can be used in combination. The following section introduces the three context-related fields in LLMConfig, which constitute the three-layer memory system.Field | Memory Level | Content | Precision | Time Span | Management Party |
SystemPrompt | Long-term memory | Basic Persona + LLM Summary of User Long-Term Preferences | Medium (Summary) | Long-term | The business side maintains and concatenates summaries. |
UserMessages | Short-term memory | Original text of the last N external historical messages | High (Original Text) | Short-term | The business side injects it when starting a conversation. |
History | In-call memory | Multiple dialogue turns during the current TRTC call | High (Original Text) | Current call | TRTC automatically manages it, with a maximum of 50 rounds. |
SystemPrompt, the user's recent inquiry original text is injected into UserMessages, and History ensures multi-turn coherence within the current voice call.{"LLMType": "openai","Model": "gpt-5.5","APIKey": "<your_openai_api_key>","APIUrl": "https://api.openai.com/v1/chat/completions","Streaming": true,"SystemPrompt": "You are an AI customer service assistant for an e-commerce platform, responsible for order inquiries, returns and exchanges, logistics tracking, and product consultations. Provide concise and friendly responses.\\n\\n[User Long-term Memory Summary]\\n- User nickname: Xiao Ming, Black Card member, prefers concise and direct replies.\\n- Frequent purchase categories: Digital 3C products, no history of return disputes.\\n- In the last call, the user inquired about 'Bluetooth earbud right side has no sound', and re-pairing was suggested.","Timeout": 3.0,"History": 10,"UserMessages": [{ "Role": "user", "Content": "I want to return the earbuds I bought last week." },{ "Role": "assistant", "Content": "Okay. Your order NO.20260528001 (Bluetooth earbuds) is within the 7-day no-reason return period, so a return can be processed for you." },{ "Role": "user", "Content": "How long does it take for the refund to be processed?" },{ "Role": "assistant", "Content": "The refund will be returned to your original payment account within 1-3 business days after the returned item is received." },]}
UserMessages and the long-term summary data from SystemPrompt?UserMessages results in greater Token consumption and longer processing time for each LLM invocation. Therefore, you should configure it appropriately based on actual requirements and cost considerations.SystemPrompt summary within 300 Tokens to prevent excessive length from occupying the conversation context window and affecting the LLM's first Token latency.AgentConfig.InterruptSpeechDuration and STTConfig.VadSilenceTime parameters in the Start AI Conversation API to increase or decrease the interruption latency. It is also recommended to enable the far-field voice suppression capability to reduce the probability of false interruptions.STTConfig.VadLevel parameter to 2 or 3, which provides effective far-field voice suppression.Parameter | Type | Description |
AgentConfig.InterruptSpeechDuration | Integer | Used when InterruptMode is 0. Unit: millisecond. Default value: 500 ms. This means that the server will interrupt when it detects continuous human speech for the specified InterruptSpeechDuration duration. Example value: 500 |
STTConfig.VadSilenceTime | Integer | ASR VAD time. Range: [240, 2000]. Default value: 1000. Unit: ms. A smaller value makes ASR sentence segmentation faster. Example value: 1000 |
STTConfig.VadLevel | Integer | The far-field human voice suppression capability of VAD (which does not affect ASR recognition performance). Range: [0, 5]. Default value: 0, which means the far-field human voice suppression capability is not enabled. A value of 2 is recommended for good far-field human voice suppression. In a noisy office environment, a value of 3 can be used. In even noisier environments, values of 4 or 5 can be used. Note that a higher VadLevel may filter out single words as noise. Example value: 2 |
metainfo to the LLM's returned content. After the AI service detects the metainfo, it will push the data to the client SDK via Custom Message, thereby completing the transparent transmission of the metainfo.chat.completion.chunk objects, a meta.info chunk is returned at the same time.{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-xxxx", "system_fingerprint": "fp_xxxx", "choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-xxxx", "system_fingerprint": "fp_xxxx", "choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}// Add the following custom message.{"id":"chatcmpl-123","type":"meta.info","created":1694268190,"metainfo": {}}{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-xxxx", "system_fingerprint": "fp_xxxx", "choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}
metainfo, it will distribute the data via the Custom Message feature of RTC Engine. The client can receive the data through the onRecvCustomCmdMsg API in the SDK callback.{"type": 10002, // Custom message."sender": "user_a", // The user ID of the sender, which represents the chatbot's ID in this case."receiver": [], // List of receiver userid. The message is actually broadcast within the room."roundid": "xxxxxx","payload": {} // metainfo}
RoomId in StartAIConversation matches the RoomId used by the client to enter the room, and that the room ID type (RoomIdType) also matches.LLMConfig and TTSConfig is correct.SecretId / SecretKey) are valid and that you have completed the authorization for the QcloudTRTCFullAccess full read/write access permission.Service Category | Error Code | Error Description |
STT(ASR) | 30100 | Requests timed out. |
| 30102 | Internal error. |
LLM | 30200 | The LLM request timed out. |
| 30201 | The LLM request was rate-limited. |
| 30202 | The LLM service returned a failure. |
TTS | 30300 | The TTS service request timed out. |
| 30301 | The TTS request was rate-limited. |
| 30302 | The TTS service returned a failure. |
llm error Timeout on reading data from socket, it usually indicates that the LLM request has timed out. You can appropriately increase the value of the Timeout parameter in LLMConfig (the default is 3 seconds). In addition, when the first-token duration of the LLM exceeds 3 seconds, the relatively high conversation latency may impact the AI conversation experience. If there are no special requirements, we recommend optimizing the first-token duration of the LLM. See Conversation Latency Optimization.AgentConfig.FilterOneWord parameter in the Start AI Conversation API is set to false (the default is true, which filters out sentences where the user only says one word).Parameter | Type | Description |
FilterOneWord | Boolean | Whether to filter out sentences where the user spoke only one word. true indicates filtering, false indicates no filtering. The default value is true. Example value: true. |
onError callback. Common errors are listed in the following table:Error | Error Code | Error Description |
ERR_TRTC_USER_SIG_CHECK_FAILED | -100018 | UserSig verification failed. Check whether the signature is correct or has expired. |
ERR_TRTC_CONNECT_SERVER_TIMEOUT | -3308 | Room entry request timed out. Check whether the internet connection is lost or a VPN is enabled. |
ERR_TRTC_INVALID_SDK_APPID | -3317 | Room entry parameter sdkAppId is incorrect. Check whether TRTCParams.sdkAppId is empty. |
ERR_MIC_NOT_AUTHORIZED | -1317 | Microphone device not authorized. |
Was this page helpful?
You can also Contact sales or Submit a Ticket for help.
Help us improve! Rate your documentation experience in 5 mins.
Feedback