use-llm
translated_in_crowdin: true
LLM in telephony
Beta
In the phone channel, you can use the llmRequest reply type so that the bot receives text from an LLM and synthesizes speech in streaming mode.
The bot receives text from the LLM by sentences and synthesizes speech by sentences as well. The two processes occur in parallel. This way you can reduce pauses before bot responses compared to when the generation and synthesis are done sequentially.
Example of sequential generation and synthesis without llmRequest
// Bot receives text from LLM
var llmResponse = $gpt.createChatCompletion([{ "role": "user", "content": $request.query }]);
var response = llmResponse.choices[0].message.content;
// Bot synthesizes speech for entire text
$reactions.answer(response);
Here, the bot:
- Accesses the LLM via
$gpt.createChatCompletion. The bot is waiting for the LLM to generate the text completely. - Sends the entire text to speech synthesis and waits until speech is synthesized for the entire text.
- Plays the speech.
In this case, there might be a long pause of several seconds between the user request and the bot response.
llmRequest also lets you specify phrases that the bot will say to fill the pause at the beginning of text generation.
Providers
LLM
To generate texts, you can use one of the following options:
-
Caila platform
Use models via the
openai-proxyservice on the Caila platform. LLMs are only available in the paid plan. To use the models, top up your balance in Caila. -
Third-party LLM provider
Set up a direct connection to your provider. This way you can use models that are not available in the
openai-proxyservice.Usage details-
The
llmRequestreply only supports LLMs that are compatible with the OpenAI Streaming API. For example, you can connect YandexGPT. -
Billing for LLM requests is handled by your provider.
-
Some providers might not be available for direct connection from Russia.
For more details on the settings, see the
llmRequestarticle. -
TTS
Speech synthesis works with any TTS provider.
Use llmRequest in script
state: NoMatch
event!: noMatch
script:
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
provider: "CAILA_OPEN_AI",
// Text generation model
model: "gpt-4o",
// Secret name
tokenSecret: "MY_LLM_TOKEN",
// Prompt and user request
messages: [
{"role": "system", "content": "Keep answers short. A few sentences at most."},
{"role": "user","content": $request.query}
]
});
In this example, llmRequest is used in the NoMatch state:
-
The bot sends a request to generate text to the
openai-proxyservice on the Caila platform. Themessagesfield contains:- Prompt for the LLM to make the model generate a short answer.
- User request text stored in
$request.query.
-
Once the bot receives the first sentence from LLM, the bot will start synthesizing speech.
-
The bot plays the first sentence to the user.
-
The bot continues to synthesize and play speech by sentences until it receives the entire text from LLM.
After transitioning to the state, the bot immediately starts preparing text and speech for llmRequest.
LLM and speech synthesis limits might be charged even if the user ends the call before the bot plays the speech.
Fill pauses
In the script, a pause might occur while the bot is waiting for the first sentence of text from the LLM. There are two ways to fill this pause:
-
Use the
fillersPhraseConfigsetting. You can specify a phrase that the bot will say at the beginning of generation. This will help fill the pause if it is too long.state: NoMatch
event!: noMatch
script:
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
// The bot says a random phrase from the list if the pause exceeds 2000 ms.
fillersPhraseConfig: {
"fillerPhrasesList": ["Great question!", "Just a moment"],
"activationDelayMs": 2000
},
…
}); -
Specify other responses before
llmRequest. After transitioning to the state, the bot immediately starts preparing text and speech forllmRequest. The bot can perform other reactions beforellmRequestwhile waiting for a response from the LLM.state: NoMatch
event!: noMatch
# The bot stars generating the llmRequest response right after transitioning to the state
a: Great question!
a: Let me think
# The bot has already said two phrases. During this time, it has prepared some part of the answer
script:
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
…
});
Barge-in
A user can interrupt the bot if the bot plays speech via llmRequest.
In the $dialer.bargeInResponse method, set the forced barge-in mode.
If the user interrupts the bot, the bot stops the speech and does not play the LLM response to the end:
state: NoMatch
event!: noMatch
script:
// Barge-in settings
$dialer.bargeInResponse({
bargeIn: "forced",
bargeInTrigger: "final",
noInterruptTime: 0
});
// llmRequest response
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
provider: "CAILA_OPEN_AI",
model: "gpt-4o",
tokenSecret: "MY_LLM_TOKEN",
messages: [{"role": "user", "content": $request.query}],
});
Conditional barge-in
You can also configure conditional barge-in.
To do this, pass a bargeInReply object to llmRequest.
In the example below:
- The bot creates an empty response with the
bargeInIfparameter. - The bot extracts a
bargeInReplyobject from this response and passes the object tollmRequest. - The barge-in is activated if the user request contains “support agent”.
Example:
state: NoMatch
event!: noMatch
# Create an empty response with the bargeInIf parameter
a: || bargeInIf = "LLM response"
script:
// Barge-in settings
$dialer.bargeInResponse({
bargeIn: "forced",
bargeInTrigger: "final",
noInterruptTime: 0
});
// Save bargeInReply from the empty response and delete the empty response
var bargeInReplyObject = $response.replies.pop().bargeInReply;
// llmRequest response
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
provider: "CAILA_OPEN_AI",
model: "gpt-4o",
tokenSecret: "MY_LLM_TOKEN",
messages: [
{"role": "user","content": $request.query}
],
// Pass the bargeInReply object
bargeInReply: bargeInReplyObject
});
state: BargeInCondition || noContext = true
event!: bargeInIntent
script:
var text = $dialer.getBargeInIntentStatus().text;
// The barge-in is activated if the user request contains “support agent”
if (text.indexOf("support agent") > -1) {
$dialer.bargeInInterrupt(true);
}
- Contextual barge-in is not supported for
llmRequest. - If the user interrupted the bot, the
llmRequesttext is not available in the dialog history, for example in the$jsapi.chatHistoryInLlmFormatmethod. You can view the response text in analytics.
Limit LLM response length
In the phone channel, relatively short LLM responses are preferred.
You can control the length of responses using the LLM parameter max_completion_tokens.
10 seconds of speech in Russian corresponds to approximately 30–35 tokens.
The parameter name, allowed values, and how it affects the response depend on the specified model and version.
Older GPT models may use max_tokens.
Example:
state: NoMatch
event!: noMatch
script:
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
provider: "CAILA_OPEN_AI",
// Secret name in JAICP
tokenSecret: "MY_TOKEN",
// Text generation model
model: "gpt-4o",
// LLM response length limit
parameters: { max_completion_tokens: 150 },
// Dialog history
messages: $jsapi.chatHistoryInLlmFormat()
});
Function calling
Instead of generating a text response, the LLM can call a function.
In this case, an event whose name is specified in the eventName parameter is triggered in the script.
The state that handles this event must contain the code to be executed.
- Function calling is supported only for
provider: "CUSTOM_LLM". - The LLM must support function calling.
- Currently, the bot cannot use a function to end the call.
For example, if a function contains
$dialer.hangUp, it will not end the call.
The example below shows a script for a bot that helps customers of an online streaming service. The LLM can use two functions:
-
If a user reports an issue, the bot calls the
reportIssuefunction to log the issue for technical support. In this message, the bot includes only the key details and assigns a priority. -
If a user wants to select a subscription plan, the bot asks how many devices the plan needs to support. The bot then calls the
findPlanfunction to find a suitable plan.
- main.sc
- prompt.yml
- tools.js
- functions.js
# Function descriptions
require: tools.js
# Function code
require: functions.js
# LLM system prompt
require: prompt.yml
var = prompt
theme: /
state: Start
q!: $regex</start>
a: Hello!
script: $jsapi.startSession();
# State for the llmRequest
state: NoMatch
event!: noMatch
script:
// System prompt
var systemPrompt = {
role: "system",
// Text prompt from the prompt.yml file
content: prompt.text
};
// Dialog history
var history = $jsapi.chatHistoryInLlmFormat();
// Dialog history with system prompt
var historyWithPrompt = [systemPrompt].concat(history);
// LLM response
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
provider: "CUSTOM_LLM",
model: "example/model-1234",
tokenSecret: "MY_TOKEN",
headers: {"Authorization": "Bearer MY_TOKEN","Content-Type": "application/json"},
url: "https://example.com/api/chat/completions",
// Dialog history with prompt
messages: historyWithPrompt,
// Function descriptions from tools.js
tools: myTools,
// Event name if the LLM calls one of the functions
eventName: "toolUsed"
});
# State for handling LLM function calls
state: Tools
event!: toolUsed
script:
// Get the function name
var toolName = $request.data.eventData.tool_call[0].name;
// Get the function arguments
var toolArgs = JSON.parse($request.data.eventData.tool_call[0].arguments);
// If the user reported an issue:
if (toolName === "reportIssue") {
// Function from functions.js
reportIssue(toolArgs.summary, toolArgs.priority);
// If the user asked to find a subscription plan:
} else if (toolName === "findPlan") {
// Function from functions.js
findPlan(toolArgs.devices);
}
$reactions.answer("Is there anything else I can help you with?");
text: |
You are a voice assistant for an online streaming service.
The user can make two types of requests.
1. If the user reports an issue (e.g., a movie won't play, a payment error, etc.), call the reportIssue function.
2. If the user wants to select a subscription plan, find out how many devices they want to connect.
Then, call the findPlan function, passing this number as a parameter.
Keep your responses brief, clear, and friendly. Do not make anything up.
var myTools = [
{
"type": "function",
"function": {
"name": "reportIssue",
"description": "Create a support ticket for a user's issue",
"parameters": {
"type": "object",
"properties": {
"summary": {
"type": "string",
"description": "A summary of the issue's most important details."
},
"priority": {
"type": "string",
"description": "The priority of the issue. Determine this from the conversation.",
"enum": ["low", "medium", "high"]
}
},
"required": [
"summary",
"priority"
]
}
}
},
{
"type": "function",
"function": {
"name": "findPlan",
"description": "Finds the appropriate subscription plan for a given number of devices",
"parameters": {
"type": "object",
"properties": {
"devices": {
"type": "integer",
"description": "The number of devices the user wants to connect"
}
},
"required": ["devices"]
}
}
}
];
function reportIssue(summary, priority) {
// Log the issue details for analytics
$analytics.setComment(summary + " | Priority: " + priority);
// Bot responds to the user
$reactions.answer("Thank you! We've forwarded your information to our support team.");
};
function findPlan(devices) {
// List of plans
var plans = [
{ name: "Basic", max: 1 },
{ name: "Standard", max: 3 },
{ name: "Premium", max: 5 }
];
// Find the first suitable plan
var plan = null;
for (var i = 0; i < plans.length; i++) {
if (plans[i].max >= devices) {
plan = plans[i];
break;
}
}
// Respond to the user
if (plan) {
$reactions.answer("The most suitable plan for you is: " + plan.name);
} else {
$reactions.answer("Sorry, I couldn't find a suitable plan for you.");
}
};
In the NoMatch state, the llmRequest reply type is used:
historyWithPromptstores the system prompt and dialog history. The system prompt instructs the LLM on when to call a function.myToolscontains descriptions of the functions the LLM can call.eventNamespecifies the event triggered in the script when the LLM calls a function.
The Tools state handles the toolUsed event:
-
The state contains conditions based on the function name.
-
If
reportIssueis called, the bot executes the corresponding function fromfunctions.js, which then writes a comment about the issue to analytics.tipWhen an LLM calls a function instead of generating a response, the tool call is logged in analytics in the following format:
CollectedToolCalls(eventName=toolUsed, toolCalls=[ToolCall(index=0, id=abcde-12345, type=function, name=findPlan, arguments={"devices": 5})]). -
If the
findPlantool is called, the bot executes the corresponding function fromfunctions.jsto inform the user about a suitable subscription plan.