LLM in telephony
Beta
In the phone channel, you can use the llmRequest
reply type so that the bot receives text from the LLM and synthesizes speech in streaming mode.
To get started:
- Add a secret for Caila.
- Use the
llmRequest
reply type in your script.
LLM and TTS providers
-
To generate texts, you can currently only access models of the
openai-proxy
service on the Caila platform.cautionLLMs are only available in the paid plan. To use the models, top up your balance in Caila.
-
Speech synthesis works with any TTS provider.
llmRequest advantages
The llmRequest
reply type generates text and synthesizes speech in streaming mode.
The bot receives text from the LLM by sentences and synthesizes speech by sentences as well.
The two processes occur in parallel.
This way you can reduce pauses before bot responses compared to when the generation and synthesis are done sequentially.
Example of sequential generation and synthesis without llmRequest
// Bot receives text from LLM
var llmResponse = $gpt.createChatCompletion([{ "role": "user", "content": $request.query }]);
var response = llmResponse.choices[0].message.content;
// Bot synthesizes speech for entire text
$reactions.answer(response);
Here, the bot:
- Accesses the LLM via
$gpt.createChatCompletion
. The bot is waiting for the LLM to generate the text completely. - Sends the entire text to speech synthesis and waits until speech is synthesized for the entire text.
- Plays the text.
In this case, there might be a long pause of several seconds between the user request and the bot response.
llmRequest
also lets you specify phrases that the bot will say to fill the pause at the beginning of text generation.
Access token for Caila
To use services and generative models from Caila in third-party applications, including JAICP, you need a personal access token. To issue a token:
-
Go to Caila.
tipCaila and Conversational Cloud use a shared account base, so if you are registered on Conversational Cloud, you do not need to register additionally on Caila.
-
Go to the My space → API tokens section.
-
In the upper right corner, click Create token.
-
Give a name to the token, generate it, and copy it to the clipboard.
Next, add this token to JAICP:
-
Go to JAICP. In the Secrets and variables section add a new secret.
-
In the
llmRequest
reply, specify the secret name in thetokenSecret
property.
Use llmRequest in script
state: NoMatch
event!: noMatch
script:
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
provider: "CAILA_OPEN_AI",
// Text generation model
model: "gpt-4o",
// Secret name
tokenSecret: "MY_LLM_TOKEN",
// Prompt and user request
messages: [
{"role": "system", "content": "Keep answers short. A few sentences at most."},
{"role": "user","content": $request.query}
]
});
In this example, llmRequest
is used in the NoMatch
state:
-
The bot sends a request to generate text to the
openai-proxy
service on the Caila platform. Themessages
field contains:- Prompt for the LLM to make the model generate a short answer.
- User request text stored in
$request.query
.
-
Once the bot receives the first sentence from LLM, the bot will start synthesizing speech.
-
The bot plays the first sentence to the user.
-
The bot continues to synthesize and play speech by sentences until it receives the entire text from LLM.
After transitioning to the state, the bot immediately starts preparing text and speech for llmRequest
.
The Caila and speech synthesis limits can be charged even if the user ended the call and the bot did not play the speech.
Fill pauses
In the script, a pause might occur while the bot is waiting for the first sentence of text from the LLM. There are two ways to fill this pause:
-
Use the
fillersPhraseConfig
setting. You can specify a phrase that the bot will say at the beginning of generation. This will help fill the pause if it is too long.state: NoMatch
event!: noMatch
script:
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
…
// The bot says the phrase if the pause exceeds 2000 ms.
fillersPhraseConfig: {"fillerPhrase": "Great question!", "activationDelayMs": 2000}
}); -
Specify other responses before
llmRequest
. After transitioning to the state, the bot immediately starts preparing text and speech forllmRequest
. The bot can perform other reactions beforellmRequest
while waiting for a response from the LLM.state: NoMatch
event!: noMatch
# The bot stars generating the llmRequest response right after transitioning to the state
a: Great question!
a: Let me think
# The bot has already said 2 phrases. During this time, it has prepared some part of the answer
script:
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
…
});
Barge-in
A user can interrupt the bot if the bot plays speech via llmRequest
.
In the $dialer.bargeInResponse
method, set the forced
barge-in mode.
If the user interrupts the bot, the bot stops the speech and does not play the LLM response to the end:
state: NoMatch
event!: noMatch
script:
// Barge-in settings
$dialer.bargeInResponse({
bargeIn: "forced",
bargeInTrigger: "final",
noInterruptTime: 0
});
// llmRequest response
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
provider: "CAILA_OPEN_AI",
model: "gpt-4o",
tokenSecret: "MY_LLM_TOKEN",
messages: [{"role": "user", "content": $request.query}],
});
Conditional barge-in
You can also configure conditional barge-in.
To do this, pass a bargeInReply
object to llmRequest
.
In the example below:
- The bot creates an empty response with the
bargeInIf
parameter. - The bot extracts a
bargeInReply
object from this response and passes the object tollmRequest
. - The barge-in is activated if the user request contains “support agent”.
state: NoMatch
event!: noMatch
# Create an empty response with the bargeInIf parameter
a: || bargeInIf = "LLM response"
script:
// Barge-in settings
$dialer.bargeInResponse({
bargeIn: "forced",
bargeInTrigger: "final",
noInterruptTime: 0
});
// Save bargeInReply from the empty response and delete the empty response
var bargeInReplyObject = $response.replies.pop().bargeInReply;
// llmRequest response
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
provider: "CAILA_OPEN_AI",
model: "gpt-4o",
tokenSecret: "MY_LLM_TOKEN",
messages: [
{"role": "user","content": $request.query}
],
// Pass the bargeInReply object
bargeInReply: bargeInReplyObject
});
state: BargeInCondition || noContext = true
event!: bargeInIntent
script:
var text = $dialer.getBargeInIntentStatus().text;
// The barge-in is activated if the user request contains “support agent”
if (text.indexOf("support agent") > -1) {
$dialer.bargeInInterrupt(true);
}
- Contextual barge-in is not supported for
llmRequest
. - If the user interrupted the bot, the
llmRequest
text is not available in the dialog history, for example in the$jsapi.chatHistoryInLlmFormat
method. You can view the response text in analytics.