Skip to main content

llmRequest

Beta

This reply type lets you use LLMs in the phone channel with minimal pauses. With the llmRequest, the bot receives text from an LLM and synthesizes speech in streaming mode.

This reply is supported only in the phone channel.

tip

To learn more about using this reply in a script, see the LLM in telephony section.

Properties

PropertyTypeRequiredDescription
providerStringYesLLM provider:
  • Specify CAILA_OPEN_AI to use the Caila platform. In this case, the bot uses models from the openai-proxy service.
  • Specify CUSTOM_LLM if you want to use a direct connection to an LLM provider. To connect, also specify the url and headers properties.
    Usage details
    • The llmRequest reply only supports LLMs that are compatible with the OpenAI Streaming API. For example, you can connect YandexGPT.
    • Billing for LLM requests is handled by your provider.
    • If you have an on-premise JAICP installation, some providers might not support direct connection — for example, due to regional restrictions.
tokenSecretStringYesName of the secret for accessing an LLM:
  1. Get an API key for accessing the LLM from your provider.
    How to get an API key for Caila

    To use services and generative models from Caila in third-party applications, including JAICP, you need a personal access token. To issue a token:

    1. Go to Caila.

      tip

      Caila and Conversational Cloud use a shared account base, so if you are registered on Conversational Cloud, you do not need to register additionally on Caila.

    2. Go to the My spaceAPI tokens section.

    3. In the upper right corner, click Create token.

    4. Give a name to the token, generate it, and copy it to the clipboard.

  2. Add a new secret to JAICP and set the API key you received as its value.
  3. In the tokenSecret property, specify the name of the created secret.
If you are using CUSTOM_LLM, you can reference this token in the headers parameter.
urlStringNo

API endpoint for an LLM. For example, if you want to send requests to YandexGPT, specify: https://llm.api.cloud.yandex.net/foundationModels/v1/completion.

Specify the property only if you are using CUSTOM_LLM.
headersObjectNo

Request headers. The format of headers and authorization requirements depend on your provider.

Instead of an API key in the headers, specify the name of the secret from the JAICP project.

For example: {"Authorization": "Api-Key <TOKEN_NAME>"}, where <TOKEN_NAME> is the name of the secret in JAICP. The same name must be specified in the tokenSecret property. Specify the property only if you are using CUSTOM_LLM.
modelStringYesModel for text generation:
  • If you use Caila as a provider, see available models and their prices on the service page.
  • In other cases, refer to your provider’s documentation for model identifiers.
temperatureNumberNo

Adjusts the creativity level of responses. At higher values, the results are more creative and less predictable.

We recommend using values between 0.0 and 1.0. These values are supported by all providers and result in consistent model performance. See your LLM provider’s documentation for more details about other possible parameter values.
fillersPhraseConfigObjectNoObject with pause filler settings.
messagesArrayYesDialog history.
bargeInReplyObjectNo

Pass a bargeInReply object to set up conditional barge-in.

To generate a bargeInReply object in the script, create an empty response with the bargeInIf parameter. See an example in article LLM in telephony article.

Fill pauses

When the LLM starts generating text, a pause occurs in the bot speech. The bot waits for the first sentence of the text to play it.

You can specify a phrase that the bot will say at the beginning of generation. This will help fill the pause if it is too long.

PropertyTypeRequiredDescription
fillersPhraseConfig.fillerPhraseStringYesPhrase text.
fillersPhraseConfig.activationDelayMsNumberNoPause duration in milliseconds:
  • If the pause lasts longer than the specified time, the bot will play fillerPhrase.
  • If the bot starts playing the text from the LLM earlier, the bot will not play fillerPhrase.
Default: 2000. Values less than 500 might cause errors in llmRequest.

Dialog history

PropertyTypeRequiredDescription
messages.roleStringYesParticipant role:
  • user is for user message.
  • assistant is for bot message.
  • system is for prompt, an instruction for LLM.
messages.contentStringYesMessage text.

The messages field contains the dialog history that the LLM must take into account. You can get the history of a dialog between the user and the bot in this format using the $jsapi.chatHistoryInLlmFormat method.

History examples:

  • The history only with the last user request:

    [
    {"role": "user","content": $request.query}
    ]
  • The history with a prompt and the previous messages:

    [
    {"role": "system", "content": "Keep answers short. A few sentences at most"},
    {"role": "user","content": "Recommend a movie"},
    {"role": "assistant", "content": "What genre?"},
    {"role": "user", "content": "Comedy" }
    ]
caution

History size and prompt size can affect LLM speed. If the LLM takes a long time to respond, try shortening history or prompt.

Function calling

Instead of generating a text response, the LLM can call a function.

caution
  • Function calling is supported only for provider: "CUSTOM_LLM".
  • The LLM must support function calling.
  • Currently, the bot cannot use a function to end the call. For example, if a function contains $dialer.hangUp, it will not end the call.
PropertyTypeRequiredDescription
toolsArrayNoAn array with descriptions of available functions.
eventNameStringNoThe name of the event triggered if the LLM calls a function.

Here is an example of a tools array with a single function description:

var myTools = [
{
"type": "function",
"function": {
"name": "getWeather",
"description": "Get the current weather in the city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city to get the weather for"
}
},
"required": ["city"]
}
}
}
];

$response.replies.push({
type: "llmRequest",

tools: myTools,
eventName: "myEvent"
});

Here:

  • function.name is the function name.
  • function.description is the function description. This description helps the LLM understand the purpose of the function.
  • function.parameters are the function parameters, provided in the JSON Schema format.
tip

You can find a function calling example in the LLM in telephony article.

How to use

Below are examples of llmRequest for Caila and YandexGPT providers:

state: NoMatch
event!: noMatch
script:
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
provider: "CAILA_OPEN_AI",
// Secret name in JAICP
tokenSecret: "MY_TOKEN",
// Text generation model
model: "gpt-4o",
// Temperature
temperature: 0.6,
// Pause filler
fillersPhraseConfig: {"fillerPhrase": "Great question!", "activationDelayMs": 1000},
// Dialog history
messages: $jsapi.chatHistoryInLlmFormat()
});