llmRequest

Beta

This reply type lets you use LLMs in the phone channel with minimal pauses. With the llmRequest, the bot receives text from an LLM and synthesizes speech in streaming mode.

This reply is supported only in the phone channel.

tip

To learn more about using this reply in a script, see the LLM in telephony section.

Properties

Property	Type	Required	Description
`provider`	String	Yes	LLM provider: Specify CAILA_OPEN_AI to use the Caila platform. In this case, the bot uses models from the `openai-proxy` service. Specify `CUSTOM_LLM` if you want to use a direct connection to an LLM provider. To connect, also fill in the `url` and `headers` properties. Особенности The `llmRequest` reply only supports LLMs that are compatible with the OpenAI Streaming API. For example, you can connect YandexGPT. Billing for LLM requests is handled by your provider. If you have an on-premise JAICP installation, some providers might not support direct connection — for example, due to regional restrictions.
`tokenSecret`	String	Yes	Name of the secret for accessing LLM: Get an API key for accessing LLM from your provider. How to get an API key for Caila To use services and generative models from Caila in third-party applications, including JAICP, you need a personal access token. To issue a token: Go to Caila. tip Caila and Conversational Cloud use a shared account base, so if you are registered on Conversational Cloud, you do not need to register additionally on Caila. Go to the My space → API tokens section. In the upper right corner, click Create token. Give a name to the token, generate it, and copy it to the clipboard. Add a new secret to JAICP and set the API key you received as its value. In the `tokenSecret` property, specify the name of the created secret. If you are using `CUSTOM_LLM`, you can reference this token in the `headers` parameter.
`url`	String	No	URL of the method that returns the LLM response to the user request. For example, if you want to send requests to YandexGPT, specify: https://llm.api.cloud.yandex.net/foundationModels/v1/completion. Fill in the property only if you are using `CUSTOM_LLM`.
`headers`	Object	No	Headers to pass with the request. The format of headers and authorization requirements depend on your provider. Instead of an API key in the headers, specify the name of the secret from the JAICP project. For example: `{"Authorization": "Api-Key <TOKEN_NAME>"}`, where `<TOKEN_NAME>` is the name of the secret in JAICP. The same name must be specified in the `tokenSecret` property. Fill in the property only if you are using `CUSTOM_LLM`.
`model`	String	Yes	Model for text generation: If you use Caila as a provider, see available models and their prices on the service page. In other cases, refer to your provider’s documentation for model identifiers.
`temperature`	Number	No	Adjusts the creativity level of responses. At higher values, the results are more creative and less predictable. We recommend using values between `0.0` and `1.0`. These values are supported by all providers and result in consistent model performance. See your LLM provider’s documentation for more details about other possible parameter values.
`parameters`	Object	No	Object containing other LLM parameters.
`fillersPhraseConfig`	Object	No	Object with pause fill settings.
`messages`	Array	Yes	Dialog history.
`bargeInReply`	Object	No	Pass a `bargeInReply` object to set up conditional barge-in. To generate a `bargeInReply` object in the script, create an empty response with the `bargeInIf` parameter. See an example in article LLM in telephony article.

Fill pauses

When the LLM starts generating text, a pause occurs in the bot speech. The bot waits for the first sentence of the text to play it.

You can specify a phrase that the bot will say at the beginning of generation. This will help fill the pause if it is too long.

Property	Type	Required	Description
`fillersPhraseConfig.fillerPhrase`	String	Yes	Phrase text.
`fillersPhraseConfig.fillerPhrasesList`	Array	No	List of phrases. The bot chooses one random phrase to fill the pause. info Specify at least one of the fields: `fillerPhrase` or `fillerPhrasesList`. If both are specified, the bot will choose one from all available phrases: `fillerPhrase` + `fillerPhrasesList`.
`fillersPhraseConfig.activationDelayMs`	Number	No	Pause duration in milliseconds: If the pause lasts longer than the specified time, the bot will play one of the pause fillers. If the LLM response begins playing before this delay, the bot will not play the filler phrase. Default: `2000`. Values less than `500` might cause errors in `llmRequest`.

Dialog history

Property	Type	Required	Description
`messages.role`	String	Yes	Participant role: `user` is for user message. `assistant` is for bot message. `system` is for prompt, an instruction for LLM.
`messages.content`	String	Yes	Message text.

The messages field contains the dialog history that the LLM must take into account. You can get the history of a dialog between the user and the bot in this format using the $jsapi.chatHistoryInLlmFormat method.

History examples:

The history only with the last user request:

[
    {"role": "user","content": $request.query}
]

The history with a prompt and the previous messages:

[
    {"role": "system", "content": "Keep answers short. A few sentences at most"},
    {"role": "user","content": "Recommend a movie"},
    {"role": "assistant", "content": "What genre?"},
    {"role": "user", "content": "Comedy" }
]

caution

History size and prompt size can affect LLM speed. If the LLM takes a long time to respond, try shortening history or prompt.

Function calling

Instead of generating a text response, the LLM can call a function.

caution

Function calling is supported only for provider: "CUSTOM_LLM".
The LLM must support function calling.
Currently, the bot cannot use a function to end the call. For example, if a function contains $dialer.hangUp, it will not end the call.

Property	Type	Required	Description
`tools`	Array	No	An array with descriptions of available functions.
`eventName`	String	No	The name of the event triggered if the LLM calls a function.

Here is an example of a tools array with a single function description:

var myTools = [
    {
        "type": "function",
        "function": {
            "name": "getWeather",
            "description": "Get the current weather in the city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city to get the weather for"
                    }
                },
                "required": ["city"]
            }
        }
    }
];

$response.replies.push({
    type: "llmRequest",
    …
    tools: myTools,
    eventName: "myEvent"
});

Here:

function.name is the function name.
function.description is the function description. This description helps the LLM understand the purpose of the function.
function.parameters are the function parameters, provided in the JSON Schema format.

tip

You can find a function calling example in the LLM in telephony article.

Parameters

The parameters object can contain any parameters supported by the model you specify.

Исключения. Следующие параметры LLM нельзя переопределить через объект parameters, переданные значения игнорируются:

audio
function_call
functions
logit_bias
logprobs
messages
modalities

model
n
stream
stream_options
tool_choice
tools

caution

The set of parameters, their allowed values, and how they affect the response depend on the specified model and version. Please refer to the official model documentation. Incorrect parameters may cause an error.

For example, with the gpt-4o model, you can set max_completion_tokens to receive shorter responses from the LLM, and stop to halt generation when unwanted sequences appear.

parameters: {
    max_completion_tokens: 150,
    stop: ["joke", "###"]
}

In older GPT models, the max_tokens parameter may be used instead of max_completion_tokens.

How to use

Below are examples of llmRequest for Caila and YandexGPT providers:

Caila
YandexGPT

state: NoMatch
    event!: noMatch
    script:
        $response.replies = $response.replies || [];
        $response.replies.push({
            type: "llmRequest",
            provider: "CAILA_OPEN_AI",
            // Secret name in JAICP
            tokenSecret: "MY_TOKEN",
            // Text generation model
            model: "gpt-4o",
            // Temperature
            temperature: 0.6,
            // LLM response length limit
            parameters: { max_completion_tokens: 150 },
            // Pause filler
            fillersPhraseConfig: {
                "fillerPhrasesList": ["Great question!", "Just a moment"],
                "activationDelayMs": 1000
            },
            // Dialog history
            messages: $jsapi.chatHistoryInLlmFormat()
        });

state: NoMatch
    event!: noMatch
    script:
        $response.replies = $response.replies || [];
        $response.replies.push({
            type: "llmRequest",
            provider: "CUSTOM_LLM",
            // Secret name in JAICP
            tokenSecret: "MY_TOKEN",
            // API endpoint for LLM
            url: "https://llm.api.cloud.yandex.net/v1/chat/completions",
            // Request headers
            headers: {"Authorization": "Api-Key MY_TOKEN"},
            // Model for text generation, path contains folder ID
            model: "gpt://folder12345/yandexgpt",
            // Temperature
            temperature: 0.6,
            // LLM response length limit
            parameters: { maxTokens: 150 },
            // Pause filler
            fillersPhraseConfig: {
                "fillerPhrasesList": ["Great question!", "Just a moment"],
                "activationDelayMs": 1000
            },
            // Dialog history
            messages: $jsapi.chatHistoryInLlmFormat()
        });

In this example:

MY_TOKEN is the name of the secret in JAICP that contains the Yandex Cloud IAM token.
folder12345 is the folder ID in Yandex Cloud.

For more details on working with YandexGPT, see the documentation.

Properties​

Fill pauses​

Dialog history​

Function calling​

Parameters​

How to use​

Properties

Fill pauses

Dialog history

Function calling

Parameters

How to use