llmRequest
Beta
This reply type lets you use LLMs in the phone channel with minimal pauses.
With the llmRequest, the bot receives text from an LLM and synthesizes speech in streaming mode.
This reply is supported only in the phone channel.
To learn more about using this reply in a script, see the LLM in telephony section.
Properties
| Property | Type | Required | Description |
|---|---|---|---|
provider | String | Yes | LLM provider:
|
tokenSecret | String | Yes | Name of the secret for accessing LLM:
CUSTOM_LLM, you can reference this token in the headers parameter. |
url | String | No | URL of the method that returns the LLM response to the user request. For example, if you want to send requests to YandexGPT, specify: https://llm.api.cloud.yandex.net/foundationModels/v1/completion. Fill in the property only if you are usingCUSTOM_LLM. |
headers | Object | No | Headers to pass with the request. The format of headers and authorization requirements depend on your provider. Instead of an API key in the headers, specify the name of the secret from the JAICP project. For example: CUSTOM_LLM. |
model | String | Yes | Model for text generation:
|
temperature | Number | No | Adjusts the creativity level of responses. At higher values, the results are more creative and less predictable. We recommend using values between0.0 and 1.0. These values are supported by all providers and result in consistent model performance. See your LLM provider’s documentation for more details about other possible parameter values. |
parameters | Object | No | Object containing other LLM parameters. |
fillersPhraseConfig | Object | No | Object with pause fill settings. |
messages | Array | Yes | Dialog history. |
bargeInReply | Object | No | Pass a bargeInReply object in the script, create an empty response with the bargeInIf parameter. See an example in article LLM in telephony article. |
Fill pauses
When the LLM starts generating text, a pause occurs in the bot speech. The bot waits for the first sentence of the text to play it.
You can specify a phrase that the bot will say at the beginning of generation. This will help fill the pause if it is too long.
| Property | Type | Required | Description |
|---|---|---|---|
fillersPhraseConfig.fillerPhrase | String | Yes | Phrase text. |
fillersPhraseConfig.fillerPhrasesList | Array | No | List of phrases. The bot chooses one random phrase to fill the pause. info Specify at least one of the fields: fillerPhrase or fillerPhrasesList. If both are specified, the bot will choose one from all available phrases: fillerPhrase + fillerPhrasesList. |
fillersPhraseConfig.activationDelayMs | Number | No | Pause duration in milliseconds:
2000. Values less than 500 might cause errors in llmRequest. |
Dialog history
| Property | Type | Required | Description |
|---|---|---|---|
messages.role | String | Yes | Participant role:
|
messages.content | String | Yes | Message text. |
The messages field contains the dialog history that the LLM must take into account.
You can get the history of a dialog between the user and the bot in this format using the $jsapi.chatHistoryInLlmFormat method.
History examples:
-
The history only with the last user request:
[
{"role": "user","content": $request.query}
] -
The history with a prompt and the previous messages:
[
{"role": "system", "content": "Keep answers short. A few sentences at most"},
{"role": "user","content": "Recommend a movie"},
{"role": "assistant", "content": "What genre?"},
{"role": "user", "content": "Comedy" }
]
History size and prompt size can affect LLM speed. If the LLM takes a long time to respond, try shortening history or prompt.
Function calling
Instead of generating a text response, the LLM can call a function.
- Function calling is supported only for
provider: "CUSTOM_LLM". - The LLM must support function calling.
- Currently, the bot cannot use a function to end the call.
For example, if a function contains
$dialer.hangUp, it will not end the call.
| Property | Type | Required | Description |
|---|---|---|---|
tools | Array | No | An array with descriptions of available functions. |
eventName | String | No | The name of the event triggered if the LLM calls a function. |
Here is an example of a tools array with a single function description:
var myTools = [
{
"type": "function",
"function": {
"name": "getWeather",
"description": "Get the current weather in the city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city to get the weather for"
}
},
"required": ["city"]
}
}
}
];
$response.replies.push({
type: "llmRequest",
…
tools: myTools,
eventName: "myEvent"
});
Here:
function.nameis the function name.function.descriptionis the function description. This description helps the LLM understand the purpose of the function.function.parametersare the function parameters, provided in the JSON Schema format.
You can find a function calling example in the LLM in telephony article.
Parameters
The parameters object can contain any parameters supported by the model you specify.
Исключения.
Следующие параметры LLM нельзя переопределить через объект parameters, переданные значения игнорируются:
audiofunction_callfunctionslogit_biaslogprobsmessagesmodalities
modelnstreamstream_optionstool_choicetools
The set of parameters, their allowed values, and how they affect the response depend on the specified model and version. Please refer to the official model documentation. Incorrect parameters may cause an error.
For example, with the gpt-4o model, you can set max_completion_tokens to receive shorter responses from the LLM, and stop to halt generation when unwanted sequences appear.
parameters: {
max_completion_tokens: 150,
stop: ["joke", "###"]
}
In older GPT models, the max_tokens parameter may be used instead of max_completion_tokens.
How to use
Below are examples of llmRequest for Caila and YandexGPT providers:
- Caila
- YandexGPT
state: NoMatch
event!: noMatch
script:
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
provider: "CAILA_OPEN_AI",
// Secret name in JAICP
tokenSecret: "MY_TOKEN",
// Text generation model
model: "gpt-4o",
// Temperature
temperature: 0.6,
// LLM response length limit
parameters: { max_completion_tokens: 150 },
// Pause filler
fillersPhraseConfig: {
"fillerPhrasesList": ["Great question!", "Just a moment"],
"activationDelayMs": 1000
},
// Dialog history
messages: $jsapi.chatHistoryInLlmFormat()
});
state: NoMatch
event!: noMatch
script:
$response.replies = $response.replies || [];
$response.replies.push({
type: "llmRequest",
provider: "CUSTOM_LLM",
// Secret name in JAICP
tokenSecret: "MY_TOKEN",
// API endpoint for LLM
url: "https://llm.api.cloud.yandex.net/v1/chat/completions",
// Request headers
headers: {"Authorization": "Api-Key MY_TOKEN"},
// Model for text generation, path contains folder ID
model: "gpt://folder12345/yandexgpt",
// Temperature
temperature: 0.6,
// LLM response length limit
parameters: { maxTokens: 150 },
// Pause filler
fillersPhraseConfig: {
"fillerPhrasesList": ["Great question!", "Just a moment"],
"activationDelayMs": 1000
},
// Dialog history
messages: $jsapi.chatHistoryInLlmFormat()
});
In this example:
MY_TOKENis the name of the secret in JAICP that contains the Yandex Cloud IAM token.folder12345is the folder ID in Yandex Cloud.
For more details on working with YandexGPT, see the documentation.