llmRequest
Beta
This reply type lets you use LLMs in the phone channel with minimal pauses.
With the llmRequest
, the bot receives text from LLM and synthesizes speech in streaming mode.
This reply is supported only in the phone channel.
To learn more about using this reply in the script, see the LLM in telephony section.
Properties
Property | Type | Required | Description |
---|---|---|---|
provider | String | Yes | LLM provider. Currently you can use only the Caila platform. Specify theCAILA_OPEN_AI value. |
model | String | Yes | Model for text generation. To access the LLM, the bot uses theopenai-proxy service on the Caila platform. You can view available models and their prices on the service page. |
tokenSecret | String | Yes | The name of the secret for accessing LLMs. |
fillersPhraseConfig | Object | No | Settings for pause filling. |
messages | Array | Yes | Dialog history. |
Settings for pause filling
When the LLM starts generating text, a pause occurs in the bot speech. The bot waits for the first sentence of the text to play it.
You can specify a phrase that the bot will say at the beginning of generation. This will help fill the pause if it is too long.
Pass the following object in the fillersPhraseConfig
field:
Property | Type | Required | Description |
---|---|---|---|
fillerPhrase | String | Yes | Phrase text. |
activationDelayMs | Number | No | Pause duration in milliseconds:
2000 . |
Dialog history
The messages
field contains the dialog history that the LLM must take into account.
Specify an array of objects. Each object must have the properties:
Property | Type | Required | Description |
---|---|---|---|
role | String | Yes | Participant role:
|
content | String | Yes | Message text. |
You can get the history of a dialog between the user and the bot in this format using the $jsapi.chatHistoryInLlmFormat
method.
History examples:
-
The history only with the last user request:
[
{"role": "user","content": $request.query}
] -
The history with a prompt and the previous messages:
[
{"role": "system", "content": "Keep answers short. A few sentences at most"},
{"role": "user","content": "Recommend a movie"},
{"role": "assistant", "content": "What genre?"},
{"role": "user", "content": "Comedy" }
]
History size and prompt size can affect LLM speed. If the LLM takes a long time to respond, try shortening history or prompt.
How to use
state: NoMatch
event!: noMatch
script:
$response.replies = $response.replies || []
$response.replies.push({
type: "llmRequest",
provider: "CAILA_OPEN_AI",
// Text generation model
model: "gpt-4o",
// Secret name
tokenSecret: "MY_LLM_TOKEN",
// Phrase to fill the pause
fillersPhraseConfig: {"fillerPhrase": "Great question!", "activationDelayMs": 1000},
// Dialog history
messages: [{"role": "user","content": $request.query}]
});