Skip to main content

llmRequest

Beta

This reply type lets you use LLMs in the phone channel with minimal pauses. With the llmRequest, the bot receives text from LLM and synthesizes speech in streaming mode.

This reply is supported only in the phone channel.

tip

To learn more about using this reply in the script, see the LLM in telephony section.

Properties

PropertyTypeRequiredDescription
providerStringYes

LLM provider.

Currently you can use only the Caila platform. Specify the CAILA_OPEN_AI value.
modelStringYes

Model for text generation.

To access the LLM, the bot uses the openai-proxy service on the Caila platform. You can view available models and their prices on the service page.
tokenSecretStringYesThe name of the secret for accessing LLMs.
fillersPhraseConfigObjectNoSettings for pause filling.
messagesArrayYesDialog history.

Settings for pause filling

When the LLM starts generating text, a pause occurs in the bot speech. The bot waits for the first sentence of the text to play it.

You can specify a phrase that the bot will say at the beginning of generation. This will help fill the pause if it is too long.

Pass the following object in the fillersPhraseConfig field:

PropertyTypeRequiredDescription
fillerPhraseStringYesPhrase text.
activationDelayMsNumberNoPause duration in milliseconds:
  • If the pause lasts longer than the specified time, the bot will play fillerPhrase.
  • If the bot starts playing the text from the LLM earlier, the bot will not play fillerPhrase.
Default: 2000.

Dialog history

The messages field contains the dialog history that the LLM must take into account.

Specify an array of objects. Each object must have the properties:

PropertyTypeRequiredDescription
roleStringYesParticipant role:
  • user is for user message.
  • assistant is for bot message.
  • system is for prompt, an instruction for LLM.
contentStringYesMessage text.
tip

You can get the history of a dialog between the user and the bot in this format using the $jsapi.chatHistoryInLlmFormat method.

History examples:

  • The history only with the last user request:

    [
    {"role": "user","content": $request.query}
    ]
  • The history with a prompt and the previous messages:

    [
    {"role": "system", "content": "Keep answers short. A few sentences at most"},
    {"role": "user","content": "Recommend a movie"},
    {"role": "assistant", "content": "What genre?"},
    {"role": "user", "content": "Comedy" }
    ]
caution

History size and prompt size can affect LLM speed. If the LLM takes a long time to respond, try shortening history or prompt.

How to use

state: NoMatch
event!: noMatch
script:
$response.replies = $response.replies || []
$response.replies.push({
type: "llmRequest",
provider: "CAILA_OPEN_AI",
// Text generation model
model: "gpt-4o",
// Secret name
tokenSecret: "MY_LLM_TOKEN",
// Phrase to fill the pause
fillersPhraseConfig: {"fillerPhrase": "Great question!", "activationDelayMs": 1000},
// Dialog history
messages: [{"role": "user","content": $request.query}]
});