LLM in telephony

Beta

In the phone channel, you can use the llmRequest reply type so that the bot receives text from the LLM and synthesizes speech in streaming mode.

To get started:

Add a secret for Caila.
Use the llmRequest reply type in your script.

LLM and TTS providers

To generate texts, you can currently only access models of the openai-proxy service on the Caila platform.

caution
LLMs are only available in the paid plan. To use the models, top up your balance in Caila.
Speech synthesis works with any TTS provider.

llmRequest advantages

The llmRequest reply type generates text and synthesizes speech in streaming mode. The bot receives text from the LLM by sentences and synthesizes speech by sentences as well. The two processes occur in parallel.

This way you can reduce pauses before bot responses compared to when the generation and synthesis are done sequentially.

Example of sequential generation and synthesis without llmRequest

llmRequest also lets you specify phrases that the bot will say to fill the pause at the beginning of text generation.

Access token for Caila

To use services and generative models from Caila in third-party applications, including JAICP, you need a personal access token. To issue a token:

Go to Caila.

tip
Caila and Conversational Cloud use a shared account base, so if you are registered on Conversational Cloud, you do not need to register additionally on Caila.
Go to the My space → API tokens section.
In the upper right corner, click Create token.
Give a name to the token, generate it, and copy it to the clipboard.

Next, add this token to JAICP:

Go to JAICP. In the Secrets and variables section add a new secret.
In the llmRequest reply, specify the secret name in the tokenSecret property.

Use llmRequest in script

state: NoMatch
    event!: noMatch
    script:
        $response.replies = $response.replies || []
        $response.replies.push({
            type: "llmRequest",
            provider: "CAILA_OPEN_AI",
            // Text generation model
            model: "gpt-4o",
            // Secret name
            tokenSecret: "MY_LLM_TOKEN",
            // Prompt and user request
            messages: [
                {"role": "system", "content": "Keep answers short. A few sentences at most."},
                {"role": "user","content": $request.query}
            ]
        });

In this example, llmRequest is used in the NoMatch state:

The bot sends a request to generate text to the openai-proxy service on the Caila platform. The messages field contains:
- Prompt for the LLM to make the model generate a short answer.
- User request text stored in $request.query.
Once the bot receives the first sentence from LLM, the bot will start synthesizing speech.
The bot plays the first sentence to the user.
The bot continues to synthesize and play speech by sentences until it receives the entire text from LLM.

Restrictions

The llmRequest does not support barge-in, the user will not be able to interrupt the bot.
After transitioning to the state, the bot immediately starts preparing text and speech for llmRequest. The Caila and speech synthesis limits can be charged even if the user ended the call and the bot did not play the speech.
Currently, the response text from the LLM is not available in the $response variable. Also, the response text is not added to the dialog history if you get the history, for example, via the $jsapi.chatHistoryInLlmFormat method.

Fill pauses

In the script, a pause might occur while the bot is waiting for the first sentence of text from the LLM. There are two ways to fill this pause:

Use the fillersPhraseConfig setting. You can specify a phrase that the bot will say at the beginning of generation. This will help fill the pause if it is too long.

state: NoMatch
    event!: noMatch
    script:
        $response.replies = $response.replies || []
        $response.replies.push({
            type: "llmRequest",
            …
            // The bot says the phrase if the pause exceeds 2000 ms.
            fillersPhraseConfig: {"fillerPhrase": "Great question!", "activationDelayMs": 2000}
        });

Specify other responses before llmRequest. After transitioning to the state, the bot immediately starts preparing text and speech for llmRequest. The bot can perform other reactions before llmRequest while waiting for a response from the LLM.

state: NoMatch
    event!: noMatch
    # The bot stars generating the llmRequest response right after transitioning to the state
    a: Great question!
    a: Let me think
    # The bot has already said 2 phrases. During this time, it has prepared some part of the answer
    script:
        $response.replies = $response.replies || []
        $response.replies.push({
            type: "llmRequest",
            …
        });

LLM and TTS providers​

llmRequest advantages​

Access token for Caila​

Use llmRequest in script​

Fill pauses​

LLM and TTS providers

llmRequest advantages

Access token for Caila

Use llmRequest in script

Fill pauses