Select an ASR/TTS provider

You can select ASR and TTS providers when you create a phone channel. Open the ASR tab and select the connection, and then repeat these steps for TTS.

caution

If a specific ASR/TTS provider is selected and incidents occur on their side, you will need to switch your channel to another provider manually.

You can also keep the Default settings, in which case the configuration of the most stable ASR and TTS providers will be applied. If an incident occurs on the selected provider’s side, the channel will be automatically switched to another.

ASR configuration

You can select one of the connections for ASR and specify additional settings when you create a phone channel.

Connection	Settings	Description
Google	Language	The service can recognize speech in multiple languages. You can find the complete list in the Google documentation.
	Model	One of the machine learning models is used for speech recognition. These models were trained by Google for certain sound types and sources. See the table for the list of models available for each language: • Command and search — use this model to recognize speech in short audio files, such as voice commands. • Default — use this model in all other cases. • Phone call — use this model to recognize speech in phone calls. The model is available only if you use your own ASR connection.
Yandex	Language	The service can recognize speech in multiple languages. You can find the complete list in the Yandex documentation.
	Model	One of the machine learning models is used for speech recognition. Data arrays from Yandex services and applications are used to train models.
	Number recognition	If the setting is enabled, the recognized text will contain numerals instead of numbers (for example, thirteen rather than 13).
	Reduced sensitivity to noise	Reduces sensitivity to background noise.
Tinkoff		The Tinkoff ASR connection has no additional settings.
Azure	Language	The service can recognize speech in multiple languages. You can find the complete list in the Microsoft documentation.
ASM Solutions	Model	One of the machine learning models is used for speech recognition. These models were trained by ASM Solutions on different domain-specific datasets.

TTS configuration

You can select one of the connections for TTS and specify additional settings when you create a phone channel.

Connection	Settings	Description
Google	Language	The service can synthesize speech in multiple languages. You can find the complete list in the Google documentation.
	Voice	You can use multiple voice options in the service (see the Google documentation for the complete list). The following voices are used by default: • `en-US-Wavenet-A` for English; • `ru-RU-Wavenet-B` for Russian; • `cmn-CN-Wavenet-B` for Chinese; • `Wavenet-A` for other languages.
	Speed	Speech tempo or speed. Here `1` is the normal speed for a specific voice.
	Voice pitch	Voice pitch. Here `20` is 20 halftones up from the original tone, and `-20` means the corresponding decrease.
	Raise volume	Volume increase in dB relative to the normal volume for a specific voice. When `+6.0` dB is selected, playback volume is twice as high as the normal one. We strongly discourage you from exceeding `+10.0` dB.
Yandex	Language	The service can synthesize speech in multiple languages. You can find the complete list in the Yandex documentation.
	Voice	You can use multiple voice options in the service (see the Yandex documentation for the complete list). The following voices are used by default: • `alena` for Russian; • `alyss` for other languages.
	Speed	Speech tempo or speed. Here `1` is the normal speed for a specific voice.
Azure	Voice	You can use multiple voice options in the service (see the Microsoft documentation for the complete list). JAICP supports neural voices only. The names for these voices contain the word “neural”.
Aimyvoice	Voice	Aimyvoice is a platform based on speech synthesis technologies by Just AI. You can use it to find a ready-made voice for your project (such as a game or audiobook), as well as create your own.

tip

Custom voices that you created and trained yourself do not appear in the dropdown list of available voices. To use them, enter the voice name manually.

Yandex v3

Yandex TTS settings include an additional switch: Enable Yandex v3. If the switch is on, the third version of the Yandex SpeechKit protocol is used for speech synthesis.

info

Switching the protocol version is currently in beta. Contact us via client@just-ai.com for details.

The following settings become available with the third version:

Volume — loudness relative to full scale (LUFS), ranging from −145 to 0. The recommended range is between −20 and −16 LUFS.
Use variables — if the switch is on, speech synthesis is done via Yandex SpeechKit Brand Voice Adaptive, which supports variables.

danger

When the third version of the protocol is enabled, the total length of synthesized speech should not exceed 250 characters over 24 seconds, including whitespace and punctuation signs. Otherwise the provider will return an error.
When the Use variables switch is on, the a tag and the $reactions.answer method do not work in the script. Only audio playback via audio and $reactions.audio is allowed, and TTS is done via $reactions.ttsWithVariables.

ASR configuration​

TTS configuration​

Yandex v3​

ASR configuration

TTS configuration

Yandex v3