Select an ASR/TTS provider
You can select ASR and TTS providers when you create a phone channel. Open the ASR tab and select the connection, and then repeat these steps for TTS.
You can also keep the Default settings, in which case the configuration of the most stable ASR and TTS providers will be applied. If an incident occurs on the selected provider’s side, the channel will be automatically switched to another.
ASR configuration
You can select one of the connections for ASR and specify additional settings when you create a phone channel.
Connection | Settings | Description |
---|---|---|
ASM Solutions | Model | One of the machine learning models is used for speech recognition. These models were trained by ASM Solutions on different domain-specific datasets. |
Azure | Language | The service can recognize speech in multiple languages. You can find the complete list in the Microsoft documentation. |
Language | The service can recognize speech in multiple languages. You can find the complete list in the Google documentation. | |
Model | One of the machine learning models is used for speech recognition. These models were trained by Google for certain sound types and sources. See the table for the list of models available for each language: • Command and search — use this model to recognize speech in short audio files, such as voice commands. • Default — use this model in all other cases. • Phone call — use this model to recognize speech in phone calls. The model is available only if you use your own ASR connection. | |
T-Bank | The T-Bank ASR connection has no additional settings. | |
Yandex | Language | The service can recognize speech in multiple languages. You can find the complete list in the Yandex documentation. |
Model | One of the machine learning models is used for speech recognition. Data arrays from Yandex services and applications are used to train models. | |
Number recognition | If the setting is enabled, the recognized text will contain numerals instead of numbers (for example, thirteen rather than 13). | |
Reduced sensitivity to noise | Reduces sensitivity to background noise. |
TTS configuration
You can select one of the connections for TTS and specify additional settings when you create a phone channel.
Connection | Settings | Description |
---|---|---|
Aimyvoice | Voice | Aimyvoice is a platform based on speech synthesis technologies by Just AI. You can use it to find a ready-made voice for your project (such as a game or audiobook), as well as create your own. |
Azure | Voice | You can use multiple voice options in the service (see the Microsoft documentation for the complete list). JAICP supports neural voices only. The names for these voices contain the word “neural”. |
ElevenLabs | A cloud-based service that synthesizes realistic speech in different languages. To use the service, connect your own account. note The ElevenLabs website is not available for Russian IP addresses. | |
Language | The service can synthesize speech in multiple languages. You can find the complete list in the Google documentation. | |
Voice | You can use multiple voice options in the service (see the Google documentation for the complete list). The following voices are used by default: • en-US-Wavenet-A for English; • ru-RU-Wavenet-B for Russian; • cmn-CN-Wavenet-B for Chinese; • Wavenet-A for other languages. | |
Speed | Speech tempo or speed. Here 1 is the normal speed for a voice. | |
Voice pitch | Voice pitch. Here 20 is 20 halftones up from the original tone, and -20 means the corresponding decrease. | |
Raise volume | Volume increase in dB relative to the normal volume for a voice. When +6.0 dB is selected, playback volume is twice as high as the normal one. We strongly discourage you from exceeding +10.0 dB. | |
Yandex v1 | Language | The service can synthesize speech in multiple languages. You can find the complete list in the Yandex documentation. |
Voice | You can use multiple voice options in the service. See the Yandex documentation for the complete list. | |
Speed | Speech tempo or speed. Here 1 is the normal speed for a voice. | |
Yandex v3 | Voice | You can use multiple voice options in the service. See the Yandex documentation for the complete list. |
Role | A characteristic of the voice. For example, the speaker can sound friendlier or whisper. Not all voices have roles. The available roles for different voices might vary. See the list of roles in the Yandex documentation. | |
Speed | Speech tempo or speed. Here 1 is the normal speed for a voice. | |
Volume | Loudness relative to full scale (LUFS), ranging from −145 to 0. The recommended range is between −20 and −16 LUFS. | |
Use variables | If the switch is on, speech synthesis is done via Yandex SpeechKit Brand Voice Adaptive, which supports variables. |
Yandex ASR and TTS versions
Yandex SpeechKit has multiple versions of ASR and TTS.
You can use different versions, for example: ASR v3 together with TTS v1. They do not affect each other.
ASR
- In the cloud JAICP version, you can only use v3.
- If you have an on-premise JAICP installation, v2 and v3 are available.
The list of available settings in $dialer.setAsrProperty
and the list of fields in speech recognition results depend on the ASR version.
TTS
-
By default, v1 and v3 are available.
A protocol version switch is available in the Yandex TTS connection settings. If the switch is active, v3 is used for speech synthesis.
-
If you are using an on-premise Yandex SpeechKit Hybrid installation for TTS, only v3 is available.
Text markup
- In v1, SSML markup and simplified TTS markup are available.
- In v3, only simplified TTS markup is available.
Synthesis with variables
Synthesis with variables is available only in v3.
Phrase length
When you use v3, the length of the phrase:
- Must not exceed 250 characters, including spaces and punctuation.
- Must not exceed 24 seconds.