Skip to main content

Speech synthesis with variables

In the phone channel, there are two main ways to generate bot replies — speech synthesis and audio playback:

TechnologyAdvantagesDrawbacks
Speech synthesis
(TTS, Text-to-Speech)
You can voice any text without live speakers.
The text is voiced automatically, and there are no additional costs if you need to edit phrases.
The text is voiced by a robotic voice.
It is difficult to make the text sound well and convey the appropriate emotions and intonations.
Audio playbackA live speaker always sounds more lively and engaging, which improves the overall user experience.The technology cannot be used when context-dependent variables have to be mentioned throughout the dialog, such as the client’s personal information or loan amount.
In this case, it is necessary to cut audio files into pieces and insert synthesized segments between them, which negatively impacts playback quality.

You can also use speech synthesis with variables to generate replies. Speech synthesis with variables is a technology which allows replacing several words (variables) in live speaker recordings.

Variable replacement is carried out by a special voice model trained on recordings of the same speaker. This model can adapt to the pronunciation context so the variable parts sound natural and integrate seamlessly into the original recording.

Advantages

Here’s why you should consider using speech synthesis with variables:

  • You do not need to join recordings. Replies with variables are voiced by a live speaker automatically.

  • Variable voice-over does not grate on the ear and sounds less robotic. It improves customer experience when using a bot and increases conversion rates.

  • Customers feel comfortable when communicating with a bot. They request to transfer a dialog to an agent less often, which saves your employees’ time.

  • Any variables may be voiced, even those that are unknown before a dialog starts.

Providers

Yandex supports speech synthesis with variables via Yandex SpeechKit Brand Voice Adaptive.