Skip to main content

Yandex SpeechKit Brand Voice Adaptive

Beta

Yandex supports speech synthesis with variables via Yandex SpeechKit Brand Voice Adaptive.

info

Yandex SpeechKit Brand Voice Adaptive integration via JAICP is currently in beta. Send an email to client@just-ai.com for more details.

How to use

The following steps are needed to use this technology in JAICP projects.

  1. Input data set preparation. Prepare a data set containing the templates and audio recordings for training the voice model. Be sure to follow all audio and text requirements. The data set will be passed on to Yandex.
  1. Voice model preparation. Model training is done on Yandex’s side. One training cycle takes approximately one month.

  2. Voice model deployment. Yandex deploys the trained model to Yandex.Cloud and provides the model ID. This ID can be used in external projects.

  3. TTS configuration in JAICP. In the phone channel settings, you will need to configure Yandex TTS settings. Paste the obtained model ID instead of the voice name:

TTS configuration for the phone channel

You will now be able to use speech synthesis with variables in the script to generate bot replies. Refer to the $reactions.ttsWithVariables documentation to find out how.

tip
Just AI is a partner of Yandex SpeechKit, so we can help you register an account, create a project, and prepare your data set. Let us know about your needs in the email.
We also have several of our own pretrained voice models and would be happy to share and retrain them using your data. This way the preparation stage will go much faster.

Restrictions

Answer length limit

The total length of speech synthesized with variables should not exceed:

  • 24 seconds of voiced text;
  • 250 characters, including whitespace and punctuation signs.

Otherwise the provider will return an error.

Bot script restrictions

When TTS with variables is used, the a tag and the $reactions.answer method do not work in the script. Along with $reactions.ttsWithVariables, only audio playback via audio or $reactions.audio is allowed.

Model retraining

During speech synthesis, you can try using templates which are not part of the original data set. However, in this case the quality of the output is not guaranteed.

tip
If you use many such templates, consider adding them to the input data set and sending a request to retrain the model.