Skip to main content

Bot quality evaluation

Bot quality evaluation is a tool that allows you to test a bot on dialog sets.

Each dialog contains user requests and expected bot reactions. During a test, the tool compares the received bot reactions with the expected ones.

You can view test history and download detailed reports with the results.

Tool limitations
  • During a test, the bot performs real requests. Requests may exceed JAICP limits or third-party service quotas.
  • You cannot test scripts that have conditions for different channel types. During a test, the $request.channelType field is always set to bot_scorer_api.
  • You can only check text responses and bot states.
  • You cannot check the bot’s reactions to events.

To evaluate the bot quality:

  1. Go to the Bot quality evaluation section.
  2. Prepare a file with the dialog set.
  3. Upload the dialog set.
  4. Run a test.
  5. View test history and the report.

File with dialog set

The dialog file must be in one of the formats:

  • XLS

  • XLSX

  • CSV

    CSV file requirements
    • Use one of the characters as a delimiter between fields:

      • comma: ,;

      • semicolon: ;;

      • vertical bar: |;

      • tab character.

    • If a value contains the delimiter, it must be enclosed in double quotation marks: ".

testCasecommentrequestexpectedResponseexpectedStateskippreActions
hello/start/Start
hellowell, helloHello there!/Hello
weatherWhat’s the weather?/Weather

The file contains test cases that are used to evaluate the bot quality. A test case can consist of several steps. Each line of the file is one test case step.

Each test case describes a new session and a new client.

tip

In the Dialog set fields article, you can learn more about the fields that must be in the file and how to fill them out.

To download an example file, select Download example of file with dialogs in the top right corner.

Upload dialog set

To upload a dialog set:

  1. Select Upload a file with dialogs.
  2. Specify the set name.
  3. Attach a file with dialogs.
  4. Select Save.

Run test

To run a test, select Run test on the panel with the dialog set.

Bot version that is tested

Test history

To open the test history, select a dialog set panel.

For each test, the Success rate is displayed. It is the percentage of successful steps out of the total number of non-skipped steps.

Test history

To view the chart with test success dynamics:

  1. On the dialog set panel, select .

  2. Select View dynamics.

    Chart example:

    Test success chart

Report

To download a detailed test report, select Download report.

The report is in the XLSX format. The report contains the dialog set and additional fields:

  • actualResponse is the received bot response.

  • actualState is the first state that the bot transitioned to.

  • result is the result of the step check. Values:

  • transition is the history of bot state transitions in the step.

  • responseTime is the duration of the step in milliseconds.

Report example
testCasecommentrequestexpectedResponseexpectedStateskippreActionsactualResponseactualStateresulttransitionresponseTime
hello/start/StarttrueSKIPPED0
hellowell, helloHello there!/HellofalseHello there!/HelloOK/Hello1424
weatherWhat’s the weather?/Hello/WeatherfalsehelloIn what city?/Hello/WeatherOK/Hello/Weather468
alarmSet an alarm/Hello/AlarmfalsehelloFor when?/Hello/MeetingFAILED/Hello/Meeting→/Hello/Time491

The hello test case:

  1. The user says “/start”. Expected reaction: the bot transitions to the /Start state. The step is skipped and not checked because skip is set to true.
  2. The user says “well, hello”. Expected reaction: the bot transitions to the /Hello state and replies “Hello there!”. The received state and response match the expected ones. The step is considered successful.

The weather test case:

  1. The preActions steps from the hello test case are performed.
  2. The user says “What’s the weather?”. Expected reaction: the bot transitions to the /Hello/Weather state. The received state matches the expected one. The response text is not checked because the expectedResponse field is empty. The step is considered successful.

The alarm test case:

  1. The preActions steps from the hello test case are performed.
  2. The user says “Set an alarm”. Expected reaction: the bot transitions to the /Hello/Alarm state. The received state does not match the expected one. The step is considered unsuccessful.