Algorithm ranking
-
Transformer ru or Transformer multi:
- Offers good accuracy and high classification speed.
- Uses a large language model to classify input, analyzing semantic relationships between words rather than just the words themselves.
- Performs well on small datasets.
- Can recognize more languages than are officially supported by JAICP.
- Non-essential words in a phrase might reduce its intent weight. See a possible solution.
- The Transformer multi might assign high weights to negative examples. See a possible solution.
-
Two algorithms are tied for second place:
-
Classic ML:
- Analyzes dictionary forms (lemmas) and word stems.
- Maintains good performance even under high load conditions.
- Unlike Deep Learning, it works with negative examples.
- Requires a large dataset.
- Requires a similar number of phrases for each intent. See a possible solution.
- Training is time-consuming.
- Does not consider the semantics of words and synonyms. See a possible solution.
-
Deep Learning:
- Considers the semantics of words when forming hypotheses.
- Distinguishes similar intents better than Classic ML.
- Requires a large dataset. See possible solution for small datasets.
- Training is time-consuming.
- Does not handle negative examples well. See a possible solution.
-
-
STS:
- Performs well on small datasets. The algorithm can function even if each intent has only one training phrase.
- Lets you use entities in training phrases.
- Has the most flexible and interpretable settings.
- Distinguishes semantically similar intents better than Classic ML.
- Limited to 1,000 training phrases. It is suitable only for small projects not focused on NLU.
- Classification is slow. Speed depends on the length of the request, as well as the number and length of the training phrases.