Troubleshooting
Transformer
Performs poorly with negative examples
The classifier assumes that it knows all intents. As a result, the classifier gives a high weight to one of the intents, even though the phrase does not match any of them.
The problem is common with Transformer multi.
Solution
-
Create a
NoMatch
intent group. -
Add intents to the group for phrases that the classifier mistakenly classifies as other intents.
Fill this group of intents based on real conversation logs. This approach lets you gradually collect training phrases for new topics, which can later be added to the bot.
-
Add an activation to the script using
intentGroup: /NoMatch
.
Performs poorly with similar intents
The classifier might confuse intents that differ by just one entity, for example: apply for a credit card and apply for a debit card, consent to a conversation and consent to an offer.
The classifier might also distribute weight across intents, which can result in none of them reaching the activation threshold.
Solution
- Use patterns instead of intents in local transitions and clarifications in the script.
- Combine similar intents into one. In the script, determine the client intent based on the triggered entity.
- Avoid using classification rules. After you filter the results, there might not be any intents left with sufficient weight.
Does not recognize synonyms
The model supports general language synonyms, but might not recognize domain-specific ones relevant to your project.
Solution
Add more variations with different synonyms and word combinations to your training phrases.
Requests match the wrong intents due to non-essential words
A possible cause is an imbalance of non-essential words in intents.
If there are few non-essential words in the training phrases, the classifier treats those words as important for the intent.
Solution
Options:
- Add lots of different non-essential words to your phrases. Non-essential words should be evenly distributed across intents.
- Remove all non-essential words from the dataset, make phrases as semantically loaded as possible.
- Create a stop word dictionary and clean user requests in the
preMatch
handler.
Deep Learning
Long phrases are recognized with very low weight (0–0.1)
The algorithm takes into account all tokens present in the request. The more tokens there are, the more difficult it is to classify.
Solution
Train the algorithm to ignore non-essential words in a request, such as greetings, polite expressions, and similar.
-
Add non-essential words and phrases that you want to ignore to your training phrases. They should be evenly distributed across intents so that the algorithm does not consider them important.
-
Increase the value of the
emb_drp
parameter to artificially deactivate some weights and avoid overfitting.
Performs poorly with negative examples
The classifier assumes that it knows all intents. As a result, the classifier gives a high weight to one of the intents, even though the phrase does not match any of them.
Solution
-
Create a
NoMatch
intent group. -
Add intents to the group for phrases that the classifier mistakenly classifies as other intents.
Fill this group of intents based on real conversation logs. This approach lets you gradually collect training phrases for new topics, which can later be added to the bot.
-
Add an activation to the script using
intentGroup: /NoMatch
.
Performs poorly with similar intents
The classifier might confuse intents that differ by just one entity, for example: apply for a credit card and apply for a debit card, consent to a conversation and consent to an offer.
Solution
- Use patterns instead of intents in local transitions and clarifications in the script.
- Combine similar intents into one. In the script, determine the client intent based on the triggered entity.
- Avoid using classification rules. After you filter the results, there might not be any intents left with sufficient weight.
Letters of another alphabet always match the same intent
For example, Latin letters can always match the same intent. This happens because the algorithm uses embeddings for a single language only.
Solution
- Enable the
multi
parameter. This might reduce the weight of all requests. - Use the
$nlp.fixKeyboardLayout
method if the user has chosen the wrong keyboard layout.
Low accuracy on a small dataset
The classifier needs to see enough examples of phrases to form an understanding of the intent.
Solution
Increase the n_epochs
value. This parameter defines how many times the classifier sees the training phrases during training.
- It affects training speed, so increasing the parameter is not recommended for large datasets.
- If the value is too high, the model may overfit — it will perform well on dataset examples but poorly on any other input.
Does not recognize synonyms
Embeddings are trained on subwords. The model supports general language synonyms, but might not recognize domain-specific ones relevant to your project. The model might also have a poor understanding of word combinations.
Solution
Add more variations with different synonyms and word combinations to your training phrases.
Requests match the wrong intents due to non-essential words
A possible cause is an imbalance of non-essential words in intents.
If there are few non-essential words in the training phrases, the classifier treats those words as important for the intent.
Solution
Options:
- Add lots of different non-essential words to your phrases. Non-essential words should be evenly distributed across intents.
- Remove all non-essential words from the dataset, make phrases as semantically loaded as possible.
- Create a stop word dictionary and clean user requests in the
preMatch
handler.
Classic ML
Performs poorly with similar intents
Only one intent can have a high enough weight.
For example, the classifier might confuse intents that differ by just one entity, such as: apply for a credit card and apply for a debit card, consent to a conversation and consent to an offer.
Solution
- Combine similar intents into one. In the script, determine the client intent based on the triggered entity.
- Use patterns instead of intents in local transitions and clarifications in the script.
- Use classification rules.
Most requests match the same intent
The classifier needs to see enough examples of phrases to form an understanding of the intent. Classes with more examples tend to get more weight.
Solution
Make your intents balanced:
- The dataset should not contain intents that contain significantly more or fewer phrases than others.
- If some large intent includes several meanings or formulations, it can be divided into several.
- If some smaller intents differ only by the presence of an entity, they can be merged into a single intent. In the script, determine the client intent based on the triggered entity.
Does not recognize synonyms
Embeddings are built based on the dataset, so the model is not aware of general-language synonyms.
Solution
Add more variations with different synonyms and word combinations to your training phrases.
Requests match the wrong intents due to non-essential words
A possible cause is an imbalance of non-essential words in intents.
If there are few non-essential words in the training phrases, the classifier treats those words as important for the intent.
Solution
Options:
- Add lots of different non-essential words to your phrases. Non-essential words should be evenly distributed across intents.
- Remove all non-essential words from the dataset, make phrases as semantically loaded as possible.
- Create a stop word dictionary and clean user requests in the
preMatch
handler.