Named entity dictionaries
.csv
files are named entity dictionaries.Named entity dictionaries are used for processing multiple named entities via patterns.
A named entity is a word or a phrase that distinguishes an object or a phenomenon among other objects or phenomena of a similar type. These can be names of cities, countries, currencies.
Dictionary structure
Dictionaries are specified in files with the .csv
extension, where every line follows the following format:
id; name; value
ID
The entity identifier.
Name
A name or a phrase designating the entity.
When you import the dictionary to the script, the bot will recognize the entity in requests containing one of the possible synonyms.
Value
The data associated with the entity.
The data may be represented by a string without quotes or a JavaScript object. Here you can add any relevant information in order to access it from the script.
Example
For example, here is a fragment of a dictionary of proper names:
150;John, Johnny;{"name": "John", "sex": "m"}
151;Jane;{"name": "Jane", "sex": "f"}
152;Jennifer, Jenniffer;{"name": "Jennifer", "sex": "f"}
153;Joaquin;{"name": "Joaquin", "sex": "m"}
Importing a dictionary
Dictionaries can be imported via the require
tag. Specify the path to the file, as well as the values of the name
and var
parameters:
require: dicts/names.csv
name = Names
var = $Names
Dictionary name
The dictionary name, specified after name
, is used for defining a named entity which uses the dictionary:
patterns:
$name = $entity<Names>
Variable name
The variable name var
is used for accessing the dictionary contents directly from script snippets, converters in particular.
For example, if we pass the $Names
variable into the log
function, an object with the following structure will be printed into the log:
{
"150": {
"id": "150",
"alternameNames": [
"John",
"Johnny"
],
"value": {
"name": "John",
"sex": "m"
}
},
"151": {
// ...
}
}
Morphologic variance
By default, entity synonyms are recognized in all their morphologic forms. For example, consider the following dictionary of programming languages:
1;Python;{"developers": ["Guido van Rossum"], "date_appeared": 1991}
2;Go;{"developers": ["Robert Griesemer", "Rob Pike", "Ken Thompson"], "date_appeared": 1995}
The entity using this dictionary will match the languages in all morphological forms, including pythons, as well as went.
If such behavior is undesirable, you can specify an additional strict
parameter when importing the dictionary:
require: dicts/languages.csv
name = Language
var = $Language
strict = true
With this parameter on, synonyms will only be recognized in the forms they have in the dictionary.