Skip to main content

Named entity dictionaries

tip
.csv files are named entity dictionaries.

Named entity dictionaries are used for processing multiple named entities via patterns.

info

A named entity is a word or a phrase that distinguishes an object or a phenomenon among other objects or phenomena of a similar type. These can be names of cities, countries, currencies.

Dictionary structure

Dictionaries are specified in files with the .csv extension, where every line follows the following format:

id; name; value

ID

The entity identifier.

caution
IDs within one dictionary must be unique.

Name

A name or a phrase designating the entity.

tip
You can add multiple synonyms inside the name, separated by a comma.

When you import the dictionary to the script, the bot will recognize the entity in requests containing one of the possible synonyms.

Value

The data associated with the entity.

The data may be represented by a string without quotes or a JavaScript object. Here you can add any relevant information in order to access it from the script.

tip
For example, when adding multiple synonyms into the entity name, you may specify its primary, normalized name in the entity value.

Example

For example, here is a fragment of a dictionary of proper names:

150;John, Johnny;{"name": "John", "sex": "m"}
151;Jane;{"name": "Jane", "sex": "f"}
152;Jennifer, Jenniffer;{"name": "Jennifer", "sex": "f"}
153;Joaquin;{"name": "Joaquin", "sex": "m"}

Importing a dictionary

Dictionaries can be imported via the require tag. Specify the path to the file, as well as the values of the name and var parameters:

require: dicts/names.csv
name = Names
var = $Names

Dictionary name

The dictionary name, specified after name, is used for defining a named entity which uses the dictionary:

patterns:
$name = $entity<Names>

Variable name

The variable name var is used for accessing the dictionary contents directly from script snippets, converters in particular.

For example, if we pass the $Names variable into the log function, an object with the following structure will be printed into the log:

{
"150": {
"id": "150",
"alternameNames": [
"John",
"Johnny"
],
"value": {
"name": "John",
"sex": "m"
}
},
"151": {
// ...
}
}

Morphologic variance

By default, entity synonyms are recognized in all their morphologic forms. For example, consider the following dictionary of programming languages:

1;Python;{"developers": ["Guido van Rossum"], "date_appeared": 1991}
2;Go;{"developers": ["Robert Griesemer", "Rob Pike", "Ken Thompson"], "date_appeared": 1995}

The entity using this dictionary will match the languages in all morphological forms, including pythons, as well as went.

caution
All synonyms must be in their canonical forms.

If such behavior is undesirable, you can specify an additional strict parameter when importing the dictionary:

require: dicts/languages.csv
name = Language
var = $Language
strict = true

With this parameter on, synonyms will only be recognized in the forms they have in the dictionary.