Small Dictionary of Contemporary Persian




By Charly Moerth

Vienna 2023

This dictionary project has grown out of a university language course. It is highly experimental in nature. Over the time, it was also used as a test-bed for the development of the Viennese Lexicographic Editor (VLE) and semi-automatic data acquisition for digital dictionaries. Recent versions have been edited with dictionary module of the XML-editor TEI-Enricher

The focus in compiling the dictionary has been on contemporary language. In the beginning, all lexical items needed in the classes were entered into the dictionary. Later we started to integrate data not available in other dictionaries. Particular attention has been paid to neologisms such as 'street bump', 'virtual reality', 'hard disk', 'charger' (of a mobile) which can not be found in most of the usually older print dictionaries. For the time being, we do not attempt to achieve anything like a complete coverage.

In compiling the dictionary, a corpus of digital isiZulu texts has been used which was put together from various Internet resources, in particular online newspapers. A very few contemporary literary works have also been used. The TEI encoded data has been made searchable via NoSke. In October 2021 the corpus contained 12,727,434 tokens.

As of 1.10.2023, the published dictionary contained roughly 13.280 lemmata (single and multi-word units) and 13.000 sample sentences.

Through the query interface, you can search for words or groups of words in the dictionary. By simply entering a word and pressing the ENTER button on your keyboard you will trigger the query. Results matching your query will be displayed below the input field.

Mind that all queries are case sensitive.

When you start typing letters in the input field, the preview option will show you a list of tokens that start with the characters you entered so far.

It is possible to search in particular fields of the dictionary. Wildcards are applied on the token level.

Query String Explanation
hamba Find the string hamba Try it!

The interface also supports a simple query language. The names for the fields can be found in the field selector under the input control

lem=uhambo Find the lemma with the string uhambo. Try it!
pos=conjunction Find all conjunctions. Try it!
dom=kinship term Find all entries with a domain label kinship term. Try it!
dom=botany Find all entries with a domain label botany. Try it!

It is possible to use wildcars in the queries.

ham.* All entries with a string ham. Try it!
infl=hambo.* Find all inflected forms (e.g. plurals) containing the string hambo. Try it!
^ham.* The circumflex (^) is supposed to anchor the term at the beginning of a token. The query should then yield results conaining `housekeeping´ or `household´ .. Try it!
.*yu$ With the dollar sign you can anchor the string at the end of the token. Try it!

You can also combine queries.

dom=zoology+lem=^ub.* All animals starting with the string ub. Try it!
[pos=adjective]+[lem=an.*] All adjectives containing the string an. Try it!

Abbreviations used in the dictionary:

imp. imperative
inf. infinitive
loc. locative
NP Nominal phrase
pl. plural
sg. singular
VP Verbal phrase is used for any phrases containing verbs.