Automated Development of a Grammatical Dictionary for Georgian Dialects

Proceedings of the 7th International Conference on Academic Research in Science, Technology and Engineering

Year: 2025

DOI:

[PDF]

Automated Development of a Grammatical Dictionary for Georgian Dialects

Liana Lortkipanidze, Anna Chutkerashvili

 

ABSTRACT:

This study presents an automated system for compiling grammatical dictionaries of the Georgian language and its dialects. Unlike traditional dictionaries, grammatical dictionaries include not only base word forms but also their entire paradigms, providing morphological and syntactic characteristics that are particularly critical for understanding agglutinative-inflectional languages like Georgian. The proposed system employs a dictionary-based method to expand the vocabulary by identifying and adding words with similar grammatical markers to those already present in the dictionary. The expansion process is guided by linguistic corpora and leverages a novel lemmatization approach for unknown words, enabling the system to derive base forms and paradigms for words not initially included.

The work builds on prior projects, such as the “Dialect Dictionaries with the Functions of Representativeness and Morphological Annotation in Georgian Dialect Corpus” and ” Syntax Annotation of the Georgian Literary Corpus,” and introduces a tool to address the lack of grammatical dictionary compilers for Georgian. It integrates a morphological processor developed in earlier studies to acquire the characteristics of new words from text corpora. The system also employs automatic training techniques to enhance lemmatization, ensuring continuous enrichment of the dictionary. By bridging the gap in computational resources for Georgian dialects, this study offers a scalable and efficient solution for linguistic research, language processing, and annotation tasks.

keywords: Acquisition of Lexicon, Agglutinative Languages, Language Modelling, Lemmatization Rules. Morphological Analysis