A Spell Checker for Esperanto

Introduction

The objective of this project was to design and implement a spell checking dictionary (i.e. an affix file and a word list) for the Esperanto language using the Hunspell spell checker. The result was meant to work both as a stand-alone spell checker and within the framework of the grammar checker project developed by the organization E@I and funded by the Esperantic Studies Foundation. The work was started in October 2007 as a Bachelor's thesis at the Faculty of Informatics at the Masaryk University in Brno, Czech Republic, which was defended on June 26, 2008 and had a proof-of-concept implementation as a result.

Bachelor's Thesis

An annotation of the thesis can be found below, the full text (in English) may be downloaded from the thesis' archive. There is also an experimental online demo.

Annotation of the Thesis

This thesis provides a brief overview of spell checking software and describes the process of constructing a spell checker for the Esperanto language and its implementation as a dictionary (i.e. an affix file and a word list) for the Hunspell spell checker. The word list is an adaptation of word roots coming from the renowned Esperanto dictionary PIV. Recognition of morphologically complex words, which are common in Esperanto due to its agglutinative nature, is made possible by the affix file which has been built based on ready-made morpheme segmentation of word derivations appearing in the same source. Rules derived in the latter process have been improved by semantic classification of all involved roots, for which a system has been created based on corpus analysis and several specialized dictionaries, in combination with knowledge on the capability of each affix to accept roots from different semantic classes, acquired from the PMEG reference grammar. The resulting spell checker is a working proof of concept, to be further improved and integrated in the grammar checker project of the E@I organization.

Continuation

Since December 2008, the project has overgrown the scope of a Bachelor's thesis and has entered a second phase in which it is financed by the Students' Research and Development Projects scholarship of the same university and faculty. More information about this project may be found in a separate website.

Contact Information

The thesis was written by Marek Blahuš, a student at the Masaryk University, and patroned by doc. RNDr. Petr Sojka, Ph.D.

The thesis' author and patron may be contacted using their respective personal pages at the Masaryk University servers:

Download

Files related to the Bachelor's thesis may be downloaded from the thesis' archive. It is also possible to inspect the wordlist and the affix file which form the spell checking dictionary that was the outcome of the thesis.


Back to top