MENU

Introduction

Previous Work

Completed Work

Work to Be Done

Presentations

Download

Contact Information

A Spell Checker for Esperanto

Introduction

The objective of this project is to design and implement a spell-checking dictionary (i.e. an affix file and a word list) for the Esperanto language using the Hunspell spell checker. The result is meant to work both as a stand-alone spell checker and within the framework of the grammar checker project developed by the organization E@I and funded by the Esperantic Studies Foundation. The work was started in October 2007 as a Bachelor's thesis at the Faculty of Informatics at the Masaryk University in Brno, Czech Republic, which was defended on June 26, 2008 and had produced a proof-of-concept implementation as a result.

In December 2008, the project entered a second phase, in which it was financed by the Students' Research and Development Projects scholarship of the same university and faculty (project no. MUNI33/212008). The objectives of this second phase included:

  • enlarging the set of recognized morphological phenomena,
  • enhancing functionality by testing and modification,
  • improving user experience,
  • distributing the work as an official spell-checking package for OpenOffice.org which gets downloaded and installed automatically with the Esperanto localization of the software.

The project is run by Bc. Marek Blahuš, a student at the Masaryk University, and mentored by doc. RNDr. Petr Sojka, Ph.D., who was also the patron of the thesis.

Previous Work

A Bachelor's thesis called "A Spell Checker for Esperanto" and written in English was defended on June 26, 2008, at the Masaryk University. Its English abstract is quoted below, the full text may be downloaded from the thesis' archive. There is also an experimental online demo.

Abstract of the Thesis

This thesis provides a brief overview of spell checking software and describes the process of constructing a spell checker for the Esperanto language and its implementation as a dictionary (i.e. an affix file and a word list) for the Hunspell spell checker. The word list is an adaptation of word roots coming from the renowned Esperanto dictionary PIV. Recognition of morphologically complex words, which are common in Esperanto due to its agglutinative nature, is made possible by the affix file which has been built based on ready-made morpheme segmentation of word derivations appearing in the same source. Rules derived in the latter process have been improved by semantic classification of all involved roots, for which a system has been created based on corpus analysis and several specialized dictionaries, in combination with knowledge on the capability of each affix to accept roots from different semantic classes, acquired from the PMEG reference grammar. The resulting spell checker is a working proof of concept, to be further improved and integrated in the grammar checker project of the E@I organization.

Completed Work

The work on the second phase of the project started in December 2008. For the purpose of making it easier to follow the project's development, a time plan consisting of three stages and three checkpoints had been set up in the scholarship application.

  • The 1st stage has focused on enhancing the spell checker's functionality. Expected results were input-output samles showing the enhacements made, and a preliminary research on the OpenOffice.org integration. By March 2009, Checkpoint 1 has been reached.
  • The 2nd stage has focused on integrating the spell checker in the OpenOffice.org environment. Expected results were a user-ready spell checker distributed as a part of OpenOffice.org, and input-output samples for all new spell checking enhacements. By May 2009, Checkpoint 2 has been reached.
  • The 3rd stage has focused on collecting user feedback and debugging. Expected results were a functioning bug-report system along with a description of received feedback and input-output samples of changes made in reaction to it. By the end of November 2009, Checkpoint 3 has been reached.

At the end of the development time, a Final Report has been produced. This has been translated into Czech and used as foundation for completing the official final report form requested to close up the project correctly in accordance with the scholarship's rules.

The resulting software has been published and is now available for download: see the download section.

Future Work

In December 2009, the project's development within the scholarship project was finished, yet the results are still to be defended in a public presentation to be held in Brno on May 17, 2010, in front of a jury by means of a 10-minute presentation (in Czech).

As of May 2010, there is no complete OpenOffice.org Esperanto localization, as the advent of OpenOffice.org 3.0 has put a new burden on the translation team and the help files still remain untranslated. In spite of this, it is already possible to download a complete build of OpenOffice.org from its official website which includes the Esperanto localization (in its current state) as well as the developed spell checker included in it, what was one of the goals of the scholarship project (see the download section for details). Once a full Esperanto localization of the package appears, the spell checker will be a part of it as well.

By achieving the goals determined in the scholarship application, the project has now entered its third phase, in which the development will be continued, with a foreseen concentration on some of the technical aspects that may have not been solved optimally so far, particularly due to inner limitations of the software implied, what makes fine code hacking necessary in some cases in order to increase efficiency. Predictably, additional user feedback and releases of new Hunspell versions will also be stimulating the project's further development in the upcoming months.

Presentations

During the project's history, it has been presented at several occasions, both to the science community and the community of prospective users. Following is a list of all such presentations. At a later time, some presentations may be made downloadable from this webpage:

  • 2008-04-02 Brno (CZ), Faculty of Informatics, Seminar of the Natural Language Processing Laboratory
  • 2008-06-26 Brno (CZ), Faculty of Informatics, Bachelor's Thesis Defense
  • 2008-07-25 Rotterdam (NL), 93rd World Congress of Esperanto, Meeting of Computer Linguists
  • 2008-09-27 Stockholm (SE), E@I Grammar Checker (Lingvohelpilo) Developers Meeting
  • 2008-11-22 Berlin (DE), Gesellschaft für Interlinguistik, 18. Tagung
  • 2009-02-06 Antwerpen (BE), La Verda Stelo, local Esperanto speakers group
  • 2009-02-09 Ottignies-Louvain-la-Neuve (BE), Kvinfolio, local Esperanto speakers group
  • 2009-11-25 Brno (CZ), Faculty of Informatics, Seminar of the Natural Language Processing Laboratory
  • 2009-12-04 Karlova Studánka (CZ), Third Workshop on Recent Advances in Slavonic Natural Language Processing RASLAN 2009
  • 2010-11-21 Modra (SK), Conference on Application of Esperanto in Science and Technology (KAEST)

In relation to the project, there have also appeared several publications:

  • Blahuš, Marek. A Spell Checker for Esperanto. Brno : Masaryk University, Faculty of informatics, 2008. 40 pp. Bachelor's thesis. Text in English. Patron of the thesis RNDr. Petr Sojka, Ph.D. Available online.
  • Blahuš, Marek. Rechtschreibprüfung für Esperanto und andere Sprachen In Fiedler, Sabine [Hrsg.]. Esperanto und andere Sprachen im Vergleich : Beiträge der 18. Jahrestagung der Gesellschaft für Interlinguistik e.V., 21.-23. November 2008, in Berlin. Berlin : Gesellschaft für Interlinguistik e.V., 2009. Pages 131-136. Text in German. ISSN 1432-3567.
  • Blahuš, Marek. Morphology-Aware Spell-Checking Dictionary for Esperanto In Sojka, Petr & Horák, Aleš (Eds.). Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2009. Brno : Masaryk University, 2009. pp. 3-8. Text in English. ISBN 978-80-210-5048-8. Available online.

At least one scholar work is known to have cited one of these publications:

  • Kück, Andreas. Quantitative Studien zum Esperanto unter besonderer Berücksichtigung der Wortbekanntheit. Trier : Universität Trier, Fachbereich 2, 2009. 212 pp. Ph.D. thesis. Text in German. Patrons of the thesis Professor Dr. Reinhard Köhler & Professor Dr. Peter Grzybek. Available online.

Download

Files related to the Bachelor's thesis may be downloaded from the thesis' archive. It is also possible to inspect the wordlist and the affix file which form the spell-checking dictionary that was the outcome of the thesis.

The OpenOffice.org dictionary extension (named "Esperanto-literumilo de E@I") may be downloaded from the official OpenOffice.org repository for Extensions. It is also featured in the list of OpenOffice's supported dictionary extensions at the same site. The software is open source and is released under the GNU Lesser General Public Licence Version 3 (LGPLv3), in compliance with the OpenOffice.org's requirements for extensions to be included by default. For the purpose of its future integration in the Mozilla Firefox browser, it is available also under the triple GPL/LGPL/MPL licence.

The official website of the OpenOffice.org Esperanto localization provides information about the project on its download and Esperanto and OpenOffice.org pages. It features also a link to the current OpenOffice.org Esperanto build with the spell checker included.

Contact Information

The project's author and mentor may be contacted using their respective personal pages at the Masaryk University servers:


Back to top