Free natural language morphology
This page provides a very fast (approx. 1M words per second) free morphological analyzer Majka including databases for Czech, Slovak, Polish, Swedish, German, French, Italian, English, Portuguese, Catalan, Welsh, Spanish, Galician, Asturian and Russian.
Free morphological analyzer Majka
Binaries
Download Majka binaries for Linux / for Windows.
Usage: program expects one entry (word, lemma, or string lemma:tag,
according the data file in use) per line on its standard
input and prints the requested information on its standard output. An example
of usage (for other options see majka -h):
$ echo test | majka -f majka.w-lt
test:k1gInSc1
test:k1gInSc4
test:k1gMnSc1
testa:k1gFnPc2
Source codes
References
When using majka for research purposes, please cite:
Pavel Šmerk. Fast Morphological Analysis of Czech. In Petr Sojka and Aleš Horák. Proceedings of Third Workshop on Recent
Advances in Slavonic Natural Language Processing, RASLAN 2009. Brno : Masaryk University, 2007. p. 13–16. ISBN
978-80-210-5048-8.
Free morphological databases for Majka
-
- data for assigning lemmata and tags to analyzed word forms,
- data for generating all word forms and tags of a given lemma,
- data for generating word forms according to a given lemma and tag.
- Data cover 90 % of CZES corpus (around 500 M words). They contain
3,393,080 wordform+lemma+tag triplets, which is 903,888 distinct word forms and
46,000 lemmata. The data also contain parts of almost 8 billion compound
words.
- You may also use our web service API to obtain the information.
- For the full version of data, please, contact us (see below).
- source vendor: Natural Language Processing Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic
- LICENSE:
Creative Commons Attribution-ShareAlike 3.0 Unported License
-
-
-
-
-
- data for assigning lemmata and tags to analyzed word forms,
- source vendor: Lefff (Lexique des Formes Fléchies du Français / Lexicon of French inflected forms)
- LICENSE: LGPL-LR free software license
-
-
-
-
-
-
-
-
-
Contact
Pavel Šmerk, Ph.D.
ma@nlp.fi.muni.cz
Natural Language Processing Centre
Faculty of Informatics, Masaryk University
Botanická 68a, 602 00, Brno, Czech Republic
Developers information in Czech