Building a Valency List

Next: The Format for Valency Up: Computer Processing of Valencies Previous: Computer Processing of Valencies

Building a Valency List

In our research we use a valency list of Czech language described in [3]. This list was built from three main sources without any significant use of automatic processing. The sources were the Dictionary of Czech Synonyms [4] and two other representative Czech explanatory dictionaries [5,6]. All these dictionaries contain some information on verb valencies. That information was extracted from the dictionaries and unified. The obtained list was then supplemented with some extra verbs that we found in corpus. The supplementing was done ``manually'' by a linguist. After some cleaning the list now contains valency lists for 15.022 Czech verb forms. The main deficiency of our list is that its items do not distinguish various verb meanings so far. This brings some drawbacks into the verb classification as we will mention later.

A more sophisticated technique of building a valency list is based on exploring a language by means of its representative -- corpus (see [7]). The main problem of this technique is the need to possess a good tool for syntactic analysis of the language, which is, in the case of Czech, still a to-do task. Pavel Smrz has developed a syntactic parser based on LALR(1) analysis with backtracking and has put together a grammar of Czech. His system covers a still increasing number of various Czech texts and thus provides means for the desired syntactic analysis. We assume that it will be soon possible to obtain a ``live'' valency list directly from corpus and compare the results of the verb classification with those presented in this paper.

Next: The Format for Valency Up: Computer Processing of Valencies Previous: Computer Processing of Valencies

Pavel Smrz 2001-03-18