next up previous
Next: Determining TIL Construction Representing Up: Exploitation of Valency List Previous: Exploitation of Valency List

Semantic Classification of Verbs

Now we want to find a distribution of the class of all verbs into classes of equivalence. As equivalent we regard those verbs that share the same valency list, or those verbs whose valency lists are similar. The algorithm of finding the similar valency lists is implemented in four levels. Each successive level defines the similarity of valency lists in such way that the number of resulted classes is gradually decreasing:

level
-- verbs are equivalent only if they share the same valency list

level
-- in the valency list the valency expressions that are formed by a noun group with preposition (hPr or hTr) are (where it is possible) replaced by one of the expression hL (location), hF (direction from), hA (direction to), hD (way description) or hW (time). After that the valency lists are compared as at the first level.

level
-- the same expression replacement as at the previous level is done and then the valency expressions of location and time are deleted from the valency patterns. The reason for this is that these expressions often represent adjuncts that display circumstantial meaning.

level
-- after processing the valency list in the same way as at the third level, the expressions of person and thing are ``depersonificated'', they are replaced by a two-faced expression meaning ``either person or thing''.

Firstly, this corresponds to the fact that both person and thing share the same type of the logical entity (as described in the next paragraph), and secondly it reflects the problem of distinguishing these two kinds of noun groups in the text. In Czech it is possible with masculine noun groups but difficult with feminine and neuter noun groups.

The number of classes obtained at each level together with additional statistical information is displayed in the Tab.[*].


2.1mm   
Table: Statistics of verb classification
ÈyLevel                                    
Number of classes 4.537 3.188 2.773 2.011
Number of verbs 15.022 15.022 15.022 15.022
Number of valencies 49.566 43.175 39.978 38.726
Three biggest classes hTc4 (1420) hTc4 (1420) hTc4 (1607) hPTc4 (2668)
  hPTc4 (812) hPTc4 (812) empty (924) empty (924)
  hTc7 (402) hA (553) hPTc4 (919) hPTc4,
        hPTc4-hPTc7,
        hPTc7 (642)
No. of classes with 1 verb 2.699 (59%) 1.780 (56%) 1.521 (55%) 1.065 (53%)
No. of classes with 2 verb 1.223 (27%) 884 (28%) 771 (28%) 510 (25%)
No. of classes with 3 verb 219 ( 5%) 155 ( 5%) 135 ( 5%) 111 ( 6%)
No. of classes with 396 ( 9%) 369 (12%) 346 (12%) 325 (16%)
more than 3 verbs        

The large number of classes with only a few verbs is caused by the fact that in our list all the meanings of a verb are mixed in one valency list for a verb. Thus only verbs that share similar valency patterns in all their meanings can fall into the same class of equivalence. This is not an error, it is just a rather uncomfortable feature of the list as it looks now.

The main reason for constructing the decomposition, is the fact that the verbs in one class have higher probability of being similar in meaning. For example in the class with valency list hPTc4ro,hPTc4ro-hPTc7rs,hPTc7rs there are verbs:

poprat se[*] souperit[*] podelit se[*] vsadit se[*]
porvat se zápasit    
poškádlit se zápolit    
svárit se      

where the verbs in each column are similar in meaning (synonymous).


next up previous
Next: Determining TIL Construction Representing Up: Exploitation of Valency List Previous: Exploitation of Valency List
Pavel Smrz 2001-03-18