Now we want to find a distribution of the class of all verbs into classes of equivalence. As equivalent we regard those verbs that share the same valency list, or those verbs whose valency lists are similar. The algorithm of finding the similar valency lists is implemented in four levels. Each successive level defines the similarity of valency lists in such way that the number of resulted classes is gradually decreasing:
Firstly, this corresponds to the fact that both person and thing share the same type of the logical entity (as described in the next paragraph), and secondly it reflects the problem of distinguishing these two kinds of noun groups in the text. In Czech it is possible with masculine noun groups but difficult with feminine and neuter noun groups.
The number of classes obtained at each level together with additional
statistical information is displayed in the Tab.
.
| ÈyLevel | ||||
| Number of classes | 4.537 | 3.188 | 2.773 | 2.011 |
| Number of verbs | 15.022 | 15.022 | 15.022 | 15.022 |
| Number of valencies | 49.566 | 43.175 | 39.978 | 38.726 |
| Three biggest classes | hTc4 (1420) | hTc4 (1420) | hTc4 (1607) | hPTc4 (2668) |
| hPTc4 (812) | hPTc4 (812) | empty (924) | empty (924) | |
| hTc7 (402) | hA (553) | hPTc4 (919) | hPTc4, | |
| hPTc4-hPTc7, | ||||
| hPTc7 (642) | ||||
| No. of classes with 1 verb | 2.699 (59%) | 1.780 (56%) | 1.521 (55%) | 1.065 (53%) |
| No. of classes with 2 verb | 1.223 (27%) | 884 (28%) | 771 (28%) | 510 (25%) |
| No. of classes with 3 verb | 219 ( 5%) | 155 ( 5%) | 135 ( 5%) | 111 ( 6%) |
| No. of classes with | 396 ( 9%) | 369 (12%) | 346 (12%) | 325 (16%) |
| more than 3 verbs |
The large number of classes with only a few verbs is caused by the fact that in our list all the meanings of a verb are mixed in one valency list for a verb. Thus only verbs that share similar valency patterns in all their meanings can fall into the same class of equivalence. This is not an error, it is just a rather uncomfortable feature of the list as it looks now.
The main reason for constructing the decomposition, is the fact that the verbs in one class have higher probability of being similar in meaning. For example in the class with valency list hPTc4ro,hPTc4ro-hPTc7rs,hPTc7rs there are verbs:
| poprat se |
souperit |
podelit se |
vsadit se |
| porvat se | zápasit | ||
| poškádlit se | zápolit | ||
| svárit se |
where the verbs in each column are similar in meaning (synonymous).