Semantic Classification of Verbs

Next: Determining TIL Construction Representing Up: Exploitation of Valency List Previous: Exploitation of Valency List

Semantic Classification of Verbs

Now we want to find a distribution of the class of all verbs into classes of equivalence. As equivalent we regard those verbs that share the same valency list, or those verbs whose valency lists are similar. The algorithm of finding the similar valency lists is implemented in four levels. Each successive level defines the similarity of valency lists in such way that the number of resulted classes is gradually decreasing:

level

-- verbs are equivalent only if they share the same valency list

level

-- in the valency list the valency expressions that are formed by a noun group with preposition (hPr or hTr) are (where it is possible) replaced by one of the expression hL (location), hF (direction from), hA (direction to), hD (way description) or hW (time). After that the valency lists are compared as at the first level.

level

-- the same expression replacement as at the previous level is done and then the valency expressions of location and time are deleted from the valency patterns. The reason for this is that these expressions often represent adjuncts that display circumstantial meaning.

level

-- after processing the valency list in the same way as at the third level, the expressions of person and thing are ``depersonificated'', they are replaced by a two-faced expression meaning ``either person or thing''.

Firstly, this corresponds to the fact that both person and thing share the same type of the logical entity (as described in the next paragraph), and secondly it reflects the problem of distinguishing these two kinds of noun groups in the text. In Czech it is possible with masculine noun groups but difficult with feminine and neuter noun groups.

The number of classes obtained at each level together with additional statistical information is displayed in the Tab..

2.1mm

**Table:** Statistics of verb classification
ÈyLevel
Number of classes	4.537	3.188	2.773	2.011
Number of verbs	15.022	15.022	15.022	15.022
Number of valencies	49.566	43.175	39.978	38.726
Three biggest classes	hTc4 (1420)	hTc4 (1420)	hTc4 (1607)	hPTc4 (2668)
	hPTc4 (812)	hPTc4 (812)	empty (924)	empty (924)
	hTc7 (402)	hA (553)	hPTc4 (919)	hPTc4,
				hPTc4-hPTc7,
				hPTc7 (642)
No. of classes with 1 verb	2.699 (59%)	1.780 (56%)	1.521 (55%)	1.065 (53%)
No. of classes with 2 verb	1.223 (27%)	884 (28%)	771 (28%)	510 (25%)
No. of classes with 3 verb	219 ( 5%)	155 ( 5%)	135 ( 5%)	111 ( 6%)
No. of classes with	396 ( 9%)	369 (12%)	346 (12%)	325 (16%)
more than 3 verbs

The large number of classes with only a few verbs is caused by the fact that in our list all the meanings of a verb are mixed in one valency list for a verb. Thus only verbs that share similar valency patterns in all their meanings can fall into the same class of equivalence. This is not an error, it is just a rather uncomfortable feature of the list as it looks now.

The main reason for constructing the decomposition, is the fact that the verbs in one class have higher probability of being similar in meaning. For example in the class with valency list hPTc4ro,hPTc4ro-hPTc7rs,hPTc7rs there are verbs:

poprat se souperit podelit se vsadit se

porvat se zápasit

poškádlit se zápolit

svárit se

where the verbs in each column are similar in meaning (synonymous).

Next: Determining TIL Construction Representing Up: Exploitation of Valency List Previous: Exploitation of Valency List

Pavel Smrz 2001-03-18

`poprat se`	`souperit`	`podelit se`	`vsadit se`
`porvat se`	`zápasit`
`poškádlit se`	`zápolit`
`svárit se`