Context Navigation

← Previous Ticket
Next Ticket →

#13 closed task (fixed)

comparison with other parsers and corpora

Reported by:	Miloš Jakubíček	Owned by:	Vojtěch Kovář
Priority:	major	Milestone:	Phase #3
Component:	set	Keywords:
Cc:

Description

comparison with other syntactic parsers for Czech and with annotated corpus data

Change History (1)

comment:1 Changed 16 years ago by Vojtěch Kovář

Resolution:	→ fixed
Status:	new → closed

Comparison with annotated corpus data (manually disambiguated morphological tagging) -- current results:

BPT2000:

average: 83.53 median: 87.50

PDT etest:

average: 76.78 median: 78.57

To compare with other dependency parsers, we automatically tagged the PDT etest sentences by the desamb tagger developed in Brno. The parser precision on this tagged testing set is as follows:

average: 72.93 median: 74.07

which is an average result in comparison with other dependency parsers for Czech as listed at http://ufal.mff.cuni.cz/czech-parsing/ .

The main disadvantantage of the SET parser on PDT testing sets seems to be the fact that the set of patterns is designed manually: Many errors and inconsistencies in the data were revealed during the development of the parser (to be published) which complicates the manual specification of the patterns. For machine learning approaches (that are the most successful dependency parsers based on), this does not pose a big problem. However, it is an open question if the precision with respect to PDT data is a representative measurement -- our opinion is that another, application-based techniques of measuring parser accuracy are needed.

Note: See TracTickets for help on using tickets.

Download in other formats: