Opened 15 years ago

Closed 14 years ago

#13 closed task (fixed)

comparison with other parsers and corpora

Reported by: Miloš Jakubíček Owned by: Vojtěch Kovář
Priority: major Milestone: Phase #3
Component: set Keywords:
Cc:

Description

comparison with other syntactic parsers for Czech and with annotated corpus data

Change History (1)

comment:1 Changed 14 years ago by Vojtěch Kovář

Resolution: fixed
Status: newclosed

Comparison with annotated corpus data (manually disambiguated morphological tagging) -- current results:

BPT2000:

average: 83.53 median: 87.50

PDT etest:

average: 76.78 median: 78.57

To compare with other dependency parsers, we automatically tagged the PDT etest sentences by the desamb tagger developed in Brno. The parser precision on this tagged testing set is as follows:

average: 72.93 median: 74.07

which is an average result in comparison with other dependency parsers for Czech as listed at http://ufal.mff.cuni.cz/czech-parsing/ .

The main disadvantantage of the SET parser on PDT testing sets seems to be the fact that the set of patterns is designed manually: Many errors and inconsistencies in the data were revealed during the development of the parser (to be published) which complicates the manual specification of the patterns. For machine learning approaches (that are the most successful dependency parsers based on), this does not pose a big problem. However, it is an open question if the precision with respect to PDT data is a representative measurement -- our opinion is that another, application-based techniques of measuring parser accuracy are needed.

Note: See TracTickets for help on using tickets.