Opened 16 years ago
Closed 15 years ago
#13 closed task (fixed)
comparison with other parsers and corpora
Reported by: | Miloš Jakubíček | Owned by: | Vojtěch Kovář |
---|---|---|---|
Priority: | major | Milestone: | Phase #3 |
Component: | set | Keywords: | |
Cc: |
Description
comparison with other syntactic parsers for Czech and with annotated corpus data
Note: See
TracTickets for help on using
tickets.
Comparison with annotated corpus data (manually disambiguated morphological tagging) -- current results:
BPT2000:
PDT etest:
To compare with other dependency parsers, we automatically tagged the PDT etest sentences by the desamb tagger developed in Brno. The parser precision on this tagged testing set is as follows:
average: 72.93 median: 74.07
which is an average result in comparison with other dependency parsers for Czech as listed at http://ufal.mff.cuni.cz/czech-parsing/ .
The main disadvantantage of the SET parser on PDT testing sets seems to be the fact that the set of patterns is designed manually: Many errors and inconsistencies in the data were revealed during the development of the parser (to be published) which complicates the manual specification of the patterns. For machine learning approaches (that are the most successful dependency parsers based on), this does not pose a big problem. However, it is an open question if the precision with respect to PDT data is a representative measurement -- our opinion is that another, application-based techniques of measuring parser accuracy are needed.