Automatic syntactic analysis for real-world applications
PhD thesis

Vojtěch Kovář
Faculty of Informatics, Masaryk University
Botanická 68a, 60200 Brno, Czech Republic


Syntactic analysis (parsing) of natural languages is a subfield of natural language processing (NLP) that is often claimed to be a “corner stone” of the area, a necessary base for any advanced language processing and real understanding. Syntactic analysis deals with revealing the sentence structure, language units bearing meaning and relationships among them; it is hard to imagine real language understanding without this information. On the other hand, in current practical ”intelligent” applications, syntactic processing is often substituted by purely stochastic methods. There are even visible opinions in the NLP community claiming that syntactic analysis is not really needed in practical applications.

In this work, we analyse the current status of the field and identify the particular problems that it is suffering from. Based on this analysis, we propose next directions that the research of parsing should accommodate. We discuss methodology and evaluation, manual annotation procedures and approaches to design and implementation of the natural language parsing tools.

Then, we describe results of our research within these directions. They include a new format for manual syntactic annotation and two application oriented tools for automatic syntactic analysis. Many application usages of these tools are reported which supports our methodology considerations. Evaluation of the practical outputs of the work is provided, and evaluation methodology problems are discussed.


