Parsing of Czech: Between Rules and Stats

IV161 NLP in Practice Course, Course Guarantee: Aleš Horák

Prepared by: Aleš Horák, Miloš Jakubíček

State of the Art

Revealing the structural organization of words in a sentence is crucial to understanding its meaning. In the case of a morphologically rich language such as Czech, disambiguating word form and meaning also plays a role. Parsing (syntactic analysis) and word-level analysis (morphological analysis) allow us to present the complex structure of phrases and sub-phrases in a form suitable for further processing in applications. State-of-the-art results are based on training models with large datasets, but grammar-based approaches offer fine-grained control over what is correct and incorrect.

References

Bauer, John, and Christopher D. Manning. "High-Accuracy Transition-Based Constituency Parsing." Proceedings of the 18th International Conference on Parsing Technologies (IWPT, SyntaxFest 2025). 2025.
Fernández-González, D., & Gómez-Rodríguez, C. (2023). Dependency parsing with bottom-up hierarchical pointer networks. Information Fusion, 91, 494-503.
Arps, D., Samih, Y., Kallmeyer, L., & Sajjad, H. (2022). Probing for constituency structure in neural language models. arXiv preprint arXiv:2204.06201.
Baisa, V. and Kovář, V. (2014). Information extraction for Czech based on syntactic analysis. In Vetulani, Z. and Mariani, J., editors,Human Language Technology Challenges for Computer Science and Linguistics, pages 155–165. Springer International Publishing.

Practical Session

We will develop/adjust the grammar of the SET parser (for English or Czech).

Open Google Colab notebook IV161-Parsing Czech and follow the text and code in it.

Upload the resulting grammar file with improved UAS to the homework vault