= Bulky == Description Bulky is a list of 9109 Czech sentences where interlingual homographs cause problems in tagging. We observed that interlingual homographs, e.g., Czech-English homographs such as ''step'', ''drop'', ''barely'', ''car'', ''copy'', are often tagged incorrectly in Czech corpora. This subcorpus can serve as a test for enhanced taggers. More information about the corpus can be found in ''PELIKÁNOVÁ, Zuzana a Zuzana NEVĚŘILOVÁ. Corpus Annotation Pipeline for Non-standard Texts. In P. Sojka, A. Horák, I. Kopeček, K. Pala. Text, Speech, and Dialogue, 21st International Conference, TSD 2018. Switzerland: Springer International Publishing, 2018, s. 304-312. ISBN 978-3-030-00794-2. Dostupné z: https://dx.doi.org/10.1007/978-3-030-00794-2_32''. == LINDAT handle http://hdl.handle.net/11234/1-2822 == Acknowledgements If you use the system, please cite the related publication as well as the LINDAT/CLARIAH infrastructure: http://hdl.handle.net/11234/1-2822 == Publication info https://www.muni.cz/vyzkum/publikace/1471077 {{{ @InProceedings{10.1007/978-3-030-00794-2_32, author="Pelikánová, Zuzana and Nevěřilová, Zuzana", editor="Sojka, Petr and Hor{\'a}k, Ale{\v{s}} and Kope{\v{c}}ek, Ivan and Pala, Karel", title="Corpus Annotation Pipeline for Non-standard Texts", booktitle="Text, Speech, and Dialogue", year="2018", publisher="Springer International Publishing", pages="295--303", isbn="978-3-030-00794-2" } }}} == License Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)