| 1 | = Bulky |
| 2 | |
| 3 | == Description |
| 4 | |
| 5 | Bulky is a list of 9109 Czech sentences where interlingual homographs cause problems in tagging. We observed that interlingual homographs, e.g., Czech-English homographs such as ''step'', ''drop'', ''barely'', ''car'', ''copy'', are often tagged incorrectly in Czech corpora. This subcorpus can serve as a test for enhanced taggers. |
| 6 | |
| 7 | More information about the corpus can be found in ''PELIKÁNOVÁ, Zuzana a Zuzana NEVĚŘILOVÁ. Corpus Annotation Pipeline for Non-standard Texts. In P. Sojka, A. Horák, I. Kopeček, K. Pala. Text, Speech, and Dialogue, 21st International Conference, TSD 2018. Switzerland: Springer International Publishing, 2018, s. 304-312. ISBN 978-3-030-00794-2. Dostupné z: https://dx.doi.org/10.1007/978-3-030-00794-2_32''. |
| 8 | |
| 9 | == LINDAT handle |
| 10 | |
| 11 | http://hdl.handle.net/11234/1-2822 |
| 12 | |
| 13 | == Acknowledgements |
| 14 | This software was developed within the projects LC536 and 2C06009 and is owned by Masaryk University, Faculty of Informatics, NLP Centre. |
| 15 | |
| 16 | If you use the system, please cite the related publication as well as the LINDAT/CLARIAH infrastructure: [link do repozitáře (handle daného submission)] |
| 17 | |
| 18 | == Publication info |
| 19 | |
| 20 | https://www.muni.cz/vyzkum/publikace/1471077 |
| 21 | |
| 22 | {{{ |
| 23 | @InProceedings{10.1007/978-3-030-00794-2_32, |
| 24 | author="Pelikánová, Zuzana and Nevěřilová, Zuzana", |
| 25 | editor="Sojka, Petr and Hor{\'a}k, Ale{\v{s}} and Kope{\v{c}}ek, Ivan and Pala, Karel", |
| 26 | title="Corpus Annotation Pipeline for Non-standard Texts", |
| 27 | booktitle="Text, Speech, and Dialogue", |
| 28 | year="2018", |
| 29 | publisher="Springer International Publishing", |
| 30 | pages="295--303", |
| 31 | isbn="978-3-030-00794-2" |
| 32 | } |
| 33 | }}} |
| 34 | |
| 35 | == Licence |
| 36 | |
| 37 | Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) |
| 38 | |
| 39 | |
| 40 | |
| 41 | |