Changes between Initial Version and Version 1 of BulkyCorpus


Ignore:
Timestamp:
May 18, 2024, 10:57:42 AM (14 months ago)
Author:
xpopelk
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • BulkyCorpus

    v1 v1  
     1= Bulky
     2
     3== Description
     4
     5Bulky is a list of 9109 Czech sentences where interlingual homographs cause problems in tagging. We observed that interlingual homographs, e.g., Czech-English homographs such as ''step'', ''drop'', ''barely'', ''car'', ''copy'', are often tagged incorrectly in Czech corpora. This subcorpus can serve as a test for enhanced taggers.
     6
     7More information about the corpus can be found in ''PELIKÁNOVÁ, Zuzana a Zuzana NEVĚŘILOVÁ. Corpus Annotation Pipeline for Non-standard Texts. In P. Sojka, A. Horák, I. Kopeček, K. Pala. Text, Speech, and Dialogue, 21st International Conference, TSD 2018. Switzerland: Springer International Publishing, 2018, s. 304-312. ISBN 978-3-030-00794-2. Dostupné z: https://dx.doi.org/10.1007/978-3-030-00794-2_32''.
     8
     9== LINDAT handle
     10
     11http://hdl.handle.net/11234/1-2822
     12
     13== Acknowledgements
     14This software was developed within the projects LC536 and 2C06009 and is owned by Masaryk University, Faculty of Informatics, NLP Centre.
     15
     16If you use the system, please cite the related publication as well as the LINDAT/CLARIAH infrastructure: [link do repozitáře (handle daného submission)]
     17
     18== Publication info
     19
     20https://www.muni.cz/vyzkum/publikace/1471077
     21
     22{{{
     23@InProceedings{10.1007/978-3-030-00794-2_32,
     24   author="Pelikánová, Zuzana and Nevěřilová, Zuzana",
     25   editor="Sojka, Petr and Hor{\'a}k, Ale{\v{s}} and Kope{\v{c}}ek, Ivan and Pala, Karel",
     26   title="Corpus Annotation Pipeline for Non-standard Texts",
     27   booktitle="Text, Speech, and Dialogue",
     28   year="2018",
     29   publisher="Springer International Publishing",
     30   pages="295--303",
     31   isbn="978-3-030-00794-2"
     32}
     33}}}
     34
     35== Licence
     36
     37Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
     38
     39
     40
     41