Změny mezi verzí 21 a verzí 22 u OcrDataset


Ignorovat:
Časová značka:
30. 11. 2022 15:22:43 (před 20 měsíci)
Autor:
xnovot32@fi.muni.cz
Komentář:

--

Vysvětlivky:

Nezměněno
Přidáno
Odstraněno
Změněno
  • OcrDataset

    v21 v22  
    55
    66== Contents ==
    7 The dataset from 2021 is structured as follows:
     7[https://hdl.handle.net/11234/1-4615 The dataset from 2021] is structured as follows:
    88
    99 * The archive [https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-4615/scanned-images.zip?sequence=7&isAllowed=y scanned-images.zip] (47.13 GB) contains 51,351 high-resolution scanned images.
     
    1717 * The archive [https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-4615/annotations-language-identification.zip?sequence=3&isAllowed=y annotations-language-identification.zip] (1.1 MB) contains 122 annotations for the evaluation of language identification.
    1818
    19 The supplementary materials from 2022 are structured as follows:
     19[https://nlp.fi.muni.cz/projects/ahisto/ocr-texts-supplementary.zip The supplementary materials from 2022] are structured as follows:
    2020
    2121 * The archive [https://nlp.fi.muni.cz/projects/ahisto/ocr-texts-supplementary.zip ocr-texts-supplementary.zip] (24.39 MB) contains 110 OCR texts for which we have both high-resolution scanned images and annotations for OCR evaluation.[[BR]]The archive is divided into a number of subdirectories with outputs of different OCR engines:
     
    3232If you use our dataset in your work, please cite the following articles:
    3333
    34   Novotný, V., Seidlová, K., Vrabcová, T., Horák, A.: When Tesseract Brings Friends: Layout Analysis, Language Identification, and Super-Resolution in the Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) ''                Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2021''  . pp. 91–100. ISSN 2336-4289. ISBN 978-80-263-1600-8. Tribun EU (2021). Available also from WWW: https://nlp.fi.muni.cz/raslan/2021/paper10.pdf
     34  Novotný, V., Seidlová, K., Vrabcová, T., Horák, A.: When Tesseract Brings Friends: Layout Analysis, Language Identification, and Super-Resolution in the Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) ''                 Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2021''   . pp. 91–100. ISSN 2336-4289. ISBN 978-80-263-1600-8. Tribun EU (2021). Available also from WWW: https://nlp.fi.muni.cz/raslan/2021/paper10.pdf
    3535
    36   Novotný, V., Horák, A.: When Tesseract Meets PERO: Open-Source Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) ''  Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2022''  . pp. 157–160. ISSN 2336-4289. ISBN 978-80-263-1752-4. Tribun EU (2022). Available also from WWW: https://nlp.fi.muni.cz/raslan/2022/paper12.pdf
     36  Novotný, V., Horák, A.: When Tesseract Meets PERO: Open-Source Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) ''   Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2022''   . pp. 157–160. ISSN 2336-4289. ISBN 978-80-263-1752-4. Tribun EU (2022). Available also from WWW: https://nlp.fi.muni.cz/raslan/2022/paper12.pdf
    3737
    3838If you use LaTeX, you can use the following BibTeX entries: