Změny mezi verzí 28 a verzí 29 u OcrDataset
- Časová značka:
- 5. 1. 2023 16:16:43 (před 19 měsíci)
Vysvětlivky:
- Nezměněno
- Přidáno
- Odstraněno
- Změněno
-
OcrDataset
v28 v29 2 2 This is an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification. 3 3 4 You can [https://hdl.handle.net/11234/1-4615 downloadthe dataset from 2021] and [https://hdl.handle.net/11234/1-4935 supplementary materials from 2022] in the LINDAT/CLARIAH-CZ repository.4 You can download [https://hdl.handle.net/11234/1-4615 the dataset from 2021] and [https://hdl.handle.net/11234/1-4935 supplementary materials from 2022] in the LINDAT/CLARIAH-CZ repository. 5 5 6 6 == Contents == … … 32 32 If you use our dataset in your work, please cite the following articles: 33 33 34 Novotný, V., Seidlová, K., Vrabcová, T., Horák, A.: When Tesseract Brings Friends: Layout Analysis, Language Identification, and Super-Resolution in the Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) '' Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2021''. pp. 91–100. ISSN 2336-4289. ISBN 978-80-263-1600-8. Tribun EU (2021). Available also from WWW: https://nlp.fi.muni.cz/raslan/2021/paper10.pdf34 Novotný, V., Seidlová, K., Vrabcová, T., Horák, A.: When Tesseract Brings Friends: Layout Analysis, Language Identification, and Super-Resolution in the Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) '' Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2021'' . pp. 91–100. ISSN 2336-4289. ISBN 978-80-263-1600-8. Tribun EU (2021). Available also from WWW: https://nlp.fi.muni.cz/raslan/2021/paper10.pdf 35 35 36 Novotný, V., Horák, A.: When Tesseract Meets PERO: Open-Source Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) '' Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2022''. pp. 157–160. ISSN 2336-4289. ISBN 978-80-263-1752-4. Tribun EU (2022). Available also from WWW: https://nlp.fi.muni.cz/raslan/2022/paper12.pdf36 Novotný, V., Horák, A.: When Tesseract Meets PERO: Open-Source Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) '' Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2022'' . pp. 157–160. ISSN 2336-4289. ISBN 978-80-263-1752-4. Tribun EU (2022). Available also from WWW: https://nlp.fi.muni.cz/raslan/2022/paper12.pdf 37 37 38 38 If you use LaTeX, you can use the following BibTeX entries: