Změny mezi verzí 24 a verzí 25 u OcrDataset
- Časová značka:
- 30. 11. 2022 15:37:30 (před 20 měsíci)
Vysvětlivky:
- Nezměněno
- Přidáno
- Odstraněno
- Změněno
-
OcrDataset
v24 v25 19 19 [https://nlp.fi.muni.cz/projects/ahisto/ocr-texts-supplementary.zip The supplementary materials from 2022] are structured as follows: 20 20 21 * The archive [https://nlp.fi.muni.cz/projects/ahisto/ocr-texts-supplementary.zip ocr-texts-supplementary.zip] (23.2 MB) contains 110 OCR texts for which we have both high-resolution scanned images and annotations for OCR evaluation.[[BR]]The archive is divided into a number of subdirectories with outputs of different OCR engines:21 * The archive [https://nlp.fi.muni.cz/projects/ahisto/ocr-texts-supplementary.zip ocr-texts-supplementary.zip] (23.26 MB) contains 110 OCR texts for which we have both high-resolution scanned images and annotations for OCR evaluation.[[BR]]The archive is divided into a number of subdirectories with outputs of different OCR engines: 22 22 * The subdirectory `google-vision-ai-old` contains JSON and TXT documents from the Google Vision AI OCR engine from 2020-10-02. 23 23 * The subdirectory `google-vision-ai` contains JSON and TXT documents from the Google Vision AI OCR engine from 2022-08-11. … … 32 32 If you use our dataset in your work, please cite the following articles: 33 33 34 Novotný, V., Seidlová, K., Vrabcová, T., Horák, A.: When Tesseract Brings Friends: Layout Analysis, Language Identification, and Super-Resolution in the Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) '' Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2021''. pp. 91–100. ISSN 2336-4289. ISBN 978-80-263-1600-8. Tribun EU (2021). Available also from WWW: https://nlp.fi.muni.cz/raslan/2021/paper10.pdf34 Novotný, V., Seidlová, K., Vrabcová, T., Horák, A.: When Tesseract Brings Friends: Layout Analysis, Language Identification, and Super-Resolution in the Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) '' Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2021'' . pp. 91–100. ISSN 2336-4289. ISBN 978-80-263-1600-8. Tribun EU (2021). Available also from WWW: https://nlp.fi.muni.cz/raslan/2021/paper10.pdf 35 35 36 Novotný, V., Horák, A.: When Tesseract Meets PERO: Open-Source Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) '' Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2022''. pp. 157–160. ISSN 2336-4289. ISBN 978-80-263-1752-4. Tribun EU (2022). Available also from WWW: https://nlp.fi.muni.cz/raslan/2022/paper12.pdf36 Novotný, V., Horák, A.: When Tesseract Meets PERO: Open-Source Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) '' Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2022'' . pp. 157–160. ISSN 2336-4289. ISBN 978-80-263-1752-4. Tribun EU (2022). Available also from WWW: https://nlp.fi.muni.cz/raslan/2022/paper12.pdf 37 37 38 38 If you use LaTeX, you can use the following BibTeX entries: