17 | | * The archive `annotations-language-identification.zip` contains 122 annotations for the evaluation of language identification. |
| 17 | * The archive `annotations-language-identification.zip` contains 122 annotations for the evaluation of language identification. |
| 18 | |
| 19 | The supplementary materials from 2022 are structured as follows: |
| 20 | |
| 21 | * The archive `ocr-texts-supplementary.zip` contains 110 OCR texts for which we have both high-resolution scanned images and also annotations for the evaluation of OCR. |
| 22 | * The subdirectory `google-vision-ai-old` contains JSON and TXT documents from the Google Vision AI OCR engine from 2020-10-02. |
| 23 | * The subdirectory `google-vision-ai` contains JSON and TXT documents from the Google Vision AI OCR engine from 2022-08-11. |
| 24 | * The subdirectory `pero-demo` contains PAGE and TXT documents from [https://pero-ocr.fit.vutbr.cz/ the web demo of the PERO OCR engine]. |
| 25 | * The subdirectory `pero-github` contains PAGE and TXT documents from [https://github.com/DCGM/pero-ocr the open-source variant of the PERO OCR engine] using [https://www.fit.vut.cz/~ihradis/pero/pero_eu_cz_print_newspapers_2020-10-09.tar.gz public pretrained models]. |
| 26 | * The subdirectory `tesseract` contains HOCR and TXT documents from the Tesseract 4 OCR engine. |
| 27 | * The subdirectory `tesseract-and-google-vision-ai-old` contains TXT documents that combine `tesseract` and `google-vision-ai-old` documents. |
| 28 | * The subdirectory `tesseract-and-google-vision-ai` contains TXT documents that combine `tesseract` and `google-vision-ai` documents. |
| 29 | * The subdirectory `tesseract-and-pero-github` contains TXT documents that combine `tesseract` and `pero-github` documents.[[BR]] 12] with pre-trained models ![3]. |
22 | | Novotný, V., Seidlová, K., Vrabcová, T., Horák, A.: When Tesseract Brings Friends: Layout Analysis, Language Identification, and Super-Resolution in the Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) '' Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2021''. pp. 91–100. ISSN 2336-4289. ISBN 978-80-263-1600-8. Tribun EU (2021). Available also from WWW: https://nlp.fi.muni.cz/raslan/2021/paper10.pdf |
| 34 | Novotný, V., Seidlová, K., Vrabcová, T., Horák, A.: When Tesseract Brings Friends: Layout Analysis, Language Identification, and Super-Resolution in the Optical Character Recognition of Medieval Texts. In: Horák, A., Rychlý, P., Rambousek, A. (eds.) '' Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2021'' . pp. 91–100. ISSN 2336-4289. ISBN 978-80-263-1600-8. Tribun EU (2021). Available also from WWW: https://nlp.fi.muni.cz/raslan/2021/paper10.pdf |