Changes between Initial Version and Version 1 of MedievalNamedEntities


Ignore:
Timestamp:
May 24, 2024, 11:30:41 PM (2 months ago)
Author:
xpopelk
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MedievalNamedEntities

    v1 v1  
     1= A Human-Annotated Dataset for Language Modeling and Named Entity Recognition in Medieval Documents
     2
     3== Description
     4
     5This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
     6
     7=== Example
     8
     9{{{
     10Král/B-PER Zikmund/I-PER dává/O Petrovi/B-PER z/I-PER Michalovic/I-PER ,/O
     11který/O mu/O prokazoval/O věrné/O služby/O a/O kterého/O chce/O Zikmund/B-PER
     12touto/O odměnou/O povzbuditi/O k/O ještě/O usilovnější/O službě/O ,/O
     13vesnici/O Předměřice/B-LOC nad/I-LOC Jizerou/I-LOC s/O alody/O ,/O
     14poplužími/O ,/O obdělávanými/O i/O neobdělávanými/O poli/O ,/O platy/O ,/O
     15službami/O ,/O robotami/O ,/O loukami/O ,/O pastvinami/O ,/O vodami/O ,/O
     16vodními/O toky/O ,/O mlýny/O ,/O všemi/O příjmy/O a/O vším/O
     17příslušenstvím/O ./O
     18}}}
     19
     20More information about the database can be found at the [https://nlp.fi.muni.cz/trac/ahisto/wiki/NerDataset AHISTO project page]
     21
     22== LINDAT handle
     23
     24http://hdl.handle.net/11234/1-5024
     25
     26== Acknowledgements
     27
     28If you use the system, please cite the related publication as well as the LINDAT/CLARIAH infrastructure: http://hdl.handle.net/11234/1-5024.
     29
     30Project code: LM2018101
     31
     32Project name: LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy
     33
     34=== TAČR
     35
     36Project code: TL03000365
     37
     38Project name: Accessible historical sources. Making medieval written documents available in the form of a contextual database
     39
     40
     41== Publication info
     42
     43- BANKOVIČ, Mikuláš, Vít NOVOTNÝ a Petr SOJKA. Application of Super-Resolution Models in Optical Character Recognition of Czech Medieval Texts. In Horák, Rychlý, Rambousek. Recent Advances in Slavonic Natural Language Processing (RASLAN 2021). Brno: Tribun EU, 2021, s. 11-18. ISBN 978-80-263-1670-1.
     44- Vít Novotný, Kristýna Seidlová, Tereza Vrabcová, Ales Horák: When Tesseract Brings Friends: Layout Analysis, Language Identification, and Super-Resolution in the Optical Character Recognition of Medieval Texts. 29-39
     45
     46If you cite the dataset, please use this citation:
     47
     48{{{
     49@inproceedings{DBLP:conf/raslan/NovotnySVH21,
     50  author       = {V{\'{\i}}t Novotn{\'{y}} and
     51                  Krist{\'{y}}na Seidlov{\'{a}} and
     52                  Tereza Vrabcov{\'{a}} and
     53                  Ales Hor{\'{a}}k},
     54  editor       = {Ales Hor{\'{a}}k and
     55                  Pavel Rychl{\'{y}} and
     56                  Adam Rambousek},
     57  title        = {When Tesseract Brings Friends: Layout Analysis, Language Identification,
     58                  and Super-Resolution in the Optical Character Recognition of Medieval
     59                  Texts},
     60  booktitle    = {The 15th Workshop on Recent Advances in Slavonic Natural Languages
     61                  Processing, {RASLAN} 2021, Karlova Studanka, Czech Republic, December
     62                  10-12, 2021},
     63  pages        = {29--39},
     64  publisher    = {Tribun {EU}},
     65  year         = {2021},
     66  url          = {http://nlp.fi.muni.cz/raslan/2021/paper10.pdf},
     67  timestamp    = {Tue, 18 Jan 2022 17:52:53 +0100},
     68  biburl       = {https://dblp.org/rec/conf/raslan/NovotnySVH21.bib},
     69  bibsource    = {dblp computer science bibliography, https://dblp.org}
     70}
     71}}}
     72
     73== License
     74
     75Public Domain Dedication (CC Zero)
     76