Změny mezi verzí 25 a verzí 26 u NerDataset


Ignorovat:
Časová značka:
5. 1. 2023 16:50:15 (před 19 měsíci)
Autor:
xnovot32@fi.muni.cz
Komentář:

--

Vysvětlivky:

Nezměněno
Přidáno
Odstraněno
Změněno
  • NerDataset

    v25 v26  
    122122||=dataset_ner_manatee_non-crossing_only-relevant_testing_401-500 =|| 38.5 kB|| 100|| 4,507|| 110|| 55|| 55|| 2,449||
    123123
    124  * The archive [https://nlp.fi.muni.cz/projekty/ahisto/named-entity-recognition-annotations-large.zip named-entity-recognition-annotations-large.zip] (1.3 GB) contains 16 tuples of files named `*.sentences.txt` and `.ner_tags.txt`. These files contain sentences and NER tags for supervised training, validation, and testing of language models.[[BR]]Here are the four variables that we used to produce the different files:
     124 * The archive [https://nlp.fi.muni.cz/projekty/ahisto/named-entity-recognition-annotations-large.zip named-entity-recognition-annotations-large.zip] (1.31 GB) contains 16 tuples of files named `*.sentences.txt` and `.ner_tags.txt`. These files contain sentences and NER tags for supervised training, validation, and testing of language models.[[BR]]Here are the four variables that we used to produce the different files:
    125125   1. The sentences are extracted from book OCR texts and may therefore span several pages.[[BR]]However, page boundaries contain pollutants such as running heads, footnotes, and page numbers.[[BR]]We either allow the sentences to cross page boundaries (`all`) or not (`non-crossing`).
    126126   1. The sentences come from all book pages (`all`) or just those considered relevant by human annotators (`only-relevant`).