Změny mezi verzí 41 a verzí 42 u NerDataset


Ignorovat:
Časová značka:
29. 5. 2023 9:13:20 (před 14 měsíci)
Autor:
xnovot32@fi.muni.cz
Komentář:

--

Vysvětlivky:

Nezměněno
Přidáno
Odstraněno
Změněno
  • NerDataset

    v41 v42  
    3333''^1 ^The `.docx` files were authored by human annotators and contain extra details missing from files `.sentences.txt` and `.ner_tags.txt`. The extra details include nested entities such as locations in person names (e.g. “Blažek z __Kralup__”) and people in location names (e.g. “Kostel __sv. Martina__”).''
    3434
    35 '''Table 2:''' Dataset statistics from the archive named-entity-recognition-annotations-small.zip, ordered by the number of B-* tags. In the article describing the dataset, the files `dataset_ner_regests_training_*` are referred to as '''Abstracts-Tiny''', the files `dataset_ner_manatee_non-crossing_only-relevant_*` are referred to as '''Books-Small''', and the files `dataset_ner_manatee_non-crossing_only-relevant_*_automatically_tagged` are referred to as '''Books-Medium'''.
     35'''Table 2:''' Dataset statistics from the archive named-entity-recognition-annotations-small.zip, ordered by the number of B-* tags. In the article describing the dataset, the files `dataset_ner_regests_training_*` are referred to as '''Abstracts-Tiny''', the files `dataset_ner_manatee_non-crossing_only-relevant_*` are referred to as '''Books-Small''', and the files `dataset_ner_manatee_non-crossing_only-relevant_*_automatically_tagged` are referred to as '''Books-Medium'''.
    3636
    3737|| ||= file size =||= # sentences =||= # tokens =||= # B-* tags =||= # B-PER tags =||= # B-LOC tags =||= # types =||
     
    125125   1. We use an ensemble of a baseline model and weak fourth-generation NER models (`004`) or the final seventh-generation NER model (`007`).
    126126
    127 '''Table 3:''' Dataset statistics from the archive named-entity-recognition-annotations-large.zip, ordered by the number of B-* tags. In the article describing the dataset, the files `dataset_mlm_non-crossing_only-relevant_*_automatically_tagged_007` are referred to as '''Books-Large''' and the files `dataset_mlm_all_all_training_automatically_tagged_007` are referred to as '''Books-Huge'''.
     127'''Table 3:''' Dataset statistics from the archive named-entity-recognition-annotations-large.zip, ordered by the number of B-* tags. In the article describing the dataset, the files `dataset_mlm_non-crossing_only-relevant_*_automatically_tagged_007` are referred to as '''Books-Large''' and the files `dataset_mlm_all_all_training_automatically_tagged_007` are referred to as '''Books-Huge'''.
    128128
    129129|| ||= file size =||= # sentences =||= # tokens =||= # B-* tags =||= # B-PER tags =||= # B-LOC tags =||= # types =||
     
    149149
    150150== Citing ==
    151 An article describing our dataset is currently under review. Preprint is available [mailto:witiko@mail.muni.cz on request].
     151An article describing our dataset is currently under review. Preprint is [https://arxiv.org/abs/2305.16718 available on ArXiv].