Změny mezi verzí 40 a verzí 41 u NerDataset
- Časová značka:
- 25. 5. 2023 22:31:54 (před 2 lety)
Vysvětlivky:
- Nezměněno
- Přidáno
- Odstraněno
- Změněno
-
NerDataset
v40 v41 33 33 ''^1 ^The `.docx` files were authored by human annotators and contain extra details missing from files `.sentences.txt` and `.ner_tags.txt`. The extra details include nested entities such as locations in person names (e.g. “Blažek z __Kralup__”) and people in location names (e.g. “Kostel __sv. Martina__”).'' 34 34 35 '''Table 2:''' Dataset statistics from the archive named-entity-recognition-annotations-small.zip, ordered by the number of B-* tags. In the article describing the dataset, the files dataset_ner_regests_training_* are referred to as '''Abstracts-Tiny''', the files dataset_ner_manatee_non-crossing_only-relevant_* are referred to as '''Books-Small''', and the files dataset_ner_manatee_non-crossing_only-relevant_*_automatically_taggedare referred to as '''Books-Medium'''.35 '''Table 2:''' Dataset statistics from the archive named-entity-recognition-annotations-small.zip, ordered by the number of B-* tags. In the article describing the dataset, the files `dataset_ner_regests_training_*` are referred to as '''Abstracts-Tiny''', the files `dataset_ner_manatee_non-crossing_only-relevant_*` are referred to as '''Books-Small''', and the files `dataset_ner_manatee_non-crossing_only-relevant_*_automatically_tagged` are referred to as '''Books-Medium'''. 36 36 37 37 || ||= file size =||= # sentences =||= # tokens =||= # B-* tags =||= # B-PER tags =||= # B-LOC tags =||= # types =|| … … 125 125 1. We use an ensemble of a baseline model and weak fourth-generation NER models (`004`) or the final seventh-generation NER model (`007`). 126 126 127 '''Table 3:''' Dataset statistics from the archive named-entity-recognition-annotations-large.zip, ordered by the number of B-* tags. In the article describing the dataset, the files dataset_mlm_non-crossing_only-relevant_*_automatically_tagged_007 are referred to as '''Books-Large''' and the files dataset_mlm_all_all_training_automatically_tagged_007are referred to as '''Books-Huge'''.127 '''Table 3:''' Dataset statistics from the archive named-entity-recognition-annotations-large.zip, ordered by the number of B-* tags. In the article describing the dataset, the files `dataset_mlm_non-crossing_only-relevant_*_automatically_tagged_007` are referred to as '''Books-Large''' and the files `dataset_mlm_all_all_training_automatically_tagged_007` are referred to as '''Books-Huge'''. 128 128 129 129 || ||= file size =||= # sentences =||= # tokens =||= # B-* tags =||= # B-PER tags =||= # B-LOC tags =||= # types =||