Kontextová navigace

Změny mezi verzí 25 a verzí 26 u NerDataset

v25	v26
122	122	\|\|=dataset_ner_manatee_non-crossing_only-relevant_testing_401-500 =\|\| 38.5 kB\|\| 100\|\| 4,507\|\| 110\|\| 55\|\| 55\|\| 2,449\|\|
123	123
124		* The archive [https://nlp.fi.muni.cz/projekty/ahisto/named-entity-recognition-annotations-large.zip named-entity-recognition-annotations-large.zip] (1.3 GB) contains 16 tuples of files named `*.sentences.txt` and `.ner_tags.txt`. These files contain sentences and NER tags for supervised training, validation, and testing of language models.[[BR]]Here are the four variables that we used to produce the different files:
	124	* The archive [https://nlp.fi.muni.cz/projekty/ahisto/named-entity-recognition-annotations-large.zip named-entity-recognition-annotations-large.zip] (1.31 GB) contains 16 tuples of files named `*.sentences.txt` and `.ner_tags.txt`. These files contain sentences and NER tags for supervised training, validation, and testing of language models.[[BR]]Here are the four variables that we used to produce the different files:
125	125	1. The sentences are extracted from book OCR texts and may therefore span several pages.[[BR]]However, page boundaries contain pollutants such as running heads, footnotes, and page numbers.[[BR]]We either allow the sentences to cross page boundaries (`all`) or not (`non-crossing`).
126	126	1. The sentences come from all book pages (`all`) or just those considered relevant by human annotators (`only-relevant`).