Změny mezi verzí 22 a verzí 23 u NerDataset


Ignorovat:
Časová značka:
15. 12. 2022 12:09:35 (před 19 měsíci)
Autor:
xnovot32@fi.muni.cz
Komentář:

--

Vysvětlivky:

Nezměněno
Přidáno
Odstraněno
Změněno
  • NerDataset

    v22 v23  
    1515
    1616|| ||= file size =||= # sentences =||= # tokens =||= # types =||
    17 ||=dataset_mlm_all_all_training =|| 630.7 MB|| 3228077|| 96556612|| 6198957||
    18 ||=dataset_mlm_non-crossing_all_training =|| 524.1 MB|| 3009931|| 80220907|| 5362515||
    19 ||=dataset_mlm_all_all_validation =|| 81.8 MB|| 402184|| 12374044|| 1273737||
    20 ||=dataset_mlm_non-crossing_all_validation =|| 67.3 MB|| 372885|| 10157799|| 1105583||
    21 ||=dataset_mlm_all_only-relevant_training =|| 8.1 MB|| 47958|| 1286573|| 181845||
    22 ||=dataset_mlm_non-crossing_only-relevant_training =|| 6.7 MB|| 44278|| 1074734|| 157354||
    23 ||=dataset_mlm_all_only-relevant_validation =|| 736.7 kB|| 2791|| 108364|| 26986||
    24 ||=dataset_mlm_non-crossing_only-relevant_validation =|| 549.4 kB|| 2489|| 81293|| 22090||
     17||=dataset_mlm_all_all_training =|| 630.7 MB|| 3,228,077|| 96,556,612|| 6,198,957||
     18||=dataset_mlm_non-crossing_all_training =|| 524.1 MB|| 3,009,931|| 80,220,907|| 5,362,515||
     19||=dataset_mlm_all_all_validation =|| 81.8 MB|| 402,184|| 12,374,044|| 1,273,737||
     20||=dataset_mlm_non-crossing_all_validation =|| 67.3 MB|| 372,885|| 10,157,799|| 1,105,583||
     21||=dataset_mlm_all_only-relevant_training =|| 8.1 MB|| 47,958|| 1,286,573|| 181,845||
     22||=dataset_mlm_non-crossing_only-relevant_training =|| 6.7 MB|| 44,278|| 1,074,734|| 157,354||
     23||=dataset_mlm_all_only-relevant_validation =|| 736.7 kB|| 2,791|| 108,364|| 26,986||
     24||=dataset_mlm_non-crossing_only-relevant_validation =|| 549.4 kB|| 2,489|| 81,293|| 22,090||
    2525
    2626 * The archive [https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-4936/named-entity-recognition-annotations.zip?sequence=2&isAllowed=y named-entity-recognition-annotations.zip] (978.29 MB) contains 82 tuples of files named `*.sentences.txt`, `.ner_tags.txt`, and in one case also `.docx`.^1^[[BR]]These files contain sentences and NER tags for supervised training, validation, and testing of language models.[[BR]]Here are the five variables that we used to produce the different files:
     
    3636
    3737|| ||= file size =||= # sentences =||= # tokens =||= # B-* tags =||= # B-PER tags =||= # B-LOC tags =||= # types =||
    38 ||=dataset_ner_fuzzy-regex_all_all_training_automatically_tagged =|| 230.4 MB|| 407395|| 24585832|| 2669582|| 1403789|| 1265793|| 2420836||
    39 ||=dataset_ner_fuzzy-regex+regests_all_all_training_automatically_tagged =|| 231.6 MB|| 411715|| 24735069|| 2640803|| 1378804|| 1261999|| 2427135||
    40 ||=dataset_ner_fuzzy-regex+regests_non-crossing_all_training_automatically_tagged =|| 164.4 MB|| 353301|| 17387149|| 2065805|| 1100245|| 965560|| 1850210||
    41 ||=dataset_ner_fuzzy-regex_non-crossing_all_training_automatically_tagged =|| 162.9 MB|| 348981|| 17237912|| 2049537|| 1089768|| 959769|| 1843163||
    42 ||=dataset_ner_manatee+regests_all_all_training_automatically_tagged =|| 95.4 MB|| 158759|| 10155332|| 1175031|| 563912|| 611119|| 1267107||
    43 ||=dataset_ner_manatee_all_all_training_automatically_tagged =|| 93.8 MB|| 154439|| 10006095|| 1158763|| 553435|| 605328|| 1258983||
    44 ||=dataset_ner_manatee+regests_non-crossing_all_training_automatically_tagged =|| 64.5 MB|| 134909|| 6795014|| 870613|| 423345|| 447268|| 932654||
    45 ||=dataset_ner_manatee_non-crossing_all_training_automatically_tagged =|| 63.0 MB|| 130589|| 6645777|| 854345|| 412868|| 441477|| 923554||
    46 ||=dataset_ner_fuzzy-regex+regests_all_all_validation_automatically_tagged =|| 58.3 MB|| 81651|| 6211198|| 685020|| 356017|| 329003|| 910379||
    47 ||=dataset_ner_fuzzy-regex_all_all_validation_automatically_tagged =|| 58.1 MB|| 81149|| 6193356|| 682993|| 354671|| 328322|| 908885||
    48 ||=dataset_ner_fuzzy-regex+regests_all_all_training =|| 218.0 MB|| 411715|| 24735069|| 606807|| 290530|| 316277|| 2427135||
    49 ||=dataset_ner_fuzzy-regex_all_all_training =|| 217.7 MB|| 407395|| 24585832|| 592822|| 281497|| 311325|| 2420836||
    50 ||=dataset_ner_fuzzy-regex+regests_non-crossing_all_training =|| 153.8 MB|| 353301|| 17387149|| 494302|| 238381|| 255921|| 1850210||
    51 ||=dataset_ner_fuzzy-regex+regests_non-crossing_all_validation_automatically_tagged =|| 37.9 MB|| 67971|| 3989670|| 487724|| 259777|| 227947|| 651387||
    52 ||=dataset_ner_fuzzy-regex_non-crossing_all_validation_automatically_tagged =|| 37.7 MB|| 67469|| 3971828|| 485697|| 258431|| 227266|| 649698||
    53 ||=dataset_ner_fuzzy-regex_non-crossing_all_training =|| 153.1 MB|| 348981|| 17237912|| 480318|| 229349|| 250969|| 1843163||
    54 ||=dataset_ner_manatee+regests_all_all_validation_automatically_tagged =|| 21.0 MB|| 28727|| 2249037|| 261612|| 120358|| 141254|| 427057||
    55 ||=dataset_ner_manatee_all_all_validation_automatically_tagged =|| 20.8 MB|| 28225|| 2231195|| 259585|| 119012|| 140573|| 425088||
    56 ||=dataset_ner_manatee+regests_all_all_training =|| 88.9 MB|| 158759|| 10155332|| 214566|| 79924|| 134642|| 1267107||
    57 ||=dataset_ner_manatee_all_all_training =|| 87.9 MB|| 154439|| 10006095|| 200582|| 70892|| 129690|| 1258983||
    58 ||=dataset_ner_manatee+regests_non-crossing_all_validation_automatically_tagged =|| 12.8 MB|| 23643|| 1348859|| 176809|| 83699|| 93110|| 293119||
    59 ||=dataset_ner_manatee+regests_non-crossing_all_training =|| 59.8 MB|| 134909|| 6795014|| 174902|| 65897|| 109005|| 932654||
    60 ||=dataset_ner_manatee_non-crossing_all_validation_automatically_tagged =|| 12.6 MB|| 23141|| 1331017|| 174782|| 82353|| 92429|| 290894||
    61 ||=dataset_ner_manatee_non-crossing_all_training =|| 58.6 MB|| 130589|| 6645777|| 160918|| 56865|| 104053|| 923554||
    62 ||=dataset_ner_fuzzy-regex+regests_all_all_validation =|| 54.2 MB|| 81651|| 6211198|| 92485|| 46038|| 46447|| 910379||
    63 ||=dataset_ner_fuzzy-regex_all_all_testing =|| 54.2 MB|| 80929|| 6167375|| 90747|| 45176|| 45571|| 908276||
    64 ||=dataset_ner_fuzzy-regex_all_all_validation =|| 54.4 MB|| 81149|| 6193356|| 90719|| 44878|| 45841|| 908885||
    65 ||=dataset_ner_fuzzy-regex+regests_non-crossing_all_validation =|| 35.0 MB|| 67971|| 3989670|| 75207|| 37496|| 37711|| 651387||
    66 ||=dataset_ner_fuzzy-regex+regests_all_only-relevant_training_automatically_tagged =|| 6.6 MB|| 14942|| 694242|| 73757|| 41838|| 31919|| 119272||
    67 ||=dataset_ner_fuzzy-regex_non-crossing_all_testing =|| 34.8 MB|| 67208|| 3938611|| 73476|| 36506|| 36970|| 644220||
    68 ||=dataset_ner_fuzzy-regex_non-crossing_all_validation =|| 35.1 MB|| 67469|| 3971828|| 73441|| 36336|| 37105|| 649698||
    69 ||=dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_training_automatically_tagged =|| 5.3 MB|| 13456|| 548928|| 61522|| 35007|| 26515|| 99275||
    70 ||=dataset_ner_fuzzy-regex_all_only-relevant_training_automatically_tagged =|| 5.1 MB|| 10622|| 545005|| 57489|| 31361|| 26128|| 98843||
    71 ||=dataset_ner_manatee+regests_all_only-relevant_training_automatically_tagged =|| 4.6 MB|| 11813|| 490147|| 51653|| 28315|| 23338|| 88535||
    72 ||=dataset_ner_fuzzy-regex_non-crossing_only-relevant_training_automatically_tagged =|| 3.7 MB|| 9136|| 399691|| 45254|| 24530|| 20724|| 77963||
    73 ||=dataset_ner_manatee+regests_non-crossing_only-relevant_training_automatically_tagged =|| 3.8 MB|| 10813|| 401164|| 44213|| 24435|| 19778|| 74376||
    74 ||=dataset_ner_manatee_all_only-relevant_training_automatically_tagged =|| 3.1 MB|| 7493|| 340910|| 34247|| 17193|| 17054|| 66659||
    75 ||=dataset_ner_manatee+regests_all_all_validation =|| 19.5 MB|| 28727|| 2249037|| 32546|| 12999|| 19547|| 427057||
    76 ||=dataset_ner_manatee_all_all_testing =|| 19.9 MB|| 29516|| 2279822|| 32234|| 12555|| 19679|| 437414||
    77 ||=dataset_ner_manatee_all_all_validation =|| 19.4 MB|| 28225|| 2231195|| 30780|| 11839|| 18941|| 425088||
    78 ||=dataset_ner_fuzzy-regex+regests_all_only-relevant_training =|| 6.3 MB|| 14942|| 694242|| 30455|| 19214|| 11241|| 119272||
    79 ||=dataset_ner_manatee_non-crossing_only-relevant_training_automatically_tagged =|| 2.3 MB|| 6493|| 251927|| 27945|| 13958|| 13987|| 51600||
    80 ||=dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_training =|| 5.0 MB|| 13456|| 548928|| 27324|| 17257|| 10067|| 99275||
    81 ||=dataset_ner_manatee+regests_non-crossing_all_validation =|| 11.8 MB|| 23643|| 1348859|| 26287|| 10498|| 15789|| 293119||
    82 ||=dataset_ner_manatee_non-crossing_all_testing =|| 12.2 MB|| 24420|| 1384547|| 25937|| 10068|| 15869|| 300862||
    83 ||=dataset_ner_manatee_non-crossing_all_validation =|| 11.7 MB|| 23141|| 1331017|| 24521|| 9338|| 15183|| 290894||
    84 ||=dataset_ner_manatee+regests_all_only-relevant_training =|| 4.4 MB|| 11813|| 490147|| 24212|| 13626|| 10586|| 88535||
    85 ||=dataset_ner_manatee+regests_non-crossing_only-relevant_training =|| 3.7 MB|| 10813|| 401164|| 22583|| 12909|| 9674|| 74376||
    86 ||=dataset_ner_fuzzy-regex+regests_all_only-relevant_validation_automatically_tagged =|| 1.5 MB|| 2776|| 158548|| 16901|| 9936|| 6965|| 44018||
    87 ||=dataset_ner_fuzzy-regex_all_only-relevant_training =|| 4.8 MB|| 10622|| 545005|| 16471|| 10182|| 6289|| 98843||
    88 ||=dataset_ner_regests_training_automatically_tagged =|| 1.5 MB|| 4320|| 149237|| 16268|| 10477|| 5791|| 29166||
    89 ||=dataset_ner_fuzzy-regex_all_only-relevant_validation_automatically_tagged =|| 1.3 MB|| 2274|| 140706|| 14874|| 8590|| 6284|| 39612||
    90 ||=dataset_ner_regests_training =|| 1.5 MB|| 4320|| 149237|| 13984|| 9032|| 4952|| 29166||
    91 ||=dataset_ner_fuzzy-regex_non-crossing_only-relevant_training =|| 3.5 MB|| 9136|| 399691|| 13340|| 8225|| 5115|| 77963||
    92 ||=dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_validation_automatically_tagged =|| 1.1 MB|| 2420|| 110376|| 12902|| 7592|| 5310|| 33352||
    93 ||=dataset_ner_fuzzy-regex_non-crossing_only-relevant_validation_automatically_tagged =|| 885.1 kB|| 1918|| 92534|| 10875|| 6246|| 4629|| 28676||
    94 ||=dataset_ner_manatee_all_only-relevant_training =|| 2.9 MB|| 7493|| 340910|| 10228|| 4594|| 5634|| 66659||
    95 ||=dataset_ner_manatee+regests_all_only-relevant_validation_automatically_tagged =|| 913.3 kB|| 1972|| 97069|| 10180|| 5592|| 4588|| 28324||
    96 ||=dataset_ner_manatee_non-crossing_only-relevant_training =|| 2.2 MB|| 6493|| 251927|| 8599|| 3877|| 4722|| 51600||
    97 ||=dataset_ner_manatee_all_only-relevant_validation_automatically_tagged =|| 730.1 kB|| 1470|| 79227|| 8153|| 4246|| 3907|| 23569||
    98 ||=dataset_ner_manatee+regests_non-crossing_only-relevant_validation_automatically_tagged =|| 683.4 kB|| 1751|| 71948|| 8136|| 4501|| 3635|| 22133||
    99 ||=dataset_ner_manatee_non-crossing_only-relevant_validation_automatically_tagged =|| 500.3 kB|| 1249|| 54106|| 6109|| 3155|| 2954|| 17138||
    100 ||=dataset_ner_fuzzy-regex+regests_all_only-relevant_validation =|| 1.4 MB|| 2776|| 158548|| 4421|| 2817|| 1604|| 44018||
    101 ||=dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_validation =|| 998.7 kB|| 2420|| 110376|| 3938|| 2519|| 1419|| 33352||
    102 ||=dataset_ner_manatee+regests_all_only-relevant_validation =|| 862.5 kB|| 1972|| 97069|| 3347|| 1887|| 1460|| 28324||
    103 ||=dataset_ner_manatee+regests_non-crossing_only-relevant_validation =|| 646.8 kB|| 1751|| 71948|| 3094|| 1774|| 1320|| 22133||
    104 ||=dataset_ner_fuzzy-regex_all_only-relevant_testing =|| 1.3 MB|| 2405|| 144684|| 2780|| 1784|| 996|| 39977||
    105 ||=dataset_ner_fuzzy-regex_all_only-relevant_validation =|| 1.2 MB|| 2274|| 140706|| 2655|| 1657|| 998|| 39612||
    106 ||=dataset_ner_fuzzy-regex_non-crossing_only-relevant_testing =|| 867.0 kB|| 2034|| 98659|| 2292|| 1455|| 837|| 29874||
    107 ||=dataset_ner_regests_testing =|| 261.7 kB|| 799|| 26148|| 2182|| 1422|| 760|| 8978||
    108 ||=dataset_ner_fuzzy-regex_non-crossing_only-relevant_validation =|| 818.4 kB|| 1918|| 92534|| 2172|| 1359|| 813|| 28676||
    109 ||=dataset_ner_regests_validation_automatically_tagged =|| 183.1 kB|| 502|| 17842|| 2027|| 1346|| 681|| 6445||
    110 ||=dataset_ner_regests_validation =|| 181.7 kB|| 502|| 17842|| 1766|| 1160|| 606|| 6445||
    111 ||=dataset_ner_manatee_all_only-relevant_validation =|| 681.8 kB|| 1470|| 79227|| 1581|| 727|| 854|| 23569||
    112 ||=dataset_ner_manatee_all_only-relevant_testing =|| 678.8 kB|| 1420|| 78751|| 1529|| 695|| 834|| 23949||
    113 ||=dataset_ner_manatee_non-crossing_only-relevant_validation =|| 465.9 kB|| 1249|| 54106|| 1328|| 614|| 714|| 17138||
    114 ||=dataset_ner_manatee_non-crossing_only-relevant_testing =|| 469.1 kB|| 1208|| 54391|| 1283|| 587|| 696|| 17713||
    115 ||=dataset_ner_regests_testing_001-400 =|| 129.8 kB|| 400|| 12811|| 1164|| 789|| 375|| 5121||
    116 ||=dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_tagged =|| 41.6 kB|| 100|| 4507|| 530|| 287|| 243|| 2449||
    117 ||=dataset_ner_manatee_non-crossing_only-relevant_testing_001-400 =|| 169.0 kB|| 400|| 19554|| 439|| 201|| 238|| 7928||
    118 ||=dataset_ner_manatee_non-crossing_only-relevant_testing_401-500 =|| 38.5 kB|| 100|| 4507|| 110|| 55|| 55|| 2449||
     38||=dataset_ner_fuzzy-regex_all_all_training_automatically_tagged =|| 230.4 MB|| 407,395|| 24,585,832|| 2,669,582|| 1,403,789|| 1,265,793|| 2,420,836||
     39||=dataset_ner_fuzzy-regex+regests_all_all_training_automatically_tagged =|| 231.6 MB|| 411,715|| 24,735,069|| 2,640,803|| 1,378,804|| 1,261,999|| 2,427,135||
     40||=dataset_ner_fuzzy-regex+regests_non-crossing_all_training_automatically_tagged =|| 164.4 MB|| 353,301|| 17,387,149|| 2,065,805|| 1,100,245|| 965,560|| 1,850,210||
     41||=dataset_ner_fuzzy-regex_non-crossing_all_training_automatically_tagged =|| 162.9 MB|| 348,981|| 17,237,912|| 2,049,537|| 1,089,768|| 959,769|| 1,843,163||
     42||=dataset_ner_manatee+regests_all_all_training_automatically_tagged =|| 95.4 MB|| 158,759|| 10,155,332|| 1,175,031|| 563,912|| 611,119|| 1,267,107||
     43||=dataset_ner_manatee_all_all_training_automatically_tagged =|| 93.8 MB|| 154,439|| 10,006,095|| 1,158,763|| 553,435|| 605,328|| 1,258,983||
     44||=dataset_ner_manatee+regests_non-crossing_all_training_automatically_tagged =|| 64.5 MB|| 134,909|| 6,795,014|| 870,613|| 423,345|| 447,268|| 932,654||
     45||=dataset_ner_manatee_non-crossing_all_training_automatically_tagged =|| 63.0 MB|| 130,589|| 6,645,777|| 854,345|| 412,868|| 441,477|| 923,554||
     46||=dataset_ner_fuzzy-regex+regests_all_all_validation_automatically_tagged =|| 58.3 MB|| 81,651|| 6,211,198|| 685,020|| 356,017|| 329,003|| 910,379||
     47||=dataset_ner_fuzzy-regex_all_all_validation_automatically_tagged =|| 58.1 MB|| 81,149|| 6,193,356|| 682,993|| 354,671|| 328,322|| 908,885||
     48||=dataset_ner_fuzzy-regex+regests_all_all_training =|| 218.0 MB|| 411,715|| 24,735,069|| 606,807|| 290,530|| 316,277|| 2,427,135||
     49||=dataset_ner_fuzzy-regex_all_all_training =|| 217.7 MB|| 407,395|| 24,585,832|| 592,822|| 281,497|| 311,325|| 2,420,836||
     50||=dataset_ner_fuzzy-regex+regests_non-crossing_all_training =|| 153.8 MB|| 353,301|| 17,387,149|| 494,302|| 238,381|| 255,921|| 1,850,210||
     51||=dataset_ner_fuzzy-regex+regests_non-crossing_all_validation_automatically_tagged =|| 37.9 MB|| 67,971|| 3,989,670|| 487,724|| 259,777|| 227,947|| 651,387||
     52||=dataset_ner_fuzzy-regex_non-crossing_all_validation_automatically_tagged =|| 37.7 MB|| 67,469|| 3,971,828|| 485,697|| 258,431|| 227,266|| 649,698||
     53||=dataset_ner_fuzzy-regex_non-crossing_all_training =|| 153.1 MB|| 348,981|| 17,237,912|| 480,318|| 229,349|| 250,969|| 1,843,163||
     54||=dataset_ner_manatee+regests_all_all_validation_automatically_tagged =|| 21.0 MB|| 28,727|| 2,249,037|| 261,612|| 120,358|| 141,254|| 427,057||
     55||=dataset_ner_manatee_all_all_validation_automatically_tagged =|| 20.8 MB|| 28,225|| 2,231,195|| 259,585|| 119,012|| 140,573|| 425,088||
     56||=dataset_ner_manatee+regests_all_all_training =|| 88.9 MB|| 158,759|| 10,155,332|| 214,566|| 79,924|| 134,642|| 1,267,107||
     57||=dataset_ner_manatee_all_all_training =|| 87.9 MB|| 154,439|| 10,006,095|| 200,582|| 70,892|| 129,690|| 1,258,983||
     58||=dataset_ner_manatee+regests_non-crossing_all_validation_automatically_tagged =|| 12.8 MB|| 23,643|| 1,348,859|| 176,809|| 83,699|| 93,110|| 293,119||
     59||=dataset_ner_manatee+regests_non-crossing_all_training =|| 59.8 MB|| 134,909|| 6,795,014|| 174,902|| 65,897|| 109,005|| 932,654||
     60||=dataset_ner_manatee_non-crossing_all_validation_automatically_tagged =|| 12.6 MB|| 23,141|| 1,331,017|| 174,782|| 82,353|| 92,429|| 290,894||
     61||=dataset_ner_manatee_non-crossing_all_training =|| 58.6 MB|| 130,589|| 6,645,777|| 160,918|| 56,865|| 104,053|| 923,554||
     62||=dataset_ner_fuzzy-regex+regests_all_all_validation =|| 54.2 MB|| 81,651|| 6,211,198|| 92,485|| 46,038|| 46,447|| 910,379||
     63||=dataset_ner_fuzzy-regex_all_all_testing =|| 54.2 MB|| 80,929|| 6,167,375|| 90,747|| 45,176|| 45,571|| 908,276||
     64||=dataset_ner_fuzzy-regex_all_all_validation =|| 54.4 MB|| 81,149|| 6,193,356|| 90,719|| 44,878|| 45,841|| 908,885||
     65||=dataset_ner_fuzzy-regex+regests_non-crossing_all_validation =|| 35.0 MB|| 67,971|| 3,989,670|| 75,207|| 37,496|| 37,711|| 651,387||
     66||=dataset_ner_fuzzy-regex+regests_all_only-relevant_training_automatically_tagged =|| 6.6 MB|| 14,942|| 694,242|| 73,757|| 41,838|| 31,919|| 119,272||
     67||=dataset_ner_fuzzy-regex_non-crossing_all_testing =|| 34.8 MB|| 67,208|| 3,938,611|| 73,476|| 36,506|| 36,970|| 644,220||
     68||=dataset_ner_fuzzy-regex_non-crossing_all_validation =|| 35.1 MB|| 67,469|| 3,971,828|| 73,441|| 36,336|| 37,105|| 649,698||
     69||=dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_training_automatically_tagged =|| 5.3 MB|| 13,456|| 548,928|| 61,522|| 35,007|| 26,515|| 99,275||
     70||=dataset_ner_fuzzy-regex_all_only-relevant_training_automatically_tagged =|| 5.1 MB|| 10,622|| 545,005|| 57,489|| 31,361|| 26,128|| 98,843||
     71||=dataset_ner_manatee+regests_all_only-relevant_training_automatically_tagged =|| 4.6 MB|| 11,813|| 490,147|| 51,653|| 28,315|| 23,338|| 88,535||
     72||=dataset_ner_fuzzy-regex_non-crossing_only-relevant_training_automatically_tagged =|| 3.7 MB|| 9,136|| 399,691|| 45,254|| 24,530|| 20,724|| 77,963||
     73||=dataset_ner_manatee+regests_non-crossing_only-relevant_training_automatically_tagged =|| 3.8 MB|| 10,813|| 401,164|| 44,213|| 24,435|| 19,778|| 74,376||
     74||=dataset_ner_manatee_all_only-relevant_training_automatically_tagged =|| 3.1 MB|| 7,493|| 340,910|| 34,247|| 17,193|| 17,054|| 66,659||
     75||=dataset_ner_manatee+regests_all_all_validation =|| 19.5 MB|| 28,727|| 2,249,037|| 32,546|| 12,999|| 19,547|| 427,057||
     76||=dataset_ner_manatee_all_all_testing =|| 19.9 MB|| 29,516|| 2,279,822|| 32,234|| 12,555|| 19,679|| 437,414||
     77||=dataset_ner_manatee_all_all_validation =|| 19.4 MB|| 28,225|| 2,231,195|| 30,780|| 11,839|| 18,941|| 425,088||
     78||=dataset_ner_fuzzy-regex+regests_all_only-relevant_training =|| 6.3 MB|| 14,942|| 694,242|| 30,455|| 19,214|| 11,241|| 119,272||
     79||=dataset_ner_manatee_non-crossing_only-relevant_training_automatically_tagged =|| 2.3 MB|| 6,493|| 251,927|| 27,945|| 13,958|| 13,987|| 51,600||
     80||=dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_training =|| 5.0 MB|| 13,456|| 548,928|| 27,324|| 17,257|| 10,067|| 99,275||
     81||=dataset_ner_manatee+regests_non-crossing_all_validation =|| 11.8 MB|| 23,643|| 1,348,859|| 26,287|| 10,498|| 15,789|| 293,119||
     82||=dataset_ner_manatee_non-crossing_all_testing =|| 12.2 MB|| 24,420|| 1,384,547|| 25,937|| 10,068|| 15,869|| 300,862||
     83||=dataset_ner_manatee_non-crossing_all_validation =|| 11.7 MB|| 23,141|| 1,331,017|| 24,521|| 9,338|| 15,183|| 290,894||
     84||=dataset_ner_manatee+regests_all_only-relevant_training =|| 4.4 MB|| 11,813|| 490,147|| 24,212|| 13,626|| 10,586|| 88,535||
     85||=dataset_ner_manatee+regests_non-crossing_only-relevant_training =|| 3.7 MB|| 10,813|| 401,164|| 22,583|| 12,909|| 9,674|| 74,376||
     86||=dataset_ner_fuzzy-regex+regests_all_only-relevant_validation_automatically_tagged =|| 1.5 MB|| 2,776|| 158,548|| 16,901|| 9,936|| 6,965|| 44,018||
     87||=dataset_ner_fuzzy-regex_all_only-relevant_training =|| 4.8 MB|| 10,622|| 545,005|| 16,471|| 10,182|| 6,289|| 98,843||
     88||=dataset_ner_regests_training_automatically_tagged =|| 1.5 MB|| 4,320|| 149,237|| 16,268|| 10,477|| 5,791|| 29,166||
     89||=dataset_ner_fuzzy-regex_all_only-relevant_validation_automatically_tagged =|| 1.3 MB|| 2,274|| 140,706|| 14,874|| 8,590|| 6,284|| 39,612||
     90||=dataset_ner_regests_training =|| 1.5 MB|| 4,320|| 149,237|| 13,984|| 9,032|| 4,952|| 29,166||
     91||=dataset_ner_fuzzy-regex_non-crossing_only-relevant_training =|| 3.5 MB|| 9,136|| 399,691|| 13,340|| 8,225|| 5,115|| 77,963||
     92||=dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_validation_automatically_tagged =|| 1.1 MB|| 2,420|| 110,376|| 12,902|| 7,592|| 5,310|| 33,352||
     93||=dataset_ner_fuzzy-regex_non-crossing_only-relevant_validation_automatically_tagged =|| 885.1 kB|| 1,918|| 92,534|| 10,875|| 6,246|| 4,629|| 28,676||
     94||=dataset_ner_manatee_all_only-relevant_training =|| 2.9 MB|| 7,493|| 340,910|| 10,228|| 4,594|| 5,634|| 66,659||
     95||=dataset_ner_manatee+regests_all_only-relevant_validation_automatically_tagged =|| 913.3 kB|| 1,972|| 97,069|| 10,180|| 5,592|| 4,588|| 28,324||
     96||=dataset_ner_manatee_non-crossing_only-relevant_training =|| 2.2 MB|| 6,493|| 251,927|| 8,599|| 3,877|| 4,722|| 51,600||
     97||=dataset_ner_manatee_all_only-relevant_validation_automatically_tagged =|| 730.1 kB|| 1,470|| 79,227|| 8,153|| 4,246|| 3,907|| 23,569||
     98||=dataset_ner_manatee+regests_non-crossing_only-relevant_validation_automatically_tagged =|| 683.4 kB|| 1,751|| 71,948|| 8,136|| 4,501|| 3,635|| 22,133||
     99||=dataset_ner_manatee_non-crossing_only-relevant_validation_automatically_tagged =|| 500.3 kB|| 1,249|| 54,106|| 6,109|| 3,155|| 2,954|| 17,138||
     100||=dataset_ner_fuzzy-regex+regests_all_only-relevant_validation =|| 1.4 MB|| 2,776|| 158,548|| 4,421|| 2,817|| 1,604|| 44,018||
     101||=dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_validation =|| 998.7 kB|| 2,420|| 110,376|| 3,938|| 2,519|| 1,419|| 33,352||
     102||=dataset_ner_manatee+regests_all_only-relevant_validation =|| 862.5 kB|| 1,972|| 97,069|| 3,347|| 1,887|| 1,460|| 28,324||
     103||=dataset_ner_manatee+regests_non-crossing_only-relevant_validation =|| 646.8 kB|| 1,751|| 71,948|| 3,094|| 1,774|| 1,320|| 22,133||
     104||=dataset_ner_fuzzy-regex_all_only-relevant_testing =|| 1.3 MB|| 2,405|| 144,684|| 2,780|| 1,784|| 996|| 39,977||
     105||=dataset_ner_fuzzy-regex_all_only-relevant_validation =|| 1.2 MB|| 2,274|| 140,706|| 2,655|| 1,657|| 998|| 39,612||
     106||=dataset_ner_fuzzy-regex_non-crossing_only-relevant_testing =|| 867.0 kB|| 2,034|| 98,659|| 2,292|| 1,455|| 837|| 29,874||
     107||=dataset_ner_regests_testing =|| 261.7 kB|| 799|| 26,148|| 2,182|| 1,422|| 760|| 8,978||
     108||=dataset_ner_fuzzy-regex_non-crossing_only-relevant_validation =|| 818.4 kB|| 1,918|| 92,534|| 2,172|| 1,359|| 813|| 28,676||
     109||=dataset_ner_regests_validation_automatically_tagged =|| 183.1 kB|| 502|| 17,842|| 2,027|| 1,346|| 681|| 6,445||
     110||=dataset_ner_regests_validation =|| 181.7 kB|| 502|| 17,842|| 1,766|| 1,160|| 606|| 6,445||
     111||=dataset_ner_manatee_all_only-relevant_validation =|| 681.8 kB|| 1,470|| 79,227|| 1,581|| 727|| 854|| 23,569||
     112||=dataset_ner_manatee_all_only-relevant_testing =|| 678.8 kB|| 1,420|| 78,751|| 1,529|| 695|| 834|| 23,949||
     113||=dataset_ner_manatee_non-crossing_only-relevant_validation =|| 465.9 kB|| 1,249|| 54,106|| 1,328|| 614|| 714|| 17,138||
     114||=dataset_ner_manatee_non-crossing_only-relevant_testing =|| 469.1 kB|| 1,208|| 54,391|| 1,283|| 587|| 696|| 17,713||
     115||=dataset_ner_regests_testing_001-400 =|| 129.8 kB|| 400|| 12,811|| 1,164|| 789|| 375|| 5,121||
     116||=dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_tagged =|| 41.6 kB|| 100|| 4,507|| 530|| 287|| 243|| 2,449||
     117||=dataset_ner_manatee_non-crossing_only-relevant_testing_001-400 =|| 169.0 kB|| 400|| 19,554|| 439|| 201|| 238|| 7,928||
     118||=dataset_ner_manatee_non-crossing_only-relevant_testing_401-500 =|| 38.5 kB|| 100|| 4,507|| 110|| 55|| 55|| 2,449||
    119119
    120120== Citing ==