Changes between Version 20 and Version 21 of private/AdvancedNlpCourse/NamedEntityRecognition


Ignore:
Timestamp:
Oct 2, 2019, 10:46:49 AM (13 months ago)
Author:
Zuzana Nevěřilová
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/AdvancedNlpCourse/NamedEntityRecognition

    v20 v21  
    3636In this workshop, we train a new NER application for the Czech language. We work with free resources & software tools: the Czech NE Corpus (CNEC) and the Stanford NER application.
    3737
    38 Requirements: Java 8, python, gigabytes of memory, [raw-attachment:convert_cnec_stanford.py:wiki:en/AdvancedNlpCourse/NamedEntityRecognition convert_cnec_stanford.py], [raw-attachment:named_ent_dtest_unknown.tsv:wiki:en/AdvancedNlpCourse/NamedEntityRecognition named_ent_dtest_unknown.tsv], [raw-attachment:cnec.prop:wiki:en/AdvancedNlpCourse/NamedEntityRecognition cnec.prop]
     38Requirements: Java 8, python, gigabytes of memory, [raw-attachment:convert_cnec_stanford.py:wiki:en/AdvancedNlpCourse/NamedEntityRecognition convert_cnec_stanford.py], [raw-attachment:get_unknown.py:wiki:en/AdvancedNlpCourse/NamedEntityRecognition get_unknown.py], [raw-attachment:cnec.prop:wiki:en/AdvancedNlpCourse/NamedEntityRecognition cnec.prop]
    3939
    40401. Create `<YOUR_FILE>`, a text file named `ia161-UCO-04.txt` where ''UCO'' is your university ID.
     
    8484}}}
    8585 In the output, the first column is the input tokens, the second column is the correct (gold) answers. Observe the differences. Copy the training result to `<YOUR_FILE>`. Try to estimate in how many cases the model missed an entity, detected incorrectly the boundaries, or classified an entity incorrectly.
    86 10. evaluate the model on `dtest` with only NEs that are not present in the train data:
     8610. evaluate the model on `dtest` with only NEs that are not present in the train data. First, you need to filter out only those documents that do not contain NERs from the training data. Use the script `get_uknown.py`, then run the NER:
    8787 {{{
    8888java -cp stanford-ner-2018-10-16/stanford-ner.jar \