Context Navigation

NamedEntityRecognition

Timestamp:: Oct 2, 2019, 10:46:49 AM (6 years ago)
Author:: Zuzana Nevěřilová
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

private/NlpInPracticeCourse/NamedEntityRecognition

-                      v20
+                      v21
 In this workshop, we train a new NER application for the Czech language. We work with free resources & software tools: the Czech NE Corpus (CNEC) and the Stanford NER application.
 Requirements: Java 8, python, gigabytes of memory, [raw-attachment:convert_cnec_stanford.py:wiki:en/AdvancedNlpCourse/NamedEntityRecognition convert_cnec_stanford.py], [raw-attachment:named_ent_dtest_unknown.tsv:wiki:en/AdvancedNlpCourse/NamedEntityRecognition named_ent_dtest_unknown.tsv], [raw-attachment:cnec.prop:wiki:en/AdvancedNlpCourse/NamedEntityRecognition cnec.prop]
+Requirements: Java 8, python, gigabytes of memory, [raw-attachment:convert_cnec_stanford.py:wiki:en/AdvancedNlpCourse/NamedEntityRecognition convert_cnec_stanford.py], [raw-attachment:get_unknown.py:wiki:en/AdvancedNlpCourse/NamedEntityRecognition get_unknown.py], [raw-attachment:cnec.prop:wiki:en/AdvancedNlpCourse/NamedEntityRecognition cnec.prop]
 . Create `<YOUR_FILE>`, a text file named `ia161-UCO-04.txt` where ''UCO'' is your university ID.
 …
 }}}
  In the output, the first column is the input tokens, the second column is the correct (gold) answers. Observe the differences. Copy the training result to `<YOUR_FILE>`. Try to estimate in how many cases the model missed an entity, detected incorrectly the boundaries, or classified an entity incorrectly.
 . evaluate the model on `dtest` with only NEs that are not present in the train data:
+. evaluate the model on `dtest` with only NEs that are not present in the train data. First, you need to filter out only those documents that do not contain NERs from the training data. Use the script `get_uknown.py`, then run the NER:
  {{{
 java -cp stanford-ner-2018-10-16/stanford-ner.jar \