Changes between Version 20 and Version 21 of private/NlpInPracticeCourse/NamedEntityRecognition
- Timestamp:
- Oct 2, 2019, 10:46:49 AM (4 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/NamedEntityRecognition
v20 v21 36 36 In this workshop, we train a new NER application for the Czech language. We work with free resources & software tools: the Czech NE Corpus (CNEC) and the Stanford NER application. 37 37 38 Requirements: Java 8, python, gigabytes of memory, [raw-attachment:convert_cnec_stanford.py:wiki:en/AdvancedNlpCourse/NamedEntityRecognition convert_cnec_stanford.py], [raw-attachment: named_ent_dtest_unknown.tsv:wiki:en/AdvancedNlpCourse/NamedEntityRecognition named_ent_dtest_unknown.tsv], [raw-attachment:cnec.prop:wiki:en/AdvancedNlpCourse/NamedEntityRecognition cnec.prop]38 Requirements: Java 8, python, gigabytes of memory, [raw-attachment:convert_cnec_stanford.py:wiki:en/AdvancedNlpCourse/NamedEntityRecognition convert_cnec_stanford.py], [raw-attachment:get_unknown.py:wiki:en/AdvancedNlpCourse/NamedEntityRecognition get_unknown.py], [raw-attachment:cnec.prop:wiki:en/AdvancedNlpCourse/NamedEntityRecognition cnec.prop] 39 39 40 40 1. Create `<YOUR_FILE>`, a text file named `ia161-UCO-04.txt` where ''UCO'' is your university ID. … … 84 84 }}} 85 85 In the output, the first column is the input tokens, the second column is the correct (gold) answers. Observe the differences. Copy the training result to `<YOUR_FILE>`. Try to estimate in how many cases the model missed an entity, detected incorrectly the boundaries, or classified an entity incorrectly. 86 10. evaluate the model on `dtest` with only NEs that are not present in the train data :86 10. evaluate the model on `dtest` with only NEs that are not present in the train data. First, you need to filter out only those documents that do not contain NERs from the training data. Use the script `get_uknown.py`, then run the NER: 87 87 {{{ 88 88 java -cp stanford-ner-2018-10-16/stanford-ner.jar \