Changes between Version 6 and Version 7 of private/NlpInPracticeCourse/NamedEntityRecognition
- Timestamp:
- Oct 11, 2015, 4:13:16 PM (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/NamedEntityRecognition
v6 v7 13 13 === Example from IE === 14 14 15 In 2003, Hannibal Lecter (as portrayed by Hopkins) was chosen by the American Film Institute as the #1movie villain.15 In 2003, Hannibal Lecter (as portrayed by Hopkins) was chosen by the American Film Institute as the number one movie villain. 16 16 17 17 Hannibal Lecter <-> Hopkins … … 36 36 In this workshop, we train a new NER application for the Czech language. We work with free resources & software tools: the Czech NE Corpus (CNEC) and the Stanford NER application. 37 37 38 Requirements: Java 8, python, gigabytes of memory 38 Requirements: Java 8, python, gigabytes of memory, [raw-attachment:convert_cnec_stanford.py:wiki:en/AdvancedNlpCourse convert_cnec_stanford.py], [raw-attachment:named_ent_dtest_unknown.tsv:wiki:en/AdvancedNlpCourse named_ent_dtest_unknown.tsv] 39 39 40 40 1. Create `<YOUR_FILE>`, a text file named ia161-UCO-03.txt where UCO is your university ID. … … 56 56 Totals 0.7814 0.7711 0.7763 994 278 295 57 57 }}} 58 In the output, the first column is the input tokens, the second column is the correct (gold) answers. Observe the differences. Copy the training result to `<YOUR_FILE>`. 58 In the output, the first column is the input tokens, the second column is the correct (gold) answers. Observe the differences. Copy the training result to `<YOUR_FILE>`. Try to estimate in how many the model missed an entity, detected incorrectly the boundaries, or classified an entity incorrectly. 59 59 10. evaluate the model on `dtest` with only NEs that are not present in the train data: `java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier cnec-3class-model.ser.gz -testFile named_ent_dtest_unknown.tsv`. Copy the result to `<YOUR_FILE>`. 60 60 11. test on your own input: `java -mx600m -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier cnec-3class-model.ser.gz -textFile sample.txt`. Copy the result to `<YOUR_FILE>`.