Changes between Version 6 and Version 7 of private/NlpInPracticeCourse/NamedEntityRecognition


Ignore:
Timestamp:
Oct 11, 2015, 4:13:16 PM (8 years ago)
Author:
Zuzana Nevěřilová
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/NamedEntityRecognition

    v6 v7  
    1313=== Example from IE ===
    1414
    15 In 2003, Hannibal Lecter (as portrayed by Hopkins) was chosen by the American Film Institute as the #1 movie villain.
     15In 2003, Hannibal Lecter (as portrayed by Hopkins) was chosen by the American Film Institute as the number one movie villain.
    1616
    1717Hannibal Lecter <-> Hopkins
     
    3636In this workshop, we train a new NER application for the Czech language. We work with free resources & software tools: the Czech NE Corpus (CNEC) and the Stanford NER application.
    3737
    38 Requirements: Java 8, python, gigabytes of memory
     38Requirements: Java 8, python, gigabytes of memory, [raw-attachment:convert_cnec_stanford.py:wiki:en/AdvancedNlpCourse convert_cnec_stanford.py], [raw-attachment:named_ent_dtest_unknown.tsv:wiki:en/AdvancedNlpCourse named_ent_dtest_unknown.tsv]
    3939
    40401. Create `<YOUR_FILE>`, a text file named ia161-UCO-03.txt where UCO is your university ID.
     
    5656         Totals 0.7814  0.7711  0.7763  994     278     295
    5757}}}
    58 In the output, the first column is the input tokens, the second column is the correct (gold) answers. Observe the differences. Copy the training result to `<YOUR_FILE>`.
     58In the output, the first column is the input tokens, the second column is the correct (gold) answers. Observe the differences. Copy the training result to `<YOUR_FILE>`. Try to estimate in how many the model missed an entity, detected incorrectly the boundaries, or classified an entity incorrectly.
    595910. evaluate the model on `dtest` with only NEs that are not present in the train data: `java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier cnec-3class-model.ser.gz -testFile named_ent_dtest_unknown.tsv`. Copy the result to `<YOUR_FILE>`.
    606011. test on your own input: `java -mx600m -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier cnec-3class-model.ser.gz -textFile sample.txt`. Copy the result to `<YOUR_FILE>`.