Context Navigation

NamedEntityRecognition

Timestamp:: Oct 9, 2017, 11:36:03 AM (8 years ago)
Author:: Ales Horak
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

private/NlpInPracticeCourse/NamedEntityRecognition

-                      v14
+                      v15
 Requirements: Java 8, python, gigabytes of memory, [raw-attachment:convert_cnec_stanford.py:wiki:en/AdvancedNlpCourse/NamedEntityRecognition convert_cnec_stanford.py], [raw-attachment:named_ent_dtest_unknown.tsv:wiki:en/AdvancedNlpCourse/NamedEntityRecognition named_ent_dtest_unknown.tsv], [raw-attachment:cnec.prop:wiki:en/AdvancedNlpCourse/NamedEntityRecognition cnec.prop]
 . Create `<YOUR_FILE>`, a text file named ia161-UCO-03.txt where UCO is your university ID.
+. Create `<YOUR_FILE>`, a text file named `ia161-UCO-04.txt` where ''UCO'' is your university ID.
 . get the data: download CNEC from LINDAT/Clarin repository (https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-1B22-8)
 . open the NE hierarchy:
+ `evince cnec2.0/doc/ne-type-hierarchy.pdf`
+{{{
+evince cnec2.0/doc/ne-type-hierarchy.pdf
+}}}
 . the data is organized into 3 disjoint datasets: the training data is called `train`, the development test data is called `dtest` and the final evaluation data is called `etest`.
 . convert the train data to the Stanford NER format:
+ `python convert_cnec_stanford.py cnec2.0/data/xml/named_ent_train.xml > named_ent_train.tsv`
+{{{
+python convert_cnec_stanford.py cnec2.0/data/xml/named_ent_train.xml > named_ent_train.tsv
+}}}
  Note that we removed documents that did not contain NEs. You can experiment with this option later.
 . download the Stanford NE recognizer http://nlp.stanford.edu/software/CRF-NER.shtml (and read about it)
 . train the model using the default settings (cnec.prop), N.B. that the `convert_cnec_stanford.py` only recognizes PERSON, LOCATION and ORGANIZATION, you can extend the markup conversion later:
+ `java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop cnec.prop`
+{{{
+java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop cnec.prop
+}}}
 . convert the test data to the Stanford NER format:
+ `python convert_cnec_stanford.py named_ent_dtest.xml > named_ent_dtest.tsv`
+ {{{
+ python convert_cnec_stanford.py named_ent_dtest.xml > named_ent_dtest.tsv
+}}}
 . evaluate the model on `dtest`:
 {{{