Context Navigation

NamedEntityRecognition

Timestamp:: Jul 23, 2015, 4:17:32 PM (10 years ago)
Author:: Zuzana Nevěřilová
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

private/NlpInPracticeCourse/NamedEntityRecognition

-                      v1
+                      v2
 === References ===
+Approx 3 current papers (preferably from best NLP conferences/journals, eg. [[https://www.aclweb.org/anthology/|ACL Anthology]]) that will be used as a source for the one-hour lecture:
+. paper1
+. paper2
+. paper3
+. David Nadeau, Satoshi Sekine: A survey of named entity recognition and classification. In Satoshi Sekine and Elisabete Ranchhod (eds.) Named Entities: Recognition, classification and use. Lingvisticæ Investigationes 30:1. 2007. pp. 3–26 [[http://brown.cl.uni-heidelberg.de/~sourjiko/NER_Literatur/survey.pdf]]
+. Charles Sutton and Andrew !McCallum: An Introduction to Conditional Random Fields. Foundations and Trends in Machine Learning 4 (4). 2012. [[http://homepages.inf.ed.ac.uk/csutton/publications/crftut-fnt.pdf]]
 == Practical Session ==
+Concrete description of work assignment for students for the second one-hour part of the lecture. The work will consist of tasks connected with practical implementations of algorithms connected with the current topic (probably not the state-of-the-art algorithms mentioned in the first part) and with real data. Students can test the algorithms, evaluate them and possibly try some short adaptations for various subtasks.
+Try naive gazetteer method (implement substring search) on prepared data.
+Observe the recognition:
+. what happens to every string present in the gazetteer?
+. what happens to NE not present in the gazetteer?
+Students can also be required to generate some results of their work and hand them in to prove completing the tasks.
+Try machine learning approach (use the Stanford NER) with prepared data.
+Observe the recognition:
+. measure precision, recall, and F1-score on the test data
+. find NEs not present in the train data
+. find NEs that were not recognized
+. discuss what types of NE are easy/difficult to recognize