Changes between Version 1 and Version 2 of private/NlpInPracticeCourse/NamedEntityRecognition
- Timestamp:
- Jul 23, 2015, 4:17:32 PM (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/NamedEntityRecognition
v1 v2 14 14 === References === 15 15 16 Approx 3 current papers (preferably from best NLP conferences/journals, eg. [[https://www.aclweb.org/anthology/|ACL Anthology]]) that will be used as a source for the one-hour lecture: 17 18 1. paper1 19 1. paper2 20 1. paper3 16 1. David Nadeau, Satoshi Sekine: A survey of named entity recognition and classification. In Satoshi Sekine and Elisabete Ranchhod (eds.) Named Entities: Recognition, classification and use. Lingvisticæ Investigationes 30:1. 2007. pp. 3–26 [[http://brown.cl.uni-heidelberg.de/~sourjiko/NER_Literatur/survey.pdf]] 17 1. Charles Sutton and Andrew !McCallum: An Introduction to Conditional Random Fields. Foundations and Trends in Machine Learning 4 (4). 2012. [[http://homepages.inf.ed.ac.uk/csutton/publications/crftut-fnt.pdf]] 21 18 22 19 == Practical Session == 23 20 24 Concrete description of work assignment for students for the second one-hour part of the lecture. The work will consist of tasks connected with practical implementations of algorithms connected with the current topic (probably not the state-of-the-art algorithms mentioned in the first part) and with real data. Students can test the algorithms, evaluate them and possibly try some short adaptations for various subtasks. 21 Try naive gazetteer method (implement substring search) on prepared data. 22 Observe the recognition: 23 1. what happens to every string present in the gazetteer? 24 1. what happens to NE not present in the gazetteer? 25 25 26 Students can also be required to generate some results of their work and hand them in to prove completing the tasks. 26 Try machine learning approach (use the Stanford NER) with prepared data. 27 Observe the recognition: 28 1. measure precision, recall, and F1-score on the test data 29 1. find NEs not present in the train data 30 1. find NEs that were not recognized 31 1. discuss what types of NE are easy/difficult to recognize