Context Navigation

← Previous Change
Wiki History
Next Change →

NamedEntityRecognition

Timestamp:: Aug 30, 2022, 10:39:57 AM (3 years ago)
Author:: Ales Horak
Comment:: copied from private/NlpInPracticeCourse/NamedEntityRecognition

Legend:

: Unmodified
: Added
: Removed
: Modified

en/NlpInPracticeCourse/2021/NamedEntityRecognition

                       v1
+= Named Entity Recognition =
+[[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/NlpInPracticeCourse|NLP in Practice Course]], Course Guarantee: Aleš Horák
+Prepared by: Zuzana Nevěřilová
+== State of the Art ==
+NER aims to ''recognize'' and ''classify'' names of people, locations, organizations, products, artworks, sometimes dates, money, measurements (numbers with units), law or patent numbers etc. Known issues are ambiguity of words (e.g. ''May'' can be a month, a verb, or a name), ambiguity of classes (e.g. ''HMS Queen Elisabeth'' can be a ship), and the inherent incompleteness of lists of NEs.
+Named entity recognition (NER) is used mainly in information extraction (IE) but it can significantly improve other NLP tasks such as syntactic parsing.
+=== Example from IE ===
+|| In 2003, Hannibal Lecter (as portrayed by Hopkins) was chosen by the American Film Institute as the number one movie villain. ||
+Hannibal Lecter <-> Hopkins
+=== Example concerning syntactic parsing ===
+|| Wish You Were Here is the ninth studio album by the English progressive rock group Pink Floyd. ||
+vs.
+|| Wish_You_Were_Here is the ninth studio album by the English progressive rock group Pink Floyd. ||
+=== References ===
+. Charles Sutton and Andrew !McCallum: An Introduction to Conditional Random Fields. Foundations and Trends in Machine Learning 4 (4). 2012. [[http://homepages.inf.ed.ac.uk/csutton/publications/crftut-fnt.pdf]]
+. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding, 2019. [[https://arxiv.org/abs/1810.04805]]
+. Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu: Automated Concatenation of Embeddings for Structured Prediction. Accepted to Proceedings of ACL-IJCNLP 2021. 17 pages.
+[[https://arxiv.org/abs/2010.05006]]
+== Practical Session ==
+=== Czech Named Entity Recognition ===
+In this workshop, we train a new NER application for the Czech language. We work with free resources & software tools: the Czech NE Corpus (CNEC) and the !FastText pre-trained word embeddings. We build a neural network to solve the problem.
+. Create `<YOUR_FILE>`, a text file named `ia161-UCO-04.txt` where ''UCO'' is your university ID.
+. Open Google Colab at [[https://colab.research.google.com/drive/1mnz-P30CLxrxQ0yyqpcLwVJgi7e59shi?usp=sharing]]
+. Follow the instructions in the notebook. There are three obligatory tasks. Write down your answers to `<YOUR_FILE>`.