Changes between Initial Version and Version 1 of en/NlpInPracticeCourse/2022/NamedEntityRecognition

Sep 13, 2023, 2:45:09 PM (10 months ago)
Ales Horak

copied from private/NlpInPracticeCourse/NamedEntityRecognition


  • en/NlpInPracticeCourse/2022/NamedEntityRecognition

    v1 v1  
     1= Named Entity Recognition =
     3[[|IA161]] [[en/NlpInPracticeCourse|NLP in Practice Course]], Course Guarantee: Aleš Horák
     5Prepared by: Zuzana Nevěřilová
     7== State of the Art ==
     9NER aims to ''recognize'' and ''classify'' names of people, locations, organizations, products, artworks, sometimes dates, money, measurements (numbers with units), law or patent numbers etc. Known issues are ambiguity of words (e.g. ''May'' can be a month, a verb, or a name), ambiguity of classes (e.g. ''HMS Queen Elisabeth'' can be a ship), and the inherent incompleteness of lists of NEs.
     11Named entity recognition (NER) is used mainly in information extraction (IE) but it can significantly improve other NLP tasks such as syntactic parsing.
     13=== Example from IE ===
     15|| In 2003, Hannibal Lecter (as portrayed by Hopkins) was chosen by the American Film Institute as the number one movie villain. ||
     17Hannibal Lecter <-> Hopkins
     19=== Example concerning syntactic parsing ===
     21|| Wish You Were Here is the ninth studio album by the English progressive rock group Pink Floyd. ||
     25|| Wish_You_Were_Here is the ninth studio album by the English progressive rock group Pink Floyd. ||
     27=== References ===
     29 1. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding, 2019. [[]]
     30 1. Afshin Rahimi, Yuan Li, and Trevor Cohn. 2019. Massively Multilingual Transfer for NER. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 151–164, Florence, Italy. Association for Computational Linguistics. [[]]
     32== Practical Session ==
     34=== Multilingual Named Entity Recognition ===
     36In this workshop, we train a NER model for any of the languages supported by WikiAnn. We work with the huggingface library, its BERT model for multilingual token classification, and the WikiAnn training data.
     381. Create `<YOUR_FILE>`, a text file named `ia161-UCO-04.txt` where ''UCO'' is your university ID.
     391. Open Google Colab at [[]]
     401. Follow the instructions in the notebook. There are four obligatory tasks. Write down your answers to `<YOUR_FILE>`.
     411. Submit to the homework vault (Odevzdavarna).