| 1 | = Extracting structured information from text = |
| 2 | |
| 3 | [[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/NlpInPracticeCourse|NLP in Practice Course]], Course Guarantee: Aleš Horák |
| 4 | |
| 5 | Prepared by: Zuzana Nevěřilová |
| 6 | |
| 7 | |
| 8 | == State of the Art == |
| 9 | |
| 10 | Information extraction (IE) is a technology based on |
| 11 | analyzing natural language in order to extract snippets |
| 12 | of information. The process takes texts (and sometimes |
| 13 | speech) as input and produces fixed-format, unambiguous |
| 14 | data as output. This data may be used directly for |
| 15 | display to users, or may be stored in a database or |
| 16 | spreadsheet for later analysis, or may be used for |
| 17 | indexing purposes in information retrieval (IR) applications |
| 18 | such as Internet search engines like Google. |
| 19 | |
| 20 | === References === |
| 21 | |
| 22 | 1. Piskorski, J. and Yangarber, R. Information Extraction: Past, Present and Future, pages 23–49. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013. |
| 23 | 1. Aydar, Mehmet, Ozge Bozal, and Furkan Ozbay. [https://arxiv.org/abs/2007.04247 Neural relation extraction: a survey.] arXiv e-prints (2020). |
| 24 | 1. Li, Qing, et al. "A comprehensive exploration of semantic relation extraction via pre-trained CNNs." Knowledge-Based Systems (2020): 105488. |
| 25 | |
| 26 | |
| 27 | == Practical Session == |
| 28 | |
| 29 | |
| 30 | The task will proceed using Python notebook run in web browser in the [https://colab.research.google.com/ Google Colaboratory] environment |
| 31 | with the MU G-Suite disk access. |
| 32 | |
| 33 | In case of running the codes in a local environment, the requirements are |
| 34 | Python 3, and NLTK module. |
| 35 | |
| 36 | 1. Create {{{<YOUR_FILE>}}}, a text file named {{{ia161-UCO-08.txt}}} where '''UCO''' is your university ID. |
| 37 | 1. Access the [https://colab.research.google.com/drive/1KSfOy8KwKQ6De45ah3JMxP0BfQa-80RD?usp=sharing Python notebook in the Google Colab environment] and make your own copy. Do not forget to save your work if you want to see your changes later, leaving the browser will throw away all changes! |
| 38 | 1. The colab reads file {{{input.txt}}} (each line is word|definition) and outputs hypernym for each word. |
| 39 | 1. Default approach is naive: ''first noun in definition is hypernym'' |
| 40 | 1. Using the gold standard, evaluate the naive approach. |
| 41 | 1. Improve the {{{find_hyper()}}} function to provide better results. Evaluate the new version. |
| 42 | 1. Copy the updated function {{{find_hyper()}}} and the output into {{{<YOUR_FILE>}}}. Please don't submit the whole notebook. |
| 43 | |
| 44 | Gold standard to evaluate your result: [[raw-attachment:gold_en.txt|gold_en.txt]] |
| 45 | |