Context Navigation

← Previous Change
Wiki History
Next Change →

InformationExtraction

Timestamp:: Sep 3, 2024, 2:49:40 PM (11 months ago)
Author:: Ales Horak
Comment:: copied from private/NlpInPracticeCourse/InformationExtraction

Legend:

: Unmodified
: Added
: Removed
: Modified

en/NlpInPracticeCourse/2023/InformationExtraction

                       v1
+= Extracting structured information from text =
+[[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/NlpInPracticeCourse|NLP in Practice Course]], Course Guarantee: Aleš Horák
+Prepared by: Zuzana Nevěřilová
+== State of the Art ==
+Information extraction (IE) is a technology based on
+analyzing natural language in order to extract snippets
+of information. The process takes texts (and sometimes
+speech) as input and produces fixed-format, unambiguous
+data as output. This data may be used directly for
+display to users, or may be stored in a database or
+spreadsheet for later analysis, or may be used for
+indexing purposes in information retrieval (IR) applications
+such as Internet search engines like Google.
+=== References ===
+. Piskorski, J. and Yangarber, R. Information Extraction: Past, Present and Future, pages 23–49. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.
+. Aydar, Mehmet, Ozge Bozal, and Furkan Ozbay. [https://arxiv.org/abs/2007.04247 Neural relation extraction: a survey.] arXiv e-prints (2020).
+. Li, Qing, et al. "A comprehensive exploration of semantic relation extraction via pre-trained CNNs." Knowledge-Based Systems (2020): 105488.
+== Practical Session ==
+The task will proceed using Python notebook run in web browser in the [https://colab.research.google.com/ Google Colaboratory] environment
+with the MU G-Suite disk access.
+In case of running the codes in a local environment, the requirements are
+Python 3, and NLTK module.
+. Create {{{<YOUR_FILE>}}}, a text file named {{{ia161-UCO-08.txt}}} where '''UCO''' is your university ID.
+. Access the [https://colab.research.google.com/drive/1KSfOy8KwKQ6De45ah3JMxP0BfQa-80RD?usp=sharing Python notebook in the Google Colab environment] and make your own copy. Do not forget to save your work if you want to see your changes later, leaving the browser will throw away all changes!
+. The colab reads file {{{input.txt}}} (each line is word|definition) and outputs hypernym for each word.
+. Default approach is naive: ''first noun in definition is hypernym''
+. Using the gold standard, evaluate the naive approach.
+. Improve the {{{find_hyper()}}} function  to provide better results. Evaluate the new version.
+. Copy the updated function {{{find_hyper()}}} and the output into {{{<YOUR_FILE>}}}. Please don't submit the whole notebook.
+Gold standard to evaluate your result: [[raw-attachment:gold_en.txt|gold_en.txt]]