Changes between Initial Version and Version 1 of en/NlpInPracticeCourse/2023/InformationExtraction


Ignore:
Timestamp:
Sep 3, 2024, 2:49:40 PM (11 months ago)
Author:
Ales Horak
Comment:

copied from private/NlpInPracticeCourse/InformationExtraction

Legend:

Unmodified
Added
Removed
Modified
  • en/NlpInPracticeCourse/2023/InformationExtraction

    v1 v1  
     1= Extracting structured information from text =
     2
     3[[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/NlpInPracticeCourse|NLP in Practice Course]], Course Guarantee: Aleš Horák
     4
     5Prepared by: Zuzana Nevěřilová
     6
     7
     8== State of the Art ==
     9
     10Information extraction (IE) is a technology based on
     11analyzing natural language in order to extract snippets
     12of information. The process takes texts (and sometimes
     13speech) as input and produces fixed-format, unambiguous
     14data as output. This data may be used directly for
     15display to users, or may be stored in a database or
     16spreadsheet for later analysis, or may be used for
     17indexing purposes in information retrieval (IR) applications
     18such as Internet search engines like Google.
     19
     20=== References ===
     21
     22 1. Piskorski, J. and Yangarber, R. Information Extraction: Past, Present and Future, pages 23–49. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.
     23 1. Aydar, Mehmet, Ozge Bozal, and Furkan Ozbay. [https://arxiv.org/abs/2007.04247 Neural relation extraction: a survey.] arXiv e-prints (2020).
     24 1. Li, Qing, et al. "A comprehensive exploration of semantic relation extraction via pre-trained CNNs." Knowledge-Based Systems (2020): 105488.
     25
     26
     27== Practical Session ==
     28
     29
     30The task will proceed using Python notebook run in web browser in the [https://colab.research.google.com/ Google Colaboratory] environment
     31with the MU G-Suite disk access.
     32
     33In case of running the codes in a local environment, the requirements are
     34Python 3, and NLTK module.
     35
     36 1. Create {{{<YOUR_FILE>}}}, a text file named {{{ia161-UCO-08.txt}}} where '''UCO''' is your university ID.
     37 1. Access the [https://colab.research.google.com/drive/1KSfOy8KwKQ6De45ah3JMxP0BfQa-80RD?usp=sharing Python notebook in the Google Colab environment] and make your own copy. Do not forget to save your work if you want to see your changes later, leaving the browser will throw away all changes!
     38 1. The colab reads file {{{input.txt}}} (each line is word|definition) and outputs hypernym for each word.
     39 1. Default approach is naive: ''first noun in definition is hypernym''
     40 1. Using the gold standard, evaluate the naive approach.
     41 1. Improve the {{{find_hyper()}}} function  to provide better results. Evaluate the new version.
     42 1. Copy the updated function {{{find_hyper()}}} and the output into {{{<YOUR_FILE>}}}. Please don't submit the whole notebook.
     43
     44Gold standard to evaluate your result: [[raw-attachment:gold_en.txt|gold_en.txt]]
     45