Changes between Initial Version and Version 1 of en/AdvancedNlpCourse2015/InformationExtraction


Ignore:
Timestamp:
Sep 11, 2017, 4:38:35 PM (7 years ago)
Author:
Ales Horak
Comment:

copied from private/AdvancedNlpCourse/InformationExtraction

Legend:

Unmodified
Added
Removed
Modified
  • en/AdvancedNlpCourse2015/InformationExtraction

    v1 v1  
     1= Extracting structured information from text =
     2
     3[[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/AdvancedNlpCourse|Advanced NLP Course]], Course Guarantee: Aleš Horák
     4
     5Prepared by: Vojtěch Kovář
     6
     7
     8== State of the Art ==
     9
     10Information extraction (IE) is a technology based on
     11analyzing natural language in order to extract snippets
     12of information. The process takes texts (and sometimes
     13speech) as input and produces fixed-format, unambiguous
     14data as output. This data may be used directly for
     15display to users, or may be stored in a database or
     16spreadsheet for later analysis, or may be used for
     17indexing purposes in information retrieval (IR) applications
     18such as Internet search engines like Google.
     19
     20=== References ===
     21
     22 1. Cunningham, Hamish. [https://gate.ac.uk/sale/ell2/ie/ An Introduction to Information Extraction]. Encyclopedia of Language and Linguistics, 2nd Edition. Elsevier, 2005.
     23 1. Chang, Chia-Hui, et al.[https://www.researchgate.net/profile/Khaled_Shaalan/publication/200110627_A_Survey_of_Web_Information_Extraction_Systems/links/0912f50abd8c6b314d000000.pdf A Survey of Web Information Extraction Systems]. Knowledge and Data Engineering, IEEE Transactions on 18.10 (2006).
     24 1. Banko, Michele, et al. [http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-429.pdf Open information extraction for the web]. IJCAI. Vol. 7. 2007.
     25 1. Fader, Anthony, Soderland, Stephen and Etzioni, Oren. [http://dl.acm.org/citation.cfm?id=2145596 Identifying relations for open information extraction]. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11). Association for Computational Linguistics, Stroudsburg, PA, USA, 2011.
     26
     27== Practical Session ==
     28
     29You are given few [raw-attachment:wiki.txt:wiki:private/AdvancedNlpCourse/InformationExtraction short excerpts from Czech wikipedia] as a plain text. They were analyzed by automatic sentence detection, tokenization (unitok tool), morphological analysis and tagging (desamb tool), and syntactic analysis (SET tool, with --long-phrases option) and [raw-attachment:wiki.phrases this is the result].
     30
     31Write a short program in Python which will extract simple information about who was who, from the parsed file. The result should look like [raw-attachment:wiki.output this file].
     32
     33You may modify or draw inspiration from [raw-attachment:demo.py this demo script].