= Extracting structured information from text = [[https://is.muni.cz/auth/predmet/fi/ia161|IA161 Advanced NLP Course]], Course Guarantee: Aleš Horák Prepared by: Vojtěch Kovář == TODO til 31.5.2015 == 1. choose particular papers for [[#References|References]] below (that will serve as input for the lecture later on) 1. prepare the [[#PracticalSession|Practical Session]] == State of the Art == Information extraction (IE) is a technology based on analyzing natural language in order to extract snippets of information. The process takes texts (and sometimes speech) as input and produces fixed-format, unambiguous data as output. This data may be used directly for display to users, or may be stored in a database or spreadsheet for later analysis, or may be used for indexing purposes in information retrieval (IR) applications such as Internet search engines like Google. === References === 1. Cunningham, Hamish. [https://gate.ac.uk/sale/ell2/ie/ An Introduction to Information Extraction]. Encyclopedia of Language and Linguistics, 2nd Edition. Elsevier, 2005. 1. Chang, Chia-Hui, et al.[https://www.researchgate.net/profile/Khaled_Shaalan/publication/200110627_A_Survey_of_Web_Information_Extraction_Systems/links/0912f50abd8c6b314d000000.pdf A Survey of Web Information Extraction Systems]. Knowledge and Data Engineering, IEEE Transactions on 18.10 (2006). 1. Banko, Michele, et al. [http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-429.pdf Open information extraction for the web]. IJCAI. Vol. 7. 2007. 1. Fader, Anthony, Soderland, Stephen and Etzioni, Oren. [http://dl.acm.org/citation.cfm?id=2145596 Identifying relations for open information extraction]. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11). Association for Computational Linguistics, Stroudsburg, PA, USA, 2011. == Practical Session == Concrete description of work assignment for students for the second one-hour part of the lecture. The work will consist of tasks connected with practical implementations of algorithms connected with the current topic (probably not the state-of-the-art algorithms mentioned in the first part) and with real data. Students can test the algorithms, evaluate them and possibly try some short adaptations for various subtasks. Students can also be required to generate some results of their work and hand them in to prove completing the tasks.