Context Navigation

InformationExtraction

Timestamp:: Nov 15, 2017, 7:09:21 PM (8 years ago)
Author:: Zuzana Nevěřilová
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

private/NlpInPracticeCourse/InformationExtraction

-                      v11
+                      v12
 . Banko, Michele, et al. [http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-429.pdf Open information extraction for the web]. IJCAI. Vol. 7. 2007.
 . Fader, Anthony, Soderland, Stephen and Etzioni, Oren. [http://dl.acm.org/citation.cfm?id=2145596 Identifying relations for open information extraction]. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11). Association for Computational Linguistics, Stroudsburg, PA, USA, 2011.
+. Piskorski, J. and Yangarber, R. Information Extraction: Past, Present and Future, pages 23–49. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.
 == Practical Session ==
 You are given few [raw-attachment:wiki.txt:wiki:private/AdvancedNlpCourse/InformationExtraction short excerpts from Czech wikipedia] as a [attachment:wiki.txt plain text]. They were analyzed by automatic sentence detection, tokenization (unitok tool), morphological analysis and tagging (desamb tool), and syntactic analysis (SET tool, with --long-phrases option) and [raw-attachment:wiki.phrases this is the result].
+We will extract information from news articles using GATE.
+Write a short program in Python which will extract simple information about who was who, from the parsed file. The result should look like [attachment:wiki.output this file].
+. Create {{{<YOUR_FILE>}}}, a text file named {{{ia161-UCO-08.txt}}} where '''UCO''' is your university ID.
+. Download and install GATE (Java 8 is necessary) from https://gate.ac.uk/download/
+. Run GATE
+. Load ANNIE (with defaults)
+. Create language resources:
+   * right click on Language !Resources/New/GATE Document in the left menu
+   * change {{{markupAware}}} to {{{false}}}
+   * change {{{sourceUrl}}} to {{{stringContent}}} and paste some news text
+   * you can find three sample texts here:
+. Create corpus:
+   * right click on Language !Resources/New/GATE Corpus in the left menu
+   * drag and drop the document in order to put them into the corpus
+. Run ANNIE: Click on !Applications/Annie in the left menu, select Corpus
+. Observe the annotated results, click on a document, then Annotation Sets and/or Annotation List.
+So far, GATE did not much more than Stanford NER in lecture 04. Note, however, that all tokens are annotated and POS-tagged.
+We add rules for extracting job titles and the respective person names:
+. Right click Processing !Resources/New/JAPE Transducer in the left menu
+. Click on {{{grammmarUrl}}} and choose grammar {{{jobtitle.jape}}}
+. Click on !Applications/Annie in the left menu and add the JAPE Transducer to the ANNIE pipeline
+. Run ANNIE again: Click on !Applications/Annie in the left menu, select Corpus
+. Observe the annotated results, click on a document, then Annotation Sets and/or Annotation List. If applicable, you can see new annotation JobTitle.
+. Observer the grammars {{{jobtitle.jape}}} and {{{jobtitleperson.jape}}}
+Add new grammar {{{jobtitleperson.jape}}} and observe the results.
+Optionally, you can add further documents and observe how universal the {{{jobtitleperson.jape}}} grammar is.
+Write your observations to {{{<YOUR_FILE>}}}.
 You may modify or draw inspiration from [raw-attachment:demo.py this demo script].