Changes between Version 11 and Version 12 of private/NlpInPracticeCourse/InformationExtraction


Ignore:
Timestamp:
Nov 15, 2017, 7:09:21 PM (6 years ago)
Author:
Zuzana Nevěřilová
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/InformationExtraction

    v11 v12  
    2424 1. Banko, Michele, et al. [http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-429.pdf Open information extraction for the web]. IJCAI. Vol. 7. 2007.
    2525 1. Fader, Anthony, Soderland, Stephen and Etzioni, Oren. [http://dl.acm.org/citation.cfm?id=2145596 Identifying relations for open information extraction]. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11). Association for Computational Linguistics, Stroudsburg, PA, USA, 2011.
     26 1. Piskorski, J. and Yangarber, R. Information Extraction: Past, Present and Future, pages 23–49. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.
    2627
    2728== Practical Session ==
    2829
    29 You are given few [raw-attachment:wiki.txt:wiki:private/AdvancedNlpCourse/InformationExtraction short excerpts from Czech wikipedia] as a [attachment:wiki.txt plain text]. They were analyzed by automatic sentence detection, tokenization (unitok tool), morphological analysis and tagging (desamb tool), and syntactic analysis (SET tool, with --long-phrases option) and [raw-attachment:wiki.phrases this is the result].
     30We will extract information from news articles using GATE.
    3031
    31 Write a short program in Python which will extract simple information about who was who, from the parsed file. The result should look like [attachment:wiki.output this file].
     32 1. Create {{{<YOUR_FILE>}}}, a text file named {{{ia161-UCO-08.txt}}} where '''UCO''' is your university ID.
     33 1. Download and install GATE (Java 8 is necessary) from https://gate.ac.uk/download/
     34 1. Run GATE
     35 1. Load ANNIE (with defaults)
     36 1. Create language resources:
     37   * right click on Language !Resources/New/GATE Document in the left menu
     38   * change {{{markupAware}}} to {{{false}}}
     39   * change {{{sourceUrl}}} to {{{stringContent}}} and paste some news text
     40   * you can find three sample texts here:
     41 1. Create corpus:
     42   * right click on Language !Resources/New/GATE Corpus in the left menu
     43   * drag and drop the document in order to put them into the corpus
     44 1. Run ANNIE: Click on !Applications/Annie in the left menu, select Corpus
     45 1. Observe the annotated results, click on a document, then Annotation Sets and/or Annotation List.
     46
     47So far, GATE did not much more than Stanford NER in lecture 04. Note, however, that all tokens are annotated and POS-tagged.
     48
     49We add rules for extracting job titles and the respective person names:
     50
     51 1. Right click Processing !Resources/New/JAPE Transducer in the left menu
     52 1. Click on {{{grammmarUrl}}} and choose grammar {{{jobtitle.jape}}}
     53 1. Click on !Applications/Annie in the left menu and add the JAPE Transducer to the ANNIE pipeline
     54 1. Run ANNIE again: Click on !Applications/Annie in the left menu, select Corpus
     55 1. Observe the annotated results, click on a document, then Annotation Sets and/or Annotation List. If applicable, you can see new annotation JobTitle.
     56 1. Observer the grammars {{{jobtitle.jape}}} and {{{jobtitleperson.jape}}}
     57
     58Add new grammar {{{jobtitleperson.jape}}} and observe the results.
     59
     60Optionally, you can add further documents and observe how universal the {{{jobtitleperson.jape}}} grammar is.
     61
     62Write your observations to {{{<YOUR_FILE>}}}.
     63
    3264
    3365You may modify or draw inspiration from [raw-attachment:demo.py this demo script].