Changes between Version 13 and Version 14 of private/NlpInPracticeCourse/InformationExtraction


Ignore:
Timestamp:
Nov 17, 2017, 9:24:19 AM (6 years ago)
Author:
Zuzana Nevěřilová
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/InformationExtraction

    v13 v14  
    3131
    3232 1. Create {{{<YOUR_FILE>}}}, a text file named {{{ia161-UCO-08.txt}}} where '''UCO''' is your university ID.
    33  1. Download and install GATE (Java 8 is necessary) from https://gate.ac.uk/download/
    34  1. Run GATE
    35  1. Load ANNIE (with defaults)
    36  1. Create language resources:
     33 1. Download and install GATE (Java 8 is necessary) from https://gate.ac.uk/download/ ({{{java -jar gate-<VERSION>-installer.jar}}})
     34 1. Run GATE ({{{GATE_Developer_<VERSION>/bin/gate.sh}}})
     35 1. Load ANNIE (with defaults), read about its components
     36 1. Create document(s):
    3737   * right click on Language !Resources/New/GATE Document in the left menu
    3838   * change {{{markupAware}}} to {{{false}}}
    3939   * change {{{sourceUrl}}} to {{{stringContent}}} and paste some news text
     40   * repeat these steps
    4041   * you can find three sample texts here: [raw-attachment:text1.txt text1.txt], [raw-attachment:text2.txt text2.txt], [raw-attachment:text3.txt text3.txt]
    4142 1. Create corpus:
     
    4546 1. Observe the annotated results, click on a document, then Annotation Sets and/or Annotation List.
    4647
    47 So far, GATE did not much more than Stanford NER in lecture 04. Note, however, that all tokens are annotated and POS-tagged.
     48So far, GATE did not much more than Stanford NER in lecture 04. Note, however, that all tokens are annotated and POS-tagged. Also note the annotation type Lookup.
    4849
    4950We add rules for extracting job titles and the respective person names. The rules are defined in the grammars [raw-attachment:jobtitle.jape] and [raw-attachment:jobtitleperson.jape]
    5051
    5152 1. Right click Processing !Resources/New/JAPE Transducer in the left menu
    52  1. Click on {{{grammmarUrl}}} and choose grammar {{{jobtitle.jape}}}
     53 1. Download the grammar(s).
     54 1. Click on {{{grammmarUrl}}} and choose the grammar file {{{jobtitle.jape}}}
    5355 1. Click on !Applications/Annie in the left menu and add the JAPE Transducer to the ANNIE pipeline
    54  1. Run ANNIE again: Click on !Applications/Annie in the left menu, select Corpus
     56 1. Run ANNIE again: Click on !Applications/Annie in the left menu
    5557 1. Observe the annotated results, click on a document, then Annotation Sets and/or Annotation List. If applicable, you can see new annotation JobTitle.
    56  1. Observer the grammars {{{jobtitle.jape}}} and {{{jobtitleperson.jape}}}
     58 1. Observe the grammars {{{jobtitle.jape}}} and {{{jobtitleperson.jape}}}
    5759
    5860Add new transducer with the grammar {{{jobtitleperson.jape}}} and observe the results.
     
    6062Optionally, you can add further documents and observe how universal the {{{jobtitleperson.jape}}} grammar is.
    6163
    62 Write your observations to {{{<YOUR_FILE>}}}.
     64Write your observations to {{{<YOUR_FILE>}}}: Particularly, comment how well the  Gazetteer and NE Transducer perform, describe how well the grammar works. Note that no coreference resolution is used (optionally, you can try one).