Changes between Version 14 and Version 15 of private/NlpInPracticeCourse/InformationExtraction
- Timestamp:
- Nov 20, 2017, 11:15:41 AM (6 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/InformationExtraction
v14 v15 31 31 32 32 1. Create {{{<YOUR_FILE>}}}, a text file named {{{ia161-UCO-08.txt}}} where '''UCO''' is your university ID. 33 1. Download and install GATE (Java 8 is necessary) from https://gate.ac.uk/download/ ({{{java -jar gate-<VERSION>-installer.jar}}}) 34 1. Run GATE ({{{GATE_Developer_<VERSION>/bin/gate.sh}}}) 33 1. Download and install GATE (Java 8 is necessary) from https://gate.ac.uk/download/ 34 {{{ 35 java -jar gate-<VERSION>-installer.jar 36 }}} 37 1. Run GATE 38 {{{ 39 GATE_Developer_<VERSION>/bin/gate.sh 40 }}} 35 41 1. Load ANNIE (with defaults), read about its components 36 42 1. Create document(s): 37 * right click on Language !Resources/New/GATE Documentin the left menu43 * right click on `Language Resources/New/GATE Document` in the left menu 38 44 * change {{{markupAware}}} to {{{false}}} 39 45 * change {{{sourceUrl}}} to {{{stringContent}}} and paste some news text … … 41 47 * you can find three sample texts here: [raw-attachment:text1.txt text1.txt], [raw-attachment:text2.txt text2.txt], [raw-attachment:text3.txt text3.txt] 42 48 1. Create corpus: 43 * right click on Language !Resources/New/GATE Corpusin the left menu49 * right click on `Language Resources/New/GATE Corpus` in the left menu 44 50 * drag and drop the document in order to put them into the corpus 45 1. Run ANNIE: Click on !Applications/Annie in the left menu, select Corpus46 1. Observe the annotated results, click on a document, then Annotation Sets and/or Annotation List.51 1. Run ANNIE: Click on `Applications/Annie` in the left menu, select `Corpus` 52 1. Observe the annotated results, click on a document, then `Annotation Sets` and/or `Annotation List`. 47 53 48 54 So far, GATE did not much more than Stanford NER in lecture 04. Note, however, that all tokens are annotated and POS-tagged. Also note the annotation type Lookup. 49 55 50 We add rules for extracting job titles and the respective person names. The rules are defined in the grammars [raw-attachment:jobtitle.jape] and [raw-attachment:jobtitleperson.jape]56 We add rules for extracting ''job titles'' and the respective ''person names''. The rules are defined in the grammars [raw-attachment:jobtitle.jape] and [raw-attachment:jobtitleperson.jape] 51 57 52 1. Right click Processing !Resources/New/JAPE Transducerin the left menu58 1. Right click `Processing Resources/New/JAPE Transducer` in the left menu 53 59 1. Download the grammar(s). 54 60 1. Click on {{{grammmarUrl}}} and choose the grammar file {{{jobtitle.jape}}} 55 1. Click on !Applications/Anniein the left menu and add the JAPE Transducer to the ANNIE pipeline56 1. Run ANNIE again: Click on !Applications/Anniein the left menu57 1. Observe the annotated results, click on a document, then Annotation Sets and/or Annotation List. If applicable, you can see new annotation JobTitle.61 1. Click on `Applications/Annie` in the left menu and add the JAPE Transducer to the ANNIE pipeline 62 1. Run ANNIE again: Click on `Applications/Annie` in the left menu 63 1. Observe the annotated results, click on a document, then `Annotation Sets` and/or `Annotation List`. If applicable, you can see new annotation `JobTitle`. 58 64 1. Observe the grammars {{{jobtitle.jape}}} and {{{jobtitleperson.jape}}} 59 65