Changes between Version 13 and Version 14 of private/NlpInPracticeCourse/InformationExtraction
- Timestamp:
- Nov 17, 2017, 9:24:19 AM (6 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/InformationExtraction
v13 v14 31 31 32 32 1. Create {{{<YOUR_FILE>}}}, a text file named {{{ia161-UCO-08.txt}}} where '''UCO''' is your university ID. 33 1. Download and install GATE (Java 8 is necessary) from https://gate.ac.uk/download/ 34 1. Run GATE 35 1. Load ANNIE (with defaults) 36 1. Create language resources:33 1. Download and install GATE (Java 8 is necessary) from https://gate.ac.uk/download/ ({{{java -jar gate-<VERSION>-installer.jar}}}) 34 1. Run GATE ({{{GATE_Developer_<VERSION>/bin/gate.sh}}}) 35 1. Load ANNIE (with defaults), read about its components 36 1. Create document(s): 37 37 * right click on Language !Resources/New/GATE Document in the left menu 38 38 * change {{{markupAware}}} to {{{false}}} 39 39 * change {{{sourceUrl}}} to {{{stringContent}}} and paste some news text 40 * repeat these steps 40 41 * you can find three sample texts here: [raw-attachment:text1.txt text1.txt], [raw-attachment:text2.txt text2.txt], [raw-attachment:text3.txt text3.txt] 41 42 1. Create corpus: … … 45 46 1. Observe the annotated results, click on a document, then Annotation Sets and/or Annotation List. 46 47 47 So far, GATE did not much more than Stanford NER in lecture 04. Note, however, that all tokens are annotated and POS-tagged. 48 So far, GATE did not much more than Stanford NER in lecture 04. Note, however, that all tokens are annotated and POS-tagged. Also note the annotation type Lookup. 48 49 49 50 We add rules for extracting job titles and the respective person names. The rules are defined in the grammars [raw-attachment:jobtitle.jape] and [raw-attachment:jobtitleperson.jape] 50 51 51 52 1. Right click Processing !Resources/New/JAPE Transducer in the left menu 52 1. Click on {{{grammmarUrl}}} and choose grammar {{{jobtitle.jape}}} 53 1. Download the grammar(s). 54 1. Click on {{{grammmarUrl}}} and choose the grammar file {{{jobtitle.jape}}} 53 55 1. Click on !Applications/Annie in the left menu and add the JAPE Transducer to the ANNIE pipeline 54 1. Run ANNIE again: Click on !Applications/Annie in the left menu , select Corpus56 1. Run ANNIE again: Click on !Applications/Annie in the left menu 55 57 1. Observe the annotated results, click on a document, then Annotation Sets and/or Annotation List. If applicable, you can see new annotation JobTitle. 56 1. Observe rthe grammars {{{jobtitle.jape}}} and {{{jobtitleperson.jape}}}58 1. Observe the grammars {{{jobtitle.jape}}} and {{{jobtitleperson.jape}}} 57 59 58 60 Add new transducer with the grammar {{{jobtitleperson.jape}}} and observe the results. … … 60 62 Optionally, you can add further documents and observe how universal the {{{jobtitleperson.jape}}} grammar is. 61 63 62 Write your observations to {{{<YOUR_FILE>}}} .64 Write your observations to {{{<YOUR_FILE>}}}: Particularly, comment how well the Gazetteer and NE Transducer perform, describe how well the grammar works. Note that no coreference resolution is used (optionally, you can try one).