30 | | 1. Create {{{<YOUR_FILE>}}}, a text file named {{{ia161-UCO-08.txt}}} where '''UCO''' is your university ID. |
31 | | 1. Download and install GATE (Java 8 is necessary) from https://gate.ac.uk/download/. Either run the MS installer or the Java installer, install and run as app or in the command line: |
32 | | {{{ |
33 | | java -jar gate-<VERSION>-installer.jar |
34 | | }}} |
35 | | 1. Run GATE |
36 | | {{{ |
37 | | GATE_Developer_<VERSION>/bin/gate.sh |
38 | | }}} |
39 | | 1. Load ANNIE (with defaults), read about its components [[br]] |
40 | | [[Image(annie.png)]] |
41 | | 1. Create document(s): |
42 | | * right click on `Language Resources/New/GATE Document` in the left menu |
43 | | * change {{{markupAware}}} to {{{false}}} |
44 | | * change {{{sourceUrl}}} to {{{stringContent}}} and paste some news text |
45 | | * repeat these steps |
46 | | * you can find three sample texts here: [raw-attachment:text1.txt text1.txt], [raw-attachment:text2.txt text2.txt], [raw-attachment:text3.txt text3.txt] |
47 | | 1. Create corpus: |
48 | | * right click on `Language Resources/New/GATE Corpus` in the left menu |
49 | | * drag and drop the document in order to put them into the corpus |
50 | | 1. Run ANNIE: Click on `Applications/Annie` in the left menu, select `Corpus` |
51 | | 1. Observe the annotated results, click on a document, then `Annotation Sets` and/or `Annotation List`. |
| 31 | The task will proceed using Python notebook run in web browser in the [https://colab.research.google.com/ Google Colaboratory] environment |
| 32 | with the MU G-Suite disk access. |
55 | | We add rules for extracting ''job titles'' and the respective ''person names''. The rules are defined in the grammars [raw-attachment:jobtitle.jape] and [raw-attachment:jobtitleperson.jape] |
| 37 | 1. Create {{{<YOUR_FILE>}}}, a text file named {{{ia161-UCO-08.txt}}} where '''UCO''' is your university ID. |
| 38 | 1. Access the [https://colab.research.google.com/drive/1lHphWGR-i6P7HqTJ_39Eo8FnPe8OJuSD Python notebook in the Google Colab environment] and make your own copy. Do not forget to save your work if you want to see your changes later, leaving the browser will throw away all changes! |
| 39 | 1. The colab reads file {{{input.txt}}} (each line is word|definition) and outputs hypernym for each word. |
| 40 | 1. Default approach is naive: ''first noun in definition is hypernym'' |
| 41 | 1. Using the gold standard, evaluate the naive approach. |
| 42 | 1. Improve the {{{find_hyper()}}} function to provide better results. Evaluate the new version. |
| 43 | 1. Copy the updated function {{{find_hyper()}}} and the output into {{{<YOUR_FILE>}}}. Please don't submit the whole notebook. |
57 | | 1. Right click `Processing Resources/New/JAPE Transducer` in the left menu |
58 | | 1. Download the grammar(s). |
59 | | 1. Click on {{{grammmarUrl}}} and choose the grammar file {{{jobtitle.jape}}} |
60 | | 1. Click on `Applications/Annie` in the left menu and add the JAPE Transducer to the ANNIE pipeline (Selected Processing Resources) |
61 | | 1. Run ANNIE again: Click on `Applications/Annie` in the left menu |
62 | | 1. Observe the annotated results, click on a document, then `Annotation Sets` and/or `Annotation List`. If applicable, you can see new annotation `JobTitle`. |
63 | | 1. Observe the grammars {{{jobtitle.jape}}} and {{{jobtitleperson.jape}}} |
64 | | 1. Add new transducer with the grammar {{{jobtitleperson.jape}}} and observe the results. |
65 | | 1. Optionally, you can add further documents and observe how universal the {{{jobtitleperson.jape}}} grammar is. |
66 | | 1. According to the above grammars, write your own that extracts new relations (e.g. job title in company or person works in company). |
| 45 | Gold standard to evaluate your result: [[raw-attachment:gold_en.txt|gold_en.txt]] |