Changes between Initial Version and Version 1 of SummarizationEvaluationManual


Ignore:
Timestamp:
Aug 29, 2022, 10:52:52 AM (20 months ago)
Author:
Ales Horak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SummarizationEvaluationManual

    v1 v1  
     1= Evaluation of the output of GPT-2 abstract summarization =
     2
     3[[Image(VyhodnoceniSumarizaceManual:sum_anot2.png,width=50%,right)]]
     4
     5== Annotation Manual ==
     6
     7The goal is to find and classify errors in machine-generated summarizations of Czech newspaper articles. Thus, we are not concerned with evaluating the quality of the summarization in the sense of conciseness, we are only concerned with the mechanism and nature of the potential error.
     8
     9=== Technical assumptions ===
     10
     11The annotation is performed using the Qualtrics questionnaire platform (possible on both desktop and mobile devices).
     12
     13The assignment consists of Input Text, Gold, and Generated sections. We evaluate only the Generated summarization in relation to the Input Text. The Gold summarization can provide some context for a better understanding, but we must note that it did not participate in the generation of the Generated, so it must not interfere with the evaluation.
     14
     15The answer table contains four columns and 1 or 3 rows (depending on whether it is generating an abstract or a title). The rows "Sentence1", "Sentence2", ... refer to the corresponding sentences in the Generated section marked with "•".
     16
     17! Each column (Special cases, Mapping, Meaning) may have at most one checkbox checked (e.g. OK or Repetitive or Sentence missing) !
     18! For each sentence, fill in either the first column (Special cases) OR the remaining ones (Mapping, Meaning) !
     19
     20(unfortunately this behaviour cannot be forced, so please be careful, otherwise the answer will not be valid)
     21
     22After processing all the texts, send the result to the system using the [[Image(VyhodnoceniSumarizaceManual:button.png)]] button
     23
     24=== Explanation of annotation values ===
     25`Special cases`:
     26   - if specifying an error does not make sense
     27   - ! in case we fill in, we have to leave the other columns empty for the sentence
     28     1. `OK`: we found no grammatical or factual error in the sentence given the Input Text and the rest of the Generated summarization.
     29     2. `Repetitive`: the sentence has already occurred in the Generated summarization or one of the previous sentences of the summarization had a completely SAME meaning. Apart from the repetition, the sentence contains no errors of fact or grammar.
     30     3. `Sentence missing`: the Generated summarization has the wrong number of sentences (e.g. the abstract has only two sentences (•) => the line for Sentence3 is marked with the special case `Sentence missing`)
     31
     32
     33`Mapping`:
     34   - helps to detect the CAUSE of the error
     35   - surface level
     36   - how the summarizer uses words and sentences to create errors in the abstract
     37
     38     1. `Omission`: copying a sentence/phrase but omitting a word/phrase
     39         - e.g.:
     40              - Input: (...) ''Trenér Nigel Pearson se obává dalších zranění, zatímco jeho mužstvo pokračuje v boji o přežití **v Premiere League**.'' (...)
     41              - Generated:   ''Trenér Nigel Pearson se obává dalších zranění, zatímco jeho mužstvo pokračuje v boji o přežití.''
     42     2. `Wrong combination`: copying parts of several different sentences and combining them incorrectly.
     43         - e.g.:
     44              - Input: (...) ''Hráči musí házet jídlo na dívku, která se objeví v jedné z devíti děr, a následně zmizí. Pokud hráč dívku mine, začne dívka ztrácet na váze, až nakonec zemře.'' (...)
     45              - Generated: ''Hráči musí házet jídlo na dívku, která se objeví v jedné z devíti děr, a následně **zemře**.''
     46     3. `Fabrication`: adding one or more new words (they do not appear in the Input text, so it is not a Wrong combination) that causes an error
     47         - e.g.:
     48              - Input: (...) ''Mauresmo, která by měla v srpnu porodit, bude zhruba v osmém měsíci během Wimbledonu toto léto.'' (...)
     49              - Generated: ''Mauresmo bude v osmém měsíci těhotenství **se svým prvním dítětem**.''
     50     4. `Lack of rewriting`: incorrect rewriting of sentences (e.g. insufficient context, incorrect substitution
     51        of a referring phrase with a non-original object)
     52         - E.g.:
     53              - Input: (...) ''**Ukázalo se, že korporace může být skutečně stíhána jako osoba.** Je to praxe, kterou Nejvyšší soud prosazuje již více než století.''   (...)
     54              - Generated: ''Je to praxe, kterou Nejvyšší soud prosazuje již více než století.''
     55
     56`Meaning`:
     57   - EFFECT of error
     58   - ! `Malformed` takes precedence over `Misleading` (it is less common)
     59   - categories and types:
     60    1. `Malformed`: the reader is puzzled by the quality, but the sentence is neither misleading nor false
     61        a. `Ungrammatical`: syntactically damaged/unnatural sentence, the speaker would not have said it that way
     62        b. `Semantically implausible`: a semantically (meaningfully) nonsensical/unnatural sentence
     63        c. `No meaning can be inferred`:
     64            - a grammatically correct sentence to which no meaning can be assigned
     65            - Usually associated with `Lack of rewriting` - context is missing and the sentence loses
     66              meaning
     67            - e.g.:
     68                 - Input: (...) ''**Ukázalo se, že korporace může být skutečně stíhána jako osoba.** Je to praxe, kterou Nejvyšší soud prosazuje již více než století.''   (...)
     69                 - Generated: ''Je to praxe, kterou Nejvyšší soud prosazuje již více než století.''
     70    2. `Misleading`: they may induce incorrect beliefs, not inferred from the article
     71        a. `Meaning changed, not entailed`: the meaning of the sentence cannot be inferred from the article (in the context of summarization)
     72         - e.g.:
     73              - Input: (...) ''Mauresmo, která by měla v srpnu porodit, bude zhruba v osmém měsíci během Wimbledonu toto léto.'' (...)
     74              - Generated: ''Mauresmo bude v osmém měsíci těhotenství **se svým prvním dítětem**.''
     75        b. `Meaning changed, contradiction`: the meaning of the sentence is reversed
     76           or OTHER meaning than we infer from the article (in the context of summarization)
     77         - e.g.:
     78              - Input: (...) ''Hráči musí házet jídlo na dívku, která se objeví v jedné z devíti děr, a následně zmizí. Pokud hráč dívku mine, začne dívka ztrácet na váze, až nakonec zemře.'' (...)
     79              - Generated: ''Hráči musí házet jídlo na dívku, která se objeví v jedné z devíti děr, a následně **zemře**.''
     80        c. `Pragmatic meaning changed`: the sentence takes on the PRAGMATIC meaning that the article
     81           is not present, or the PRAGMATIC meaning disappears (in the context of summarization) = e.g., was
     82           a figurative sentence was used and its meaning changed or disappeared in the summarization (it sounds like it is meant literally)
     83         - e.g.:
     84              - Input: (...) ''Trenér Nigel Pearson se obává dalších zranění, zatímco jeho mužstvo pokračuje v boji o přežití **v Premiere League**.'' (...)
     85              - Generated:   ''Trenér Nigel Pearson se obává dalších zranění, zatímco jeho mužstvo pokračuje v boji o přežití.''
     86
     87`Mistake explanation`
     88   - A field for a more detailed textual specification of the error to compare the approach of each annotator to the evaluation
     89   - is not machine-checked, but it will help us substantially in assessing the consistency of the responses
     90   - e.g. the sentence about Coach Nigel (above)
     91        - Mistake explanation: omitting the words "v Premiere League (in the Premiere League)" changes the meaning of the phrase "boj o přežití (struggle to survive)".
     92
     93
     94More practical examples can be found in the [https://aclanthology.org/2020.eval4nlp-1.1.pdf original article].
     95
     96
     97== Possible problems ==
     98We have encountered the following possible problems while filling in the form:
     99   - we need to check that the displayed questionnaire has check boxes in the shape of a FOUR and not a CIRCLE (i.e. multiple answer and not single answer)
     100      => SOLUTION: use a browser other than Chrome (if it displays wrong) - Mozzila should work, the mobile version of Chrome worked for me too. I don't really have a way to test the bug further.
     101   - Although completing **Mistake explanation** is not mandatory (in case the sentence, does not contain an error), the system requires it for some questions and refuses to un-question the user (noted for INPUT 575)
     102      => SOLUTION: if the situation arises, fill in the text fields with any text (e.g. OK in case of error-free sentences), we will deal with it during the evaluation.
     103