Changes between Version 1 and Version 2 of private/NlpInPracticeCourse/GenerativeLanguageModels


Ignore:
Timestamp:
Dec 2, 2023, 6:07:42 PM (7 months ago)
Author:
foltynek
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/GenerativeLanguageModels

    v1 v2  
    77== State of the Art ==
    88
     9Generating the text is - in principle - the same as predicting the following word. Given the seed (typically a start of the text), the model can predict/generate a following word that fits the context. Current state-of-the-art models are based on the Generative Pre-trained Transformer (GPT) architecture, which uses a multi-head attention mechanism to capture contextual features. The models contain several attention blocks to perform higher-order cognitive tasks.
     10
     11The language models generate text regardless of factual correctness, which means that they may produce wrong, misleading or biased output. Some bias is deeply rooted in the training data, which are heavily unbalanced concerning genre and domain, as well as writers' gender, age and cultural background. In some applications, the bias may cause harmful outputs.
     12
    913=== References ===
    1014
    11 Approx 3 current papers (preferably from best NLP conferences/journals, eg. [[https://www.aclweb.org/anthology/|ACL Anthology]]) that will be used as a source for the one-hour lecture:
    12 
    13  1. paper1
    14  1. paper2
    15  1. paper3
     15 1. Vaswani, A. et al. (2017): Attention Is All You Need. ArXiv preprint: https://arxiv.org/abs/1706.03762
     16 1. Radford, A. et al. (2018): Improving Language Understanding by Generative Pre-Training. OpenAI: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
     17 1. Navigli, R., Conia, S., & Ross, B. (2023). Biases in Large Language Models: Origins, Inventory, and Discussion. J. Data and Information Quality, 15(2). https://doi.org/10.1145/3597307
    1618
    1719== Practical Session ==
    1820
    19 Concrete description of work assignment for students for the second one-hour part of the lecture. The work will consist of tasks connected with practical implementations of algorithms connected with the current topic (probably not the state-of-the-art algorithms mentioned in the first part) and with real data. Students can test the algorithms, evaluate them and possibly try some short adaptations for various subtasks.
     21We will be working with the [[https://colab.research.google.com/drive/19wZxHV6GLsRNvTdfVWbSK_vaoyEECHLj#scrollTo=PVXofXV4Ft7z|Google Colab Notebook]]. First, we load the GPT2-Large model and experiment with generating the text. To receive a more objective view of the probabilities of the following tokens, we adjust the generating function to give us the top k most words and their probabilities. Then, we learn how to calculate the perplexity of a given text.
    2022
    21 Students can also be required to generate some results of their work and hand them in to prove completing the tasks.
     23**Task 1: Exploring perplexity**
     24
     25Generate various text samples using different temperatures. Observe the relationship between temperature (parameter of the generator) and perplexity of the resulting text.
     26
     27**Task 1: Exploring bias**
     28
     29We will experiment with several prompts/seeds that are likely to produce biased output.
     30
     31Your task will be to design more seeds and generate text or get predictions of subsequent words. Then, annotate the predictions (positive/negative/neutral), and answer the following questions:
     32
     33*  To which groups the GPT2 model outputs exhibit positive bias?
     34*  To which groups the GPT2 model outputs exhibit negative bias?
     35*  Was there anything you expected to be biased, but the experiments showed fairness in the model outputs?
     36*  On the contrary, was there anything you expected to be fair, but the model showed bias?
     37