Changes between Initial Version and Version 1 of cs/LaboratorniSeminar/2019SyntacticDifferencesKroon

Nov 4, 2019, 4:34:33 PM (5 years ago)
Ales Horak



  • cs/LaboratorniSeminar/2019SyntacticDifferencesKroon

    v1 v1  
     1= Towards the automatic detection of syntactic differences
     3**Author: [ Martin Kroon]**, PhD candidate, Leiden University, The Netherlands[[br]]
     5**Tuesday 12:00, November 12, 2019**[[br]]
     6**NLP lab, room B203**[[br]]
     9=== Abstract:
     11The field of comparative syntax aims at developing a theoretical model
     12of the syntactic properties all languages have in common and of the
     13range and limits of syntactic variation. Massive automatic comparison of
     14languages in parallel corpora will greatly speed up and enhance the
     15development of such a model. In this talk I will discuss previously
     16obtained results, as well as briefly touch on future research ideas.
     18First I will discuss a preprocessing tool that selects parallel sentence
     19pairs that are suitable for comparative syntactic research, filtering
     20out sentence pairs that are syntactically too different. Results were
     21obtained through experiments on Dutch, German and English, and suggest a
     22graph edit distance on parse trees yields the best results.
     24I will furthermore discuss recent results in extracting syntactic
     25differences from parallel corpora. We build on Wiersma et al.'s (2011)
     26method, and apply the Minimal Description Length Principle in the task.
     27After mining for characteristic part-of-speech patterns by compressing
     28the data, we extract differences in distribution of found patterns
     29between languages. Results were obtained through experiments on Dutch,
     30English and Czech, and show useful and meaningful differences, which can
     31guide linguists in their comparative syntactic research.