wiki:private/NlpInPracticeCourse/Stylometry

Version 7 (modified by Jan Rygl, 8 years ago) (diff)

--

Stylometry

IA161 Advanced NLP Course?, Course Guarantee: Aleš Horák

Prepared by: Honza Rygl

State of the Art

The analysis of author's characteristic writing style and vocabulary has been used to uncover author's traits such as authorship, age, or gender documents by both manual linguistic approaches and automatic algorithmic methods.

The most common approach to stylometry problems is to combine stylistic analysis with machine learning techniques: 1) specific style markers are extracted, 2) a classification procedure is applied to extracted markers

References

  1. Stamatatos, E. (2009), A Survey of Modern Authorship Attribution Methods (2009), Journal of the American Society for Information Science and Technology, 60(3), 538-556. pdf
  2. Kestemont, M. (2014), Function Words in Authorship Attribution From Black Magic to Theory? Proceedings of the 3rd Workshop on Computational Linguistics for Literature, EACL 2014, 59–66 pdf
  3. Walter, D. Explanation in Computational Stylometry

Practical Session

Student will have to Concrete description of work assignment for students for the second one-hour part of the lecture. The work will consist of tasks connected with practical implementations of algorithms connected with the current topic (probably not the state-of-the-art algorithms mentioned in the first part) and with real data. Students can test the algorithms, evaluate them and possibly try some short adaptations for various subtasks.

Students can also be required to generate some results of their work and hand them in to prove completing the tasks.