= Opinion mining, sentiment analysis = [[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/AdvancedNlpCourse|Advanced NLP Course]], Course Guarantee: Aleš Horák Prepared by: Zuzana Nevěřilová == State of the Art == Sentiment analysis can be seen as a text categorization task (i.e. is the writer's opinion on a discussed topic X or Y?). It consists of detection of the topic (which can be easy in focused reviews) and detection of the sentiment (which is generally difficult). Opinions are sometimes expressed in a very subtle manner (e.g. the sentence ''How could anyone sit through this movie?'' contains no negative word) ![3]. The sentiments are usually simply classified by their polarity (positive, negative) but they can be recognized more in depth (e.g. strongly negative). Recognized opinions are also subject to summarization (e.g. how many people like this new iPhone design?). === References === 1. Bing Liu. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. 2012, 5(1): 1-167. DOI: 10.2200/s00416ed1v01y201204hlt016. Draft version available at [[http://www.cs.uic.edu/~liub/FBS/SentimentAnalysis-and-OpinionMining.pdf]] 1. Bing Liu. Sentiment Analysis Tutorial. AAAI-2011, August 8, 2011. Slides available at [[http://www.cs.uic.edu/~liub/FBS/Sentiment-Analysis-tutorial-AAAI-2011.pdf]] 1. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, Thumbs up? Sentiment Classification using Machine Learning Techniques, Proceedings of EMNLP 2002. [[http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf]] 1. Liviu P. Dinu and Iulia Iuga. The Naive Bayes classifier in opinion mining: In search of the best feature set. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, volume 7181 of Lecture Notes in Computer Science, pages 556–567. Springer Berlin Heidelberg, 2012. 1. Zhang, L. J., Wang, S., and Liu, B. (2018). Deep learning for sentiment analysis: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 8. Bing Liu's References: http://www.cs.uic.edu/~liub/FBS/AAAI-2011-tutorial-references.pdf == Practical Session == === Technical Requirements === === Sentiment Analysis === In this workshop, we try two methods for opinion mining. We use the Liu's Opinion Lexion. For Czech SA, we use the automatically translated version. Next, we try to compensate drawbacks of the lexicon by computing word vectors in a simple way. Requirements: python 3, jupyter notebook, modules NLTK, scipy, numpy, pandas, sklearn Files: [raw-attachment:cestina20.csv cestina20.csv], [raw-attachment:cestina20_annotation.csv cestina20_annotation.csv], [raw-attachment:urban_dictionary.csv urban_dictionary.csv] [raw-attachment:Word_Vectors_and_Sentiment.ipynb Word_Vectors_and_Sentiment.ipynb], [raw-attachment:negative-words-en.txt negative-words-en.txt], [raw-attachment:negative-words-cs.txt negative-words-cs.txt], [raw-attachment:positive-words-en.txt positive-words-en.txt], [raw-attachment:positive-words-cs.txt positive-words-cs.txt] 1. Create ``, a text file named ia161-UCO-01.txt where UCO is your university ID. 1. Enter the name of the dataset you were working on. 1. Do tasks marked in the python notebook as TASK X. You don't have to do optional tasks. === Upload `` === Do not forget to upload your resulting file to the [wiki:en/AdvancedNlpCourse homework vault (odevzdávárna)].