Changes between Version 2 and Version 3 of en/TopicSimilarity
- Timestamp:
- Jun 6, 2014, 1:15:44 PM (10 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
en/TopicSimilarity
v2 v3 5 5 [[Image(/trac/research/raw-attachment/wiki/en/TopicSimilarity/sim_articles.png)]] 6 6 7 * dif 8 ferent machine learning methods as Random Projections, TFIDF word weighting, Latent Semantic Indexing/Analysis, Latent Dirichlet Allocation 7 * different machine learning methods as Random Projections, TFIDF word weighting, Latent Semantic !Indexing/Analysis, Latent Dirichlet Allocation 9 8 10 9 * 50,000+ fulltexts on http://dml.cz … … 18 17 Similarity types: from '''plagiarism''' (similarity on n-grams, narrative similarity, evolved into http://theses.cz) to '''thematic, topical similarity'''. 19 18 20 == Prehistoric Example: Project Ott uv Slovnk naucny, 1998 ==19 == Prehistoric Example: Project Ottuv Slovnk naucny, 1998 == 21 20 22 21 Levels of content processing: strings -> words and collocations -> semantics (word meaning) -> information (knowledge). … … 57 56 architecture. 58 57 59 Developed by NLPlab PG student Radim Reh urek (awarded in Ceska hlava competition in 2011).58 Developed by NLPlab PG student Radim Rehurek (awarded in Ceska hlava competition in 2011). 60 59 61 60 Leading edge machine learning methods implemented. … … 63 62 Used in 40+ local, EU or worldwide projects. 64 63 65 Typical deployment and 66 ne-tuning scenario: expressing data as words (features) -> con 67 guration of topic modeling of 64 Typical deployment and ne-tuning scenario: expressing data as words (features) -> conguration of topic modeling of 68 65 features -> setting of gensim methods and tuning parameters -> usage in an application with proper vizualization interface. 69 66 … … 73 70 * similarity: plagiarism 74 71 * topical modeling 75 * thematic document 76 ltering 72 * thematic document ltering 77 73 * visualization 78 74 * semantic, meaning computations and modeling of natural language texts 79 75 80 76 81 Credits: Ji riFranek (illustrations)77 Credits: Jiri Franek (illustrations) 82 78 83 79