Changes between Version 2 and Version 3 of en/TopicSimilarity


Ignore:
Timestamp:
Jun 6, 2014, 1:15:44 PM (10 years ago)
Author:
xkocinc
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • en/TopicSimilarity

    v2 v3  
    55[[Image(/trac/research/raw-attachment/wiki/en/TopicSimilarity/sim_articles.png)]]
    66
    7  * dif
    8 ferent machine learning methods as Random Projections, TFIDF word weighting, Latent Semantic Indexing/Analysis, Latent Dirichlet Allocation
     7 * different machine learning methods as Random Projections, TFIDF word weighting, Latent Semantic !Indexing/Analysis, Latent Dirichlet Allocation
    98
    109 * 50,000+ fulltexts on http://dml.cz
     
    1817Similarity types: from '''plagiarism''' (similarity on n-grams, narrative similarity, evolved into http://theses.cz) to '''thematic, topical similarity'''.
    1918
    20 == Prehistoric Example: Project Ottuv Slovnk naucny, 1998 ==
     19== Prehistoric Example: Project Ottuv Slovnk naucny, 1998 ==
    2120
    2221Levels of content processing: strings -> words and collocations -> semantics (word meaning) -> information (knowledge).
     
    5756architecture.
    5857
    59 Developed by NLPlab PG student Radim Rehurek (awarded in Ceska hlava competition in 2011).
     58Developed by NLPlab PG student Radim Rehurek (awarded in Ceska hlava competition in 2011).
    6059
    6160Leading edge machine learning methods implemented.
     
    6362Used in 40+ local, EU or worldwide projects.
    6463
    65 Typical deployment and
    66 ne-tuning scenario: expressing data as words (features) -> con
    67 guration of topic modeling of
     64Typical deployment and ne-tuning scenario: expressing data as words (features) -> conguration of topic modeling of
    6865features -> setting of gensim methods and tuning parameters -> usage in an application with proper vizualization interface.
    6966
     
    7370 * similarity: plagiarism
    7471 * topical modeling
    75  * thematic document
    76 ltering
     72 * thematic document ltering
    7773 * visualization
    7874 * semantic, meaning computations and modeling of natural language texts
    7975
    8076
    81 Credits: Jiri Franek (illustrations)
     77Credits: Jiri Franek (illustrations)
    8278
    8379