Tag: gensim

  • Computing string similarity with TF-IDF and Python

    “The tf–idf weight (term frequency–inverse document frequency) is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus.”[wikipedia] It is also the weight I use to measure similarity between texts, for these two…

  • Simple keyword extraction in Python: choices, choices.

    As explained in an earlier post, I am working on a simple method of extracting ‘important words’ from a text-entry. The methods I am using at the moment are frequency distributions and word collocations. I’ve bumped into some issues regarding finetuning my methods. Read on for a short explanation of my approaches, and some issues…