#OccupyAmsterdam wordle
October 16, 2011
Wordle van de 200 meest voorkomende woorden in tweets met hashtag #OccupyAmsterdam. Gemaakt van 5.239 tweets van tussen zaterdag 8 oktober 09:55 uur en 16 oktober 15:50 uur. Handmatig gefilterd op nicknames en nietszeggende woorden. Hier is de lijst van de 1000 meest voorkomende woorden: OccupyAmsterdam-woorden.
Computing string similarity with TF-IDF and Python
October 3, 2011
“The tf–idf weight (term frequency–inverse document frequency) is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus.”[wikipedia] It is also the weight I use to measure similarity between texts, for these two […]
Simple keyword extraction in Python: choices, choices.
September 12, 2011
As explained in an earlier post, I am working on a simple method of extracting ‘important words’ from a text-entry. The methods I am using at the moment are frequency distributions and word collocations. I’ve bumped into some issues regarding finetuning my methods. Read on for a short explanation of my approaches, and some issues […]
Graduation pt. 4: What’s next
August 19, 2011
Just a quick update to let myself know what’s going to happen next: It’s time to produce some results! While I was getting quite stuck in figuring out the best – or rather, most practical – way to extract keywords from a text (and not just any text, mind you, but notes of biologists), my supervisor […]
Graduation project
July 8, 2011
Currently I am working on my final project of the Media Technology MSc. Programme of Leiden University. With the goal of structuring my thoughts and process so far, and because I’ve promised on Twitter, I decided to write a small and simple summary of what my project is about, how I got here and what I’m expecting to do in the next 2-3months.