Abstract
March 10, 2012
Below the first draft of the abstract of my paper. It doesn’t yet include the results/conclusion. Word count: 127 Semantic annotation uses human knowledge formalized in ontologies to enrich texts, by providing structured and machine-understandable information of its content. This paper proposes an approach for automatically annotating texts of the Cyttron Scientific Image Database, using the […]
Geomapping the Bible and Herman Melville’s Moby Dick
October 23, 2011
For a small dataviz experiment I wanted to create maps of books, by extracting locations (cities, countries, continents, whatever is mentioned in the text) and drawing these on a map. I used the Stanford Named Entity Recognizer to extract the locations from two books: the Bible and Herman Melville’s Moby Dick. I then wrote a […]
#OccupyAmsterdam wordle
October 16, 2011
Wordle van de 200 meest voorkomende woorden in tweets met hashtag #OccupyAmsterdam. Gemaakt van 5.239 tweets van tussen zaterdag 8 oktober 09:55 uur en 16 oktober 15:50 uur. Handmatig gefilterd op nicknames en nietszeggende woorden. Hier is de lijst van de 1000 meest voorkomende woorden: OccupyAmsterdam-woorden.
More text-mining. Popularity contest: Drosophila Melanogaster vs. C. elegans
October 9, 2011
vs While waiting on several word-counting scripts to finish counting, I picked up my cancerCounter script to count something else. This time, I wanted to see what organism was more popular and more frequently mentioned in biomedical studies: the ever-present Drosophila Melanogaster, aka common fruit fly, or the aptly named Caenorhabditis elegans (one cannot deny […]
Textmining BioMedCentral: Cancer – a trending topic?
September 28, 2011
*Update* I added a graph which shows the ratio of articles containing the word ‘Cancer’ to total articles per year. It sadly still suffers from the incomplete data of earlier years: *Original post* This is my first attempt to get some data to get some data out of the BioMedCentral dataset, the freely available, Open […]