It’s a wrap!

Photo by Thijs Niks This category can soon be archived ;)! Earlier this week I handed in my final paper, and yesterday was the day of my final presentation. It was a great day and I’m really excited about embarking on my next adventure. I will soon start as a PhD candidate at the University of Amsterdam, on a very exciting project in ‘Semantic Search in e-Discovery’ at the Information and Language Processing Systems group. Naturally, this blog will keep the world informed of my work and projects ;). Exciting times!


Download my paper: Automatic Annotation of Cyttron Entries using the NCIthesaurus [PDF – 328 KB] Download the supplementary data (graphs, tables and viz): Supplementary Data [PDF – 2.27 MB]


Similarity graph demo Check out the D3.js-powered demo of a similarity graph (comparing expert & computer-generated annotations) Continue reading “It’s a wrap!”


Below the first draft of the abstract of my paper. It doesn’t yet include the results/conclusion. Word count: 127

Semantic annotation uses human knowledge formalized in ontologies to enrich texts, by providing structured and machine-understandable information of its content. This paper proposes an approach for automatically annotating texts of the Cyttron Scientific Image Database, using the NCI Thesaurus ontology. Several frequency-based keyword extraction algorithms, aiming to extract core concepts and exclude less relevant concepts, were implemented and evaluated. Furthermore, text classification algorithms were applied to identify important concepts which do not occur in the text. The algorithms were evaluated by comparing them to annotations provided by experts. Semantic networks were generated from these annotations and an ontology-based similarity metric was used to cross-compare them. Finally the networks were visualized to provide further insights into the differences of the semantic structure generated by humans, and the algorithms.

Tags: Semantic annotation, ontology-based semantic similarity, semantic networks, keyword extraction, text classification, network visualization, text mining