Joby Gorillapod SLR Zoom + Ballhead

dvdgrs [graus.nu] posted a photo:

Joby Gorillapod SLR Zoom + Ballhead

spaceship EYE

dvdgrs [graus.nu] posted a photo:

spaceship EYE

Inside EYE

dvdgrs [graus.nu] posted a photo:

Inside EYE

Outside EYE

dvdgrs [graus.nu] posted a photo:

Outside EYE

New project: dataminen.nl

I have started a Dutch blog on datamining, as I haven’t really come across one, and figured the time is right, with the increase in interest in datajournalism, dataviz, big data, etc. The idea is to provide a general and human-understandable overview of the (academic) field of datamining and the innovations.

I will still use this blog to keep the world informed of my personal endeavours ;-).

It’s a wrap!

Photo by Thijs Niks

This category can soon be archived ;)! Earlier this week I handed in my final paper, and yesterday was the day of my final presentation. It was a great day and I’m really excited about embarking on my next adventure. I will soon start as a PhD candidate at the University of Amsterdam, on a very exciting project in ‘Semantic Search in e-Discovery’ at the Information and Language Processing Systems group. Naturally, this blog will keep the world informed of my work and projects ;). Exciting times!

Paper

Download my paper: Automatic Annotation of Cyttron Entries using the NCIthesaurus [PDF – 328 KB] Download the supplementary data (graphs, tables and viz): Supplementary Data [PDF – 2.27 MB]

Demo

Similarity graph demo

Check out the D3.js-powered demo of a similarity graph (comparing expert & computer-generated annotations)

Continue reading “It’s a wrap!”

text graphs

Subgraph Similarity Example

“(a) A sagittal reconstruction of a coronally acquired magnetic resonance imaging (MRI) scan, at the level on which the cingulate gyrus was measured. The area outlined represents the portion of the scan used to orient the operator to the landmarks of the cingulate. A box has been placed over the region of interest in one hemisphere. (b) A diagram of the cingulate gyrus divided into the rostral portion of the anterior cingulate (RAC), the caudal portion of the interior cingulate (CAC), and the posterior cingulate (PC). Adjoining landmarks include the corpus callosum (CC), the lateral ventricle (Lat. Vent.), and the thalamus (Thal.). (c) The region of the cingulate gyrus measured in the present study, as delineated on the MRI scan of a control subject. […] ” (snippet)

» Check out the d3js demo here «

Two sets of annotations (Expert 1 & Expert 3)

 +

Result in the following similarity graph:

Moon

dvdgrs [graus.nu] posted a photo:

Moon through Nikkor 70-300mm@300mm… Bit noisy

Moon

Moon through Nikkor 70-300mm@300mm… Bit noisy

Moon & Venus

dvdgrs [graus.nu] posted a photo:

Moon & Venus

Where’d he go?!

dvdgrs [graus.nu] posted a photo:

in ‘t Twiske

Where'd he go?!

in ‘t Twiske

Kickstarting

dvdgrs [graus.nu] posted a photo:

in ‘t Twiske

Kickstarting

in ‘t Twiske

Sippin’ slootwater

dvdgrs [graus.nu] posted a photo:

in ‘t Twiske

Sippin' slootwater

in ‘t Twiske

Measure and Visualize Semantic Similarity Between Subgraphs

As I blogged previously, I am working on measuring the performance of my keyword extraction algorithms. The confusion matrix approach I have implemented is quite ‘harsh’. It ignores any semantic information and simply treats the concepts as words, and counts hits and misses between two sets of concepts.

To benefit from the semantic  information described in the NCI Thesaurus, and thus produce more detailed results, I will measure the algorithm’s performance by measuring the semantic similarity between the lists of concepts. The two lists (expert data & algorithm) are treated as subgraphs within the main graph: the NCI Thesaurus. Their similarity is measured with a path-based semantic similarity metric, of which there are several. I have implemented Leacock & Chodorow’s measure, as in the literature I found it consistently outperforms similar path-based metrics in the Biomedical domain. Speaking of domain; this measure has originally been designed for WordNet (as many of the other metrics), but has also been used and validated in the Biomedical domain. Hooray for domain-independent, unsupervised and corpus-free approaches to similarity measurement ;-). Continue reading “Measure and Visualize Semantic Similarity Between Subgraphs”