As I blogged previously, I am working on measuring the performance of my keyword extraction algorithms. The confusion matrix approach I have implemented is quite ‘harsh’. It ignores any semantic information and simply treats the concepts as words, and counts hits and misses between two sets of concepts.
To benefit from the semantic information described in the NCI Thesaurus, and thus produce more detailed results, I will measure the algorithm’s performance by measuring the semantic similarity between the lists of concepts. The two lists (expert data & algorithm) are treated as subgraphs within the main graph: the NCI Thesaurus. Their similarity is measured with a path-based semantic similarity metric, of which there are several. I have implemented Leacock & Chodorow’s measure, as in the literature I found it consistently outperforms similar path-based metrics in the Biomedical domain. Speaking of domain; this measure has originally been designed for WordNet (as many of the other metrics), but has also been used and validated in the Biomedical domain. Hooray for domain-independent, unsupervised and corpus-free approaches to similarity measurement ;-).
I evaluate an algorithm’s output by measuring the similarity of each node across the subgraphs of the expert and my algorithm (as shown in the figure right). By determining the average semantic similarity of the two graphs I can pick the best performing algorithm: the one which produces the semantically ‘closest’ subgraph.
For visualization purposes, I include the semantic similarity measurement between each node within a subgraph. This allows me to draw graphs where concepts are grouped by similarity: by mapping the similarity between two nodes to the edge’s (pull) force, a force-directed graph will group concepts by similarity; higher similarity means stronger pull. An example (created with the excellent d3.js library) below:
Blue nodes are from an expert.
Red nodes are from one of my algorithms.
The similarity between two nodes are mapped to:
- edge-length (short is most similar, long least)
- edge-color (white is most similar, black least)
The concept’s specifity is mapped to the size of the node (bigger is more specific).
Hovering over an edge displays the similarity between the two connected nodes.
Zoom in and out with mousewheel, pan by clicking and dragging.
This graph by itself already offers some quick insights, for example, the expert nodes are more specific in this graph, the expert nodes are semantically more similar (only Sagittal Plane and MRI are outside the main cluster). But only if these kind of patterns persist among different experts and texts, they could provide valuable insights in how to fine-tune the keyword extraction algorithms, and what specific information to take into account. I found that visualizing helps a lot in identifying these kind of patterns. On to analysis!