Measure and Visualize Semantic Similarity Between Subgraphs
March 19, 2012
As I blogged previously, I am working on measuring the performance of my keyword extraction algorithms. The confusion matrix approach I have implemented is quite ‘harsh’. It ignores any semantic information and simply treats the concepts as words, and counts hits and misses between two sets of concepts. To benefit from the semantic information described […]
Pathfinder finds paths!
November 20, 2011
First results are in! My computer spent about 20 hours to retrieve and store neighboring concepts of over 10.000 concepts which my Breadth-First-Search algorithm passed through to find the shortest paths between 6 nodes. But here is the result, first the ‘before’ graph, which I showed earlier: all retrieved concepts with their parent relations. Below that is […]
Add some #dataviz in the mix
October 25, 2011
All this time I have been working on my ontology-powered topic identification system: combining semantic web technologies with natural language processing and text-mining techniques to extract (a) topic(s) from a text. But I never really decided what my program should output: a topic? A list of potential topics with their ‘likeliness’? That would mean the […]
Computing string similarity with TF-IDF and Python
October 3, 2011
“The tf–idf weight (term frequency–inverse document frequency) is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus.”[wikipedia] It is also the weight I use to measure similarity between texts, for these two […]
Ontology-based semantic similarity measurements: an overview
September 9, 2011
My thesis is about keyword extraction of biological notes, using semantic ‘dictionaries’ called ontologies. These ontologies are large networks, where each node stands for a concept, and each connection between nodes for relations. See the picture on the right for a visual representation of an ontology. To identify the subject of a text, I need […]