Below the first draft of the abstract of my paper. It doesn’t yet include the results/conclusion. Word count: 127
Semantic annotation uses human knowledge formalized in ontologies to enrich texts, by providing structured and machine-understandable information of its content. This paper proposes an approach for automatically annotating texts of the Cyttron Scientific Image Database, using the NCI Thesaurus ontology. Several frequency-based keyword extraction algorithms, aiming to extract core concepts and exclude less relevant concepts, were implemented and evaluated. Furthermore, text classification algorithms were applied to identify important concepts which do not occur in the text. The algorithms were evaluated by comparing them to annotations provided by experts. Semantic networks were generated from these annotations and an ontology-based similarity metric was used to cross-compare them. Finally the networks were visualized to provide further insights into the differences of the semantic structure generated by humans, and the algorithms.
Tags: Semantic annotation, ontology-based semantic similarity, semantic networks, keyword extraction, text classification, network visualization, text mining