In this project we propose to improve the state of the art of ontology matching by exploring two kinds of external resources: unstructured text and annotation corpora. In element-level techniques, unstructured text will be used to tackle the lack of background knowledge, namely of synonyms and context. We will develop novel methods to calculate the similarity between ontology concepts based on documents related to them. This approach includes annotating text with ontology concepts, expanding relevant terms found therein through Wikipedia and then using them to calculate similarity between concepts. To support these tasks we will explore the concept of evidence content of a word, which was proposed by our team and successfully applied to identify biomedical terms in text. This approach has the advantage of being applicable to virtually any ontology, since it relies only on textual properties and is domain independent, given that text is widely available for all types of domains. We define annotation corpora as resources that provide mappings (i.e. annotations) between ontology concepts and other entities. These will be explored by structural-level techniques, following two parallel lines: global similarity computation and inter-ontology semantic similarity. Both approaches will address the semantics of edges in the ontology graph, an issue neglected by current techniques. For this we will explore the concept of information content (IC) based on annotation corpora, which has been thoroughly explored by our team for developing biomedical semantic similarity measures. Our strategies for element-level and structural-level matching can be used sequentially, and constitute the main components of the ontology matching system to be developed (see architecture figure attached). To provide a better integration of our algorithms with other existing ones we will build our system on top of a widely used ontology-matching framework, AgreementMaker. We intend to outperform state-of-the-art systems in various domains by exploiting background knowledge in a more effective manner through evidence and information content. To the best of our knowledge these ideas have never been applied to ontology matching and their successful application in related topics supports our claims. To demonstrate the effectiveness of our work we will assess it in joint evaluations, like the OAEI (Ontology Alignment Evaluation Initiative), and also apply it to real world applications based on existing ontologies: Gene Ontology and Geo-Net-PT. We expect this project to directly contribute to biomedical and geospatial knowledge engineering, as well as to other areas where there is a need to coordinate heterogeneous resources, such as catalog integration, p2p communication and web service composition.
LASIGE is supported by FCT, project UID/CEC/00408/2013