Improving Ontology Matching Towards Achieving Semantic Interoperability

ABSTRACT

This work proposes an algorithm for concept matching, applied in the ontology mapping domain. The basic idea is to seek the effective semantics embedded in the concept name by analyzing the contexts in which it appears. Through simple interaction with the known lexicon WordNet, the right meaning associated with a concept is unequivocally elicited by exploring their local semantic contexts. This approach reveals interesting results for the word sense disambiguation, when polysemy problems require a semantic interpretation. The algorithm, though takes a longer time but yet produce a better matching because the concepts in the ontology trees are populated with much semantic information at the end of the first and second step of the matching process.

TABLE OF CONTENTS

DECLARATION ……………………………………………………………………………………………………………………………. iii
CERTIFICATION…………………………………………………………………………………………………………………………… iv
DEDICATION……………………………………………………………………………………………………………………………….. v
ACKNOWLEDGEMENT…………………………………………………………………………………………………………………. vi
ABSTRACT…………………………………………………………………………………………………………………………………. vii
Table of Contents……………………………………………………………………………………………………………………… viii
LIST OF TABLES…………………………………………………………………………………………………………………………… xi
LIST OF FIGURES………………………………………………………………………………………………………………………… xii
LIST OF APPENDICES …………………………………………………………………………………………………………………..xiii
ABBREVIATIONS, DEFINITIONS, GLOSSARIES AND SYMBOLS ……………………………………………………………xiv
CHAPTER ONE ……………………………………………………………………………………………………………………………..1
GENERAL INTRODUCTION……………………………………………………………………………………………………………..1
1.1 Background of the study…………………………………………………………………………………………………….1
1.2 Research motivations and goals ……………………………………………………………………………………………….2
Motivating Example ………………………………………………………………………………………………………………….3
1.3 Research questions …………………………………………………………………………………………………………………9
1.4 Research objectives ………………………………………………………………………………………………………………10
1.5 Methodology………………………………………………………………………………………………………………………..11
1.6 Contribution to Knowledge…………………………………………………………………………………………………….11
CHAPTER TWO …………………………………………………………………………………………………………………………..12
LITERATURE REVIEW…………………………………………………………………………………………………………………..12
2.1 Designing and classifying ontologies………………………………………………………………………………………..12
2.1.1 Why develop ontologies ………………………………………………………………………………………………….13
2.1.2 Steps in developing ontologies …………………………………………………………………………………………13
2.1.3 Problems and solutions to ontology development ……………………………………………………………….14
2.1.4 Ontologies in the semantic web language…………………………………………………………………………..14
Example …………………………………………………………………………………………………………………………….15
2.1.5 Owlvisualizer (Owlviz)……………………………………………………………………………………………………15
2.2 Ontology interoperability……………………………………………………………………………………………………….15
2.3 Ontology matching towards semantic interoperability………………………………………………………………16
2.3.1 What is ontology matching? …………………………………………………………………………………………….18
2.3.2 Why is ontology matching needed/interesting? …………………………………………………………….19
Language or meta-model level………………………………………………………………………………………………19
Ontology or model level ………………………………………………………………………………………………………19
2.3.3 What is semantic interoperability, why is it needed?……………………………………………………..20
2.4 Review of ontology matching techniques…………………………………………………………………………………20
2.4.1 Glue………………………………………………………………………………………………………………………………20
2.4.2 Coma++ ………………………………………………………………………………………………………………………..21
2.4.3 Automatch……………………………………………………………………………………………………………………..21
2.4.4. Duma……………………………………………………………………………………………………………………………22
2.4.5 Scalable knowledge composition………………………………………………………………………………..22
2.5 Comparisons of ontology matching techniques …………………………………………………………………..22
2.6 Limitations of ontology matching techniques …………………………………………………………………………..23
CHAPTER THREE…………………………………………………………………………………………………………………………25
SYSTEM DEVELOPMENT………………………………………………………………………………………………………………25
3.1 System Requirement……………………………………………………………………………………………………………..25
3.1.1 Protégé Ontology Software Tool………………………………………………………………………………………26
3.1.2 S-match …………………………………………………………………………………………………………………………31
3.2 Introduction to graphs …………………………………………………………………………………………………………..32
3.2.1 Textual representation of graphs ……………………………………………………………………………………..35
3.3 Pattern in graphs…………………………………………………………………………………………………………………..36
3.3.1 Pattern specification and detection…………………………………………………………………………………..36
3.3.2 Pattern visualization and navigation………………………………………………………………………………….37
CHAPTER FOUR ………………………………………………………………………………………………………………………….38
SYSTEM IMPLEMENTATION …………………………………………………………………………………………………………38
4.1 WordNet Overview ……………………………………………………………………………………………………………….38
4.2 The semantic matching algorithm …………………………………………………………………………………………..39
4.3 The tree matching algorithm: Step I Computing concepts at labels……………………………………………..42
4.4 Dealing with ambiguity using concept sense discrimination algorithm………………………………………..44
4.5 The computation of the CL matrix……………………………………………………………………………………………46
4.6 The computation of the CN matrix …………………………………………………………………………………………..48
4.7 Discussion…………………………………………………………………………………………………………………………….50
4.8 An Architecture of our enhanced S-match implementation ……………………………………………………….52
4.9 Conclusion……………………………………………………………………………………………………………………………54
CHAPTER FIVE……………………………………………………………………………………………………………………………55
CONCLUSION AND RECOMMENDATION ……………………………………………………………………………………….55
5.1 Conclusion and future work……………………………………………………………………………………………………55
5.2 Recommendation………………………………………………………………………………………………………………….55
REFERENCE………………………………………………………………………………………………………………………………..56
APPENDIX ONE…………………………………………………………………………………………………………………………..58

CHAPTER ONE

GENERAL INTRODUCTION

This chapter discusses the introductory part of the thesis which includes the background of the study, research motivations and goals, the research questions for which the thesis should provide answers to, the methodology that is used to answer those questions and finally the summary of the thesis contribution to knowledge.

1.1 Background of the study

The world wide web is the greatest repository of information ever assembled by man. It contains documents and multimedia resources concerning almost every imaginable subject, and all of these data are instantaneously available to anyone with an Internet connection. The web’s success is largely due to its decentralized design: web pages are hosted by numerous computer, where each document can point to other documents, either on the same or different computers.

As a result, individuals all over the world can provide content on the web, allowing it to grow exponentially as more and more people learn how to use it. However, the web’s size has also become its limitation. Due to the sheer volume of available information, it is becoming increasingly difficult to locate useful information. Although directories (such as Yahoo!) and search engines (such as Google and Alta Vista) can provide some assistance, they are far from perfect. For many users, locating the right document is still like trying to find a needle in a sea.

Ontologies represent a conceivable solution for data representation as well as the knowledge sharing, aimed at the integration of the web content in a unique and coherent view. Nevertheless, due to the decentralized nature of the web, a number of ontologies have been defined and disseminated on the Internet; often they describe overlapped application domains; sometimes are specialized for specific domain.

Noy et.al (2010) states that it is evident the exigency to find some semantic correspondence among concepts which refer to different ontologies in order to get a semantic reconciliation, aimed at establishing interoperability between semantic web applications and a more homogeneous integration of information.

The main obstacle is the fact that the web was not designed to be processed by machines. Although web pages include special information that tells a computer how to display a particular piece of text or where to go when a link is clicked, they do not provide any information that helps the machine to determine what the text means. Thus, to process a web page intelligently, a computer must understand text, but natural language understanding is known to be an extremely difficult and unsolved problem. In order for machines to be able to integrate information that commits to heterogeneous ontologies, there need to be primitives that allow ontologies to map terms to their equivalents in other ontologies.

1.2 Research motivations and goals

The goal of the semantic web is to take advantage of formalized knowledge (in languages like RDF) at the scale of the world wide web. In particular, it is based on ontologies which define concepts used for representing knowledge on the web, e.g., for annotating a picture, specifying a web service interface or expressing the relation between two persons.

Some researchers and web developers have proposed that we augment the web with languages that make the meaning of web pages explicit. Tim Berners-Lee, inventor of the Web, has coined the term semantic web to describe this approach. Wang et.al(2008) provide the following definition: The semantic web is not a separate web but an extension of the current one, in which information is given a well-defined meaning, better enabling computers and people to work in
cooperation.

The following are some of the problems the semantic web is meant to solve:

1. allow users to organize and browse the web in ways that are more suitable to the problems they have at hand.

2. impose a conceptual filter to a set of web pages, and display their relationships based on such a filter.

3. allow visualization of complex content.

Motivating Example

Suppose you want to find out more about someone you met at a conference. You know that his last name is Aminu, and that he teaches Computer Science at a nearby university, but you do not know which one. You also know that he just moved to Nigeria from Niger, where he had been an associate professor at his alma mater.

On the world wide web of today you will have trouble finding this person. The above information is not contained within a single web page, thus making keyword search ineffective.

On the semantic web, however, you should be able to quickly find the answers. A marked-up directory service makes it easy for your personal software to find nearby computer science departments.

Here the data is organized into a taxonomy as in Figure 1 and Figure 2 that includes staff, Academic, and Non Academic. Associate professors have attributes such as name, degree, and degree-granting institution. Such marked-up data makes it easy for your software to find a professor with the last name Aminu. Then by examining the attribute granting institution, the software quickly finds the alma mater CS department in Niger. Here, software learns that the data has been marked up using an ontology specific to Nigerian universities, and that there are many entities named Aminu. However, knowing that assistant professor is equivalent to senior lecturer, machines can select the right sub tree in the departmental taxonomy, and zoom in on the old homepage of your conference acquaintance.

Figure 1:Computer science Department ontology for Niger

Figure 2: Computer science Department ontology for Nigeria

The semantic web depends on the ability to associate formal meaning with content. The field of knowledge representation provides a good starting point for the design of a semantic web language because it offers insight into the design and use of languages that attempt to formalize meaning.

The idea here should be clear that our software agents are not a replacement of humans in the use of the semantic web because they can not on their own make certain decisions but the only role they play is to harmonize information on the web so that users can make easy decisions.

This brings the fact that the semantic web does not only intend to represent knowledge and formal semantics but also performs some reasoning based on certain artificial intelligence concept. Figure3 shows the flow of information between users, personal agents, intelligent infrastructure and the web document we create.

Figure 3: Architectural operation of the semantic web

One of the driving factors in the proliferation of the web is the freedom from a centralized authority. However, since the web is the product of many individuals, the lack of central control presents many challenges for reasoning with its information. First, different communities will use different vocabularies, resulting in problems of synonym (when two different words have the same meaning) and polysem (when the same word is used with different meanings). Second, the lack of editorial review or quality control means that each page’s reliability must be questioned.

An intelligent web agent simply cannot assume that all of the information it gathers is correct and consistent. There are quite a number of well-known “web hoaxes” where information was published on the web with the intent to amuse or mislead. Furthermore, since there can be no global enforcement of integrity constraints on the web, information from different sources may be in conflict. Some of these conflicts may be due to philosophical disagreement; different

Users

Personal Agents

Intelligent

Infrastructure Service

Web documents

political groups, religious groups, or nationalities which may have fundamental differences in opinion that will never be resolved.

In order for information from different sources such as websites to be integrated, there needs to be a shared understanding of the relevant domain. Knowledge representation formalisms provide structures for organizing this knowledge, but provide no mechanisms for sharing it. Ontologies provide a common vocabulary to support the sharing and reuse of knowledge, As discussed by Guarino and Giaretta(1995), the meaning of the term ontology is often vague. It was first used to describe the philosophical study of the nature and organization of reality. In AI, the most cited definition is due to Gruber et al.(2009): “An ontology is an explicit specification of a conceptualization.”.
The semantic web thus offers a compelling vision, but it also raises many difficult challenges.

Researchers have been actively working on these challenges, focusing on fleshing out the basic architecture, developing expressive and efficient ontology languages, building techniques for efficient marking up of data, and learning ontologies. A key challenge in building the semantic web, one that has received relatively little attention, is finding semantic mappings among the ontologies. Given the de-centralized nature of the development of the semantic web, there will be an explosion in the number of ontologies. Many of these ontologies will describe similar domains, but using different terminologies and others will have overlapping domains. To integrate data from disparate ontologies, we must know the semantic correspondences between their elements.

In the last decade, the Semantic web initiative has grown from the personal vision of a small group of individuals to a large research field with several international conferences, multiple journals, and a wide range of interesting research projects all over the world. The Semantic web can be conceived as a collection of ontologies. Ontologies are generally used to specify and communicate domain knowledge in a generic way.

This is the key technology for realizing the semantic web . This approach forces taxonomic hierarchies, where it describes what things are and what they are used for. The current ontology technology is matured enough to provide means for development, management and reasoning within the single ontology of a particular organization. Different ontologies may be modeled for the same concepts in different ways. Although shared ontologies and ontology extension allow a certain degree of interoperability between different organizations and domains, there are often cases where there are multiple ways to model the same information. This may be due to differences in the perspectives of different organizations, different professions, different nationalities, etc.

In order for machines to be able to integrate information that commits to heterogeneous ontologies, there need to be primitives that allow ontologies to map terms to their equivalents in other ontologies. Ontology matching is one of the core tasks for ontology interoperability. It is aimed to find semantic relationships between entities (i.e. concept, attribute, and relation) of two ontologies.

Many different matching solutions have been proposed so far, some of which include: Similarity Flooding. Melnik et al.(2002) approach utilizes a hybrid matching algorithm based on the ideas of similarity propagation. Schemas are presented as directed labeled graphs; grounding on the OIM specification by Meta et al.(1999) the algorithm manipulates them in an iterative fixed-point computation to produce an alignment between the nodes of the input graphs.

COMA. (Combination of Matching algorithms) COMA by Do et al.(2002) is a composite schema matching tool. It provides an extensible library of matching algorithms; a framework for combining obtained results, and a platform for the evaluation of the effectiveness of the different matchers.

S-Match. S-Match by Guindiglia and Shvaiko(2003) is a schema-based matching system. It takes two graph like structures (e.g., XML schemas or ontologies) and returns semantic relations (e.g., equivalence, subsumption) between the nodes of the graphs that correspond semantically to each other. The relations are determined by analyzing the meaning (concepts, not labels) which is codified in the elements and the structures of schemas/ontologies.

The goals of this thesis work is to extend the S-match ontology matching algorithms with automatic learning techniques in other to make data more sharable and to facilitate the communication between ontology applications using an enhanced matching technique.

1.3 Research questions

The study is set up to answer the following questions:

1. What automatic learning techniques are most appropriate for incorporation into ontology matching algorithms?

Ontology matching is a major problem when it comes to ontology integration. Many ontology matching techniques have been developed based on certain reasoning algorithms.

The interesting point is that almost all these algorithms use some concept of similarity measures to calculate a joint probability distribution that exist between concepts. Uniform textual representation can be used to enhance ontology matching. This type of representation can be incorporated into graph based approach of ontology matching.

2. Which ontology matching algorithm admits learning techniques with a minimal overhead?

Graph based ontology matching algorithms classify input ontologies as labeled graphs containing nodes and edges. The nodes contain labels associated to them that provide additional information to the classes they represent in that concept. This is a perfect algorithm for adding machine learning technique as edges can be associated with certain labels representing some particular patterns

3. How can textual annotations be added to ontology matching algorithms to make data between ontologies more sharable?

The idea is to be able to add some textual labels to nodes of graph(representing ontologies) so that software systems are able to deduce some additional meaning from the input ontologies. Our approach uses simple logical comparisons (rather than joint probability calculations used in many other researches) to make inference.

1.4 Research objectives

The main objective of this research include:

1. Defining an architectural framework for adding textual annotations to hierarchical trees representing ontologies of different domain.

2. Enhancing the s-match algorithm to understand those textual annotations.

3. Implementation of the enhanced s-match.

4. Evaluation of the two implementations.

1.5 Methodology

The following are the steps that are set out to answer the above research questions:

1. Review literature to understand why it is necessary to match ontologies.

2. Conduct extensive literature review on ontology matching techniques.

3. Create and read simple ontologies using protégé which is one of the most widely used ontology editing tool.

4. Determine unsatisfiable classes and providing explanation for unsatisfiability in terms of matching ontologies.

5. Develop an algorithm to enhance ontology matching based on machine leaning approach.

1.6 Contribution to Knowledge

The thesis contributes to the accuracy and method of ontology interoperability through matching concepts in some domain of interest. Furthermore it demonstrates the fact that human intervention can be minimized interms of ontology matching by adding some automatic learning techniques to ontology matching algorithms.

Get Full Work

If you like this article, see others like it:

Improving Ontology Matching Towards Achieving Semantic Interoperability

Related Topics