Notes on ontologies, (and their relevance to service trading in an internet service market)

Venu Vasudevan

Note about the title: At this point it is more the former than the latter, but we'll get there
 

Why the interest in ontologies

To paraphrase [Chandra], "the current interest in ontologies is the latest version of AI's alternation of focus between mechanism theories and content theories.  The ontology realization amounts to the fact that no matter what magic you have in the problem solver (fuzzy logic, neural nets, frame language etc.), it cannot do much good without a content theory of the domain. Further, given a good content theory, different mechanisms may be used to build effective systems".  Similar statements have been independently realized by the software engineering community, the database community,  the workflow community and others. Domain modelling has become an important research topic in software engineering because tools with different domain models can't talk to each other since they say the same thing in different (incompatible) ways. To the database community, it is hard to integrate data from databases that aren't consistent about how they model the world. Multiple workflow systems need to agree on what concepts such as processes, actors, concepts of resource "usage" by activities etc. so as to exchange workflow and worflow execution information across workflow agents (thus PIF defines a workflow ontology). All these are versions of the ontology problem amongst human/programmed agents.

Ontologies: Various perspectives

There is an ontology problem with ontologies. While everybody agrees ontologies are important, there is  debate about wher the dividing line is between ontologies and a number of other approaches (e.g object models) of representing concepts and conceptualization. The most frequently quoted definition of ontologies is that by Gruber: that an ontology is a specification of a conceptualization. The quibbling arises when you dive deeper and ask: "how formal or rich does this specification need to be before one can call it an ontology". The AI community views ontologies as formal logical theories whereby you are not only defining terms and relationships, but formally defining the context in which the term(relationship) applies, and implied facts and relationships.  These ontological theories are formal enough to be testable for soundness and completeness by theorem provers. An AI ontology approach will be able to fully define an "attack aircraft" as - " An attack aircraft is a fixed wing aircraft that has been assigned to a combat mission that is one of those in the group SEAD".  In contrast, databases and other communities view ontologies more as object models, taxonomies and schemas, and do not explicitly express constraints such as the context in which a fixed wing aircraft turns into an attack aircraft.  Linguistic ontologies (e.g. WordNet) and thesaurii express various relationships between concepts (e.g synonymy, antonymy, is_a, contains_a), but do not explicitly  and formally describe what a concept means.

In addition to the formailty and rigor dimensions, ontologies can be classified along dimensions of coverage, guiding principle, and point of view. Upper ontologies such as CYC aim to cover concepts that are common across domains (e.g. common sense reasoning in the case of CYC), while domain ontologies focus on a single domain (evacuation operations, medicine etc.). The point of view of an ontology is the kinds of concepts for which the ontology forms a theory. The point of view of an ontology need not be a domain (e.g mission planning). For instance, problem solving ontologies describe the strategies taken by problem solvers to attack domain problems, and theory ontologies might describe concepts of space, time, causality, plans etc . The guiding principle of an ontology is the principle by which the concepts and relationships in the ontology were chosen.  WordNet and other ontologies use linguistics as a guide for identifying the concepts that should be in the ontology (particularly the upper level concepts). Linguistics is not the only guidance that could be used. It might be possible, for example, to use conceptual clustering as a guide for ontology structure.

How are ontologies used

One of the first motivators for ontologies was to share domain knowledge across problem solvers.  As Patil points out, if two problem solvers use different conceptualizations of the same world, there can be no sharing of the knowledge bases, and mapping concepts from one KB to the other can be tricky business. So, the idea was to use logics (e.g KIF) to define an ontological theory, which could then be translated to the particular problem solver's formalism (KEE, CLIPS, OPS-5 etc.).  For this kind of knowledge reuse, the ontology needs to be semantically rich.

Even if a single kind of problem solver (e.g KEE) is being used, ontologies are deemed as a useful knowledge structuring mechanism. A whole bunch of KEE users could design their knowledge base so it consists of an ontology (a shared KB with only the sharable assertions) and non-sharable sets of facts and assertions that apply only to a particular situation that not everybody is interested in.

Closer to the world of tool integration, enterprise integration people realized that tools used internal models of the enterprise that were intersecting and incompatible. An enterprise might use multiple workflow systems, each one of which used the concepts "resource", "completion time", "owner of a step" etc. in different ways, or didn't have a common terminology. The solution was to a) define an enterprise ontology b) exchange results between workflow systems A and B by translating A's state to the enterprise ontology which was then read in by B. The ontology became a common format (an interlingua), with diverse models providing reader of and writers to this ontology.

A more basic version of the interlingua idea is the database view of ontologies. Here information sources, their capabilities, and their pedigrees are mapped to a common ontology (e.g. UML). This then allows these repositories to be federated by a federation system.  A slew of mediator based multi-database architectures thus use ontologies in one form or another.

Ontologies: Principles, Methods and Applications (Mike Uschold, Michael Gruninger)

People and programs use different models of the world in solving the same problem.This causes problems in both people and programmatic inter-communication. Ontologies specify a domain/world model, often in concepts/relationships/processes/... lingo, which if expressed (and pointed to) explicitly, will allow programs and people to avoid misunderstandings. Ontology is an overloaded term, and everything from object models to database schemas to logical theories (not just objects and relationships, but assertions and inferences) serve an ontological purpose for a given domain and given set of applications. For a single domain, and a narrow set of problems, a relation db schema is an ontology in that applications that share the schema "view of the world" can interoperate. As you try to build a knowledge model that cuts across domains and unifies radically different world views (say a logistics application and a military course-of-action application), concepts, relationships and contexts get hairy, you might have to "say a lot to say what you mean". In this multi-domain situation, one might need to not only define concepts but contexts as well, in some formal manner.

Ontologies have several different uses. To those who build AI problem solvers, it allows to problem solvers to reuse each other's knowledge base, or at least minimize the pain of translation (Patil makes the same point). If two knowledge bases agree that they mean the same thing when they talk about "strut", the problem of translating from a CLIPS KB to a KEE KB is simplified to a data translation problem (as opposed to a "gee are we even talking about the same thing" problem). To repository people, an ontology provides  common representation into which they can translate their repositories, thus avoiding the O(n2) translator problem.

Clearly, if the ontology is the glue to all your tools, building a bad ontology is a bad idea. The rest of this paper talks about guidelines to building good ontologies, which is not of immediate interest to me.
 

Scalable Knowledge Composition (Gio's work - see http://www-db.stanford.edu/LIC/SK.html)

Gio has a series of papers on the topic resolving semantic heterogeneity in Information Systems. These include: The collective summary of these papers is below.

The semantics of informations sources are captured by their ontologies (i.e the terms and relationships they use = their domain of discoruse).  To support the coherent querying of multiple overlapping information sources, we need to use ontologies to understand and compensate for the overlap of "world views", or actual data between the information sources. Integrating information from different sources without an understanding of ontologies will lead to duplicate data that just looks different, missing data that is actually there in the information source, multiple inconsistent views of the same information (e.g same information different fidelities) , information that belongs on different points in the timeline etc. So "unintelligent I**2" will give you more data,  less knowledge, and lead you to make wrong conclusions.

Gio assumes that ontologies are pre-developed for non-collaborating repositories. These ontologies define not only the concepts that model the repository content (e.g shoes, their manufacturers etc.) but also the pedigree (is the information authoratiative, how recent is it), wrapper smarts etc. Given that an I**3 application has to be built on multiple non-collaborating repositories (each with its own ontology, the ontology being relevant in a context), we have an ontology composition problem, which is what Gio is addressing.  So, continuing, contexts provide gurantees about the exported knowledge, and the inferences feasible over the knowledge. Context would then include stuff like: schema of the source, supported queries, pedigree of the data (and/or authoritativeness of the data provider), latency and accuracy of data etc. The application now has to operate across a third ontology (with its own context) which is some subset of the combined ontologies of the information sources it is operating on. Gio proposes an ontology algebra (with set operations) by which an application ontology can be defined over multiple resource ontologies using set operators. The papers deal with implementing such an algebra using lower level rules that relate concepts in one ontology with concepts in the other (e.g. factory shoe color number #43423 is shoe store color "pink").

Anyway, the execution of these operators (which are a bunch of rules underneath) allows the application to:

The Ontology of Tasks and Methods (B.Chandrasekharan et al.)

The current interest in ontologies is the latest version of AI's alternation of focus between mechanism theories and content theories.  The ontology realization amounts to the fact that no matter what magic you have in the problem solver (fuzzy logic, neural nets, frame language etc.), it cannot do much good without a content theory of the domain. Further, given a good content theory, different mechanisms may be used to build effective systems. E.g. if you model students and employees as "is_a" humans, then you will draw wrong conclusions, as opposed to modelling them as "roles_of" humans. A bad ontology can make the reaonser draw wrong conclusions. A good ontology can be reused across problem solvers.

The rest of the paper makes a distinction between "domain ontology" (what do you know about), and "problem solving ontology" (strategies you use to solve a problem using the domain ontology) and expands on the elements of a problem solving ontology.

Ontologies: Where are the killer apps (ECAI-98 Workshop on Applications of Ontologies and Problem-Solving Methods)

While ontology technology has been "ready" for a while, practical applications of ontologies are hanrd to pin down. Part of the reason is that ontologies come in many flavors with many uses, and nobody has categorized the (ontology application) design space so that people can index their ontology applications into this space. Once such a design space is defined, and applications are slotted into this space, we can get a perspective of the common uses of ontologies, and why they are not being used in more ambitious ways. Uschold proposes the following design space:
 
Category Variations
Purpose Knowledge reuse, interoperability between heterogenous software applications, reduce s/w maintenance costs
Formality Is the ontology a taxonomy(object model) or highly formal specification of the meaning of terms
Breadth (subject matter) Narrow domain ontology or broad upper ontology
Scale Ontology size - 100, 1000, million concepts?
Conceptual Architecture Is the ontology used for repository federation, as an interchange language for multiple KBs ....?
Mechanisms what are the operators used on the ontologies and why (inferences, articulation rules, tracing mapping etc.). This clarifies how the ontology adds value.

Uschold's view is that AI applications of ontologies are few, and most fielded applications are in databases, Corba and workflows. There are growing applications of ontologies in query term expansion (closest to trader) and cluster purification. Although data warehouses are not viewed as ontology applications, they could be. Based on his experience uschold explains the lack of fielded ontologies by the fact that applications have to be very large before an ontology can be justified from a cost viewpoint. In the case of data translation, until you have at least 4 different complex object models, it is more cost effective to write translators than to translate all these models into a common ontology.

Toward Distributed Use of Large-Scale Ontologies (Swartout, Patil et al. -USC-ISI)

Currently, people who build knowledge bases use not only differnt problem solvers, but model the same domain differently. See for example, two models of the same concept "strut" used by different knowledge bases. Such diversity makes it hard to share (or merge) knowledge bases, as there is a mismatch in the intermediate concepts. Knowledge becomes more shareable if knowledge bases addressing the same problem share a common domain model skeletal structure, i.e an ontology. This paper deals with how to build a large ontology. It proposes some guiding principles and a methodology.

The guiding principles are:

How to build an ontology, say for a family of air campaing planning operations:

SHOE papers (TB summarized)

Notes

The WordNet papers (TB summarized)

Ontology.org notes

The Role of Shared Ontology in XML-Based Trading Architectures
The main barrier to electronic commerce lies in the need for applications to share information, not in the reliability or security
of the Internet. Because of the wide range of enterprise and electronic commerce systems deployed by businesses and the way these systems are configured, the problem is particularly acute among large electronic trading groups, yet this
 is precisely where the greatest return on investment (RoI) can be achieved. While companies are beginning to organise, standardise and stabilise their digital services in order to create and maintain sustainable network relationships with their trading partners, they are doing this only in conjunction with their immediate trading partners. This severely limits the RoI opportunities.
 

RosettaNet

RosettaNet  The lack of electronic business interfaces in the IT supply chain puts a huge burden on manufacturers, distributors, resellers, and end-users, ultimately creating tremendous inefficiencies and ultimately inhibiting our ability to leverage the Internet as a business-to-business commerce tool. Here are a few examples: Resellers must learn and maintain different ordering/return procedures What is missing in order to scale eBusiness are the "dictionaries," the "framework," the "Partner Interface Processes - PIPs" and the "eBusiness processes."
Note: RosettaNet has standard properties specifications for laptops, memory, s/w etc.
 

Ontology Problems

Attic