Archive

Archive for the ‘Technology’ Category

Fast, Accurate and Easy to Use

July 17, 2011 Comments off

Ultimately, we (at Semantic Insights) are engineers and not researchers. Our goal is to build something useful, and something that will sustain itself (and us) financially. We recognize that whatever we develop needs to perform fast enough, produce accurate enough results, and be easy enough to use to warrant end user investment.

We knew that the system we envisioned needed to understand whole sentences in context. Existing technologies fail to meet one or more of our requirements. “Statistical proximity matching” is clearly not accurate enough. “Keyword search” misses anaphora and “key-phrase search” misses intervening terms and other equivalent sentence structures. Traditional Natural Language Processing (NLP) appears too slow. Statistically-based Parts of Speech Taggers are wholly inadequate. Building “Ontologies” has all the problems of standards, in addition to requiring time, expertise, and significant up-front investment. “Text mining” results are too limited to warrant the implementation costs.

In short, we found current technology significantly lacking for our purpose. So, we began with first principles and built our approach from the bottom up.

Advertisements
Categories: Requirements, Technology

Semantic-enabled Applications vs. non-Semantic-enabled Applications

July 17, 2011 Comments off

This entry is offered as background.

Semantic-based applications operate on data. This can be data in a database, entries in a blog or wiki, text in documents or web pages. The data can be real-time data such as clicks in a browser, streaming video, or a stock ticker. The data can be anything. This is of course true of non-semantic-based applications as well.

The difference between semantic-based applications and non-semantic-based applications comes from their use of ontologies.

Semantic-based applications use special information models (semantic models) that describe the data they operate on. These semantic models are commonly known as “ontologies”. Note: ontologies are different than a database schema. And creating ontologies is not the same as “data modeling”.

Ontologies are usually maintained and accessed separately from the data they describe. To accomplish this, semantic-based applications need to have some way to relate the data to the concepts in the ontology/ontologies that describes it. The simplest form of this relationship is via “semantic tags”.

Semantic tags are used to identify data as representing or containing information related to one or more concepts in the ontology. More sophisticated systems employ classification functions that dynamically identify data as representing one or more concepts. Still more sophisticated systems create and maintain mappings of relationships as well as the concepts. And even more sophisticated systems maintain these mappings on multiple dimensions (for example, mapping different “Points of View”).

Semantic-based applications enable reasoning about data by examining the related ontologies. This can be as simple as maintaining a hierarchical index of the semantic contents of a document and then querying the index to find semantically related data. Logic can also be applied to find inferences where, for example, the existence of concept A and B and C in one or more documents implies the existence of concept D. Still more sophisticated systems can determine semantic overlap between the information sought and the information available, or between different “points of View”. And even more sophisticated systems can discover new data and new concepts.

What is important here is this:

  1. Semantic-based applications use an external information model (ontology) to semantically describe data in terms of concepts and their relationships.
  2. There is some way to relate the data to the items in the ontology.
  3. Reasoning can be performed about the data based on its relationship to the ontology.

There are many ways all this can happen. Each has its own pros and cons. This will become evident in later blog entries.

Natural Language Understanding

July 17, 2011 Comments off

Understanding Natural Language is complex. In particular, extracting the meaning from all but the simplest Natural Language communication presents a number of challenges for machine understanding. Let me explain.

In Natural Language there are many ways to say the same thing. Subjects, objects, and verbs can change position (as well as form) to form sentences each with an equivalent meaning. A subject, object, or verb may appear as a single word or a complex phrase. For that matter, whole sentences can be simple or complex, festooned with coordinating conjunctions, subordinate clauses, and all manner of punctuation (or frequently missing or incorrect punctuation).

In Natural Language flexibility abounds. There are many ways to express notions of time, identity, location, possession, and quantity. There are many ways to express characteristics, values, and units of measure. There are many ways to express relationships between two things, concepts, or other relationships. In addition, there are many ways to express negation.

In Natural Language the opportunity for ambiguity abounds. For a given grammar, there may be more than one way to correctly divide the sentence into terms and phrases. In fact, there may be more than one valid part-of-speech for a given word in a sentence. Punctuation may help, but more often then not, is missing or incorrect.

A sentence can contain words that stand for, or reference, other words in the same sentence or in previous sentences. This includes pronouns (e.g., “he”, “she”, “it”, “those”, “they”, “him”, “her”, “them”, etc.) and possessive pronouns (e.g., “his”, “hers”, “theirs”, etc.). Further, a sentence can contain indexicals that refer to one or more concepts expressed in the current sentence, in one or more previous sentences, or in sentences yet to come.

Determining the sense of a word (or multi-word term) in a sentence can be tricky. The sense of a given word can change based on how it is used in a sentence. It can change based on the presence or absence of other words in a sentence. It can change base on the content of one or more previous sentences or sentences yet to come.

There are many kinds of sentences. There are declarations, questions, commands, and exclamations. Sentences can contain quotes. Sentences can be quotes. Sentences can contain conditionals (e.g. “when”, “if”, etc.) or statements of probability (e.g. “may” vs “will”). Sentences can directly reference other sentences.

When it comes to understanding Natural Language, context is crucial. Different readers (or listeners) derive the meaning of a communication based on their individual skill with the language, as well as, their individual background knowledge of the content of the conversation. Much of our understanding of language is rooted in both a local semantic context (this document or set of documents) and one or more larger semantic contexts (domain-specific and general knowledge).

Many ways of saying the same thing

July 17, 2011 Comments off

Subjects and Objects can move around.  For example; “Joe gave Bob the ball” also be stated as “The ball was given to Bob by Joe.” Verbs can change “direction.” For example; “Joe gave Bob the ball” can be restated as “Bob received the ball from Joe.” Substitution of Equivalent words or phrases (e.g. synonyms) is often not simple and may be constrained by the semantic context.  For example; you may say that, “Joe ran for office” is the same as “Joe campaigned for office.”  Here “ran” and “campaigned” may be valid synonyms in this context.  However, it may not be correct to substitute “sprinted” for “ran” to produce, “Joe sprinted for office.”  However, if you add a determiner, “the”, before “office”, that substitution may make sense as in, “Joe sprinted for the office.”  Note: You will need to consult the semantic context to determine what is probably meant.  There will be more on this later.

Subjects and Objects

July 17, 2011 Comments off

Subjects and objects of a sentence can be simple or complex.
A single word (e.g. “Joe”)
A pronoun (e.g. “he”, “she”, “it”,…)
An indexical (e.g. “that” as in “That made all the difference”)
An indexical is a reference to something previously stated.
A noun phrase (e.g. “President Lincoln”)
A gerund phrase (e.g. “going to town”)
A infinitive phrase (e.g. “to go to town”)
Conjoined (e.g. “Joe and Bob”, “Joe or Bob”,…)
And there is more.

Word Sense in context

July 17, 2011 Comments off

Depending on other words in the same sentence
Consider the word “took”
“Joe took his medicine.”
“took” may mean “swallowed”
“Joe took his medicine with him.”
“took” may mean “brought”
“Joe took his medicine like a man.”
“took his medicine” may mean “endured ”
“Joe took his medicine and hid it.”
“took” may simply mean “moved”
Depending on other words in other sentences
Consider the word “took” again in this paragraph:
“Joe was wrong and he knew it.  Finally, it was time.  Joe took his medicine.”

World View, Meaning and Understanding

July 17, 2011 Comments off

Let us begin with some basic definitions.

An individual or collective World View is an operating model composed of concepts, instances, and relationships, along with how they are known to (may possibly) inter-relate. These inter-relations may be constrained in time, space, identity, quantity, negation, and possession.

When we read we experience the Natural Language text. The relationship between what we experience and our “World View” is the meaning we ascribe to what we read. Meaning is often a complex relationship between what we experience and our World View.
Understanding is gained as a result of determining the meaning of our experiences in terms of our World View.

From these definitions we can create a form of Machine Understanding by:

  1. creating and populating a persistent model of one or more World Views,
  2. establishing a model to represent the relationships between what is experienced (read) and the World View,
  3. providing the ability to render Natural Language sources to a common form for processing,
  4. providing the ability to map rendered Natural Language to a World View and store that mapping as meaning
Categories: Definitions, Technology