Natural Language Understanding

July 17, 2011

Understanding Natural Language is complex. In particular, extracting the meaning from all but the simplest Natural Language communication presents a number of challenges for machine understanding. Let me explain.

In Natural Language there are many ways to say the same thing. Subjects, objects, and verbs can change position (as well as form) to form sentences each with an equivalent meaning. A subject, object, or verb may appear as a single word or a complex phrase. For that matter, whole sentences can be simple or complex, festooned with coordinating conjunctions, subordinate clauses, and all manner of punctuation (or frequently missing or incorrect punctuation).

In Natural Language flexibility abounds. There are many ways to express notions of time, identity, location, possession, and quantity. There are many ways to express characteristics, values, and units of measure. There are many ways to express relationships between two things, concepts, or other relationships. In addition, there are many ways to express negation.

In Natural Language the opportunity for ambiguity abounds. For a given grammar, there may be more than one way to correctly divide the sentence into terms and phrases. In fact, there may be more than one valid part-of-speech for a given word in a sentence. Punctuation may help, but more often then not, is missing or incorrect.

A sentence can contain words that stand for, or reference, other words in the same sentence or in previous sentences. This includes pronouns (e.g., “he”, “she”, “it”, “those”, “they”, “him”, “her”, “them”, etc.) and possessive pronouns (e.g., “his”, “hers”, “theirs”, etc.). Further, a sentence can contain indexicals that refer to one or more concepts expressed in the current sentence, in one or more previous sentences, or in sentences yet to come.

Determining the sense of a word (or multi-word term) in a sentence can be tricky. The sense of a given word can change based on how it is used in a sentence. It can change based on the presence or absence of other words in a sentence. It can change base on the content of one or more previous sentences or sentences yet to come.

There are many kinds of sentences. There are declarations, questions, commands, and exclamations. Sentences can contain quotes. Sentences can be quotes. Sentences can contain conditionals (e.g. “when”, “if”, etc.) or statements of probability (e.g. “may” vs “will”). Sentences can directly reference other sentences.

When it comes to understanding Natural Language, context is crucial. Different readers (or listeners) derive the meaning of a communication based on their individual skill with the language, as well as, their individual background knowledge of the content of the conversation. Much of our understanding of language is rooted in both a local semantic context (this document or set of documents) and one or more larger semantic contexts (domain-specific and general knowledge).

Many ways of saying the same thing

July 17, 2011

Subjects and Objects can move around.  For example; “Joe gave Bob the ball” also be stated as “The ball was given to Bob by Joe.” Verbs can change “direction.” For example; “Joe gave Bob the ball” can be restated as “Bob received the ball from Joe.” Substitution of Equivalent words or phrases (e.g. synonyms) is often not simple and may be constrained by the semantic context.  For example; you may say that, “Joe ran for office” is the same as “Joe campaigned for office.”  Here “ran” and “campaigned” may be valid synonyms in this context.  However, it may not be correct to substitute “sprinted” for “ran” to produce, “Joe sprinted for office.”  However, if you add a determiner, “the”, before “office”, that substitution may make sense as in, “Joe sprinted for the office.”  Note: You will need to consult the semantic context to determine what is probably meant.  There will be more on this later.

Subjects and Objects

July 17, 2011

Subjects and objects of a sentence can be simple or complex.
A single word (e.g. “Joe”)
A pronoun (e.g. “he”, “she”, “it”,…)
An indexical (e.g. “that” as in “That made all the difference”)
An indexical is a reference to something previously stated.
A noun phrase (e.g. “President Lincoln”)
A gerund phrase (e.g. “going to town”)
A infinitive phrase (e.g. “to go to town”)
Conjoined (e.g. “Joe and Bob”, “Joe or Bob”,…)
And there is more.

Word Sense in context

July 17, 2011

Depending on other words in the same sentence
Consider the word “took”
“Joe took his medicine.”
“took” may mean “swallowed”
“Joe took his medicine with him.”
“took” may mean “brought”
“Joe took his medicine like a man.”
“took his medicine” may mean “endured ”
“Joe took his medicine and hid it.”
“took” may simply mean “moved”
Depending on other words in other sentences
Consider the word “took” again in this paragraph:
“Joe was wrong and he knew it.  Finally, it was time.  Joe took his medicine.”