Archive

Archive for the ‘Natural Language Understanding’ Category

Natural Language Understanding

July 17, 2011 Comments off

Understanding Natural Language is complex. In particular, extracting the meaning from all but the simplest Natural Language communication presents a number of challenges for machine understanding. Let me explain.

In Natural Language there are many ways to say the same thing. Subjects, objects, and verbs can change position (as well as form) to form sentences each with an equivalent meaning. A subject, object, or verb may appear as a single word or a complex phrase. For that matter, whole sentences can be simple or complex, festooned with coordinating conjunctions, subordinate clauses, and all manner of punctuation (or frequently missing or incorrect punctuation).

In Natural Language flexibility abounds. There are many ways to express notions of time, identity, location, possession, and quantity. There are many ways to express characteristics, values, and units of measure. There are many ways to express relationships between two things, concepts, or other relationships. In addition, there are many ways to express negation.

In Natural Language the opportunity for ambiguity abounds. For a given grammar, there may be more than one way to correctly divide the sentence into terms and phrases. In fact, there may be more than one valid part-of-speech for a given word in a sentence. Punctuation may help, but more often then not, is missing or incorrect.

A sentence can contain words that stand for, or reference, other words in the same sentence or in previous sentences. This includes pronouns (e.g., “he”, “she”, “it”, “those”, “they”, “him”, “her”, “them”, etc.) and possessive pronouns (e.g., “his”, “hers”, “theirs”, etc.). Further, a sentence can contain indexicals that refer to one or more concepts expressed in the current sentence, in one or more previous sentences, or in sentences yet to come.

Determining the sense of a word (or multi-word term) in a sentence can be tricky. The sense of a given word can change based on how it is used in a sentence. It can change based on the presence or absence of other words in a sentence. It can change base on the content of one or more previous sentences or sentences yet to come.

There are many kinds of sentences. There are declarations, questions, commands, and exclamations. Sentences can contain quotes. Sentences can be quotes. Sentences can contain conditionals (e.g. “when”, “if”, etc.) or statements of probability (e.g. “may” vs “will”). Sentences can directly reference other sentences.

When it comes to understanding Natural Language, context is crucial. Different readers (or listeners) derive the meaning of a communication based on their individual skill with the language, as well as, their individual background knowledge of the content of the conversation. Much of our understanding of language is rooted in both a local semantic context (this document or set of documents) and one or more larger semantic contexts (domain-specific and general knowledge).