Massachusetts Institute of Technology
Sign in | Create Account

Statistical Natural Language Parsing: Reliable Models of Language?

10/19/2007 3:15 PM Wong Auditorium
Sandiway Fong, SM '86, PhD '91, Associate Professor, University of Arizona

Description: The statistical natural language linguist owes much to the University of Pennsylvania's famous Treebank project. But this giant corpus of one million words _ actually, 49 thousand sentences from the Wall Street Journal all carefully labeled for their syntactic and semantic components -- is actually both a "blessing and a curse," says Sandiway Fong. This "gold standard" list of parsed sentences, the result of more than a decade of work, has become "the only game in town,"according to Fong. Linguists developing natural language algorithms often rely on the complex Penn Treebank to construct and train probabilistic, context"free grammars, and Fong acknowledges the Treebank's revolutionary impact on the field. But he also thinks it' sworthwhile to examine how systems that rely on Penn Treebank actually perform. He has been exploring three basic questions: Do such systems attain cognitively plausible knowledge of language, such as distinguishing between grammatical and ungrammatical components of sentences? How brittle are these systems, so that if you misspell a word or flip one part of the sentence, the system will "give you back some parse? Can these systems learn non"natural languages? Fong has unearthed some interesting issues. For instance, two well"known parsing systems couldn't score more than 50% figuring out the right way to pronounce the word "read" in eight sentences that deployed the past and present tenses (e.g., The girls will read the paper; The girls have read the paper). And the two systems didn't get the same sentences wrong. Fong wonders if "reading the Wall Street Journal is not a good way to learn how to pronounce 'read' or 'red.'" Fong also demonstrated that a parsing system could be turned on the presence (or absence) of a single example involving the phrase "milk with 4% butterfat," calling in question whether such systems are truly robust. While Treebank"based parsing systems demonstrably perform well on Treebank"like sentences, one cannot infer they have necessarily achieved grammatical competence nor linguistic stability. We must understand, says Fong, that 40 thousand training samples do not really provide enough parameters to provide the broad range of linguistic cases for computational systems that ordinary people pick up nearly effortlessly. "We expect statistical systems to be able to deal with noise. But they are extremely fragile, despite their statistical nature and training over a large data set."

About the Speaker(s): Sandiway Fong received his B.Sc. in Computing Science, at Imperial College of Science and Technology, University of London. He received an S.M. in 1986 at MIT, where he worked in the Artificial Intelligence Laboratory.

After working at IBM's Watson Research Center, he returned to MIT for his Ph.D.

In 1991, he joined the NEC Research Institute to work on natural language processing, and machine translation. In 2003, he moved to the University of Arizona, where his research interests are at the intersection of computer science and formal linguistics, with a focus on multilingual parsing, ontolinguistics, computational lexical semantics and computational morphology.

Host(s): School of Engineering, Laboratory for Information and Decision Systems

Comments (0)

It looks like no one has posted a comment yet. You can be the first!

You need to log in, in order to post comments. If you don’t have an account yet, sign up now!

MIT World — special events and lectures

MIT World — special events and lectures

Category: Events | Updated over 2 years ago

Created
December 14, 2011 14:22
Category
Tags
License
All Rights Reserved (What is this?)
Additional Files


Viewed
3956 times

More from MIT World — special events and lectures

Energy Entrepreneurship and Innovation: Today's Challenges, Tomorrow's Opportunities

Energy Entrepreneurship and Innovat...

Added over 2 years ago | 01:28:00 | 6513 views

The Last Time I Saw Bali

The Last Time I Saw Bali

Added over 2 years ago | 01:20:00 | 1794 views

Carbon and Energy Efficient Supply Chains

Carbon and Energy Efficient Supply ...

Added over 2 years ago | 01:00:00 | 5735 views

Fisheries and Global Warming: Impacts on Marine Ecosystems and Food Security

Fisheries and Global Warming: Impac...

Added over 2 years ago | 00:50:39 | 2890 views

An Electrical Engineering View of a Mechanical Watch

An Electrical Engineering View of a...

Added over 2 years ago | 00:53:01 | 2214 views

Reflections on the Life and Legacy of Dr. King Student Remarks

Reflections on the Life and Legacy ...

Added over 2 years ago | 00:09:01 | 1652 views