Wednesday, August 5, 2009

Chomsky v Shannon

SVG version of :en:Image:Syntax_tree.png. Show...

There is a whole lot of literature on Chomsky and generative grammar. There is also quite a bit written on Claude Shannon's Information Theory and the development of stochastic linguistics. On this subject I just read a really great paper that outlines the arguments by Chomsky against statical models in linguistics and computational linguistics in particular. Please read the paper in its entirety at http://www.vinartus.net/spa/95c.pdf by Steven Abney.

My favorite is the argument by Chomsky, (sorry Steve you missed this point in your paper), because Chomsky in essence by making his argument disproves himself. Here is the snippet in the form from Chomsky, Noam (1957). Syntactic Structures. The Hague/Paris: Mouton. pp. 15.
  1. Colorless green ideas sleep furiously.
  2. Furiously sleep ideas green colorless.

It is fair to assume that neither sentence (1) nor (2) (nor indeed any part of these sentences) had ever occurred in an English discourse. Hence, in any statistical model for grammaticalness, these sentences will be ruled out on identical grounds as equally "remote" from English. Yet (1), though nonsensical, is grammatical, while (2) is not.


If you just read the above you are an English speaker so both (1) and (2) are now in your experience, they are both indeed now part of the English discourse. Thus Chomsky actually disproves himself on that point.
Furthermore, since Chomsky wrote the sentences, the probability is not zero, and not even remotely zero. Wikipedia seems to that that (1) has enough informational value that there is even has a wiki page dedicated to the sentence, you can follow the link. In fact any nonsensical set of words that Chomsky would written for (1) and (2) with would also have a non-zero probability of occurring in a sentence. Secondly, since it is used as part of an argument it now contains information, (even though independent of the argument (1) and (2) may not have informational content) they have grammatical value. And anything that that any English speaker would replace (1) and (2) with in that argument would give the same results. So therefore, the Markov approach would actually work with a non zero probability since the 'grammaticalness' of the sentences is established by their use in the argument. And (2) is thus gramatical although grammatically undecipherable by generative grammar rules.

Hence, in any statistical model for grammaticalness, these sentences will be ruled in on identical grounds as equally "proximate" English.

Aside from my argument, Steven Abney's paper does a great job of explaining why Chomsky’s arguments do not bear at all on the probabilistic nature of Markov models only on the fact that they are finite-state, and that Chomskys arguments are not by any stretch of the imagination a sweeping condemnation of statistical methods.
Reblog this post [with Zemanta]

0 comments:

Post a Comment