Advertisement
Research Article

Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution

  • Ramon Ferrer-i-Cancho mail,

    rferrericancho@lsi.upc.edu

    Affiliation: Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Barcelona, Catalonia, Spain

    X
  • Brita Elvevåg

    Affiliation: Clinical Brain Disorders Branch, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America

    X
  • Published: March 09, 2010
  • DOI: 10.1371/journal.pone.0009411

Reader Comments (15)

Post a new comment on this article

what is random and reasonable?

Posted by beckon on 18 Mar 2010 at 17:23 GMT

It may be of interest to the authors that a couple of decades ago biogeographers when through an extended discussion over how to populate islands randomly as a baseline for comparison with the species composition on real islands. How much should the selection of candidate species for random populations be guided by the characteristics of real populations? The analogy seems to be close with choosing the constraints on random text. It might be useful to go back over that literature. Linguists don't seem to have thought about this as much as biogeographers did.

No competing interests declared.

RE: what is random and reasonable?

beckon replied to beckon on 18 Mar 2010 at 17:27 GMT

Sorry...I ment to say "biogeographers WENT through..."

No competing interests declared.

RE: what is random and reasonable?

rferrericancho replied to beckon on 22 Mar 2010 at 18:25 GMT

In my opinion, the discussion about the relevance or meaningfulness of Zipf’s law lacks a proper null hypothesis. “random typing” could be considered a null hypothesis but no cognitive scientist would agree that this is the way words are produced when we speak or write. This is in connection with the discussion at the end of our article.
If we assumed, more realistically, that there is a mental lexicon, a possible null hypothesis would be “words are chosen uniformly at random from a mental lexicon”. But this, would not give Zipf’s law (with a typical exponent). The rank histogram would be flat (if the sample was large enough).
Another problem is that many researchers look at Zipf’s law as a null hypothesis, see for instance

Miller, G. A. & Chomsky, N. 1963. Finitary models of language users. In: Handbook of Mathematical Psychology (Ed. by R. D. Luce, R. R. Bush & E. Galanter), pp. 419–492. New York: J. Wiley.

(or more recently Nowak, M. A. (2000) The basic reproductive ratio of a word, the maximum size of a lexicon. Journal of Theoretical Biology, 204 (2), 179-189)

The point is that a null hypothesis is a possible explanation for a phenomenon but not the phenomenon itself.

No competing interests declared.