Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution

Ramon Ferrer-i-Cancho; Brita Elvevåg

doi:10.1371/journal.pone.0009411

Loading metrics

Open Access

Peer-reviewed

Research Article

Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution

Ramon Ferrer-i-Cancho ,

* E-mail: rferrericancho@lsi.upc.edu

Affiliation Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Barcelona, Catalonia, Spain
⨯
Brita Elvevåg

Affiliation Clinical Brain Disorders Branch, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America
⨯

Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution

Ramon Ferrer-i-Cancho,
Brita Elvevåg

Published: March 9, 2010
https://doi.org/10.1371/journal.pone.0009411

Reader Comments

Post a new comment on this article

what is random and reasonable?

Posted by beckon on 18 Mar 2010 at 17:23 GMT

It may be of interest to the authors that a couple of decades ago biogeographers when through an extended discussion over how to populate islands randomly as a baseline for comparison with the species composition on real islands. How much should the selection of candidate species for random populations be guided by the characteristics of real populations? The analogy seems to be close with choosing the constraints on random text. It might be useful to go back over that literature. Linguists don't seem to have thought about this as much as biogeographers did.

No competing interests declared.

RE: what is random and reasonable?

beckon replied to beckon on 18 Mar 2010 at 17:27 GMT

Sorry...I ment to say "biogeographers WENT through..."

No competing interests declared.

RE: what is random and reasonable?

rferrericancho replied to beckon on 22 Mar 2010 at 18:25 GMT

In my opinion, the discussion about the relevance or meaningfulness of Zipf’s law lacks a proper null hypothesis. “random typing” could be considered a null hypothesis but no cognitive scientist would agree that this is the way words are produced when we speak or write. This is in connection with the discussion at the end of our article.
If we assumed, more realistically, that there is a mental lexicon, a possible null hypothesis would be “words are chosen uniformly at random from a mental lexicon”. But this, would not give Zipf’s law (with a typical exponent). The rank histogram would be flat (if the sample was large enough).
Another problem is that many researchers look at Zipf’s law as a null hypothesis, see for instance

Miller, G. A. & Chomsky, N. 1963. Finitary models of language users. In: Handbook of Mathematical Psychology (Ed. by R. D. Luce, R. R. Bush & E. Galanter), pp. 419–492. New York: J. Wiley.

(or more recently Nowak, M. A. (2000) The basic reproductive ratio of a word, the maximum size of a lexicon. Journal of Theoretical Biology, 204 (2), 179-189)

The point is that a null hypothesis is a possible explanation for a phenomenon but not the phenomenon itself.

No competing interests declared.

Subject Areas
?

For more information about PLOS Subject Areas, click here.
We want your feedback. Do these Subject Areas make sense for this article? Click the target next to the incorrect Subject Area and let us know. Thanks for your help!

Computational linguistics
Is the Subject Area "Computational linguistics" applicable to this article?

Thanks for your feedback.
Semantics
Is the Subject Area "Semantics" applicable to this article?

Thanks for your feedback.
Natural language
Is the Subject Area "Natural language" applicable to this article?

Thanks for your feedback.
Language
Is the Subject Area "Language" applicable to this article?

Thanks for your feedback.
Probability distribution
Is the Subject Area "Probability distribution" applicable to this article?

Thanks for your feedback.
Statistical distributions
Is the Subject Area "Statistical distributions" applicable to this article?

Thanks for your feedback.
State law
Is the Subject Area "State law" applicable to this article?

Thanks for your feedback.
Test statistics
Is the Subject Area "Test statistics" applicable to this article?

Thanks for your feedback.

Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution

Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution

Reader Comments

Post Your Discussion Comment

Why should this posting be reviewed?

Thank You!

what is random and reasonable?

Posted by beckon on 18 Mar 2010 at 17:23 GMT

RE: what is random and reasonable?

beckon replied to beckon on 18 Mar 2010 at 17:27 GMT

RE: what is random and reasonable?

rferrericancho replied to beckon on 22 Mar 2010 at 18:25 GMT