Zipf’s law of word distribution states the following:
Take a large corpus of text, count the frequency of every word in the
corpus, and then rank these frequencies in decreasing order. Let $f_{I}$
be the $I$th largest frequency in this list; that is, $f_{1}$ is the
frequency of the most common word (usually “the”), $f_{2}$ is the
frequency of the second most common word, and so on. Zipf’s law states
that $f_{I}$ is approximately equal to $\alpha / I$ for some constant
$\alpha$. The law tends to be highly accurate except for very small and
very large values of $I$.
Zipf’s law of word distribution states the following: Take a large corpus of text, count the frequency of every word in the corpus, and then rank these frequencies in decreasing order. Let $f_{I}$ be the $I$th largest frequency in this list; that is, $f_{1}$ is the frequency of the most common word (usually “the”), $f_{2}$ is the frequency of the second most common word, and so on. Zipf’s law states that $f_{I}$ is approximately equal to $\alpha / I$ for some constant $\alpha$. The law tends to be highly accurate except for very small and very large values of $I$.