Exercise 22.4

Choose a corpus of at least 20,000 words of online text, and verify Zipf’s law experimentally. Define an error measure and find the value of $\alpha$ where Zipf’s law best matches your experimental data. Create a log–log graph plotting $f_{I}$ vs. $I$ and $\alpha/I$ vs. $I$. (On a log–log graph, the function $\alpha/I$ is a straight line.) In carrying out the experiment, be sure to eliminate any formatting tokens (e.g., HTML tags) and normalize upper and lower case.