### BIOGRAPHY 14.1* Karl Pearson* (1895 -1980)

Karl Pearson was born in London, England, the son of a successful trial lawyer. While attending University College, London, and then the universities of Heidelberg, Berlin, and Cambridge (where he received a law degree), he exhibited a phenomenal range of interests, moving from mathematics, physics, and philosophy to religion, history, and law to German folklore, socialism, and Darwinism! Much of this study had little to do with things for which Pearson is now remembered, but early in his life he was simply overwhelmed with all there is to know and noted that "not one subject in the universe is unworthy of study."

Pearson's interest in analytical statistics was kindled only in the late 1880s after he had become a professor of applied mathematics and mechanics at University College. (Later, he was named the first Galton professor of eugenics there.) His 1892 book, *the Grammar of Science*, illustrates his growing conviction that analytical statistics lies at the foundation of all knowledge. Going beyond Adolphe Quetelet (Biography 4.2), the "father of statistics," who believed that almost all phenomena can be described by the normal distribution (provided only that the number of cases examined is large enough), Pearson derived a system of generalized frequency curves that recognized the importance of asymmetrical distributions. (In that connection, he noted that true variability among individuals was a concept very different from chance variation among errors made while estimating a single value. He introduced the term *standard deviation* and the symbol s for the former and called the latter *probable error*.) Having established ways of fitting all kinds of curves to all kinds of observations, Pearson searched for a criterion that could measure the *goodness of fit*. Thus, in a famous paper published in 1900, he introduced *chi square*.

When Quetelet and his followers wanted to demonstrate the closeness of agreement between the frequencies in a distribution of observed data and the frequencies calculated on the assumption of a normal distribution, they merely printed the two series side by side, and that was that! Readers could look at these and reach their own conclusions. They had no measure of discrepancy between the observed and the expected. By introducing chi square, Pearson provided such a measure, and he worked out its distribution as well. As text Chapter 14 shows, chi square has turned out to be an enormously useful statistic, and it now occupies a major position in statistical theory.
However, unbeknownst to Pearson, a German--Friedrich Helmert--had discovered a chi-square distribution in 1875, when studying the sampling distribution of the sample variance, while sampling from a normal population. Pearson discovered it in a different context--namely, that of a goodness-of-fit problem-- and he later extended its application to the analysis of frequencies in contingency tables. Pearson himself was unclear, however, on the proper number of degrees of freedom to use. He always used the number of classes minus 1. As was later shown by Ronald A. Fisher (Biography 13.1), a more accurate result is achieved when this number is reduced by 1 for each estimated parameter.

Pearson also is responsible for considerably further developing the idea of correlation introduced by Francis Galton (Biography 16.1). He generalized Galton's conclusions and methods, derived the formula now called "Pearson's product moment" (discussed in Chapter 16), derived a simple routine for the computation of regression equations, and much more.

Most important, perhaps, is the fact that Pearson aroused the scientific world from a state of total indifference to statistical studies and convinced thousands in all fields of the necessity to gather and analyze data. He showed statistics to be a general method that was applicable to all sciences. Undoubtedly, there was no science for which Pearson himself demonstrated this fact more conclusively than biology. In 1900, Pearson became the cofounder (with Galton and Weldon) of *Biometrika*, a journal devoted to the statistical study of biological problems. Pearson edited the journal until his death and made it into the world's leading medium for the discussion of statistical theory and practice. The first issue of the journal carried a picture of a statue of Charles Darwin with the words, "*Ignoramus, in hoc signo laboremus*" (We are ignorant; so let us work). These words pretty much sum up Pearson's own philosophy of life. In the pursuit of that vast universe of knowledge that so overwhelmed him in his youth, he published hundreds of articles, and he facilitated the work of other statisticians by creating (and publishing in *Biometrika*) the types of tables found in the Appendix of the text and the existence of which we all now take for granted.

In his relentless pursuit of the truth, with the help of newly developing statistical techniques, Pearson also became embroiled in many controversies, often bitter and prolonged. As a young man, Pearson was something of a crusader, battling for such causes as socialism, the emancipation of women, and the ethics of free thought. This spirit was still evident later in his life when, as head of the Eugenics Laboratory, he battled the medical profession on such issues as the causes of tuberculosis or the effect of alcoholism on future generations. (Using statistical analysis, Pearson showed tuberculosis more related to hereditary than environmental factors, which made the widespread use of sanatoria look foolish. He also showed that, contrary to common opinion, alcoholic parents were not producing mentally or physically deficient children.)

*Sources: Dictionary of Scientific Biography*, vol.10 (New York: Scribner's 1974). pp. 447-473; *International Encyclopedia of Statistics*, vol.2 (New York: Free Press, 1978), pp. 691-698.