Reflections on the early 19th c. French military hat-shaped curve
At some point in our illuminating conversation with Mike Acree (episode 57), I brought up the fact that the normal distribution curve, on which much of statistical inference rests, is open to infinity at both ends. The tails are supposed to hug but never cross the x-axis, and therefore never define any limits to the normal range.
That bothered me because I misunderstood how the label “normal” came to be affixed to that curve. If that curve is to serve as a normative model for human height (as Quetelet first proposed in the 1830s), then, accordingly, discovering a 2-inch tall Lilliputian could be a perfectly normal, albeit rare occurrence.
After the show, Mike explained to me that the term “normal” was not necessarily there to denote normativity. Wikipedia elaborates:
Since its introduction, the normal distribution has been known by many different names: the law of error, the law of facility of errors, Laplace’s second law, Gaussian law, etc. Gauss himself apparently coined the term with reference to the “normal equations” involved in its applications, with normal having its technical meaning of orthogonal rather than “usual”. However, by the end of the 19th century some authors had started using the name normal distribution, where the word “normal” was used as an adjective – the term now being seen as a reflection of the fact that this distribution was seen as typical, common – and thus “normal”. Peirce (one of those authors) once defined “normal” thus: “…the ‘normal’ is not the average (or any other kind of mean) of what actually occurs, but of what would, in the long run, occur under certain circumstances.” Around the turn of the 20th century Pearson popularized the term normal as a designation for this distribution.
So, it took decades after its formulation before the curve was labeled normal in the sense of “typical” and, in many contexts, the reference was to the long run occurrence of some events under study.
Karl Pearson clarified further that the term “normal” in this context should not (necessarily) be understood in opposition to “abnormal.”
Many years ago I called the Laplace–Gaussian curve the normal curve, which name, while it avoids an international question of priority, has the disadvantage of leading people to believe that all other distributions of frequency are in one sense or another ‘abnormal’. –Pearson (1920)
All well and good but, interestingly, there is one statistician who indeed thought of the curve in a normative sense, and that’s Adolphe Quetelet himself.
Like many scientists of his time, Quetelet was a polymath and his early training and focus was in mathematics and astronomy, which is the context in which he became familiar with the normal distribution.
As Mike mentioned during the podcast, he was the first scientist to apply statistical distribution curves to the context of human biology and human affairs after he discovered that the normal curve was a good fit for the distribution of human height.
Quetelet had been assigned by the French government to analyze the height of 100,000 male conscripts, discovered that the Gaussian curve was a good fit for that dataset, and that discovery served as the impetus for his development of a science of “social physics.”
But for him, the bell-shaped curve was more than a descriptive tool. Rather, he believed that its mean value was an indication of the ideal blueprint for mankind and deviations from the mean were indications of manufacturing defects “from on high,” as Mike jokingly put it to me. Quetelet believed, in effect, that the mathematical distribution indicates what’s normal and what deviates from normal.
Now, I don’t know the exact circumstances that gave Quetelet access to the data from 100,000 conscripts and to what extent he chose to confine his analysis to this homogenous sample. But, as a thought experiment, I wonder what would have happened if height measurements for the entire French population had been made available to him to analyze.
Before constructing his distribution curve, he would have had to make decisions about who to include and who to exclude: Should he include children or only fully-developed adults? Should he include women? Should he include the elderly who might have lost some height from osteoporosis or disc disease? What about dwarves or people suffering from gigantism? What about people with deformities and traumatic injuries of the spine or legs? What about the malnourished?
Depending on those decisions, the resultant distribution curve may not have been as compellingly “normative” as the bell-shaped curve ultimately seemed to him.
Also, if in the process of collecting his dataset, Quetelet had excluded achondroplastic dwarves, malnourished folks, people with deformities, etc., that process of exclusion should have made him realize that the human mind can judge normality and deviation from normality without the need to refer to any known statistical distribution. And that should have made him less sanguine about the prospect of the Gaussian curve yielding “normative” information.
I also wonder how he would have reacted if he was given the dataset for heights from male adults in ancient Gaul conscripted in Caesar’s army. That set undoubtedly showed a different mean height and standard deviation than the one Quetelet discovered in modern day French conscripts.
And if he had been aware of the short stature of African Pygmies—the cause of which remains somewhat obscure, that awareness could also have given him pause about defining normal on the basis of numbers. Likewise, for the tall Dinka people of South Sudan or, for that matter, the fact that Dutch people to this day are measurably taller on average than their German or Belgian neighbors. Normal height is obviously a multidimensional concept that must take into account the historical and geographic context.
On one of our earlier podcast episodes with guest Dr. Saurabh Jha (“Normal is Fuzzy,” episode 5), we touched on some of these questions in reference to a paper published last year by the famous methodologist John Ioannidis and his colleagues (“In the Era of Precision Medicine and Big Data, Who Is Normal?” published in JAMA). They too are grappling with the problem of defining normal values.
Perhaps one thing to bear in mind is that “normality” can only be judged in the context of the whole organism. It is not simply a feature of the trait itself. There is no such thing as a normal range of human height per se; there is a vague range of heights that belongs to normal human beings. It’s the whole being that must be deemed normal first of all.
What does that say about the prospect for artificial intelligence to help us decide what’s normal or not? Ioannidis and his colleagues were uncharacteristically optimistic in their editorial. I, for one, believe that the capacity to determine normality is irreducibly human and cannot be learned by machines (but machines could fool us into thinking they can).
So much for Quetelet and normal humans.
Mike also mentioned to me that, in the 19th century, the shape of the normal curve was often referred to in the French-language scientific literature as the Chapeau de Gendarme (military police hat) rather than forme de cloche (bell shape). That puzzled me until I realized that that was a reference to the Napoleonic outfit.
And that head dress has really captured the imagination of French industrial designers. The French Wikipedia site has an entire entry on that chapeau and here are some of its illustrations.