Saturday, June 1, 2013

The limit of statistics

In French : La limite de la statistique.

We know that statistics is not appropriate to describe a small population. We can certainly count the individuals who compose it, but it will be virtually impossible to go from description to explanation.

In fact, explanation requires that we find in the statistical observation clues that guide us to causal hypothèses, between which we will choose based on the accumulation of past interpretations provided by the theory.

We find these clues in comparing the distribution of a character between different populations (eg, comparing the structure of age between two countries or between two times in the same country) and in looking at the correlation between characters within the same population.

One can always extract a representative sample in a large population, that is to say that distributions and correlations observed in this sample are not substantially different from those that could be observed on the entire population because the clues they provide lead to the same hypotheses.

Here is the test that will tell if the size of a population is sufficient to interpret its statistical description: that population must be able to be considered a representative sample drawn from a virtual population of infinite size whose structure is explained by the same causes that the population considered.

*     *

Some populations are therefore not "statistisable" (please forgive this neologism). For sure we can count their individuals and calculate totals, averages, dispersions and correlations, then publish it all in tables and graphs: but this morass will be impossible to interpret, we cannot move from this description to an explanation.

This is the case, for example, for much of business statistics: it often happens that the production of a branch or sector is concentrated in a few large companies whose number is too low for this population being “statistisable".

There is a remedy: if it is impossible to interpret a statistical description, we will use the monograph. The search for causal relationships at work in the population will no longer consider distributions and correlations, but consider each individual case in its particular history.

Of course history never provides more than assumptions, because the past is essentially enigmatic, but after all statistics also provides in the best case only assumptions... but they are not of the same nature, and the monograph requires a depth of investigation which statistics does not require.

No comments:

Post a Comment