“If your experiment needs statistics, you ought to have done a better experiment,” Ernest Rutherford once declared. But when you work at the frontier of detection, as astronomers and particle physicists often do, you rely on statistical analysis to extract results. Indeed, if your experiment doesn’t need statistics, then you might be too far from the frontier to make an important discovery.
Despite such statistical triumphs as last year’s discovery of the Higgs boson, Rutherford’s disdain for—or at least suspicion of—statistics remains widespread. A recent statistical analysis demonstrated that visiting your doctor every year for a checkup doesn’t significantly prolong life. Of course, the practice doesn’t harm any individual patient, but its prevalence in the US raises the total cost of medical care, which harms society. Will the study make a difference? I doubt it.
I’m not sure what evidence would convince physicians to refrain from insisting on annual checkups, but they and anyone else who is skeptical of statistical analysis might be persuaded by a simmering scandal that boiled over recently in Atlanta, Georgia.
On 29 March the superintendent of the Atlanta school district, Beverly Hall, and 34 other educators were indicted in what a New York Times news story characterized as “the most widespread public school cheating scandal in memory.”
According to the indictment, the 35 educators conspired to raise students test scores by altering the tests after the students had taken them. Meeting in secret and wearing gloves to avoid leaving incriminating fingerprints, groups of teachers at various schools rubbed out wrong answers and replaced them with the correct ones.
Besides acclaim for appearing to fix badly performing schools, the conspirators also received cash bonuses. Hall’s totaled $500 000, according to the Times. One school, Parks Middle School, “improved” so much that it forfeited $750 000 in state and federal aid.
To gather evidence of a conspiracy that might convince a jury, Georgia state investigator, Richard Hyde, persuaded one of the teachers who was allegedly part of the scheme to wear a secret recording device. But evidence of a different kind had come to light five years earlier. In December 2008, the Atlanta Journal-Constitution drew attention to what seemed like suspiciously large and abrupt jumps in test scores. That initial investigation expanded into a five-year project in which three reporters and two database specialists gathered and analyzed test scores from 69 000 schools in 14 743 districts in 49 states.
The scores from Atlanta and few other districts stuck out as anomalous. As reported last June, some of those school districts are taking advantage of the Atlanta Journal-Constitution study to identify cheating educators.
Organized crime and electoral fraud
Similar statistical investigations can be found on the arXiv e-print server. Last month two physicists, Salvatore Catanese and Giacomo Fiumara and mathematician Emilio Ferrara, all from the University of Messina in Sicily, demonstrated that they could pick out organized criminal activity from cell phone records by looking for statistically anomalous behavior.
My favorite example—because it’s so similar to the Atlanta cheating scandal—was the study posted last year by Dmitry Kobak of the electrical and electronic engineering department of Imperial College London and two unaffiliated coauthors, Sergey Shpilkin and Maxim Pshenichnikov. Here’s the abstract:
Here we perform a statistical analysis of the official data from recent Russian parliamentary and presidential elections (held on December 4th, 2011 and March 4th, 2012, respectively). A number of anomalies are identified that persistently skew the results in favour of the pro-government party, United Russia (UR), and its leader Vladimir Putin. The main irregularities are: (i) remarkably high correlation between turnout and voting results; (ii) a large number of polling stations where the UR/Putin results are given by a round number of percent; (iii) constituencies showing improbably low or (iv) anomalously high dispersion of results across polling stations; (v) substantial difference between results at paper-based and electronic polling stations. These anomalies, albeit less prominent in the presidential elections, hardly conform to the assumptions of fair and free voting. The approaches proposed here can be readily extended to quantify fingerprints of electoral fraud in any other problematic elections.
As for Rutherford, I remain puzzled by his attitude toward statistics. The famous experiment that Hans Geiger and Ernest Marsden performed in 1909 at the University of Manchester under his direction revealed the existence of the atomic nucleus—after Geiger and Marsden had laboriously tallied the rare backward reflections of alpha particles from gold foil.