Ten Rules of Statistics
August 1, 2016
There's a statistic (I suppose) that any article about statistics will likely start with the observation that "there are three kinds of lies: lies, damned lies, and statistics." This saying, which can be traced back to 1891, was popularized by Mark Twain (1835-1910), but its origin is unknown. I wrote about one type of statistical lie in an earlier article (Hacking the p-Value, May 4, 2015).
Statistics are an important part of science. Very low level statistical analysis is used to derive a best value for a measured quantity found through several experimental trials, and for assessing the quality of a curve fit to data (called "goodness of fit"). Most curve-fitting programs give this as the coefficient of determination, called r-squared in a simple linear regression (see graph).
The physical science and mathematics preprint website, arXiv, publishes statistics papers in several categories, as follow:
|Regression fit of a straight line to noisy data.|
In this example, the noise is just a few percent of full scale, so r2 is nearly 99%.
(Analysis using Gnumeric.)
stat.AP - Applications: Biology, Education, Epidemiology, Engineering, Environmental Sciences, Medical Research, Physical Sciences, Quality Control, Social Sciences.
Statistical analysis is especially important in the area of experimental high energy physics, where some types of elementary particles are few and far in between. So as to not fool themselves too often, elementary particle physicists have set a certainty of five standard deviations (5-σ) as the threshold of truth. This sets a p-value, the likelihood that the result of an experiment is not as predicted by the hypothesis, of about 0.000028%. Few would argue about such a standard.
A recent editorial in PLoS Computational Biology by authors from Carnegie Mellon University (Pittsburgh, Pennsylvania), Johns Hopkins University (Baltimore, Maryland), North Carolina State University (Raleigh, North Carolina), Harvard University (Cambridge, Massachusetts), the University of California Berkeley (Berkeley, California), and the University of Toronto (Toronto, Ontario) offers ten rules to remove most of the "lies," and all of the "damned lies," from statistics.[1-2]
Their article follows in a long tradition of "Ten Rules" articles on PLoS, of which there are about sixty. There's even a PLoS article entitled, "Ten Simple Rules for Writing a PLoS Ten Simple Rules Article." These articles seem to evoke considerable interest, as the following graph indicates.
As the authors write in their "Ten Simple Rules for Effective Statistical Practice" editorial, "Statisticians are not shy about reminding administrators that statistical science has an impact on nearly every part of almost all organizations." This particular article has generated more than 43,000 views, a statistic that puts it in the top twenty of these "Ten Rules" articles. Says Michael J. Tarr, head of CMU's Department of Psychology,
stat.CO - Computation: Algorithms, Simulation, Visualization.
stat.ML - Machine Learning: Classification, Graphical Models, High Dimensional Inference.
stat.ME - Methodology: Design, Surveys, Model Selection, Multiple Testing, Multivariate Methods, Signal Processing and Image Processing, Time Series, Smoothing, Spatial Statistics, Survival Analysis, Nonparametric and Semiparametric Methods.
stat.OT - Other Statistics: Work in statistics that does not fit into the other stat classifications.
stat.TH - Statistics Theory: Asymptotics, Bayesian Inference, Decision Theory, Estimation, Foundations, Inference, Testing.
"The sciences, and, particular the fields of psychology and neurobiology, have come under increasing scrutiny in recent years for sometimes poor statistical practices... Straightforward and understandable guidelines as articulated by (Robert E.) Kass and colleagues will help tremendously in reminding both students and faculty as to the importance of statistically well-grounded research. Their paper is an instant 'must-read' for anyone who cares about good and reproducible science."
The following is a summary of the "Ten Simple Rules for Effective Statistical Practice."[1-2]
Statistical Methods Should Enable Data to Answer Scientific Questions
While scientists are skilled at collecting data, they are typically not skilled in the many ways that information can be extracted from the data. The statistician authors of these rules, not unexpectedly, propose that statisticians be consulted at all stages of investigation.
Signals Always Come With Noise
Noise is present in all experimental data, so it's important to accurately express your uncertainty (for example, the error bars in the orbit figure, above), and to identify sources of systematic error.
Plan Ahead, Really Ahead
It's always important to design your experiment carefully. You want to change the variables appropriately, and you also want to simplify your data analysis when it's over.
Worry About Data Quality
Computer programmers are not the only ones who face the "garbage in -> garbage out" problem. Automated data acquisition, and laboratory instruments that perform signal filtering and other preprocessing, might shade your data in unknown ways.
Statistical Analysis Is More Than a Set of Computations
Statistical software may speed analysis, but it's important to do the types of statistical analysis appropriate to your experiment. Are you fitting to a straight line because that's what theory predicts, or does theory predict a different type of curve you should be fitting to?
Keep it Simple
As I learned in the design of industrial experiments (see my earlier article, Tea Party Technologists, November 18, 2011), most experiments can be considerably simplified, since the extra data is, in fact, redundant. One chemist I knew didn't need to fit his data points to a curve, since his data points were so close together that they were the curve. All that data-taking was wasted effort.
Provide Assessments of Variability
When your hypothesis is proven with a weight change of the order of milligrams, and your available analytical balance measures to 10 micrograms, you know that your measurement error is important. In your published paper, it's important to calculate how this uncertainty propagates to your final result.
Check Your Assumptions
Since you're a materials scientist, and not an astronomer, why did you think that it was necessary to do your experiments on the night of the full moon? The argument that "we've always done things that way" doesn't carry much weight in scientific circles.
When Possible, Replicate!
Replication is at the heart of scientific investigation. It is only through replication that your experiments are validated; but, for this to be possible, replicant experiments must be done in the same way. Science advances, also, when experiments are done with slight changes in variables. Statistics will identify how much a variable needs to be changed to expect a different experimental outcome.
Make Your Analysis Reproducible
Data is one thing, but experimental results are presented in aggregate form. When data are shared, details on the statistical analysis should be shared, also, so that the tables, figures and statistical inferences in your publication can be reproduced exactly.
Most of the above is done routinely by most senior scientists, and they will require as much from their junior team members. Funding for the authors of the "Ten Simple Rules for Effective Statistical Practice" came from the National Institutes of Health, the Natural Sciences and Engineering Research Council Council of Canada, and the National Science Foundation.
- Robert E. Kass, Brian S. Caffo, Marie Davidian, Xiao-Li Meng, Bin Yu, and Nancy Reid, "Editorial - Ten Simple Rules for Effective Statistical Practice," PLoS Comput. Biol., vol. 12, no. 6 (June 9, 2016), Article no. e1004961, doi:10.1371/journal.pcbi.1004961. This is an open access publication with a PDF file available here.
- Shilo Rea, "Kass Co-Authors 10 Simple Rules To Use Statistics Effectively," Carnegie Mellon University Press Release, June 20, 2016.
- Harriet Dashnow, Andrew Lonsdale, and Philip E. Bourne, "Ten Simple Rules for Writing a PLOS Ten Simple Rules Article," PLoS Comput. Biol., vol. 10, no. 10 (October 23, 2014), Article no. e1003858, doi:10.1371/journal.pcbi.1003858.
Permanent Link to this article
Linked Keywords: Statistic; statistics; there are three kinds of lies: lies, damned lies, and statistics; Mark Twain (1835-1910); p-hacking; science; data analysis; measurement; measure; experiment; experimental; curve fitting; curve fit; data; goodness of fit; computer program; coefficient of determination; simple linear regression; Gnumeric; physical science; mathematics; preprint; website; arXiv; stat.AP - Applications; Biology; Education; Epidemiology; Engineering; Environmental Sciences; Medical Research; Physical Sciences; Quality Control; Social Sciences; stat.CO - Computation; Algorithms; Simulation; Visualization; stat.ML - Machine Learning; Classification; Graphical Models; High Dimensional Inference; stat.ME - Methodology; Design; Surveys; Model Selection; Multiple Testing; Multivariate Methods; Signal Processing; Image Processing; Time Series; Smoothing; Spatial Statistics; Survival Analysis; Nonparametric; Semiparametric Methods; stat.OT - Other Statistics; stat.TH - Statistics Theory; Asymptotics; Bayesian Inference; Decision Theory; Estimation; Foundations of Statistics; Inference; Statistical Hypothesis Testing; high energy physics; elementary particles; deception; physicist; standard deviation; truth; p-value; hypothesis; astronomy; orbit; star; Sagittarius A*; error bar; position; measurement; ellipse; elliptical; data; European Southern Observatory; Wikimedia Commons; editorial; PLoS Computational Biology; author; Carnegie Mellon University (Pittsburgh, Pennsylvania); Johns Hopkins University (Baltimore, Maryland); North Carolina State University (Raleigh, North Carolina); Harvard University (Cambridge, Massachusetts); University of California Berkeley (Berkeley, California); University of Toronto (Toronto, Ontario); "Ten Rules" articles on PLoS; Cartesian coordinate system; graph; citation; statistician; administrator; Michael J. Tarr; CMU's Department of Psychology; science; psychology; neuroscience; neurobiology; Robert E. Kass; collaboration; colleague; undergraduate education; student; faculty; research; academic publishing; paper; reproducibility; reproducible; information; statistical noise"; systematic error; variable; computer programmer; garbage in -> garbage out; automation; automated; data acquisition; laboratory instrument; digital filter; signal filtering; preprocessor; preprocessing; software; straight line; theory; design of industrial experiments; chemist; weight; milligram; analytical balance; microgram; materials scientist; astronomer; full moon; replication; funding of science; National Institutes of Health; Natural Sciences and Engineering Research Council Council of Canada; National Science Foundation; punch line; cartoon; Randall Munroe; xkcd Comics; Creative Commons Attribution-NonCommercial 2.5 License; xkcd 882.
Latest Books by Dev Gualtieri
Thanks to Cory Doctorow of BoingBoing for his favorable review of Secret Codes!
Blog Article Directory on a Single Page
- Levitation - March 27, 2017
- Soybean Graphene - March 23, 2017
- Income Inequality and Geometrical Frustration - March 20, 2017
- Wireless Power - March 16, 2017
- Trilobite Sex - March 13, 2017
- Freezing, Outside-In - March 9, 2017
- Ammonia Synthesis - March 6, 2017
- High Altitude Radiation - March 2, 2017
- C.N. Yang - February 27, 2017
- VOC Detection with Nanocrystals - February 23, 2017
- Molecular Fountains - February 20, 2017
- Jet Lag - February 16, 2017
- Highly Flexible Conductors - February 13, 2017
- Graphene Friction - February 9, 2017
- Dynamic Range - February 6, 2017
- Robert Boyle's To-Do List for Science - February 2, 2017
- Nanowire Ink - January 30, 2017
- Random Triangles - January 26, 2017
- Torricelli's law - January 23, 2017
- Magnetic Memory - January 19, 2017
- Graphene Putty - January 16, 2017
- Seahorse Genome - January 12, 2017
- Infinite c - January 9, 2017
- 150 Years of Transatlantic Telegraphy - January 5, 2017
- Cold Work on the Nanoscale - January 2, 2017
- Holidays 2016 - December 22, 2016
- Ballistics - December 19, 2016
- Salted Frogs - December 15, 2016
- Negative Thermal Expansion - December 12, 2016
- Verbal Cues and Stereotypes - December 8, 2016
- Capacitance Sensing - December 5, 2016
- Gallium Nitride Tribology - December 1, 2016
- Lunar Origin - November 27, 2016
- Pumpkin Propagation - November 24, 2016
- Math Anxiety - November 21, 2016
- Borophene - November 17, 2016
- Forced Innovation - November 14, 2016
- Combating Glare - November 10, 2016
- Solar Tilt and Planet Nine - November 7, 2016
- The Proton Size Problem - November 3, 2016
- Coffee Acoustics and Espresso Foam - October 31, 2016
- SnIP - An Inorganic Double Helix - October 27, 2016
- Seymour Papert (1928-2016) - October 24, 2016
- Mapping the Milky Way - October 20, 2016
- Electromagnetic Shielding - October 17, 2016
- The Lunacy of the Cows - October 13, 2016
- Random Coprimes and Pi - October 10, 2016
- James Cronin (1931-2016) - October 6, 2016
- The Ubiquitous Helix - October 3, 2016
- The Five-Second Rule - September 29, 2016
- Resistor Networks - September 26, 2016
- Brown Dwarfs - September 22, 2016
- Intrusion Rheology - September 19, 2016
- Falsifiability - September 15, 2016
- Fifth Force - September 12, 2016
- Renal Crystal Growth - September 8, 2016
- The Normality of Pi - September 5, 2016
- Metering Electrical Power - September 1, 2016
Deep Archive 2006-2008