Hacking the p-Value
May 4, 2015
Important things are represented by single characters. Beyond the primary pronoun, "I," we have many physical and mathematical objects. In math, we have, π; the base of natural logarithms, e; the imaginary unit, i; and the golden ratio, φ.
In chemistry, we have the elements represented by single character symbols; namely, hydrogen, boron, carbon, nitrogen, oxygen, fluorine, phosphorus, sulfur, potassium, vanadium, yttrium, iodine, and tungsten. In physics we have the unit of elementary charge, e; the speed of light, c; the gravitational constant, G; the Planck constant, h; the Stefan-Boltzmann constant, σ; and the gas constant, R.
Psychologists and biologists have their own important single character quantity, p, for possibility, since their experiments are unlike the generally quantitative experiments of the physical sciences. Often, the only way that they can make sense from their results is through the use of statistics. Not that statistics are the savior of just life science experiments. Particle physics experiments rely on statistics, also.
Physicists strive to attain a result with a high confidence level. The existence of the Higgs boson has been confirmed at the 4.9-sigma level, meaning that there's only a one-in-a-million chance that the Higgs wasn't really detected. Biological objects are affected by too many environmental influences to give such clear results.
In many scientific disciplines, Fisher's null hypothesis significance test is the path to statistical truth. The null hypothesis, as its name indicates, is the hypothesis that your experimental variable had no affect on the observed outcome. The experimenter computes a probability, p, that the anticipated effect might still be observed even if the null hypothesis is true. If this p-value is very small, say 0.05, then the null hypothesis is rejected, and it's claimed that the experimental variable does affect the observed outcome.
This p-value validation has been used historically in published papers, but the method can be misleading. In February, 2015, the journal, Basic and Applied Social Psychology, declared that it won't publish papers that rely on the p-value method, or publish those that even mention the method. This journal warned authors in 2014 of its belief that the null hypothesis significance testing procedure (NHSTP) is invalid, but it allowed a grace period until this year. In answer to the question whether manuscripts with mention of p-values will be rejected automatically, the journal responded,
"No. If manuscripts pass the preliminary inspection, they will be sent out for review. But prior to publication, authors will have to remove all vestiges of the NHSTP (p-values, t-values, F-values, statements about 'significant' differences or lack thereof, and so on)."
The journal cited the p-test as an "important obstacle to creative thinking" that's dominated psychology for decades; and, it hoped that other journals would join in this ban on what's seen as an unneeded crutch. Shortly thereafter, the American Statistical Association (ASA) posted a comment on its web site that it was wary that such a p-value ban might have its own negative consequences. The ASA has formed a group of more than two-dozen "distinguished statistical professionals" to develop a statement on p-values.
Tom Siegfried, in his blog at Science News, quotes William Rozeboom, a philosopher of science, as saying that the p-test was "surely the most bone-headedly misguided procedure ever institutionalized in the rote training of science students."
A recent paper in PLoS Biology by biologists at the Australian National University (Canberra, Australia) and Macquarie University (New South Wales, Australia) concludes that scientists will sometimes "tweak" experiments and analysis methods to obtain a better p-value and thereby increase the likelihood of publication. The authors call this technique, "p-hacking," and it appears to be common in the life sciences. This conclusion is based on an analysis of more than 100,000 research papers in such diverse scientific disciplines as medicine, biology and psychology.
Says lead author, Megan Head of the Australian National University, you can't put too much blame on the scientists who perform p-hacking, as
"Many researchers are not aware that certain methods could make some results seem more important than they are. They are just genuinely excited about finding something new and interesting."
Typical research practices leading to p-hacking include doing analyses in the middle of an experiment to decide whether to continue the experiment; recording many variables, but deciding which are significant enough to report; dropping outliers; excluding, combining, or splitting groups after analysis; and stopping data taking once an analysis gives a significant p-value.
One reason for p-hacking is publication pressure. Prestigious journals accept papers that have statistically significant ("positive") results, and this appears to generate papers with false positive results that hinder scientific progress.[4-5] Early positive studies receive a lot of attention, while contradicting negative studies not as much. In multiple studies on the effectiveness of a pharmaceutical drug, too many p-hacked findings would make the drug would look more effective than it is.
Not surprisingly, the study found many papers citing p-values just over the acceptable threshold of significance. This is evidence that some scientists have adjusted their experiments and analyses to cross that important threshold. Says Head,
"This suggests that some scientists adjust their experimental design, datasets or statistical methods until they get a result that crosses the significance threshold... They might look at their results before an experiment is finished, or explore their data with lots of different statistical methods, without realizing that this can lead to bias."
Funding for this research was provided by the Australian Research Council.
- David Trafimowa and Michael Marksa, "Editorial: Publishing models and article dates explained," Basic and Applied Social Psychology, vol. 37, no. 1 (February 12, 2015), pp. 1-2, DOI: 10.1080/01973533.2015.1012991.
- ASA Comment on a Journal's Ban on Null Hypothesis Statistical Testing.
- Tom Siegfried, "P value ban: small step for a journal, giant leap for science," Science News, March 17, 2015.
- Megan L. Head, Luke Holman, Rob Lanfear, Andrew T. Kahn, and Michael D. Jennions, "The Extent and Consequences of P-Hacking in Science," PLOS Biology, vol. 13, no. 3 (March 13, 2015), DOI: 10.1371/journal.pbio.1002106. This is an open access paper with a PDF file available here.
- Scientists unknowingly tweak experiments, Australian National University Press Release, March 18, 2015.
Permanent Link to this article
Linked Keywords: Alphabet; character; pronoun; physical science; mathematics; mathematical; pi; π; natural logarithm; mathematical constant, e; imaginary unit, i; golden ratio, φ chemistry; chemical element; Periodic Table; symbol; hydrogen; boron; carbon; nitrogen; oxygen; fluorine; phosphorus; sulfur; potassium; vanadium; yttrium; iodine; tungsten; unit of elementary charge, e; speed of light, c; gravitational constant, G; Planck constant, h; Stefan-Boltzmann constant, σ; gas constant, R; Psychologist; biologist; probability; possibility; experiment; quantitative research; statistics; life science; particle physics; physicist; confidence interval; confidence level; Higgs boson; standard deviation; sigma; biology; biological; environment; environmental; science; scientific; discipline; Fisher's exact test; Fisher's null hypothesis significance test; null hypothesis; variable; p-value; Sir Ronald Aylmer Fisher (1890 - 1962); ANOVA; analysis of variance; Wikimedia Commons; scientific literature; published paper; Basic and Applied Social Psychology; scientific journal; author; manuscript; T-statistic; t-value; F-distribution; F-value; creativity; creative thinking; psychology; decade; American Statistical Association; unintended consequences; negative consequences; professional; Tom Siegfried; blog; Science News; philosophy of science; philosopher of science; rote learning; rote training; undergraduate education; student; PLoS Biology; Australian National University (Canberra, Australia); Macquarie University (New South Wales, Australia); scientist; tweaking; tweak; analysis method; scientific journal; research paper; medicine; Megan Head; evolutionary biology; laboratory; Regina Vega-Trejo; experiment; outlier; publish or perish; publication pressure; statistical significance; statistically significant; false positive; scientific progress; pharmaceutical drug; abstract; engineering; chemistry; Creative Commons Attribution License; Australian Research Council; xkcd.com/1478/; Randall Munroe; xkcd comic; Creative Commons Attribution-NonCommercial 2.5 License.
Latest Books by Dev Gualtieri
Thanks to Cory Doctorow of BoingBoing for his favorable review of Secret Codes!
Blog Article Directory on a Single Page
- The Wisdom of Composite Crowds - April 27, 2017
- J. Robert Oppenheimer and Black Holes - April 24, 2017
- Modeling Leaf Mass - April 20, 2017
- Easter, Chicks and Eggs - April 13, 2017
- You, Robot - April 10, 2017
- Collisions - April 6, 2017
- Eugene Garfield (1925-2017) - April 3, 2017
- Old Fossils - March 30, 2017
- Levitation - March 27, 2017
- Soybean Graphene - March 23, 2017
- Income Inequality and Geometrical Frustration - March 20, 2017
- Wireless Power - March 16, 2017
- Trilobite Sex - March 13, 2017
- Freezing, Outside-In - March 9, 2017
- Ammonia Synthesis - March 6, 2017
- High Altitude Radiation - March 2, 2017
- C.N. Yang - February 27, 2017
- VOC Detection with Nanocrystals - February 23, 2017
- Molecular Fountains - February 20, 2017
- Jet Lag - February 16, 2017
- Highly Flexible Conductors - February 13, 2017
- Graphene Friction - February 9, 2017
- Dynamic Range - February 6, 2017
- Robert Boyle's To-Do List for Science - February 2, 2017
- Nanowire Ink - January 30, 2017
- Random Triangles - January 26, 2017
- Torricelli's law - January 23, 2017
- Magnetic Memory - January 19, 2017
- Graphene Putty - January 16, 2017
- Seahorse Genome - January 12, 2017
- Infinite c - January 9, 2017
- 150 Years of Transatlantic Telegraphy - January 5, 2017
- Cold Work on the Nanoscale - January 2, 2017
- Holidays 2016 - December 22, 2016
- Ballistics - December 19, 2016
- Salted Frogs - December 15, 2016
- Negative Thermal Expansion - December 12, 2016
- Verbal Cues and Stereotypes - December 8, 2016
- Capacitance Sensing - December 5, 2016
- Gallium Nitride Tribology - December 1, 2016
- Lunar Origin - November 27, 2016
- Pumpkin Propagation - November 24, 2016
- Math Anxiety - November 21, 2016
- Borophene - November 17, 2016
- Forced Innovation - November 14, 2016
- Combating Glare - November 10, 2016
- Solar Tilt and Planet Nine - November 7, 2016
- The Proton Size Problem - November 3, 2016
Deep Archive 2006-2008