——> Follow us on TWITTER

P-values – in Nature (Regina Nuzzo. Scientific method: statistical errors. Nature, 12 February 2014; 506: 150-52)

EpiBlogMost epidemiologists are aware of some of the problems with P-values; they combine both effect size and sample size, they are not easy to interpret, in most studies they really have no meaning and in many cases they are simply grossly misleading causing mistakes that can be serious. Misinterpretation of P values has probably led to some of the most frequent and serious iatrogenic mistakes. Thousands, probably millions, have suffered from misinterpretations of the P-value. How does this arise? ‘Overlooking valuable treatment or risky exposures because the results were not statistically significant is not the fault of the p-value itself, but is a common problem that arises out of the use of p-values without adequate thought and scrutiny by scientific authors, peer reviewers and editors’, as stated our BLOG from December 2009. The main problem is taking absence of evidence to be evidence for absence of on an effect; to accept the null f P>0.05, perhaps even to believe you have proven the null.

Some improvements in using P values with care over the years have been seen. Reporting effect sizes and confidence intervals have often replaced P values in Epidemiologic Journals.  The International Journal of Epidemiology prefers manuscripts to be purged of statements about ‘significant’ or ‘insignificant’ findings, substituting ‘strong evidence for’,  or ‘no strong evidence for’ findings and guide authors to a readable explanatory paper (Sterne J, Davey Smith G.  Sifting the evidence. What’s wrong with significance tests. BMJ. Jan 27, 2001; 322(7280): 226–231).   But still, relying on a single number makes life easier and is appealing for many, including many editors and reviewers. The Nature article may help to stop this malpractice and the author (Regina Nuzzo) is not being shy. She states P-values have been likened to mosquitoes, the emperor’s new clothes or a tool of sterile intellectual rake. She even puts forward the rechristening to ‘statistical hypothesis inference testing’ because the acronym for that spells SHIT.

P values may be justified in some cases , like in the trials for which Fisher develop the concept , especially if a novel treatment is being examined for the first time, but P-values were never meant to be used the way they are being used today as stated in  Nuzzo’s paper.

Nature does recognize that Rothman back in 1989 when he started the journal, Epidemiology did his best to discourage P-values in its pages and Allen Wilcox has followed that practice.

A Bayesian approach to data analyses is also a way of avoiding some of the mistakes generated by P values. It is of unreasonable to start with a null hypothesis when existing evidence points toward a non null association.

Richard Royall states in the paper that there 3 questions a reader should ask after a study 1) what is the evidence? 2) what should I believe? and 3) what should I do? A P value will not help much in answering any of these questions and could often lead to severe misunderstandings and actions.

But do read the Nature paper and use it in your teaching, particularly if you are prone to ‘statistical hypothesis inference testing’.

—Jørn Olsen, Shah Ebrahim, Neil Pearce, Cesar Victora


Share Button