## P Value – Is this Gold Standard for Statistical Validity Really Valid? – Philosophy of Healing

Majid Ali, M.D.

When Wisdom Is Sacrificed at the Alter of False Science

P value is considered as the gold standard for establishing statistical validity of data presented in scientific papers. During 1970s through 1990s, we used P values to establish statistical significance of our data published in prestigious journals, including JAMA (The Journal of The American Medical Association), Lancet, American Heart Journal, American Journal of Clinical Pathology, and others ((me as the lead author). During these years, I often wondered why we had to bother to calculate P values when the data spoke elegantly for itself. However, the reviewers and editors had to be satisfied and P values simply did that.

**Lies, Damn Lies, and Statistics**

I do not know if Samuel Clemens (Mark T.) knew about P values when he weighed in on the subject of statistics. But if he did, the Old Man Mark was clearly on the mark. The P value has been stripped of its exalted stature, though it continues to be widely used to sell drugs of dubious value.

In 1920, the British statistician Ronald Fisher introduced the P value as an preliminary way to see if evidence was significant in the commonsense way, not as a definitive test for statistical validity (as it is now considered). Fisher’s approach was to run an experiment and see if the results were consistent with what random chance might produce—the simple notion now designated as null hypothesis. Next came playing the devil’s advocate and calculating the probability (the P value) of proving wrong the mere chance effect (null hypothesis).

T**he Gold Standard’ of Statistical Validity Not Valid Anymore**

On February 12, 2014, the journal Nature published a searing indictment of P value methods that must be read by all doctors who claim to practice evidence-based medicine. Nature, most scientists in the world agree is the most highly esteemed science journal in the wolrd. Please brace yourself as you read the following quotes from Nature’s article entitled “Statistical Errors – P values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume.”

* In 2005, epidemiologist John Ioannidis of Stanford University in California suggested that most published findings are false2; since then, a string of high-profile replication problems has forced scientists to rethink how they evaluate results.

* “P values are not doing their job, because they can’t,” says Stephen Ziliak, an economist at Roosevelt University in Chicago, Illinois, and a frequent critic of the way statistics are used.

* “Change your statistical philosophy and all of a sudden different things become important,” says Steven Goodman, a physician and statistician at Stanford. “Then ‘laws’ handed down from God are no longer handed down from God. They’re actually handed down to us by ourselves, through the methodology we adopt.”

* “What does it [P value] all mean? One result is an abundance of confusion about what the P value means.”

* “Critics also bemoan the way that P values can encourage muddled thinking.”

* “The P value was never meant to be used the way it’s used today.”

* “Perhaps the worst fallacy is the kind of self-deception for which psychologist Uri Simonsohn of the University of Pennsylvania and his colleagues have popularized the term P-hacking; it is also known as data-dredging, snooping, fishing, significance-chasing and double-dipping. “P-hacking,” says Simonsohn, “is trying multiple things until you get the desired result” — even unconsciously.”

* “there are three questions a scientist might want to ask after a study: ‘What is the evidence?’ ‘What should I believe?’ and ‘What should I do?’ One method cannot answer all these questions, Goodman says: “The numbers are where the scientific discussion should start, not end.”

* “Nature special: Challenges in irreproducible research. These are sticky concepts, but some statisticians have tried to provide general rule-of-thumb conversions (see ‘Probable cause’). According to one widely used calculation5, a P value of 0.01 corresponds to a false-alarm probability of at least 11%, depending on the underlying probability that there is a true effect; a P value of 0.05 raises that chance to at least 29%. So Motyl’s finding had a greater than one in ten chance of being a false alarm. Likewise, the probability of replicating his original result was not 99%, as most would assume, but something closer to 73% — or only 50%, if he wanted another ‘very significant’ result. In other words, his inability to replicate the result was about as surprising as if he had called heads on a coin toss and it had come up tails.”

Evidence-Based Medicine and Toxicities of Foods, Environments, and Thought

Next time you hear about claims of “evidence-based medicine” by doctors who limit their work to prescribing drugs, please do not engage them in a philosophical discourse. Simply suggest that they read this article or the Nature article cited here.

