Research integrity, and why bad science in biomedicine and agriculture has become such a problem

studies
Science depends on corroboration โ€” that is, researchers verify othersโ€™ results, often making incremental advances as they do so. ย The nature of science dictates that no research paper is ever considered to be the final word, but increasingly, there are too many whose results are not reproducible.ย  Explanations include the complexity of experimental systems, misunderstanding (and often, misuse) of statistics, pressures on researchers to publish, and the proliferation of shoddy pay-to-play โ€œpredatoryโ€ journals.

In 2011 and 2012, two articles rocked the scientific world.ย  One reported the attempt to reproduce the results of 53 preclinical research papers that were considered โ€œlandmarkโ€ studies.ย  The scientific findings were confirmed in only six (11%) of them.ย  Astonishingly, even the researchers making the claims could not replicate their own work.

The second article found that claims made using observational data could not be replicated in randomized clinical trials (which is why the latter are known as the โ€œgold standard). ย Overall, there were 52 claims tested and none replicated in the expected direction, although most had very strong statistical support in the original papers.

Subsequently, there has been more evidence of a crisis in scientific research: In a survey of ~1500 scientists, 90% said there were major or minor problems with the replication of experiments.

More recently, in 2015, 270 co-investigators published the results of their systematic attempt to replicate work reported in 98 original papers from three psychology journals, to see how their results would compare.ย  According to the replicators’ qualitative assessments, only 39 of the 100 replication attempts were successful.

Around the same time, a multinational group attempted to replicate 21 systematically selected experimental studies in the social sciences published inย the journals Natureย andย Scienceย between 2010 and 2015.ย  They found โ€œa significant effect in the same direction as the original study for 13 (62%) studies, and the effect size of the replications is on average about 50% of the original effect size.โ€

These failure rates for reports in prominent journals are astonishing — and worrisome, because false claims can become canonized.

Of course, technical problems with laboratory experiments โ€“ contamination of cell lines or reagents; unreliable equipment; the difficulty of doing a complex, multi-step experiment the same way, time after time; etc. โ€“ are one explanation, but another is statistical sleight-of-hand. One technique for that is called p-hacking: Scientists try one statistical or data manipulation after another until they get a small p-value that qualifies as โ€œstatistical significance,โ€ although the finding is the result of chance, not reality.

Australian researchers examined all the publicly available literature and found evidence that p-hacking was common in almost every scientific field. ย Peer review and editorial oversight are inadequate to ensure that articles in scientific publications represent reality instead of statistical chicanery. ย Another problem is that competing scientists often do not retest questions, or if they do, they donโ€™t make known their failure to replicate, so there are significant lacunae, or gaps, in the published literature โ€“ which, of course, mostly comes from universities and is funded by taxpayers.

Many claims appearing in the literature do, of course, replicate, but even those may not be reliable. ย Many claims in the psychology literature, for example, are only โ€œindirectlyโ€ replicated. If X is true, then Y, a consequence, should also be true. Often Y is accepted as correct, but it turns out that neither X nor Y replicates when tested anew.

Understandably, editors and referees are biased against papers that report negative results; they greatly prefer positive, statistically significant results. Researchers know this and often donโ€™t even submit them โ€“ the so-called โ€œfile drawer effect.โ€ย  Once enough nominally positive, confirmatory papers appear, the claim becomes canonized, making it even more difficult to publish an article that reports a contrary result.

The system thus perverts the method, the value of accumulated data, and the dogma of science.ย  It makes us wonder whether scientists who practice statistical trickery fail to understand statistics, or whether theyโ€™re so confident of the correct outcome that they take shortcuts to get to it.ย  If the latter, it would bring to mind the memorable observation about science by the late, great physicist and science communicator Richard Feynman, โ€œThe firstย principleย is that you must notย foolย yourselfย โ€“ and you are the easiestย personย to fool.โ€

Part of the canonization process often involves a meta-analysis, which is defined as โ€œa method for systematically combining pertinent qualitative and quantitative study data from several selected studies to develop a single conclusion that has greater statistical power [and that] is statistically stronger than the analysis of any single study, due to increased numbers of subjects, greater diversity among subjects, or accumulated effects and results.โ€

This is how itโ€™s doneโ€ฆ A computer search finds published articles that address a particular question โ€“ say, whether taking large amounts of vitamin C prevents colds. ย From those that are considered to be methodologically sound, the data are consolidated and carried over to the meta-analysis. If the weight of evidence, based on a very stylized analysis, favors the claim, it is determined to be real, or canonized.

The problem is that there may not be safety in numbers because many of the individual base papers are very likely wrong โ€“ the result of p-hacking and publication bias. ย Potential p-hacking can be detected by creating a โ€œp-curveโ€ โ€“ i.e., plotting the p-values for each of the papers included in the meta-analysis against the โ€œrankโ€ — the integers 1,2,3โ€ฆetc., up to the number of papers.ย  The first figure below, for example, plots a meta-analysis in which there were 19 papers; in the second figure, the meta-analysis included 14 papers.

Follow the latest news and policy debates on sustainable agriculture, biomedicine, and other ‘disruptive’ innovations. Subscribe to our newsletter.

Recall that a p-value measures the likelihood that an effect is real, as opposed to having occurred by chance. The smaller the p-value, the more likely the effect is real.

If the resulting p-curve looks like a hockey stick, with small p-values on the blade and larger p-values on the handle (as in the two figures below), there is a good case to be made for p-hacking.

The figures below are derived from meta-analyses of the supposedly beneficial effects of omega-3 fatty acids and the alleged direct relationship between sulfur dioxide in the air and mortality, respectively that were presented in a major medical journal and claimed a positive effect.ย  There are, indeed, several small p-values reported and, taken alone, they would indicate a real effect. But there are more p-values greater than 0.05, which indicate no effect. ย Both cannot be correct. ย Inasmuch as there are many more negative studies and p-hacking is the logical explanation for the presence of a small number of low p-values, the most likely conclusion is that there is no effect.ย  Thus, the meta-analyses yield false-positive results.

These examples are all too common.ย  The sad truth is that much of published science and the canonized claims resulting from it are likely wrong, and it is incumbent on the scientific community to find solutions.ย  Without research integrity, we donโ€™t know what we know.

unnamed file

 

Dr. S. Stanley Young is a statistician who has worked at pharmaceutical companies and the National Institute of Statistical Sciences on questions of applied statistics. He is an adjunct professor at several universities and a member of the EPAโ€™s Science Advisory Board

Henry I. Miller, a physician and molecular biologist, is a Senior Fellow at the Pacific Research Institute in San Francisco. He was the founding director of the FDAโ€™s Office of Biotechnology.ย Follow him on Twitterย @henryimiller

{{ reviewsTotal }}{{ options.labels.singularReviewCountLabel }}
{{ reviewsTotal }}{{ options.labels.pluralReviewCountLabel }}
{{ options.labels.newReviewButton }}
{{ userData.canReview.message }}

Related Articles

Infographic: Global regulatory and health research agencies on whether glyphosate causes cancer

Infographic: Global regulatory and health research agencies on whether glyphosate causes cancer

Does glyphosateโ€”the world's most heavily-used herbicideโ€”pose serious harm to humans? Is it carcinogenic? Those issues are of both legal and ...

Most Popular

Screenshot-2026-04-13-at-1.39.26-PM
Viewpoint: โ€˜Safer for children?โ€™ Stonyfield yogurt under fire for deceptive organic marketing
Screenshot-2026-04-22-at-10.46.29-AM
Viewpoint: How to counter science disinformation? Science journalist offers 12 practical tips
ChatGPT-Image-May-7-2026-12_16_37-PM-2
Viewpoint: Are cancer rates โ€˜skyrocketingโ€™ as RFK, Jr. and MAHA claim? The evidence says mostly the opposite
Picture1-14
When superbugs threaten vulnerable children: Can AI help solve antibiotic resistance?
Screenshot-2026-04-23-at-11.00.36-AM
Regulators' dilemma: Thalidomide, Metformin, and the cost of getting drug approvals wrong
ChatGPT-Image-May-12-2026-08_39_41-PM
GLP podcast: Big Pharma, Big Ag, Big Foodโ€”health harming industries or life-saving innovators?
Picture1-1
Cooling the planet with balloons: Could a geoengineering gamble slow global warming?
png-pill-omega-Supp-fish-oil
Millions take omega-3 fish oil for brain health. New research suggests it may do the opposite.
ChatGPT-Image-May-7-2026-12_32_36-PM
Viewpoint: The state of U.S. vaccine policy? Dismal nationally, but some states are stepping up.
glp menu logo outlined

Get news on human & agricultural genetics and biotechnology delivered to your inbox.