Skip to Content

The NIH Takes On Reproducibility

Here’s more on the problems with non-reproducible results in the literature (see here for previous blog entries on this topic). Various reports over the last few years indicate that about half of the attention-getting papers can’t actually be replicated by other research groups, and the NIH seems to be getting worried about that:

The growing problem is threatening the reputation of the US National Institutes of Health (NIH) based in Bethesda, Maryland, which funds many of the studies in question. Senior NIH officials are now considering adding requirements to grant applications to make experimental validations routine for certain types of science, such as the foundational work that leads to costly clinical trials. As the NIH pursues such top-down changes, one company is taking a bottom-up approach, targeting scientists directly to see if they are willing to verify their experiments. . .
. . .Last year, the NIH convened two workshops that examined the issue of reproducibility, and last October, the agency’s leaders and others published a call for higher standards in the reporting of animal studies in grant applications and journal publications. At a minimum, they wrote, studies should report on whether and how animals were randomized, whether investigators were blind to the treatment, how sample sizes were estimated and how data were handled.

The article says that the NIH is considering adding some sort of independent verification step for some studies – those that point towards clinical trials or new modes of treatment, most likely. Tying funding (or renewed funding) to that seems to make some people happy, and others, well:

The very idea of a validation requirement makes some scientists queasy. “It’s a disaster,” says Peter Sorger, a systems biologist at Harvard Medical School in Boston, Massachusetts. He says that frontier science often relies on ideas, tools and protocols that do not exist in run-of-the-mill labs, let alone in companies that have been contracted to perform verification. “It is unbelievably difficult to reproduce cutting-edge science,” he says.
But others say that independent validation is a must to counteract the pressure to publish positive results and the lack of incentives to publish negative ones. Iorns doubts that tougher reporting requirements will make any real impact, and thinks that it would be better to have regular validations of results, either through random audits or selecting the highest-profile papers.

I understand the point that Sorger is trying to make. Some of this stuff really is extremely tricky, even when it’s real. But at some point, reproducibility has to be a feature of any new scientific discovery. Otherwise, well, we throw it aside, right? And I appreciate that there’s often a lot of grunt work involved in getting some finicky, evanescent result to actually appear on command, but that’s work that has to be done by someone before a discovery has value.
For new drug ideas, especially, those duties hae traditionally landed on the biopharma companies themselves – you’ll note that the majority of reports about trouble with reproducing papers comes from inside the industry. And it’s a lot of work to bring these things along to the point where they can hit their marks every time, biologically and chemically. Academic labs don’t spend too much time trying to replicate each other’s studies; they’re too busy working on their own things. When a new technique catches on, it spreads from lab to lab, but target-type discoveries, something that leads to a potential human therapy, often end up in the hands of those of us who are hoping to be able to eventually sell it. We have a big interest in making sure they work.
Here’s some of the grunt work that I was talking about:

On 30 July, Science Exchange launched a programme with reagent supplier, based in Aachen, Germany, to independently validate research antibodies. These are used, for example, to probe gene function in biomedical experiments, but their effects are notoriously variable. “Having a third party validate every batch would be a fabulous thing,” says Peter Park, a computational biologist at Harvard Medical School. He notes that the consortium behind ENCODE — a project aimed at identifying all the functional elements in the human genome — tested more than 200 antibodies targeting modifications to proteins called histones and found that more than 25% failed to target the advertised modification.

I have no trouble believing that. Checking antibodies, at least, is relatively straighforward, but that’s because they’re merely tools to find the things that point towards the things that might be new therapies. It’s a good place to start, though. Note that in this case, too, there are commercial considerations at work, which do help to focus things and move them along. They’re not the magic answer to everything, but market forces sure do have their place.
The big questions, at all these levels, is who’s going to do the follow-up work and who’s going to pay for it. It’s a question of incentives: venture capital firms want to be sure that they’re launching a company whose big idea is real. The NIH wants to be sure that they’re funding things that actually work and advance the state of knowledge. Drug companies want to be sure that the new ideas they want to work on are actually based in reality. From what I can see, the misalignment comes in the academic labs. It’s not that researchers are indifferent to whether their new discoveries are real, of course – it’s just that by the time all that’s worked out, they may have moved on to something else, and it might all just get filed away as Just One Of Those Things. You know, cutting-edge science is hard to reproduce, just like that guy from Harvard was saying a few paragraphs ago.
So it would help, I think to have some rewards for producing work that turned out to be solid enough to be replicated. That might slow down the rush to publish a little bit, to everyone’s benefit.

28 comments on “The NIH Takes On Reproducibility”

  1. Pete says:

    Reproducibility was a major theme at the computer-aided drug design Gordon conference last month. Ensuring that data is openly shared is essential and for CADD it would also help to make software available.
    When $$ and IP are involved, there may be advantages in taking the risk that a study may prove irreproducible rather than waiting for the validation study.

  2. qetzal says:

    Let’s not forget John Ioannidis’ argument that most published research findings are false. If his analysis is correct, that’s even more reason we need to verify new, cutting edge results. And we should not be surprised when most of them fail that verification. Not necessarily because the original authors did anything wrong, but for the unavoidable reasons that Ioannidis describes.

  3. Jumbo says:

    I worked in pharma licensing and ‘wet’ due diligence was central to the process for one group I worked with. Our metrics were that two-thirds of data we had shared with us could not be fully reproduced. Sometimes it was a potency/tox issue, but a fair amount of the time (I reckon 50% of the failed studies, or one third of total) the results couldn’t be repeated. We were looking at both academic and startup projects, and both suffered from this. We never thought people were intentionally trying to mislead. Instead it was more a projection-type problem. They so wanted something to work, they discounted the data that didn’t support their hypothesis. As licensors, we could afford rose-colored glasses.

  4. Teddy Z says:

    Why is reproducibility so controversial. When I was in grad school I would get an interesting new result and take it to my boss, glancing up from his desk, he would say, that’s nice, does it reproduce? I understand with some very big studies, especially with animals and cells and such, that is very hard. Tough. Extraordinary results demand extraordinary proof, and sometimes that proof means an n> 2.

  5. Hap says:

    The very idea of a validation requirement makes some scientists queasy. “It’s a disaster”, says Peter Sorger, a systems biologist at Harvard Medical School in Boston, Massachusetts. He says that frontier science often relies on ideas, tools and protocols that do not exist in run-of-the-mill labs, let alone in companies that have been contracted to perform verification. “It is unbelievably difficult to reproduce cutting-edge science”, he says.

    But isn’t that the point of the game? A neat observation probably isn’t worth a glamour mag article – only things that have been seen again and again. If you can’t explain how to reproduce something, how do you know that you aren’t seeing a side effect of your fancy method or apparatus rather than a real phenomenon? To make things that can stand the test of time or at least that won’t blow down in the first breeze of spring?
    I can see this making an already competitive funding environment worse, but since the point of research grants is to make useful results that can be used to make knowledge, things, and jobs and not to make pretty pictures, CVs, and future retractions, I don’t see that much of a problem.

  6. Stuart says:

    There are probably many cases where the results are wrong, but not intentionally fraudulent. As Richard Feynman said in his talk about “Cargo Cult” science:
    “The first principle is that you must not fool yourself–and you are the easiest person to fool. So you have to be very careful about that. After you’ve not fooled yourself, it’s easy not to fool other scientists. You just have to be honest in a conventional way after that.

  7. Chrispy says:

    We really need a journal of negative results.
    It is hard to publish negative data — often it really isn’t a complete story. Most scientists working for any amount of time have had the experience of not being able to reproduce something in the literature. But usually the inclination is just to walk away from those results.

  8. JoJo says:

    They should have two types of journals: those that require results to be reproduced, and those that do not. Over time, the reliabilty of results from either journal type will factor into “impact”. Organic Syntheses does the job in chemistry. Is there a similar journal in the biological arena?

  9. DLIB says:

    Requirement that in new grant applications, only papers whose results have been replicated can be put in your BIO section??

  10. Anonymous says:

    “The very idea of a validation requirement makes some scientists queasy. “It’s a disaster”, says Peter Sorger, a systems biologist at Harvard Medical School in Boston, Massachusetts. He says that frontier science often relies on ideas, tools and protocols that do not exist in run-of-the-mill labs, let alone in companies that have been contracted to perform verification. “It is unbelievably difficult to reproduce cutting-edge science”, he says.”
    I understand that some tools will be unavailable in “run-of-the-mill” labs, but it is fundamentally your job to communicate the ideas and protocols. To be honest, I’m suspicious of anyone that is hesitant to support second-party verification requirements.

  11. NJBiologist says:

    @5 Hap, 9 Anonymous–Amen. Too many cutting edge (but not reproducible) findings become not cutting edge (but still not reproducible) after, say, five years. During those five years, the originator lab gets the cutting edge reputation; other labs get to waste time and money. Heads, the originator wins; tails, everyone else loses.

  12. Jim D says:

    Doesn’t basic statistical reasoning dictate that, given a large pool of studies and an accepted P value of ~0.01, around 1 in 100 will be false results? Up to 1 in 20 if the P value threshold is 0.05?
    And this presumes that everything else, e.g., DOE, data collection, analytical techniques, etc., are all on the up & up.

  13. RM says:

    Anonymous@9 “To be honest, I’m suspicious of anyone that is hesitant to support second-party verification requirements.”
    I honestly doubt any scientist is against second-party verification. What they’re against is
    1) Requirements that *I’ll* be forced to spend time and effort being the second-party for someone else’s verification.
    2) Requirements that I sit on a publication because I can’t find a willing second party to do the verification, or because the second party I could find keeps fumbling the results.
    3) Requirements that make me pay out of my shoestring budget for a second party verification, either to a commercial lab or to another academic lab.
    Anonymous@9, it’s very easy to say “this is what *you* should do”, but it’s very different when you have to say “this is what *I’m* willing to do to support it”. (If you pay attention to the industry grumbles on the topic, you’ll notice “It’s a travesty that the literature isn’t very accurate”, often takes a backseat to “We wasted time and money on this!”)

  14. David says:

    As someone who has worked for Pete Sorger’s collaborators, I can understand part of what he’s describing. At one point, we were working with 11-color flow cytometry data[1], and there was only one lab in the world that could produce that for us.

  15. Hap says:

    The problem with that argument is that authors want the credit for having found robust and useful results, but apparently don’t want to have to do the work to assure that they are actually robust and reproducible. Instead, they want someone else to pay with their time and money for the original authors not having done their job in the first place.
    If most authors had done good work in the first place, so that we didn’t have to guess which big results were real, or talk about failure rates of &gt 25%, we wouldn’t be talking about reproducibility requirements in the first place. People are tired of having to take care of the misbegotten spawn from the “publish and run” philosophy. If researchers aren’t willing to think hard enough about their work before submitting it, well, then this is what happens.
    Journal articles aren’t supposed to be a medium for enhancing profs’ CVs, or for telling people how you wish the world were; they’re an honest try at understanding how it is. If people aren’t willing to try honestly to understand the world, then perhaps they should find something more appropriate to do with their talents, like telemarketing.

  16. Anon says:

    “It is unbelievably difficult to reproduce cutting-edge science,” he says.”
    So how do I know “cutting-edge science” isn’t analogous to basketball trick shots? I don’t know you aren’t taking 20 shots and cutting the rest out. I want to see your blooper real (which is impossible to regulate in science) or see it replicated.

  17. qetzal says:

    @Jim D (#11):
    No, because that doesn’t factor in the prior probability that the thing you’re trying to show is true.
    To illustrate, suppose your general investigational approach to testing a hypothesis is sufficiently rigorous to achieve an overall false positive rate of 5%, and an overall false negative rate of 0%. Now apply that approach to 1000 different hypotheses, of which 20 are actually true, and 980 are actually false. Your approach will correctly confirm the 20 that are actually true, but it will incorrectly tell you that 5% of the 980 are also true. That’s an additional 49 ‘confirmed’ hypotheses that are actually false – more than 70% of the total ‘confirmed’ hypotheses.
    Note that this applies even assuming no bias whatsoever. You might argue that the rate of true hypotheses among all those tested is higher than the 2% in this example, which would indeed reduce the magnitude of the problem. But in the real world, we do have to contend with publication bias, investigator bias, etc. Moreover, cutting edge results will certainly have a lower prior probability by their very nature.
    Ioannidis’ work on this subject takes time to grasp. (At least it did for me.) But once you understand it, it’s not so easy to ignore.

  18. Anon says:

    I’d also like to follow this up with: How many people will be wanting to do this?
    1. Job will by definition not have any originality.
    2. Job will require ability and knowledge of latest experimental techniques
    3. Job will likely pay little because of 1 and 2.
    From an economic/social behavior perspective you are going to have some low paid individuals, that lack drive (were not able to “make it”).
    This sort of spells disaster…Unless the NIH has some extra money laying around for people to sacrifice their career?

  19. PUI Prof says:

    @14 David
    There is still a way to reproduce data when only one lab can do it — give the notebook to student number two and have them go through the experiment. Isn’t the reproducibility of science the thing that used to set science above snake oil? And how many of the retractions due to fraud that we read about discovered when someone from the same lab can’t reproduce the findings?

  20. Cellbio says:

    @13 RM,
    As someone who has looked at many technologies and done “validation” work (and see >%50 failure to repeat), I can confirm that indeed most academic scientists I’ve interacted with are quite hesitant to accept the message that their cure-for-death is bogus. I think it is safe to extend this observation to mean there is resistance to validation on a fundamental level. In the abstract, no one may be “against” it, whatever that means in detail, but they are not incentivized to correct the story (or get it right in the first place)or stop the publication, press and fame that gets them recognition, grants, speaker money etc.
    The problem is the system. It rewards high profile ah-hah findings as both academics and publishers compete to be first to press with no meaningful down-side to being wrong.
    As an industry person, not sure what you mean with the “grumbles” comment, as I think the two are linked, different sides of the same coin. In industry, almost all I do will fail, and wasting time is a pain, but it comes with the job. For me, the issue is not my time wasted, but a larger one of the transfer of scientific cache from strong scientists to snake-oil salesmen with scientific training with the accompanying transfer of public funding to these sellers of hope. The bigger issue is the failing of our ivory towers. Do they meet, with acceptable standards, the goals of educating the public well, being accessible to the broad population, and contributing to society through technological advancement? I think they still do, but I don’t like the direction things appear to be heading. Defending publishing without validation of the findings as disastrous leaves me scratching my head.

  21. pgwu says:

    #12 and #17. I got another reference on the topic from the class I am taking: Sterne JA and Smith GD. Sifting through the evidence—what’s wrong with significance tests? BMJ 2001; 322: 226-31. About 50 % of chance of a false positive when p is set to 0.05. One caveat is that many of those studies were looking for something small and were relying on statistics to differentiate. Chances or random errors dominate in those cases. I think that systematic errors dominate in biology work: wrong or inactive reagents, impurities, limited materials, and so on the wet lab side, and mis-placed data, cherry-picking data, or too much photoshop art on the analysis side.

  22. Dr. Manhattan says:

    “It’s a question of incentives: venture capital firms want to be sure that they’re launching a company whose big idea is real.”

  23. annonie says:

    22. Dr. Manhattan: Regarding Sirtris, GSK scientists at RTP in NC were not able to reproduce the reported data, and also noted that reversatrol was not selective for the presumed target(s). They did their proper due diligence, and were correct. Based on their results, at a presentation to the companies new and naive and academically biased research management, the internal researchers recommended that the data did not warrant a deal by GSK. They got it right. But, the recently appointed senior managers felt that work done inside a commercial company must be inferior to that done by the more creative biotechs, particularly if led in part by academics. So, the deal was done anyway, despite the inability of the internal scientists to support the reported claims.
    So, validation is indeed important, but WHAT and WHO one wants to believe can rule the day.
    And, is seems the same GSK leaders learned little from the Sirtris experience, considering their current issues in China.

  24. lzhi says:

    Last year’s Higgs Boson is an illuminating case. Yeah, lots of money, 6000 scientists, and huge government bureaucracy but they would have been laughed out of the room if they’d come in with 3 sigma results, let alone two. Isn’t it the peers in the peer review process that *should be* best able to decide what level of confidence a paper needs based on complete disclosure to them of the data, methods, and analysis? Isn’t the failure not simply due to over-eager PhD candidates, or an inadequate supply of MS students but rather a old-boys network which operates on the wink and a nod approach – I’ll scratch your back if you scratch mine? That is: the problem is deep and systematic. To me, it seems very analogous to the Banking/Mortgage crisis of 2008. It was the senior executives who were willing to allow the greed to overwhelm the system. I do not believe they didn’t know better, I do believe they were too weak to demand that “certain standards” be met. Industry has gone to a cookbook approach: ISO Standards that define procedures. These standards replace vague and incompetent management with by the numbers approaches. I know that one counterargument is how do we standardize it? If we knew what we were doing, then it wouldn’t be research! It seems to me that Bayesian methods would often work, and that such evaluations could be appended to the work as a quality control. I think some of this is familiar, surgeons and fighter pilots both were (and are) dismissive of the check-list approach to their jobs. And yet, many studies show that blind stupid algorithms often have statistically better success rates that the experts in the “art”. I suggest laboratory experimentation is not so cutting edge that error analysis can not be done, and done much better than we do now.
    First, establish a database for prior probabilities. Confidence limits will be wide/low unless sufficient documented, replicated data exists to narrow them. Its still going to come down to who do you trust, and how much, but it might be a step in the right direction.
    What process did the investigators follow? What documentation do they have to confirm that? What calibrations? What documentation of those calibrations? Once its routine, it will be only a minor annoyance – if its feasible at all.

  25. petros says:

    Some of the worst examples can be the reporting of what appear to be useful (to pharma) in vivo models that can be used to assess potential drug candidates.
    The asthma field is littered with models that proved to be highly unreproducible.

  26. Jonathan says:

    @Chrispy –
    There *are* already journals of negative results:

  27. NJBiologist says:

    @13 RM, @18 anon: your points make sense. Unfortunately, they are also a justification for the behavior that has brought us here.

  28. Morten G says:

    We don’t need journals of negative results, we need the journals who published the cutting edge bogus papers in the first place to accept articles that refute the result and link them up on their website. What’s the incentive of writing up your article about how you don’t see anything like that reported? Well, if it was in Science and you get a Science Replication paper out of it? There’s already multiple sections most journals, especially the high-impact ones.

Comments are closed.