Skip to Content

Of Mice (Studies) and Men

Here’s an article from Science on the problems with mouse models of disease.

or years, researchers, pharmaceutical companies, drug regulators, and even the general public have lamented how rarely therapies that cure animals do much of anything for humans. Much attention has focused on whether mice with different diseases accurately reflect what happens in sick people. But Dirnagl and some others suggest there’s another equally acute problem. Many animal studies are poorly done, they say, and if conducted with greater rigor they’d be a much more reliable predictor of human biology.

The problem is that the rigor of animal studies varies widely. There are, of course, plenty of well-thought-out, well-controlled ones. But there are also a lot of studies with sample sizes that are far too small, that are poorly randomized, unblinded, etc. As the article mentions (just to give one example), sticking your gloved hand into the cage and pulling out the first mouse you can grab is not an appropriate randomization technique. They aren’t lottery balls – although some of the badly run studies might as well have used those instead.

After lots of agitating and conversation within the National Institutes of Health (NIH), in the summer of 2012 [Shai] Silberberg and some allies went outside it, convening a workshop in downtown Washington, D.C. Among the attendees were journal editors, whom he considers critical to raising standards of animal research. “Initially there was a lot of finger-pointing,” he says. “The editors are responsible, the reviewers are responsible, funding agencies are responsible. At the end of the day we said, ‘Look, it’s everyone’s responsibility, can we agree on some core set of issues that need to be reported’ ” in animal research?
In the months since then, there’s been measurable progress. The scrutiny of animal studies is one piece of an NIH effort to improve openness and reproducibility in all the science it funds. Several institutes are beginning to pilot new approaches to grant review. For an application based on animal results, this might mean requiring that the previous work describe whether blinding, randomization, and calculations about sample size were considered to minimize the risk of bias. . .

Not everyone thinks that these new rules are going to work, though, or are even the right way to approach the problem:

Some in the field consider such requirements uncalled for. “I am not pessimistic enough to believe that the entire scientific community is obfuscating results, or that there’s a systematic bias,” says Joseph Bass, who studies mouse models of obesity and diabetes at Northwestern University in Chicago, Illinois. Although Bass agrees that mouse studies often aren’t reproducible—a problem he takes seriously—he believes that’s not primarily because of statistics. Rather, he suggests the reasons vary by field, even by experiment. For example, results in Bass’s area, metabolism, can be affected by temperature, to which animals are acutely sensitive. They can also be skewed if a genetic manipulation causes a side effect late in life, and researchers try to use older mice to replicate an effect observed in young animals. Applying blanket requirements across all of animal research, he argues, isn’t realistic.

I think, though, that there must be some minimum requirements that could be usefully set, even with every field having its own peculiarities. After all, the same variables that Bass mentions above – which are most certainly real ones – could affect studies in completely different fields. This, of course, is one of the biggest reasons that drug companies restrict access to their animal facilities. There’s always a separate system to open those doors, and if you don’t have the card to do it, you’re not supposed to be in there. Pace the animal rights activists, that’s not because it’s so terrible in there that the rest of us wouldn’t be able to take it. It’s because they don’t want anyone coming in there and turning on lights, slamming doors, sneezing, or doing any of four dozen less obvious things that could screw up the data. This stuff is expensive, and it can be ruined quite easily. It’s like waiting for a four-week-long soufflé to rise.
That brings up another question – how do the animal studies done in industry compare to those done in academia? The Science article mentions some work done recently by Lisa Bero of UCSF. She was looking at animal studies on the effects of statins, and found, actually, that industry-sponsored research was less likely to find that the drug under investigation was beneficial. The explanation she advanced is a perfectly good one: if your animal study is going to lead you to spend the big money in the clinic, you want to be quite sure that you can believe the data. That’s not to say that there aren’t animal studies in the drug industry that could be (or could have been) run better. It’s just that there are, perhaps, more incentives to make sure that the answer is right, rather than just being interesting and publishable.
Doesn’t the same reasoning apply to human studies? It certainly should. The main complicating factor I can think of is that once a company, particularly a smaller one, has made the big leap into human clinical trials, it also has an incentive to find something that’s good enough to keep going with, and/or good enough to attract more investment. So perverse incentives are, I’d guess, more of a problem once you get to human trials, because it’s such a make-or-break situation. People are probably more willing to get the bad news from an animal study and just groan and say “Oh well, let’s try something else”. Saying that after an unsuccessful Phase II trial is something else again, and takes a bit more sang-froid than most of us have available. (And, in fact, Bero’s previous work on human trials of statins seems to show various forms of bias at work, although publication bias is surely not the least of them).

37 comments on “Of Mice (Studies) and Men”

  1. Boghog says:

    Another thing to watch out for: variations in gut flora can have a big impact on the outcome of an animal studies. See for example pmid 21262117.

  2. anon1 says:

    In my experience, rarely are outbred animals used in animal models, sometimes/often by necessity (i.e. knockouts, Brattleboro rats, etc) but also because inbred lines give “cleaner, more reproducible” results. If you have to stack the deck so much for your to work in the lab, it probably won’t work in the clinic.
    Also, it seems to be the exception rather than the rule that the scientists/technicians scoring the model are blinded to drug vs. control

  3. Anonymous says:

    As a scientist, I find the backlash against certain reporting requirements disturbing to say the least. People seem to think that such requirements mean that you must adhere to a certain protocol, rather than a requirement to have a methods section with sufficient detail. It’s hard to reproduce conditions that you do not know. This must have something to do with the difficulty in repeating published work…

  4. Anonymous says:

    The over reliance on animal models could be attributed to your previous post on lipiskis anchor?
    If it were up to me i would get rid of them all except for basic tox.

  5. Charlie Kilian says:

    “industry-sponsored research was less likely to find that the drug under investigation was beneficial”
    That also tracks with the view that academia is morphing into a model where it is selling its intellectual property. It makes a certain amount of sense that if academics are under pressure to produce more intellectual property, they would be less likely to set up the more rigorous trials. The incentives just aren’t there. If the university can make money licensing research that produced results at p = .05, then what incentive do they have to improve?
    Disclaimer: I am neither in the industry, nor in academia. Just an interested outside observer. I could be way off base.

  6. Hap says:

    How do you know that something is due to species- and topic-dependent variability and not sloppiness in performing the studies if you haven’t tried to eliminate the methodological sloppiness as a factor?

  7. Erebus says:

    @4: That’s a totally inane comment. The early stages of drug discovery have no replacement for animal models. Clearly, some are better and some are worse, and we should certainly endeavor to work towards more relevant, more robust, more reproducible models… but we certainly can’t do away with animal studies entirely. There’s a lot to learn from them. Knockout mice are very useful tools, as well.

  8. Jack Shaftoe says:

    Another facet on top of animal studies not done with sufficient rigor (a serious problem – I agree), is the misrepresentation of pharmacological results from animal studies.
    When labs claim validation of their target from treating their animals with a compound and seeing a result, too often I find that they have not accounted for the PK of the compound at all (they have no idea how much of the compound was present in the poor animal) nor the selectivity of the compound.
    PK is important. If you dose a aldehyde containing compound with a half-life of 1 h only once in a 12 day study, I am not at all surprised that it gives a different result than a compound similarly dosed that has a half-life of 15 days, but I do not buy the stated conclusions that the results say anything at all about the targets the compounds are hitting.
    As examples of overselling the selectivity of a molecule, I continue to see that massive doses of valproic acid seem to prove that HDACx was involved in driving the phenotype of a model when the selectivity of that compound is suspect at best. Once I saw resveratrol used as an antagonist of AhR.
    Please pay attention to the properties of the small molecule tools that are used.

  9. Jim says:

    @#6 Hap: I don’t even think it’s an issue of sloppiness, just a matter of reproducibility. As a behavioral pharmacologist, I have seen many “standard” assays run in fairly different ways. I don’t think one way is necessarily “right”, but if I don’t understand how to assays differ, I cannot interpret data that are not equal (or similar, or close, or whatever).
    Clearly, there is lots of sloppy research that is done out there – the standards should be a step towards improving that across the board and it wouldn’t be prohibitive to implement them.

  10. Anonymous says:

    Why use animal models at all, when molecular dynamics simulations will tell you everything you need to know? In other words, just flip a coin, or consult the planets.

  11. Anon says:

    I think we should all be reading this as
    “work done recently by [the technicians that are willing to work] for Lisa Bero of UCSF”
    IMO, this is the same story we always hear. Reproducibility is an issue. We can always trace this back to the low pay, no career development, no benefits, borderline abused individuals that give results they are expected to give to their PI.
    But that is never addressed and hand-waved away.

  12. Virgil says:

    What the post misses, is that for many people (particularly outside the US), animal studies are difficult and expensive in the same way that clinical trials are.
    Sure, if you’re a physiologist in a big US University it’s easy to say “make the knockout mouse already”. Such people typically have no problem bemoaning their lack of access to clinical samples and the difficulties in translating their results to humans. They would do well to realize their privileged position and respect that those further down the ladder may be equally perplexed at the difficulties in translating INTO the whole animal, from cell culture.
    There are a TON of scientists working with (and publishing on) cell culture or even simpler in-vitro biological systems. For them, doing a mouse study is a major undertaking. We’re talking about people with degrees in very basic sciences such as biochemistry, who often have no training whatsoever in whole animal physiology. For them, “translational” means doing it in any in-vivo model. It requires dealing with the IACUC and other such regulatory entities. For a basic in-vitro scientist (e.g. cell biologist), this can be just as scary as the IRB is to a large mammalian physiologist. When you run a lab with all cell lines and engineered proteins and expression constructs, moving everything to a whole animal model is expensive, and not something you do on a whim.
    So yes, animals don’t often translate to humans, but let’s not forget there’s another massive obstacle further back down the chain, where cell culture results don’t translate into a whole animal system. How many things kill cancer cells in a dish? How many of those have been proven in mouse tumor models?

  13. Hap says:

    Sloppiness was the wrong word to use, because I assumed that there was one right way to do the studies that wasn’t being followed, when there’s not.
    It would still help for people to know how much variability is present after accounting for the basic methods before assuming that the variability is inherent in the nature of the study and the phenomenon.

  14. MILFshake says:

    Another compelling argument made in this paper:
    http://www.pnas.org/content/early/2013/10/28/1313476110.abstract
    by Valen Johnson recently is that our standards for significance are much too lax. He advocates P

  15. NJBiologist says:

    @5 Charlie Kilian: “If the university can make money licensing research that produced results at p = .05, then what incentive do they have to improve?”
    I think you’re on the right track, but you’re underestimating the scope of the issue. Grants and tenure come from publications; publications come from positive results. The result is a massive bias for positive results.

  16. Reverend J says:

    Whoa, wait a minute…are you telling me that mice aren’t people?
    *Mind blown*

  17. ScientistSailor says:

    Lennie said, “I might jus’ as well go away. George ain’t gonna let me tend no rabbits now.”

  18. RKN says:

    Whoa, wait a minute…are you telling me that mice aren’t people?
    *Mind blown*

    It’s worse — mice held in captivity. The argument we hear so often is, “Surely we can learn something from all these mouse experiments.”
    No doubt, just very little in the way of translatable human biology. If you ask me the goal is (or ought to be) translation to humans, not merely reproducibility of the experiments.
    It’s like proposing to study human sexual psychology and your cohort is a prison population.
    And if coming into the animal room and turning on lights, slamming doors, or sneezing is all it takes to foul an experiment, that doesn’t say much for the robustness of any result, and even less for translatability. Can you say over-fitted?

  19. Cellbio says:

    Don’t agree Erebus. For reasons mentioned above and the fact that most models are highly dynamic induced states of exaggerated physiology instead of a disease process, there is not a lot to learn about human disease but rather only something to learn about what are the pressure points for the constructed dynamic process. Is a adjuvant induced arthritis model anything more than an immunization model? No, not much more. Has curing mice of tumors helped pick winners? No. How about asthma models, nope. on and on.
    If successful model treatments fail to predict clinical success, why would we assume failure in the model(non-trivial failures)means we should not go forward into humans? If the mechanism is supported by human biology or genetics, then to hell with the animal models.

  20. Lyle Langley says:

    @#8, Jack Shaftoe..
    Couldn’t agree more. I can’t tell you how many grants I’ve reviewed that do not have any PK data attached to their “stunning” in vivo efficacy slides. This isn’t isolated to academia either, I’ve reviewed many a grant for a major foundation and industry is just at fault (large and small). I it such an issue that even before I start my critique the scientific officers know what my first criticism is going to be. It is especially painful in the CNS arena. People looking at no PK – or simply plasma levels with no other correlating piece of information and that is proof it works in the brain.

  21. Jack Shaftoe says:

    @#20 – Lyle – Please keep holding the grant writers accountable. PD without PK just doesn’t mean much of anything.
    Is it that PK analysis is not widely available?

  22. newnickname says:

    VERY interesting problem but I haven’t read the cited articles yet. 1. Do they say anything about animals and tox studies? 2. In the Pipeline had a story about animals and immunology on February 13, 2013. “Mouse Models of Inflammation Are Basically Worthless. Now We Know.” (Add a hotlink?) 3. I often bring up Gerald B Dermer who made a case about how worthless and misleading cancer screening in cell culture can be in “The Immortal Cell” (Avery Press, 1995). 4. @18 RKN: “It’s like proposing to study human sexual psychology and your cohort is a prison population.” I volunteer to be a non-captive control as long as I get to choose my personal cohort.

  23. Erebus says:

    @19: I believe that what you’re suggesting would absolutely paralyze pharmaceutical research. We don’t need to test everything which clears tox on humans (which cannot be done; it is impractical in the extreme,) what we need is better models. There’s lots of room for improvement in those models, the community realizes that we’ve got a problem with a lot of ’em, and it’s an issue that’s being worked on.
    …Obviously the xenograft rodent tumor model is garbage, and I don’t suppose that the OVA-induced asthma models are much better, but that doesn’t mean that all animal models of disease are therefore worthless. They simply need to be improved for relevance towards human disease, and standards need to be set for future reproducibility.

  24. Hap says:

    If none of the animal models work, why do we use them? I am assuming that people’s lack of desire to admit they don’t know something and institutional inertia are significant causes, but not sufficient or exclusive ones.
    If we don’t get any useful data from animals (other than brute toxicity), then couldn’t we just wave a dead chicken over a drug candidate and get the same success rate aa lot more cheaply?

  25. bacillus says:

    What this article fails to address is why rodents have become the overwhelming models of choice for life scientists. I contend that it has been done in part to assuage the moderate animal welfare organizations, and to save money rather than because rodent models are superior to all other alternatives. Therefore, I think it is more important to determine on a disease by disease basis what animal models are most likely to mimick the human disease and its response to treatment. Scientifically, if it turns out
    that dogs, cats, pigs, or monkeys are more appropriate for developing therapies against many human diseases, is the scientific community prepared for the consequences? I don’t think so since it was our own capitulation and convenience that got us into this mess in the first place. I shudder to think that this law of unintended consequences has meant that many millions of mice and rats have been squandered needlessly when exponentially fewer “higher” mammals could have led to better translational odds.
    FTR. I posted this up for comment at Science 5 hours ago, so I assume it was binned there.

  26. Anonymous says:

    Sigh, drug discovery was so much easier when we could just inject random stuff into slaves and prisoners. Damn human rights folks!

  27. Anonymous says:

    “why rodents have become the overwhelming models of choice for life scientists?”
    Time.
    Because they breed, grow and die a lot faster than dogs, cats, pigs, or monkeys.

  28. Hap says:

    The whole point is to avoid testing in people first, without knowing as much as you can about what you’ll get – it’s expensive, and people are unpredictable. Animals are supposed to act like us, so we’d rather use them than people, but if they aren’t, then we need something else – human cell lines or artificial organs or something that acts like us. I don’t think anyone wants to carpet-bomb candidates into people, because then you won’t have anyone to test your good candidates on.
    Besides, slaves and prisoners probably don’t look physiologically like the populations they would be used to represent, so that testing in them wouldn’t be helpful (assuming helpful doesn’t involve yet another 8th Amendment impalement)

  29. Jim says:

    Yes, animal models need improvement, and at least in my field have made significant improvements over the past decade. Clinical trials also need improvement, but that’s another topic. For anyone who says they’re worthless, the questions I have are 1.) how else do you select your lead candidate from the 10-100 NMEs that have met all criteria in your screening tree? and 2.) can you tell me of any drugs that have been shown to work in the clinic that failed in animal models? I know that seems counterintuitive, but there are lots of drugs that were approved for indication A and later were found to be effective for treating indication B, and then were also shown to be effective in animal models. (Take for example, gabapentin.) In the end, there may be some false positives with animal models, but if there aren’t false negatives, their use is still appropriate.

  30. Anonymous says:

    “Animals are supposed to act like us, so we’d rather use them than people … slaves and prisoners probably don’t look physiologically like the populations they would be used to represent, so that testing in them wouldn’t be helpful”.
    Are you saying that animals are more similar to human patients than slaves and prisoners are?

  31. Anonymous says:

    “can you tell me of any drugs that have been shown to work in the clinic that failed in animal models?”
    That’s a dumb question, given that nobody would ever try to test any compound in the clinic if it failed in animals.

  32. Hap says:

    No, but if you’re skipping to testing in people, then the people you’re testing had better look like the people you want to buy your drugs, otherwise you’re wasting lots of time and money and people’s lives doing something that doesn’t help you.

  33. Cellbio says:

    The criteria for testing agents in clinical trials would not just be that a compound clears tox. There always has to be significant data supporting potential benefit to warrant testing in humans. Those data can, and should, and from a regulatory sense must include testing in animals, just not disease models. One can show a drug has a biochemical or pathway impact after dosing and associated with blood or tissue levels distinct from adverse events. Whether this pharmacological impact “cures” disease in animals is irrelevant to moving forward in humans.

  34. hmmmmm.... says:

    @#31 SSRIs are used to treat anxiety disorders in the clinic but don’t work in most animal models of anxiety.
    Another point: have the ARRIVE guidelines (see http://www.nc3rs.org.uk/page.asp?id=1357) fallen into a blackhole in the US? This whole area has been debated, published on and signed up for by a broad range of journals and now the NIH wants to do it again? I’ve not read the Science article (paywall) but if the author has not mentioned ARRIVE then it will be a shoddy poorly researched piece of work.
    Yes, these problems need to be highlighted and discussed but don’t ignore the fact that this was all being seriously discussed and acted upon 3 years ago.

  35. Pete Kissinger says:

    I very much like what Jack Shaftoe has to say here. It is difficult to generalize across the landscape of diseases and toxic reactions to xenobiotics. Animals are excellent for many macroscopic features, but are less reliable on quantitative details. One area I have worked on for 20 years is the question of how to get a sample from a mouse or rat or pig or monkey without the stress response to sampling dramatically distorting the observables in the sample. This effect is large and often ignored because it cost more than traditional manual ways that were once “the only way” but are not today.
    We get complete PK data in a single mouse with no human in sight – this was impossible prior to about 2005.

  36. NJBiologist says:

    @35 hmmmmmmmmmmm….: I’ve run some SSRIs in stress-induced hypothermia and marble burying; they worked beautifully. Better than benzos, in fact.

Comments are closed.