Skip to Content

Mouse Models of Inflammation: Wrong Or Not?

I wrote here about a study that suggested that mice are a poor model for human inflammation. That paper created quite a splash – many research groups had experienced problems in this area, and this work seemed to offer a compelling reason for why that should happen.
Well, let the arguing commence, because now there’s another paper out (also in PNAS) that analyzes the same data set and comes to the opposite conclusion. The authors of this new paper are specifically looking at the genes whose expression changed the most in both mice and humans, and they report a very high correlation. (The previous paper looked at the mouse homologs of human genes, among other things).
I’m not enough of a genomics person to say immediately who’s correct here. Could they both be right: most gene and pathway changes are different in human and mouse inflammation, but the ones that change the most are mostly the same? But there’s something a bit weird in this new paper: the authors report P values that are so vanishingly small that I have trouble believing them. How about ten to the minus thirty-fifth? Have you ever in your life heard of such a thing? In whole-animal biology, yet? That alone makes me wonder what’s going on. Seeing a P-value the size of Planck’s constant just seems wrong, somehow.

28 comments on “Mouse Models of Inflammation: Wrong Or Not?”

  1. Chad Orzel says:

    I’ve seen a paper in laser physics that reported a violation of a classical prediction by 100 standard deviations. That’s so big that Mathematica couldn’t convert it to a p-value…

  2. RegularReader says:

    “My apple is better than your orange.”
    Derek, you (on 02/17/14) and others have already commented on the over-use, misuse, and abuse of p-values. Perhaps this reflects an educational deficit–how many life science graduate programs require their students to take applied statistics courses?
    Hopefully, advances in primary cell culture, patient-derived xenografts, and organs/humans-on-a-chip will address many issues with current whole animal models.
    Quoting a former professor:
    “Where applicable, the best analog of your compound is the enantiomer.”

  3. Anonymous says:

    #2 RR you attribute this statement to your prof “Where applicable, the best analog of your compound is the enantiomer.” and would suggest the “where applicable” criteria must be pretty wide open as most enantiomers are likely to have activity and not necessarily in a positive or easily detectable fashion in early preclinical testing (thalidomide being the classic example). I for one would be very careful of the context before application of that principle

  4. a. nonymaus says:

    Of course your correlations improve when you suppress all the data points that are near an axis, which is what the authors did. Basically, they have established that things that are correlated are correlated. On the other hand, if there are a lot of other things that are not correlated, those things will lead to a bad model.

  5. Oblarg says:

    @2
    It most certainly is an educational deficit. It’s the same reason you can’t trust any published results in nutrition – the people writing them have no fundamental understanding of statistics or experimental design.
    Math education in general is in tatters, and our science is worse for it.

  6. kjk says:

    Most papers just do p

  7. Oblarg says:

    @4
    That’s not strictly true – one can easily generate pathological data sets which have almost no data near an axis yet have zero correlation.
    It’s worth noting that correlation coefficients are, to some degree, wonky and misleading whenever you’re working with any system that isn’t described by a simple linear relation.

  8. Biff says:

    I’ve done a lot of gene expression analyses over the years. From the beginning, I’ve been impressed with how wildly different the results can be when comparing rat data with mouse data (collected in the same set of experiments by the same set of hands). Obviously a rat is not a mouse and vice versa, but the difficulty of extrapolating between closely related rodents suggests the need for real caution when extrapolating between rodents and humans. Well, at least some humans.

  9. Barry says:

    Upjohn spent a few $hundred million (sounded like a lot of money at the time) on a small-molecule blocker of degranulation. Worked impressively in mouse models of inflammation. Showed zero efficacy in human disease–except for one family group in Finland.

  10. Argon says:

    @8 Biff
    Hmmm. Sequencing papers suggest the house mouse / Norway rat split was about 12-24 mya and that the human and mouse lineages split about 75 mya. So if the mice/rat comparison experiments are a bit dodgy, multiply that by 3 or so for comparisons with humans…?

  11. Harrison says:

    @9
    That’s more of a criticism of where the hypothesis for the model comes from rather than the animal itself. I would argue that some of the failed Alzheimer’s drugs may work in early on-set Swedish Alzheimer’s patients. At least this hypothesis is being tested by the DIAN trials.

  12. luysii says:

    Hopefully the inflammation work in mice is more applicable to man that similar work on stroke in animals which was an unmitigated disaster — no treatment showing efficacy in animals was of any use when tried in humans.
    Even as early as 24+ years ago, of 25 different compounds of proven efficacy for treating focal and global brain ischemia over the past 10 years based on published articles in refereed journals, NONE has worked in clinical trials in man [ Stroke vol. 21 pp. 1 – 3 ’90 ]

  13. Cellbio says:

    Inflammation models are really pretty close in my opinion. That is, close to inflammation in humans. The problem is that most auto-immune models are essentially inflammation models (acute stimulation of immune cell migration and activation) and therefore many actives in these models do not address pathophysiology of more complex auto-immune disease. They work for mechanistic components which can then be tested for therapeutic benefit in humans.

  14. RKN says:

    Haven’t read either paper, but it has been my experience working with commercial pathway/network software that the p-values associated with this or that discovered pathway/network/disease map are often extraordinarily small, as Derek mentioned. If the paper mentioned used commercial pathway software the authors may have just reported the p-values the software returned and never questioned them. You can get some fantastically small p-values when you evaluate significance on a hypergeometric distribution, which at least one software package I used, did.

  15. Anonymous says:

    look in the supplemental. The most extreme p values are e**-323!
    I smell a rat, no a mouse, well certainly not a human.

  16. matt says:

    #2 RR writes:
    “Hopefully, advances in primary cell culture, patient-derived xenografts, and organs/humans-on-a-chip will address many issues with current whole animal models.”
    How?

  17. pete says:

    I briefly looked at the 1st paper & don’t have immediate access to the 2nd so I’m in no position to comment on the data.
    Still, I’m stunned that there could be such a radical disconnect in microarray data interpretation. After all, there’s a common focus on immune system transcripts in particular. And both papers were sponsored for submission by heavy-hitters in immunology, so you’d think that the data handling by both groups would be sound & thorough.
    How could this be ??
    Is the analysis of gene transcript data really so treacherous ? Should we all go back to good old Northern Blots ?

  18. Anonymous says:

    “look in the supplemental. The most extreme p values are e**-323!”
    If p values are supposed to be the probability of obtaining a test result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true, then wouldn’t they have had to test at least 10^323 samples to get such a value?

  19. captaingraphene says:

    People seem to make the assumption that the authors have a sufficient grasp of statistics (and of mathematics in general), which is in my experience nowhere near warranted. Big names in say, immunology, unfortunately does not automatically equal a ‘big time’ understanding of statistical principles. Even a ‘star’ PI may make quantitative mistake after mistake, yet still get the paper published, solely because of his/her name and reputation.

  20. newnickname says:

    @2. RegularReader, Quoting a former professor:
    “Where applicable, the best analog of your compound is the enantiomer.”: That concept is usually attributed (Matthew Effect) to RB Woodward IN THE CONTEXT OF ORGANIC SYNTHESIS. In those days, there were frequent large scale resolutions to obtain chiral materials. What is the best way to test and optimize reaction conditions? On the precious compound with the natural configuration or on the 50% “waste” compound with the identical (other than chiral) chemical properties?

  21. Neo says:

    The real problem is not that p-values are small or that many leading chemists/biologists do not have a sufficient understanding of statistics.
    The real problem is that p-values are often an answer to a different and/or far easier problem of little practical significance.

  22. PUI Prof says:

    @10 Argon
    Interesting diversion, and I have not thought through the calculation. But my gut says a factor of 2^3 to 10^3 would be more realistic (probably closer to 2 than 10).

  23. clinicaltrialist says:

    This second paper is deeply flawed. As #4 said, they select only the genes that are correlated and run a p value. Duh!
    The goal of a model is to mimic the human conditions/diseases. If you filter out most of the genes that are upregulated in inflammation in humans, as these authors did, the p values mean nothing.

  24. captaingraphene says:

    @clinicaltrialist brings up a good point.
    How in the world did nobody in the peer review process question these things?

  25. anonymous says:

    Any time you see someone use the notation “p equals x”, you know they lack some fundamental statistics background. “p is less than” is the proper notation. “p equals x” is so very wrong when you understand the basics of p-values.
    I just glanced at each article and saw that cringe-worthy mistake all over the second article. I’m no expert in the data so their conclusions could still be correct, but they are not boosting confidence there.
    (I’m not using the actual equals and less-than symbols because I think it messes up the html here.)

  26. @18 says:

    For some perspective, that number of experiments would exceed the number of protons and electrons in the observable universe!
    http://en.wikipedia.org/wiki/Eddington_number

  27. Dana says:

    Regarding the small p-values, keep in mind that a comparison like a t-test is a signal to noise measure, and you can get very low p-values when the variability of the data is very low. For instance, try in Excel doing a t-test of two populations with values (1,1,1) and (2, 2.0001, 2), and you will get a t-test p-value of the order of 7E-18. So I have commonly seen quite low p-values, and often as an artifact of low variance, which may be spurious (chance). With the many comparisons that you tend to do in genomics/omics in general, that sort of stuff can happen, but it may not be important.

Comments are closed.