Skip to main content

Biological News

The Microbiome and Human Obesity: Wait a Minute

For the last few years, it has been impossible to escape talk of the microbiome – the associated bacteria (and other organisms) that live in and on the human body. Overall, this attention has been a good thing, since it’s made people aware of just how bacteria-laden we are (not that everyone finds that a comfortable subject). And in some cases, particularly Irritable Bowel Syndrome, evidence is accumulating that microflora are a big part of the disease state, and might be a big part of treatment as well. It’s been known for a long time that antibiotic treatment can disrupt the gut biota, of course, with a particular case being C. difficile colitis. But it’s quite plausible that less dramatic changes in bacterial populations could also have noticeable effects.

Proving that, though, and unraveling those effects, is something else again. One of the most-referenced possibilities is a connection between gut flora and obesity, and there have a been a number of studies suggesting one. Now, though, a meta-analysis of all this work is enough to make a person wonder. In mouse models, connections have been reported between obese states and ratios of different bacterial populations, or between obesity and alpha diversity of the gut microbiome in general. But here are the authors of this new review of the data:

. . .We performed an extensive literature review of the existing studies on the microbiome and obesity and performed a meta-analysis of the studies that remained on the basis of our inclusion and exclusion criteria. By statistically pooling the data from 10 studies, we observed significant, but small, relationships between richness, evenness, and diversity and obesity status, as well as the RR of being obese based on these metrics. We also generated random forest machine learning models trained on each data set and tested on the remaining data sets. This analysis demonstrated that the ability to reliably classify individuals as obese solely on the basis of the composition of their microbiome was limited. Finally, we assessed the ability of each study to detect defined differences in alpha diversity and observed that most studies lacked the power to detect modest effect sizes. Considering that these data sets are among the largest published, it appears that most human microbiome studies lack the power to detect differences in alpha diversity.

These are very good points, and they’re the sort of issues that come up in all areas of science. Does your study have the statistical power to safely draw the conclusions that you’re drawing? Too often, the answer is “No, not really”. In biopharma research, some of the worst offenders are too-small rodent studies, of which there have been a great many published over the years, but this issue goes all the way up to the design of human clinical trials (indeed, it’s perhaps the most crucial issue there). In the case of this microbiome work, the machine learning issue is also a complication, as the quote mentions.

That illustration shows what happened when the authors went back to each study, took the stated machine-learning method from each, and applied it to the data from all the other studies. You will notice that some of the combinations flip between roughly 10% accuracy and 90% accuracy, and (from what I can see) every single study can show up as well below 50% or well above 50% accuracy, depending on whose model is used. So that statement above is certainly correct; the ability to classify individuals as obese by these methods of microbiome analysis is “limited”, and I think we can call that an example of academic understatement. I would have trouble distinguishing the overall effects shown by these re-analyses from random noise, honestly.

This says something about machine-learning models as well as about the complexities of microbiome research. The authors tried, but “it was not possible to identify factors that predictably affected model performance”, and the conclusion that has to be drawn is that none of these models can be said to be any better or more useful than any of the others, and it would appear that none of them are very useful at all. This is something to keep in mind when you hear about the marvels that can be produced through machine learning. I actually have little doubt that marvels are possible through such routes, but “garbage in, garbage out” is a law of the universe that no one has ever been able to repeal. In this case, it’s more like “not enough data in, not enough conclusions out”, but that’s an unbreakable one, too.

It’s also true, though, that the data on microbiome/obesity connections in rodent models is a lot more robust than this, and it would appear that these studies are in conflict with what we’re seeing in humans. Mind you, we can’t do the sort of wholesale manipulations in people that are done in some of the mice, but you’d still expect to see more than this. The best hypotheses that the authors offer are that obesity signatures might vary a lot more from person to person than we realize, and that just looking at diversity and taxonomy through 16S ribosome sequences is not going to be enough to tease these things out. There might be a whole host of rather different microbiome populations that end up doing similar effects and similar-looking ones that do completely different things (perhaps through metabolite formation), or it might all depend crucially on each person’s own immune fingerprint. At any rate, once again, It’s Not As Simple As You Would Have Hoped.


39 comments on “The Microbiome and Human Obesity: Wait a Minute”

  1. kjk says:

    We really need automation as experiments are usually so labor intensive that the sample size is driven too small. Transcriptic is automating wet-lab but animal work is harder. Maybe a system that can handle/manage the animals automatically and automatically take weights, etc, so the people only do a couple of experiment-specific steps manually?

    1. NJBiologist says:

      Animal handling and data production are not large costs. A reasonably competent scientist/tech/grad student should be able to weigh a rat, do a basic health check and weigh the food jar in a minute. That means two hours in a day gets them 120 rats’ worth of data at daily intervals. Fecal collections are another minute per rat at this experimental scale.

      Cage charges, on the other hand, are likely to be limiting. From the point of view of a young faculty member, having every tech and student running fully powered studies very quickly results in an uncomfortable amount of your grants handed over to the university (no, that’s not counted in overhead). An example: $1/rat/day (University of Kentucky, non-barrier) with five people in the lab maintaining 60 rats each gets you a bit over $100k/year in animal care costs.

      1. halbax says:

        That is why many labs work with mice. The cost is more like $0.30 each day for a cage of four mice.

        1. In vivo veritas says:

          Halbax, I’m not sure that you want coprophagic rodents group housed when you are studying the gut microbiome. Your 4 mice just went up to $1.20 a day. 🙂
          Plus, the reason we use mice over rats (when we do) is typically easier access to genetic manipulation.

    2. Zenboy99 says:

      “Maybe a system that can handle/manage the animals automatically and automatically take weights, etc, so the people only do a couple of experiment-specific steps manually?”

      This is a system out there, I’ve seen it.

      1. I’ve also seen such systems but it’s hard to scale as the costs increase exponentially

  2. SirWired says:

    I’m no professional scientist, but I took one look at that scatter plot (before even reading the whole article) and said to myself: “Okay, so that’s what it looks like when your data is saying ‘back to the drawing board.'”

    From what I’ve seen on the literature for the Microbiome, it does not appear anywhere near advanced to any useful level besides as a selling point for dubious purveyors of various “natural” health products, except for the post-antibiotic C. Diff use-case.

  3. I didn’t see any of the work of Justin Sonnenburg (Stanford, linked) as part of that meta-analysis. I find compelling his anthropological microbiology approach to studying microbiome among those with indigenous diets compared to Western diets, identifying microbial species-specific enzymes and signals between the microbiome and innate immunity seemingly mediated by propanoate. That work also points to idiosyncratic factors in co-evolution of the microbiome with the host, as some “indigenous bacteria” transplanted to non-indigenous hosts are not conserved. His book about diet is on my to-read shelf.

    I think the ability to absolutely classify GOOD/BAD as pertains to microbiome composition is not yet accessible, but should be soon. A friend of mine wrote a proposal for a microfluidic-based sorter for microbial sample characterization, and I know there are multiple companies interested in that sort of thing. But that is what needs to be addressed – the resolution. Bacteria (as well as Archaea and single-celled eukaryotes) vary widely in cell size and shape and that can introduce error in pooled samples of isolated DNA.

    Finally, and in keeping with your closing thoughts, there was an interesting article published last week (linked) about fecal transplantation of IBS-D microbiome samples into germ-free mice and tracking systemic response, including innate and behavioral consequences. We’re asking the right questions now we just need better resolution on this fascinating area that, I believe, will provide a more nuanced view of many of the most difficult to tackle diseases in the purview of academia and pharma.

    1. PeptoidChemist says:

      Sorry, it was linked before I decided to switch to the IBS paper. Just google his lab and check the research page, lots of good research projects in this area.

  4. robz says:

    Bacteriophages present a very big complication.

  5. Me says:


    I’m liking the idea of the microbiome in epidemiology etc. very much, but it seems to be that every paper I’m reading now contains a hand-wavy reference to ‘microbiota’ every time they can’t explain something. Which seems to be, roughly paraphrased, the conclusion of the paper Derek is reviewing here?

  6. David Antonini says:

    Is it also possible that this is one of those cases where “mice are not humans” applies?

    1. John Wayne says:

      Still? Dammit.

    2. alphonse says:

      Or how about: “single genetic lines poorly represent wild populations”?

  7. Andy II says:

    “It’s also true, though, that the data on microbiome/obesity connections in rodent models is a lot more robust than this, and it would appear that these studies are in conflict with what we’re seeing in humans”

    Microbiome and host is much more complicated. And, microbiome analysis is still the “associative evidence” not the “true cause” of the physiological status of the host. A famous experiment, two group of mice with the same genetics were developed into one with skinny status and the other is obese by supplying the gut microbiome from a skinny boy and obese boy from the ‘identical twin.” (Science. 2013 Sep 6;341(6150):1241214. doi: 10.1126/science.1241214.) The study demonstrated the gut microbiome between two groups are clearly different. So, what’s happened when the obese mice treated with the gut microbiota from the skinny boy?

    1. theSkeptic says:

      RE: “the data on microbiome/obesity connections in rodent models is a lot more robust than this”

      I don’t think that’s true. Most use very abnormal approaches to normalization of weights between groups — and result in fractions of a percentage difference based on normalized starting weight/percent fat, etc.

  8. DH says:

    I find the microbiome a fascinating area of study. But at our current state of understanding, it seems there are just too many damn variables to make sense of it all. These include:

    – all the possible strains of gut bacteria.
    – all the possible mixtures of these strains.
    – all the possible dietary effects on gut composition, including probiotics, prebiotics, macronutrient distribution, micronutrient distribution, and who knows what else.
    – the fact that gut composition can change from day to day such that a fecal sample at one point in time might or might not be typical (if “typical” can even be defined with so many variables).

    So I guess I’m not surprised that a machine learning model fails to be predictive.

  9. AC says:

    ““garbage in, garbage out” is a law of the universe that no one has ever been able to repeal”

    Pigs are the first exception that come to mind. They have the amazing ability to convert garbage to pork.

    1. NJBiologist says:

      Also crabs and lobster, if you like seafood….

      1. oldnuke says:

        Pigs are mere amateurs compared to politicians, who can not only turn money into pork, but also turn pork into garbage.

  10. Luysii says:

    The fact that investigator A’s neural net works on his data, but not investigator B’s (and vice verse) data points to a much larger problem with neural nets and machine learning — we don’t understand how they do what they do. We can’t point to a specific connection (or connections) to say ‘aha that’s how it works’.

    Even when they a neural net works spectacularly (AlphaGo) we still don’t know how it works — for details see

    It is very likely that even possessing a wiring diagram of the cerebral cortex will leave us in the same state — you can call this the neurological/information theory uncertainty principle if you wish

    1. Imaging guy says:

      Machine learning experts accept the fact that it is black box in nature. But they claim that there is a difference between prediction and causation. According to them, machine learning is for prediction which in itself is useful and you don’t have to explain why or how it works. All you need is the algorithm to predict something correctly. That is explained in an article called “Prediction Policy Problems (PMID: 27199498)” by Ziad Obermeyer. 3rd February, 2017 issue of Science was a special issue dedicated to “Prediction” and there are articles (“Beyond prediction: Using big data for policy problems” and “Prediction and explanation in social systems”) explaining difference between prediction and causation.

      1. luysii says:

        But how do investigators A and B figure out what went wrong when they tried to apply their neural net to the other’s data? Clearly something is wrong with both. This is an important problem with no obvious solution (to me at least). Could it be that the nets are overtrained on one data set?

        1. Imaging guy says:

          You are right. Machine learning is trying to solve something called inverse problems where you are using observed data to build models/algorithms. Since there are many different ways to fit the data to the model, the problem is said to be “ill-posed”. They use separate training and validation data sets to solve overfitting problems but it seems that the problems persist. This is what Nobel Laureate Sidney Brenner wrote, “the new science of Systems Biology claims to be able to solve the problem but I contend that this approach will fail because deducing models of function from the behaviour of a complex system is an inverse problem that is impossible to solve” (1). Of course, machine learning experts would disagree with that statement. This is what they believe, “ultimately, these researchers argue, the complex answers given by machine learning have to be part of science’s toolkit because the real world is complex: for phenomena such as the weather or the stock market, a reductionist, synthetic description might not even exist” (2). One physicist even said, “you would just throw data at this machine and it would come back with the laws of nature (2). Let’s wait and see how this will turn out.
          1) “Sequences and consequences” PMID: 20008397
          2) “Can we open the black box of AI?” Nature, 2016 (PMID: 27708329)

  11. great article thanks

  12. imaging guy says:

    Microbiome is new epigenetics or metabolomics. BMS invested in a French microbiome company to develop a method to identify cancer patients who would respond to its immuno-oncology drugs using microbiome data, which made me remember this cartoon.

  13. Michael Nute says:

    The problem with these studies is primarily that they are approaching the analysis by first distilling all of the microbial complexity down to abundance levels among a taxonomy that is probably far too coarse to see the differences. Machine learning models can make the differences appear clearer than they are, which is what’s being pointed out in the Schloss meta-study (and this post). But the point about the data being more robust than this is also true, and the most likely problem is that we’re just looking at it with too weak of a lens. As the cost of shotgun sequencing comes down and better methods of analyzing it are developed, the picture will eventually come into focus. For now though it’s clear to me at least that the usefulness of 16S data to analyze this kind of sample is essentially maxed out. That’s disappointing to researchers used to getting a Nature paper out of a couple of PCA plots, but that low-hanging fruit has all been taken.

  14. Thomas Lumley says:

    One correction: they didn’t “[take] the stated machine-learning method from each”. They built their own random forest predictor for the data from each study separately and then applied it to the data from the other studies

    1. eub says:

      Ah. Yeah that’s different.

      So if the basic problem is that studies’ datasets are statistically different and you can’t generalize between them, you’d expect each accuracy for “train on dataset X, predict on dataset X (validation set)” to be good, and accuracy for “train on X, predict on Y” to be bad. Eyeballing the plot, it looks… somewhat like that. Sometimes the X/X point is highest. Sometimes it’s not, though. This suggests that the way they’re doing their random-forest model just isn’t so hot.

      On each graph they ought to include a reference for what the original paper reported with their own model. Only if the random-forest learner roughly replicates that is it interesting whether it can cross-predict other datasets.

  15. 💩 says:

    I’m thin I’ll sell any of you a stool sample for 0.5 bitcoin

  16. Pennpenn says:

    “It’s Not As Simple As You Would Have Hoped.”

    I’ve always gotten the impression from science that if something is as simple as you would have hoped, that’s a reason to be automatically suspicious. And if someone is claiming that something is simple, then that’s a red flag, ESPECIALLY if they’re trying to sell you something…

  17. Histology guru says:

    I think part of the reason that the animal studies give more power is that you are dealing with such a lower amount of genetic variability. All the rodent strains are homogeneous compared to any human sample population. If one could do this on an isolated population with less genetic variation, then you might see an effect. Plus, I agree with comments above that more details on the exact strain rather than just a 16S signature may be additionally needed.

  18. Anon says:

    Did you know that jelly beans cause acne? But only the green ones mind you.

    Post hoc data dredging on steroids.

    1. Anon says:

      And with more variable degrees of freedom than observations.

  19. Elliott says:

    I suggest that they switch to a rat model with a wee bit of y. pestis

  20. Christophe Verlinde says:

    The microbiome of the N-Korean population seems to be very well balanced – no obese people.

  21. Mach4 says:

    Don’t blame messing up the gut flora only on antibiotics, as the civilized world contains them and 14M other xenobiotics that we absorb from all tangible consumers goods, and the compounds aren’t pretty. Flame retardants are more toxic than any low level antibiotic McDonalds may serve you, and while changing the microbiome may also masculinize females and feminize males, while increasing our body masses.

    Whats more striking are the low MW compounds that come from HFCS and crap foods that can impede neuronal function and development and bad microbiome players and are a large scale experiment being perpetrated on the billions on this planet- the likes of which have yet to be fully realized. Then there are the PFOAs still in use in food packaging just so you don’t notice how greasy the burger wrappers are.

    So next time at a fast food restaurant ask them about the perflourooctanoic acids in their food packaging, or the PBAs in cash register tape (except for Whole Foods- Thx!),
    Then wonder what your microbiome thinks when they absorb such alien molecules, which not only can make you fat, but stupid as well.

  22. DTX says:

    For an excellent overview that demystfies machine learning, UPenn has been doing an excellent series. All of the talks are recorded.

    The Sept 22 talk by Aaron Roth is particularly good in this regard. He notes that machine learning is really “just statistics” (with a focus on prediction). Hence, Derek’s admonishment “garbage in/garbage out” definitely applies.

    Not at Upenn – I had heard a talk when the speaker mentioned a study that found the same microbiome sample sent to 3 different labs gave 3 different microbiome population results. If someone knows of this study, please post the reference.

  23. eub says:

    “This recipe was terrible! I doubled the baking soda since it called for a tiny amount, and I just used Sprite instead of half a lemon…”

    This is interesting, to run other choices of machine learning techniques on the data, but what do we learn when it doesn’t work? It would be more damning if the original technique doesn’t work (which happens all the time, like after bad data hygiene let validation data leak into the training set). I assume that wasn’t found or it would have been the headline.

    Machine learning techniques are not interchangeable all-purpose black boxes, and we might be seeing some of that. Or we might be seeing that the original author spent some time fiddling to get their learner to learn (but it validly did, predicting held-back data), and the replicators didn’t put all that into each learner × data combination. It’s a statement that machine learning is not trivial to make work, for sure, but that’s not surprising.

    (Okay, let me go read the paper.)

Comments are closed.