Skip to Content

Chemical News

Not Even Wrong

This paper is not going to make a lot of computational chemists very happy at all. It’s from Dan Singleton and Erik Plata at Texas A&M, and it’s on the Morita-Bayliss-Hillman reaction. More specifically, though, it’s on the many computational attempts to decide on the mechanism of the MBH reaction, and taken together, they’re not a pretty sight. The authors do some good old physical organic chemistry to help establish the real mechanism (which had already been proposed some years ago), and let’s just say that things don’t always match up very well.

Computational methods are simply scientific models. Any model makes some inaccurate predictions but models can retain utility despite significant propensities for inaccuracy. Inaccurate predictions aid the choice of models for future predictions. Because of this, the central scientific problem in the computational study of the MBH mechanism is not the inaccuracy of the predictions. Rather, it is the absence of any particular prediction at all. Fully-defined computational methods (including the choice of basis set, entropy calculation, and solvent model) of course make quite specific predictions. However, there is neither a consensus best-choice method nor a common view on the right way to choose a method. When evaluable, the most accurate choice varies with the system at hand. In the MBH reaction, defensible and expectantly publishable choices of computational approaches lead to predictions of the facility of the proton-shuttle process that vary by 35 orders of magnitude in the stability of 19, while also diverging in the geometry and preferred stereochemistry of transition state 13. This variance is in practical terms indistinguishable from making no prediction. In addition, studies of the MBH mechanism have not been considered falsified by extreme inaccuracies in predictions. In the terminology of Pauli, computational mechanistic chemistry is “not even wrong” about the MBH mechanism.

Here’s a C&E News article if you don’t have access to JACS. It’s true that predicting reaction mechanisms is a challenge for computational methods, because you are, out of necessity, looking at high-energy molecular states and trying to distinguish between them. It’s especially tough with a polar reaction mechanism, because solvation effects (which we still don’t have as good a handle on as we need) become very important in stabilizing transition states, assisting proton transfers, and so on. But at the same time, this sort of problem is just the sort of thing that many such groups work on: the MBH mechanism has been the subject of 11 separate computational papers.
The authors here try to figure out what has gone wrong. The errors mostly seem to be in the enthalpy term, which would suggest trouble with those polar interactions. A good number of the earlier studies predicted a proton-shuttle mechanism, which turns out not to be operating at all. The problem is that current programs have a much easier time handling proton-shuttle mechanisms, while full-scale proton transfer to and from a solvent molecule is much harder to model. So there’s a constant danger of arriving at a mechanism because it’s computationally tractable, not because it’s real. Digging into the individual equilibria, it appears that some approaches did very well on particular reaction steps, but blew up completely on others: 14, 20, or 35 orders of magnitude off for the equilibrium constants, I would say, is enough to warrant that description. And it’s very hard to see what factors led to the failures or successes – in fact, it’s quite possible that some of the best individual predictions were themselves fortuitous. Overall, though, no computational approach got things anywhere near correct.

The problems in the computational study of mechanisms encountered in the MBH reaction certainly cannot be used to paint all computational mechanistic studies. Many, either by simplicity or carefully designed use of the computations, would not be susceptible to the difficulties encountered here. At least, however, it would seem that studies of complex multimolecular polar reactions in solution should be undertaken and interpreted only with extreme care.

That’s for sure. And while this is a harder problem, in many ways, than docking a ligand into a protein, we should keep in mind that polar interactions and the treatment of solvation are very important parts of those calculations, too, and looking under this particular hood tells us that we have a long way to go on those.

26 comments on “Not Even Wrong”

  1. Janne says:

    I have to say, as a computational modeller, that any kind of prediction being off by tens of orders of magnitude feels deeply wrong. I mean, any model where an error term can blow up like that would mean that the model is only ever right pretty much by accident (and the understanding of the underlying process is severely deficient). Or, the system — or the means to measure it — is so very unstable that any attempt to model it is doomed to failure.
    Very happy not to be a computational chemist either way.

  2. Chris Ing says:

    Some Reddit users are criticizing this paper for using inappropriate computational methods for this problem:

  3. Pete says:

    I’m out of office and can’t currently see the JACS article. It’s worth thinking a bit about solvent models that might have been used for the quantum mechanical calculations. A number of continuum solvent models are based on an assumption that electrostatic interactions between solvent and solute can be modeled using the solvent dielectric constant. While this sort of works for water because its hydrogen bond donor and acceptor characteristics are more or less balanced (as Latimer and Rodebush were well aware in 1920), it works a lot less well for dipolar aprotic solvents like DMSO which are strong HB acceptors but lack HB donors. Suppose you take an electrostatic model that has been parameterized for water and use it to build a model for DMSO by using a different solvent radius and a different dielectric constant. This model will be symmetric with respect to charge type in that it will return identical solvation energies for a cation and anion of the same radius. Does this sound like the DMSO that you know so well? I believe that charge type symmetry in continuum solvent models was first articulated in an article by David Mobley and Ken Dill a few years ago although I may have got that wrong so apologies in advance if this is the case.
    There are ‘not even wrong’ situations much closer to the drug discovery than the reaction modeling. For example, if you use ligand efficiency to scale binding affinity by molecular size, you will find that your perception of the system will depend on the units in which Kd is expressed. I have discussed this in greater depth in the publication that is linked as the URL for this comment.

  4. Jason says:

    As an electronic structure theorist coming out of the woodwork, I would like to chime in as well.
    Computational modeling of molecules in solution in my opinion is still in its infancy. As any one would suppose, to actually model reactions in solvents one needs to involve the solvent quantum mechanically with the solute calculation. The common (cheap) approach of using continuum models includes no such physics.
    This whole paper reads as
    We did an organic chemistry experiment. Oh look if we just grab the nearest stack of comp chem acronyms and throw them at the problem nothing works, time to publish a JACS complaining about it.
    Having just done the experiment the authors have to be intimately familiar with the involved physics, why would they not use that familiarity to sanity check the theory.
    Using the internet vernacular, this entire JACS article is flame bait.

  5. VTJ says:

    I have done very little computational chemistry but I have run a lot of MBH reactions. As one commenter above has already pointed out, a variety of polar solvents (or mixtures of solvents), catalysts and concentrations are used. I have, in fact, never been able to run two different substrates under a similar set of conditions, in contrast with many other C-C bond-forming reactions I have run, where one size fits all (more or less.) Given these observations, it is not unlikely that the mechanism is actually changing subtly between different reaction conditions. The MBH reaction could be an superbly poor choice of reaction to select for validation/invalidation of computational methods. Just my two cents.

  6. VTJ says:

    “Measure seven times, cut once….” I just read the rest of the paper and my explanation above clearly does not explain all of the discrepancies in the computational results. It seems that, at least for this type of problem, new or more robust computational methods are needed.

  7. Matt says:

    From the comments here and at Reddit I still can’t tell: are the terrible computational predictions criticized in this paper taken from papers others already published about this reaction? Or did the authors of the present work run the deficient calculations themselves? It’s not a very fair indictment of the field if they did the latter.
    I wouldn’t be at all surprised if the bad calculations were previously published, mind you. I am a computational enthusiast but I see bad calculations in the literature all the time. Often the only thing you can conclude from a low quality computational attempt is that the authors are cargo-culting their methods from other papers, or that they possessed insufficient computer power to do meaningful calculations for their problem, but hey you’ve got to publish something
    And then there’s a cascading quality problem if some later research cites that earlier garbage calculation along the lines of “Calculations by Dunning and Kruger suggest that…” (erroneous conclusion justifying further errors follows). You have to work backwards using full texts, not just abstracts or citations, to see if there’s really a foundation worth building on.

  8. Brendan O'Boyle says:

    How can you prove a real mechanism if you cannot prove a real number???

  9. David says:

    My model suggests the mass of the earth is 6×10^-8 grams … who would like to publish this?

  10. Wavefunction says:

    These things are like electrons – look at them too closely and they break down. Higher-level pictures can still be accurate, however.

  11. Pete says:

    Looking at reddit thread reminded me of something else. My understanding is that continuum solvent models are typically parameterized using a particular level of QM theory. If memory serves me correctly, PCM is parameterized to reproduce aqueous solvation free energies at the HF/6-31G* using molecular geometry energy-minimized at that level. This means that solvation energies calculated using a different QM theoretical model that the one used for parameterization are questionable.
    Some continuum solvent models work by treating solvent as partial charges on the molecular surface. Using QM solute models can lead to significant amounts of electron density outside the molecular surface (on the solvent side of the surface). This is ‘outlying charge problem’ and it’s more likely to be a big deal when modelling anions (electrons held less tightly). The other thing to check is how the parameters for the solvent in question have been derived. For example, have the parameters been generated by substituting a solvent radius and dielectric constant for the corresponding values in the water model? Have solvation energies for the solvent in question been used in the parameterization?

  12. Dan Singleton says:

    I would be happy to answer any serious questions.
    About the criticism that the methods that we used are bad. Yes. That is the point! They are bad. They are also the methods that are used in countless papers. We tried quite a few methods that receive only short discussion in the paper – see the SI. The results are not always so amazingly horrid, once you exclude some stupid practices that are still in many papers published at high levels, but they are hardly impressive, and we are still waiting for one to actually predict the correct mechanism.
    Pity the researcher trying to disprove that houses are haunted. People claim that a house is haunted, so you go to the house, investigate, explain the weird noises and the strange light, argue for a long time, and finally the people who believe in ghosts say “Ok, you might be right about this house. You have just chosen the wrong one. What about this other house?”
    If one is going to argue that “Oh, if they had only used method X, as everyone knows, then the calculations would have done well and got the right answer.” Tell me today what method X is, before you run the calculations. If you don’t know the right method ahead of time, then I think the question of whether it is science or an elaborate exercise in confirmation bias remains open.

  13. Jason says:

    @Dan Singleton @12
    I appreciate your difficulty. I also will state my view right away that a large portion of the fervor around your recent JACS is the aggressive and extremely general attack you take in wording your manuscript. Beyond a very detailed and careful reading the immediate view I get is the overgeneralized attack on theory by an experimentalist that I posted about above.
    I will take your comment here to mean that your goal was to point out the failings of specifically the theory used in the manuscript (which is a fair attack but not one readily gleaned from your manuscripts wordings). Continuum models are very fast and cheap and have a great many shortcomings, I am not sure if any one now would really be surprised at all that even qualitatively you get wrong results.
    So my first question to you is why pick on continuum models specifically. There are other ways to attack the problem out there that have varying improvements to the physics. I appreciate the point that this is an extremely widely used method, however bad theory is bad theory so why continue to propagate the trend?

  14. Matt says:

    Having now read the paper I think it’s a solid one. If the Redditors should complain about any group it’s authors of computational studies who don’t actually have methods robust enough to support their published conclusions. This is the key paragraph from the current article IMO:
    A series of eleven papers from multiple groups has studied the MBH mechanism computationally. Every paper that examined the issue, a total of seven, supported the Aggarwal/Lloyd-Jones proton shuttle depicted in 4, and this prediction was a highlight of most of these papers. Large computational error is evident in some of these papers, but several of the groups undertook substantial and respected approaches to minimizing error. Sunoj chose his DFT method (MPW1K) based on comparisons with high-level CBS-4M calculations in computational models. Aggarwal and Harvey employed G3MP2 calculations on a model system to calibrate their B3LYP results. Harvey later studied in detail the ability of diverse computational methods to predict the barrier for an MBH reaction, and he recognized explicitly the difficulty of predicting rate constants quantitatively. Cantillo and Kappe chose M06-2X calculations based on detailed experimental thermodynamics.
    So these prior studies weren’t obvious garbage, “just throw Gaussian default B3LYP at it” papers. There were several studies employing high-level composite thermochemical methods or calibrated, apparently appropriate DFT calculations. The fact that they still failed so badly is a sharp reminder of the problems computational chemistry has modeling condensed phase reactions, that is, almost all of bench chemistry.

  15. Anonymous says:

    I’m a comp chemist in pharma. I started out in the 90s in the QM world. I recently took a look at a few contemporary reaction mechanism papers. I did not like what I saw. The papers were in top draw journals. I pulled out the structures from SI. The system contained a boc group. Disappointingly the whole reaction scheme was littered with minima either side of TS for atom transfers where this boc was adopting all sorts of different rotamers separated themselves by barriers. The authors even concluded the carbonyl of the boc was having a crucial effect on the mechanism. In the end there was a problem with their IRC following.
    Anyhow, the IRC approach worked on the small model systems of the 80s and 90s, then we couldn’t afford good level of theory. Nowadays the field has moved to bigger systems with too many degrees of freedom (esp dihedrals) to be studied with single IRCs. I don’t get why the field doesn’t move toward QM MD.

  16. Back from China says:

    Of course, global climate models have a much simpler target. Their results are to be given greater trust, even though they did not predict the ‘pause’ of the last decade or so. 95% of scientists are said to agree on this.

  17. Dan Singleton says:

    @Jason @13
    As for being aggressive in our wording, it seems like the most common comment on the work is that everybody already knows that these methods fail for such mechanisms. I know it, and a lot of people say they do, but then why do so many of these papers get published, and prominently? When I was an associate editor for J. Org. Chem., I had a tremendous amount of difficulty finding reviewers that would seriously question computational mechanistic papers. The experimentalists would not touch them, and the computational people would rubber-stamp them. I had to publish some papers that I knew were garbage because they had gotten four positive reviews and I had to choose my battles.
    To use what may be a dangerous analogy, why aren’t the good cops stopping the bad cops? Except for around the coffee pot, there just isn’t enough criticism. So I decided not to pull any punches in making the case. Because of a lack of criticism, the field is without a doubt unhealthy. It hurts the theoreticians who are trying to do things right. Don’t blame me for trashing them; they were trashed by sharing a field with some bad work.
    I will get back to the continuum model question.

  18. Pete says:

    Dan (@17) Contrary to my earlier comments, I have now been able to see your article since it’s ASAP (I’d not checked earlier). One question that comes to mind is has anybody using continuum solvent models to study reaction mechanisms actually discussed how effectively the chosen solvent model reproduces measured solvation free energy (in the solvent used for the experiments). This is the sort of uncouth question I tend to ask when reviewing manuscripts. Perhaps the criticism should be of extrapolation of continuum solvent models away from their applicability domain (or at least what they were parameterized for) to solvents for which there is a lack of relevant experimental data?
    Although I’ve not been a journal editor, I certainly appreciate your frustration with the reviewers and, after reading some articles in high profile journals, I ask myself what the reviewers had been smoking when they waved them through (while secretly wishing that I could have some in order to enliven my drab ordinary life). In some fields, there appears to be a ‘magic circle’ in which the members rubber-stamp each other’s offerings and then gush in print about how seminal these offerings are. There are parallels with the drug discovery field which spawns a body of rules, guidelines and metrics whose growth appears drive a decline in Pharma/Biotech productivity. In drug discovery we have a Cult Of Ligand Efficiency (who worship the molar concentration unit) and an Anti-Fat Anti-Flat Movement (whose leaders preach that aromatic ring count rather than a surfeit of atoms is the true Satan). I’ve linked ‘Correlation inflation in the pursuit of drug-likeness’ as the URL for this comment so you can get an idea of the problem.

  19. luysii says:

    #”s 17 and 18 are very sad. After getting a masters in chemistry and then an MD, I thought the medical literature was terrible by comparison. I was wrong. For a few gory details on the medical literature please see —

  20. Anonymous says:

    @ luysii
    Twice a month my group has a literature group meeting where everyone gets up and talks about a paper. The game for every paper is “What is wrong with this paper?” This is the most important thing that one learns when getting a Ph.D.

  21. Dan Singleton says:

    @ luysii
    Twice a month my group has a literature group meeting where everyone gets up and talks about a paper. The game for every paper is “What is wrong with this paper?” This is the most important thing that one learns when getting a Ph.D.

  22. Pete says:

    Dan (@21) I’ve now linked Derek’s post for discussion in the LinkedIn computational chemistry group and I anticipate that the blog post and article will generate the lively and informed exchange of views that is characteristic for that group. One question that is likely to be raised there is whether the COSMO-RS model has been used to study this reaction. My understanding is that COSMO-RS is not charge type symmetric and therefore can handle solvents more flexibly than charge type symmetric models like PCM. Something I will be asking in that group discussion is how we should interpret vibrational frequencies and energy minimization in the context of a continuum model.
    I do like the format for your fortnightly literature group meetings and I hope that you won’t consider it impertinent to make a suggestion (PNAS 1999 96:9997–10002 DO: 10.1073/pnas.96.18.9997)

  23. Dan Singleton says:

    I will make the prediction that COSMO-RS will do better than PCM at structure 12 + MeO-, but worse at 13 and 19, when used with the better modern functionals. And I will bet that it still gets the mechanism wrong. But hey, someone could be a hero and prove me wrong with a day or two’s calculations. MeO- is not the only problem here.
    Pointing out errors obtained using COSMO-RS would be a bit like doing so with wB97xD (where we had data but did not highlight it). It would be interesting to specialists but few overall would care.
    Word search for JACS since January 2014 for the number of papers containing the word:
    PCM: 103
    SMD: 52
    COSMO-RS: 8
    All of these searches contain irrelevant extras but you get the idea.

  24. Anonymous says:

    @15 and @ 17
    I totally agree. I’ve seen so many garbage structures in SI (or NO structures in SI) for calculated intermediates. Things that would get you failed in any organic or inorganic test at the UG level, much less the G level. There is a huge disconnect growing between theory and experimental…theorists should know the literature better than the experimentalists! They should be able to rattle off bond lengths, angles, typical frontier orbital diagram arguments, etc etc. I’ve met very, very few theorists that can do those things. It’s the equivalent of looking to google for the first 3 digits of pi or something.
    A good start would be to require something like a CIF file for all calculated intermedites in the SI. I want to view the surfaces, look at the angles, etc. And I’m an experimentalist.

  25. Curt F. says:

    As an outside observer of the computational chemistry field (and definitely an outside observer of the electronic structure theory-based calculations of high-order polar organic reaction intermediates and transition states field!), I can’t help but compare this paper to some examples of “successful” computational chemistry that have been famous enough to receive broader attention.
    1. Designed enzymes for Kemp elimination. Nature 2008, DOI: 10.1038/nature06879. This example was famous and seemed to “work” in the sense of the designed enzymes actually catalyzed the desired reaction. What are the key differences between the computational methods in this work and the current study? Is it that polar solvents are not involved in the enzyme’s transition state? Or that the goal of the design study was to make an enzyme, not to successfully predict the rate of an existing enzyme? Or something else?
    2. A “designed” material for electrocatalysis of the oxygen evolution reaction. Science 2011. DOI: 10.1126/science.1212858. If I understand this paper and the work that enabled it correctly, DFT was used to identify structural or electronic “descriptors” of functional catalysts, and then to optimize the catalysis, the authors made a series of materials that spanned a range of descriptor values, and found a catalyst that was way better. The implication was that the DFT-identified descriptor seemed to be the right one. Is the difference between that line of work and the present study the use of this “descriptor” abstraction that doesn’t seek to calculate desired properties directly, but simply to suggest materials properties to tune that might improve the desired properties? And how, at the mechanistic chemical level, does oxygen evolution electrocatalysis differ from the reaction studied here? To my layman eyes it seems like a highly “polar” reaction but maybe not one that is so high-order?

  26. Dan Singleton says:

    I think that if you ask around you might encounter some nuanced views on first paper you mention. I do not know anything about the second. But no one would deny the value of the computational chemistry field. For the particular subfield of computational mechanistic chemistry, we describe it as the “most important advance ever.” It is a tool that is routinely used beautifully by savvy researchers aware of and allowing for the limitations, but it is also one that is too often abused.

Comments are closed.