Here’s a new paper on ChemRxiv that is very much worth reading if you’re a computational chemist (or work with them). And it makes a larger point that’s applicable to everyone else – not an original point, I fear, but it keeps on coming up.

The computational part first: it has to do with Density Functional Theory (DFT) calculations. (Here’s a site on the topic that does not end up knee-deep in equations quite so swiftly). Note, for example, that “functional” is used here in its mathematical sense, as in “a function of other functions”, note the vernacular sense of “performs some task”. Nonetheless, the task that DFT is trying to perform is to estimate free energies, and the functional is electron density (which is in turn a function of space and time, via the Schrodinger equation, for starters). Walter Kohn (share of the chemistry Nobel in 1998) helped develop the Hohenburg-Kohn theorem, which showed that the total ground state energy of a many-electron system is a functional of that electron density – so if you know that (or can estimate it really well), you can get a value for the energy. You don’t have to deal with every electron individually (a task that ranges from “hideously difficult” to “flat out impossible”); you just have to know how many electrons, on average, are inside a given volume of space: the electron density. This is far more tractable, computationally – you can actually knock the whole thing down to a single Schrodinger-ish equation (the Kohn-Sham equation) and work with that.

We are rapidly approaching the limits of my ability to talk about DFT, sadly, as I’m sure is already apparent to people who really know the stuff. But as you can see from that hand-waving view, a key to doing the calculations is how you divide the space you’re working with as you calculate those electron density numbers: what kind of grid do you use, and how small is each piece of it? You’re going to integrate across all those, and you’d hope that the answer would come out the same as (for example) you rotate the molecule around inside that space. Note: we’re not talking about rotating around any particular bond or anything like that; this is a basic “What if I rotated the whole thing just like I was using the Rotate function in ChemDraw; I should still get the same answer then, right?”

Apparently not. The paper linked above (from Andrea Bootsma and Steven Wheeler at Georgia) shows that some of the common integration grids that people use in DFT calculations are *not* rotation-invariant, and thus can give very different answers indeed. (An analogy: this is sort of like having a device that can estimate how many potatoes there are in a sack, but then you find that it gives you a significantly different Potato Count if you just flip the sack over). This is a real problem for people who would like to calculate the different free energies of conformers, rotational isomers, transition states of bimolecular reactions, and so on. The paper shows that predicted enantioselectivities and regioselectivities of such reactions can produce almost *completely opposite results* if you just rotate the transition state structure and run the calculations again with the same integration grid. That ain’t good.

Here, we show that popular grids can lead to large errors infree energies, with harrowing implications for DFT studies of organic and organometallic systems. These errors arise from the sensitivity of the entropic component of the free energy to small variations in the values of low-frequency vibrational modes but are distinct from errors due to the treatment of these modes as harmonic oscillators. Moreover, they occur using different DFT functionals, basis sets, and quantum chemistry codes and require at least a (99,590) grid to be resolved in most cases.

That’s a much finer/denser grid than many people are using (which is apparently often more like 75,302). Those low-frequency vibrational modes contribute to the free energy values in what the authors describe as an “unphysically large” way in the standard calculations, and this doesn’t seem to be generally appreciated. Moving to the 99,590 grid (a recommended minimum, actually) helps a lot, but they recommend even then trying your molecules out in more than one orientation to get a handle on how much variation remains.

And that brings up the other point, the one I promised was unoriginal. *Looking under the hood of such systems is essential*. Actually, *being able to look under the hood* is perhaps a better way to put it, and frankly, not all of us are able to do that. I’m not, when it comes to DFT or its cousins, believe me. So I have to work with people who I feel have the competence to do that, and I strongly recommend that everyone else do the same. Every calculation or model of this sort has assumptions built into it, shortcuts that it takes out of necessity, approximations that are set to certain tolerances, and these things can most definitely affect what comes out the chute. If you are expecting Golden Tablets of Truth to come rattling down said chute automatically, you are taking a very substantial risk. Any decent computational chemist will tell you this, and you should make sure as you collaborate with them that they feel able to do that. Open that hood. Kick those tires. It’s the only way.

The main takeaway message (not stated clearly enough in the posted ChemRxiv version) is that the applications of all quantum chemistry methods requires choices regarding accuracy vs computational cost. However, such choices require that the user be cognizant of the underlying errors. Our main goal with this paper is to point out that the choice of integration grid is an often underappreciated/ignored source of significant errors in predicted free energies. As long as such errors are accounted for in analyses/predictions, DFT remains one of the most powerful tools we have for analyzing organic and organometallic systems.

When I read papers with computational data (mechanistic predictions, usually), particularly from synthetic groups, I am not sure how much to trust the conclusions from the data. While computational methods are useful, I don’t know how many people that use them are aware (or even concerned) about their limitations. I don’t know if the models proposed fit what they’ve seen and so are chosen (even though other models might also fit). As someone who doesn’t know much about computational methods and their underpinnings, I am not certain what to do with them when I read their predictions.

I’m guilty of that when I was a synthetic chemist. When my methodology papers got comments back from JACS, reviewers often wanted us to include a stereochemical model. At that point, I set up some DFT jobs based on experimental data (rate, selectivity, etc.) and basic chemical principles. Then I added pretty pictures of transition states in the last figure with a catalytic cycle that may or may not be right. After that, revisions were typically accepted within a week. I admit I was being lazy as a scientist, but I did not want to spend more time on those projects that needed to be wrapped up.

For synthetic chemists pictures are no more than pictures. Referees should not ask for them without being able to judge their quality.

I’ve reviewed a lot of papers that include unnecessary calculations. They’ll have spectroscopy that rather unambiguously confirms some structural feature then run a calculation to essentially reproduce it. For instance, they’ll calculate the structure of the molecule when actually they have the crystal structure.

I’m an organicker who has “played with” computational chem through many programs, starting with MacroModel and up through Gaussian. I do not trust myself or my answers. But I get a rough idea of what I want to see. I always ask for help from the pros.

When I read papers using Comp Chem, I have always liked the ones that DO compare different methods and functionals (and now, grid size). It just seems more honest, to me. E.g., “The ability of several density functionals to reproduce experimental 13C‐NMR shielding was examined, and it was found that the MPW1PW91 hybrid functional with the 6‐311+G(2d,p) basis set gave generally good agreement with the observed isotropic shielding values. The MP2/GIAO procedure was not as successful, leading to significant underestimation of the paramagnetic terms in most cases.” Or, “We report a test of 30 density functionals, including several recent ones, for their predictions of 69 singlet-to-singlet excitation energies of 11 molecules. …”

Most of the papers I read tend to report just one result, maybe from using B3LYP (seems to be the most popular, to me – not an official statistic!). Is that even the optimum functional for that molecule? How do they know? And now we’ve got this Integration Grid problem, too!

One more problem that I hope the Comp Chem folks can help with. Please correct any mistakes in the following “statements”: For small to medium organics, DFT is usually very reliable to provide the electron density with very good accuracy (99%). However, to compute molecular properties from the e density, you need to use other methods such as Kohn-Sham. I.e., if you want to compute the HOMO energy, you get that from K-S on the e density. From MANY papers I have read, those errors can be enormous, off by several kcal / mol. I’m talking accuracies, compared to experimental outcomes, of 60%, 70%, 75%, NOT 99%. Some papers have compared DFT + K-S with Hartree-Fock + Koopman’s Theorem and claimed the H-F+K was more accurate than DFT+K-S. Legitimate?

I tried to download the rXiv paper (“Download All 5.01 M”) and I got a corrupted 5.01 M file, not a zip file. I can’t even delete if from my hard drive. Have others downloaded successfully?

Dowloaded zip file worked for me. So did “Export as PDF”.

“But I get a rough idea of what I want to see”.

This, in my view, is the biggest pitfall of computational chemistry. There are so many methods and so many basis sets that there is a strong urge to keep running calculations until you get the one that matches your chemical intuition. As Derek said, you need a competent practitioner to make sure that what you are doing is reasonable, and this is an area where “competent practitioner” is a rather high bar to reach. Otherwise computational chemistry is the field that is ripest for any and all varieties of cherry picking.

From a very cynical perspective what we’ve shown is that for some systems you just need to try different orientations until you get ‘what you want to see.’ This a problem.

“But I get a rough idea of what I want to see”. – Just to clarify, I didn’t mean “the preconceived answer I want to see.” I meant, “a rough idea of the property I want to see.” More like using a plastic model: “How close is that methyl to that phenyl?” or “Is this a reasonable conformer or not?”

And I was able to get and open the zip file.

This reminds me of my early days in big pharma some time ago. We got a new program that could calculate PSAs and I gave it a spin. My local friendly computational chemist asked me if I used it, and I shook my head in the negative. When he asked why, I told him that I was looking at a series of molecules I was making, got suspicious about the results, and calculated the PSAs for 2-, 3-, and 4-tert-butylpyridine. They were the same.

The program was obviously adding up fragmented functional groups without considering 3 dimensional structure. On the other hand, it was very fast.

That’s exactly what most of the programs like ChemDraw do. Still, for the vast majority of compounds, that’s good enough. Check out the correlation in Figure 2 of J. Med. Chem.2000, 43,20, 3714-3717. I would imagine that the outliers may be structures with a lot of internal bonding that bury polar groups.

Yeah…. on the other hand, the PSA values were reported with 4 significant figures….

So, they were very precisely wrong! And the calculation was fast and highly reproducible…

So, by today’s standards, it looks like the right thing to do…

As Academic Med Chemist says, the correlation between fragment-addition tPSA and 3D PSA is higher than the correlation between PSA and any other property you’re likely to be interested in. The question the comp chemist should’ve asked you is “why do you want to calculate PSA?” as they would then have been able to come up with a more sensible suggestion for you to look at.

As a mathematician whose first specialty was numerical analysis, it seems to me that this is a failure from our end.

As a mathematician and computer scientist who was taught numerical analysis by a very goodness numerical analyst (Richard V. Andree), I am constrained by precept, example, and conscience to agree with you.

As a software designer who does numerical computation, I’m guessing this this failure is on my house — in the design of the software that presents the algorithms to the user. Could it maybe have defaulted to some general checks like comparing to a cheap coarser-mesh run? Rule of thumb: if 2x coarser goes kablooey so bad you can’t derive information from the comparison, odds are your mesh is giving distorted results in places. Sometimes adaptive meshing just gets beat.

Granted right up front I’m speculating because I don’t know computational chemistry, but take ODE initial-value problems. There’s nothing fundamentally wrong with plain old RK4 as an algorithm. Nor anything wrong with Adams-Bashforth-Moulton or with BDFs. But we and do can design some very bad software that asks the user: choose which family of methods! now what order do you want! great and what step size! And then spits out an answer for the user to trust.

The user just has an ODE problem and they want it solved. They’re not experts on things like stability domains, and like the rest of us they don’t know all of what they don’t know. The holy grail of the design would be if they never have to learn; for ODEs this is mostly possible. But the minimum bar for decent design is that at least they are faced with what they don’t know — “these various functionals and parameter choices give wildly varying results; do you know which is appropriate or would you take a default?” — and realize they need to consult somebody. Yeah, nobody wants to spend the compute, much less deal with undercutting their result that looked fine. 😛

Chris Williams from Chemical Computing Group (MOE) and Miklos Feher (now at D. E. Shaw) wrote a few articles about numerical stability. These papers discussed molecular mechanics, but DFT will face similar issues.

https://doi.org/10.1021/ci200598m (conformation searching)

https://doi.org/10.1007/s10822-007-9154-7 (docking)

https://doi.org/10.1021/ci300298d (predicting binding energies)

The problem is that tiny rounding errors in floating point calculations can build up with complicated calculations. If you use a different model of CPU, an operating system with a different maths library, or GPUs, then the rounding errors will be different.

Getting a consistent result is not the same as getting the right one. To get reproducable results, you can repeat calculations with small random tweaks, and the sampling statistics will tell say which are the most likely results.

The conformation searching & docking links are reversed.

Making sure a program uses IEEE 754-compliant floating point arithmetic might help.

Even SCF energies are very much grid dependent. We recently did a convergence study of geometries and energies vs grid size, for a bunch of functionals, and found that Turbomole’s m4 grid is the minimum.

Yes, particularly for some functionals:

http://dx.doi.org/10.1021/ct900639j

Not only metaGGA, but (at times) even GGA. We observed this for b97-3c.

I love you, Weitao Yang!

But do the molecules smell different when rotated?

I wish I had read Derek’s post before presenting this paper to my group this morning – I would have included the potato bag analogy :p

How bad is it actually? Could it be that DFT is too esoteric to properly understand for many of us mere mortal organic chemists (including me), but that intricacies of chemical reaction mechanisms are just as well too esoteric for many DFT computators? I see in the literature quite a lot of weird enzyme-catalyzed reaction mechanisms that are “evidenced” by means of B3LYP DFT-calculations (so-called “metadynamics”), but appear to be in flagrant contradiction to what is expected when considering stereoelectronic participations from properly oriented adjacent orbitals. There were already several warnings that some DFT programs have severe problems with stereoelectronics, i.a. by Grimme for even simple branched alkanes (!), and now this – the grid problem. What should I give more credence to: an enzyme-catalyzed chemical mechanism confer over 50 decades of knowledge about stereoelectronic control, or B3LYP-based metadynamics calculations yielding a strange mechanism?

That is 50 years / 5 decades of course.

Anyone who uses predictive models should have George Box’s quotation fixed to their screen: “All models are wrong, but some are useful” The key part of the statement is that the model is *wrong*. It just is. It might be pretty good, most of the time, but it is always wrong.

I have found this to be a great reminder to be skeptical of any model.

In this particular case, would it be possible to pre-screen using a coarse grid and submitting the molecule in a couple of rotated starting points to see if there is a significant difference in output for each one. If there is, then move to a finer grid…?

This is precisely the advice offered in the Conclusions of the preprint: “compute free energies for several molecular orientations using multiple integration grids in order to gauge the magnitude of grid errors and to guide the choice of the most appropriate integration grid for a given problem.”