Anyone who’s done fragment-based drug design (especially) or who has just looked at a lot of X-ray crystal structures of bound ligands will be able to back up this statement: if you sit down with a series of such structures, all bound to the same site, it is very, very difficult to rank-order them in terms of their actual binding constants against that site. The gross features can help you out – if there are like three obvious hydrogen bonds, sure, that’s a tip-off that your ligand has found a good home versus one that’s just sort of floating in there.
But rarely do we get anything so clear-cut to work with. For the most part, you have a set of hydrophobic interactions, pi-cloud interactions, some polarities pointing vaguely toward some opposite polarities, a bit more occupancy here and a bit less here, differently displaced water molecules, and so on. Evaluating these and adding them up is the very opposite of straightforward. It’s not at all easy to judge the strength of such binding events, and it’s even harder to judge what energetic price was paid to get to them. Changes in the solvation of the ligand, of the protein, changes in its conformation, the balance of entropy and enthalpy in all of these – no, that ain’t trivial, which is why we have such a hard time modeling all this, despite billion-dollar incentives to figure out how.
Here’s a new paper that shows that we really don’t even have the full list of such interactions sorted out yet. The authors (from Roche and the Cambridge Crystallographic Data Center) have gone back through that database looking for lesser-known “nonclassical” interactions. They’re especially focusing on aryl halogen atoms, on nitriles, sulfonyl groups, and sulfur atoms inside aromatic rings. This sort of search has certainly been done before, but this time a new approach to the statistics has been used, to try to account for directionality and normalizing for frequency of occurrence. There are an awful lot of data points in the PDB, and a lot of ways to slice them up (and it should always be kept in mind that some percentage of the PDB structures are wrong, subtly or not-so-subtly, especially in the small-molecule ligands.
But what this paper finds is evidence for the “sigma-hole” halogen-carbonyl interaction (for chlorine and higher), aryl fluorine interactions with the carbon of both carbonyl and guanidine groups (Arg side chains), nitrile with the terminal nitrogens of those Arg guanidines and with the NH of indoles (Trp side chains), sulfonyl oxygens with NH amide backbones, and many more. These are presented, usefully, as whether they occur at greater or less than rates expected by chance. On the other hand, there’s no particular evidence for a number of other interactions that are at least hand-wavingly plausible, such as NH hydrogen bond donors with fluorine as acceptor (at rates greater than chance), and many that come in at significantly worse than chance and are thus clearly unfavorable (such as aryl Cl with donor sulfur atoms).
Both sets (significantly greater than chance and significantly worse) are quite useful, and should help to refine structure-based drug design ideas and virtual screening efforts. The paper, as you’d expect, has details of the preferred (and non-preferred) geometries for all of these interactions, as extracted from the data, and I’d definitely recommend it to computational ligand-fitters of all persuasions. The authors state that they’re continuing to dig for more interactions, and also mention that a focus on the unfavorable ones is warranted, and I’d agree. Our human bias is to pay attention to favorable “winning” cases, but you’d also very much want to know what to avoid – or more precisely, what your molecules and proteins are going to avoid whether you know it or not!