Every so often, we medicinal chemists need reminding that those beautiful X-ray crystallography structures of our ligands bound to target proteins are. . .not quite what we tend to think they are. Here’s a post I did on that a while back, and this new paper quantifies one of the issues. You see, what you get out of the X-ray data is electron density from both the protein (backbone and side chains) and from your small-molecule ligand. And the electron density for that ligand, likely is not, is going come out as kind of a blob.
The resolution of a small molecule ligand bound to a protein pocket is rarely as high as non-crystallographers imagine it to be, unfortunately. It might help to picture a single aromatic ring (the side chain of a phenylalanine, for example, or an aryl in your drug molecule). In good old-fashioned single-molecule crystallography (not with a protein, just a crystal like you might grow when you have a pure small molecule), an aromatic ring can easily come out looking like a flat hexagon of ball bearings with space between each atom. Now, that’s resolution. But that’s not what you get, most of the time from a protein X-ray structure even when things are going well. There are just too many atoms in there, and too many ways for them to be slightly out of position in one unit cell relative to another one, and you get a bit more slop in the data. A nicely resolved phenylalanine side chain in a protein structure will be more like a lumpy hexagon-shaped donut – you’ll have a hole in the middle, and bulges at each vertex for the carbons, but it’s not going to look like the instantiation of a ball-and-stick model, which is what a small-molecule crystal structure will remind you of.
And the same goes for the small-molecule ligand sitting in that protein, and worse. You’re looking at the average of all the places that your ligand can end up on said protein, and while it would be nice if there were only one spot for it to be and only one way for it to be in that spot, that’s not always the case. As the authors of the paper linked to above note, this is a rather under-reported feature of such structures in the PDB, and it’s not hard to see why. Traditionally, you’re looking at a presentation of the data that assumes one place for the molecule and gives you (if you bother to look!) some measures of how well the average atom displacements look when fitted to that model. But it’s a model, and it may or may not reflect reality to the degree you’d like. It’s much harder to say “This looks like a situation where 90% of the ligand is sitting as shown, 7% is over like this, and 3% of it is actually over here like this”. Unless you’re a skilled crystallographer you’re not going to be able to say anything of the kind, but this sort of low-but-detectable occupancy can be valuable information.
The authors introduce a new computational tool (qFit-ligand) to address this. As of summer 2017, there were about 130,000 entries in the PDB. Of those, 44620 have ligands in the proteins, but fewer than 2% of these have two or more alternate conformations reported. And as anyone who’s rooted through the PDB will have suspected, many of the small molecules in those structures are crystallographic additives or metabolites, not drugs. Once you strip out all the ATPs, ethylene glycols, ascorbates, cholesterols and so on, you’re left with only 90 multiconformer drug structures (that number excludes things like totally flipped-around ligands). Those are divided between cases where one end of the ligand has flipped around, or where a side chain has moved or rotated, some 180-degree ring flips, and some where the whole ligand has moved over by some amount. The authors note that an implausible number of these multiconformer cases show up in the PDB as 50/50 splits, which doesn’t reflect physical reality.
Their new method looks for rotatable bonds, flippable rings and so on and does a translational and rotational search to generate a set of plausible conformers. Applying this to a standard set of crystallographic data (from D3R), it was able to pick up 7 of the 10 known multiconformer states in the collection, doing better with rotations and flips than it did with whole-molecule translation. There’s another data set (Twilight) that’s intended to highlight poorly worked-out ligand models in the PDB, and turning qFit-ligand loose on that one actually seems to retrieve some of them. For example, the beta-secretase inhibitor structure 5EZX is now 6DMI, showing three separate conformations of the ligand from the same original data set (they were nearly equal occupancy, as it turns out). An even more striking example is the former 5CFW, now 6DMJ, a ligand for the BRD4 bromodomain. The minor form in that data is different enough to be actionable for drug design, making some completely different interactions that you’d never have known about.
Now, you’re not going to see this every time out, of course. But it’s not that rare an event, either, so it’s something to keep an eye out for. More than that, it’s something to add to your mental picture of what X-ray crystallographic data really are: not messages from the Gods delivered fresh from the Platonic realm, but models. Constructs. Best attempts at dealing with reality, which is often more heterogeneous than we think.