Skip to main content

Drug Assays

Hosed-Up X-Ray Structures: A Big Problem

X-ray crystallography is great stuff, no doubt about it. But it’s not magic. It takes substantial human input to give a useful structure of a ligand bound to a protein – there are decisions to be made and differences to be split. It’s important to emphasize, for those of us who are not crystallographers, that unless you have resolution down below 1Å – and I’ll bet you don’t – then your X-ray structures are not quite “structures”; they’re models. A paper several years ago emphasized these factors for chemists outside the field.
About ten years ago, I wrote about this paper, which suggested that many ligand-bound structures seemed to have strain energy in them that wouldn’t have been predicted. One interpretation is that there’s more to ligand (and binding site) reorganization than people tend to realize, and that ligands don’t always bind in their lowest-energy conformations. And while I still think that’s true, the situation is complicated by another problem that’s become more apparent over the years: many reported X-ray structures for ligand-bound proteins are just messed up.
Here’s an editorial in ACS Medicinal Chemistry Letters that shows how bad the problem may well be. Reviews of the crystallographic databases have suggested that there are plenty of poorly refined structures hiding in there. But I didn’t realize that they were as poorly refined as some of these. Take a look at the phosphate in 1xqd, and note how squashed-out those oxygens are around the first phosphorus. Or try the olefin in 4g93, which has been yanked 90 degrees out of plane. It’s bad that there are such ridiculous structures in the literature, but the larger number of semi-plausible (but still wrong) structures is even worse.
Those structures at the left illustrate what’s going on. The top one is an old PDB structure, 3qad, for an IKK inhibitor. It’s a mess. Note that there’s a tetrahedralish aromatic carbon (not happening), and a piperazine in a boat conformation (only slightly less unlikely). The structure was revised after this was pointed out to the middle version (3rzf), but that one still has some odd features – those two aromatic groups are flat-on in the same plane, and the amine between them and the next aryl is rather odd, too. Might be right, might be wrong – who’s to know?
The most recent comprehensive look (from 2012) suggests that about 25% of the reported ligand-bound structures are mangled to the point of being misleading. This new editorial goes on to mention some computational tools that could help to keep this from happening, such as this one. If we’re all going to draw conclusions from these things (and that’s what they’re there for, right?) we’d be better off using the best ones we can.

20 comments on “Hosed-Up X-Ray Structures: A Big Problem”

  1. Barry says:

    too often, an x-ray co-crystal diffraction of a ligand its target protein is solved after the drug discovery phase is complete, suitable only for publication but useless as a guide to drug discovery. The right time to express/purify protein for structural studies is when you’re expressing/purifying it to set up the Hi-throughput Assay.
    If crystallography is supporting med chem from the first lead, and if you’re paying attention, “ligands don’t always bind in their lowest-energy conformations”, and that should trigger the synthesis of new drug candidates for which the bound conformation is the lowest energy.

  2. pete says:

    OT: Derek, if you want to talk about “hosed-up” (umm…see the real meaning of the term “hoser”), I’ve seen that your web host continues to be so. Hosed-up.
    One day it’s fine, the next day it ain’t.

  3. xtal_vision says:

    As a crystallographer you do spend a large amount of time trying to referee the battle between electron density map and small molecule restraints.
    The key thing to remember is that the result of an X-ray diffraction experiment is the electron density map and NOT the model. The added complication is that the electron density map describes the average electron density for all atoms within the crystal, hence any flexibility results in ‘smeared blobs’
    The biggest problem I tend to encounter is with saturated ring systems, especially if they are only functionalised at one position.
    These moieties will always be flexible, therefore the electron density map, even at high resolution, may describe the average orientation. The refinement program tries to fit an atomic position to the peaks in the map, and even applying modest restraints, it will come out with some weird a wonderful results. The most common result is boat conformations or some co-planar sections on one side of the ring.
    So what can you do – the first port of call is generally applying strict restraints to the ligand but that can often result in a conformation that deviates from your map.
    How do you play off prediction (restraints) vs experimental result (map).
    This is by no means an excuse for some of the ‘out there’ ligand conformations that are blatantly wrong and/or have had restraints files created with little regard for chemistry – but at the estimation of 25% of ligands being wrong, there must be a component that is down to reporting a single conformation for a system that is inherently flexible.
    Within the drug discovery context it is hugely important that a crystallographer relays the information that a certain moiety is poorly fitted or modeled and if there are caveats or consequences around this.
    The end point should be ligand ensembles, all with various occupancies and thermal parameters that accurately describe the electron density but these have not really taken off in the community as yet

  4. David Borhani says:

    I’ve seen some crystallographers (even in Pharma) that don’t know, or remember, enough organic chemistry or conformational analysis—so their ligand structures end up screwy (at least at first).
    xtal_vision makes some good points, but I would not go so far out on the limb regarding saturated ring conformations. For flexible (parts of) molecules, I think it is best to either leave (part of) it out of the model, if you really cannot interpret the electron density, or to build one or more reasonable conformations, as xtal_vision suggests, that, together, explain the density.
    If the refinement program returns a boat conformation, then it is the crystallographer’s job to correct that (very likely) mistake. Unfortunately, many don’t (and seem not even to look at their results).
    Deviation from the density, brought about by trying to satisfy the very strong prior of known reasonable-energy small molecule conformations, is allowed and to be encouraged. After all, how many macromolecule-ligand structures have revealed a previously unknown, high-energy conformation of the ligand?

  5. CialisizeMe says:

    Great post. Check this one out from a high power group. Made the *cover* of Nature Chemical Biology. Never saw a correction. Compound IC87114 is missing a methylene group in the structure. Nat Chem Biol. Feb 2010; 6(2): 117–124.(free article)

  6. anon the II says:

    One example in the PDB that always bothered me was 1OSH. I always thought that the eminent organic chemist associated with that work should have pointed out the error in their ways. But after reading the post of #3 xtal_vision, maybe I should give him a little slack.
    I think the lesson for medicinal chemists is that if the structure seems a little off, then maybe it’s disordered a bit. And if it looks perfect, then it still might be disorderd a bit.

  7. David Borhani says:

    @7, re: 1OSH. Egregious. No slack whatsoever should be given. Too few crystallographers of late treat coordinates for what they are, IMO: unique glimpses into Nature’s inner workings. They should be handled with loving care.
    Imagine what a poor and sorry state structural and conformational analysis would now be in had crystallographers and chemists of the 1910s-1960s treated coordinates (and other, analogous data) in such a cavalier manner.

  8. Toad says:

    Having worked on a variety of projects using structures from the PDB, there tends to be a high error rate on the protein side, typically more than on the ligand side, from my experience.
    Even though there are known methods, including free algorithms to check for the correct orientation of the sidechain amide in asparagine and glutamine residues, we continue to see a large number of these blatantly incorrect in the original deposited file, including proteins from a wide variety of target classes and sources.

  9. CialisizeMe says:

    #5 an #6 Thanks Derek. Didn’t see the correction. Glad they updated it.

  10. myma says:

    Many years ago, I worked with a computational chemistry group. We had a geeky Russian whose job it was, essentially, to clean up pdb structures. He would spend weeks on each one fixing, correcting, merging (if there was more than one good one available), minimizing, to make a protein structure that was good for further use. It does not surprise me at all that x% of structures in the pdb can be crap, given my experience working with this guy.

  11. kinaser says:

    I remember that IKK structure from when it came out. Download the electron density, and you’ll see the whole ATP site’s pretty optimistically fitted. As for the ligand, there are are just a few blobs here and there. Certainly nothing that looks like the structure’s supposed to be.

  12. exchemist says:

    How does it vary from inorganic (solid state) to organic synthesists? The former tends to do their own structure refinements and to have less fluffy organic lettuce waving around. Even has less atoms of low Z (low Z is harder to spot by x-ray).
    My question is if someone looked at comparable samples would the problem be same or different across the two fields?

  13. a. nonymaus says:

    Re: 13
    Inorganic small molecules are basically x-ray diffraction on easy mode unless you have huge amounts of disorder or twinning. You can also much more readily exploit things like anomalous dispersion of all your high-Z elements by acquiring data at multiple wavelengths. Inorganic X-ray structures start getting interesting once you are dealing with things like zeolites or intermetallic alloys (some of the latter crystallize in aperiodic tilings).
    The protein-ligand complex crystals that are the subject of the present discussion are much harder, both to get crystals and to get good X-ray data, let alone fitting the data. Medicinal chemists could throw a bone to their X-ray colleagues by hanging more and varied high-Z substituents off their screening libraries or at least their compounds that they end up co-crystallizing. Use TMS instead of t-butyl, Se for S or I for Br (ideally, make both and get X-ray data from the two co-crystals, biochemists use a similar trick to get better protein structures by substituting in selenocysteine), and so on. Hell, try hanging a ferrocene off it.

  14. Michael Bower says:

    1GOL! Never fails to amaze

  15. anon the II says:

    @ Michael Bower
    Ouch! That reminds me of when Joe Theismann “injured” his leg on a hit from Lawrence Taylor.

  16. poor science sceptic says:

    Isn’t the result of an x-ryay experiment NOT the density, but the diffraction pattern? At least a density can be calculated quite incorrectly?

  17. Mark Murcko says:

    100% agree that there are many published xray structures that need refinement. It can make an **enormous** difference in the quality of the modeling that is done with the structure. Even very simply things like the way the amide groups in Asn and Gln side chains are positioned, or which nitrogen on His is protonated, can destroy the results. A lot of the swill that we all see in modeling stems from a lack of care being taken in structure preparation & analysis. Medicinal chemists should always check with their structural & modeling colleagues on how the structures were prepared.
    Back in the day @ Vertex we spent a lot of time having the modelers and the crystallographers looking at the structures together to make sure that everything made chemical sense. We were also careful to show where the ambiguous density was, and often would put ligands into the active site in two different ways if the density was not crystal clear (ho ho). It was indeed a ton of work. The cool thing is that we now have tools (like the ones in the editorial Derek referenced) to instantly “sanity check” a structure, so there is no excuse to get it wrong!

  18. Yong Wang says:

    Some data simply don’t justify placing a ligand there, e.g. 4g93 and 3rzf. Most of the structures containing erroneous ligands come from academia. Lack of chemistry knowledge is one reason, another is lack of tools dealing with the geometry and conformation of small molecules. From my experience in industry I can say that the kind of problems described here is very rare. The validation tools we have are very good nowadays. Finally I completely disagree with the idea that you need to “have resolution down below 1Å” to have something as significant as a “structure”. There can be useful information about the bound organic compounds even at 3 A. I also disagree that “many” ligands are strained. Most are at or very the low energy conformations if you handle the small molecules right.

  19. Catherine says:

    One of the editors at Nat Chem Biol here. What should we be doing differently to make sure bad structures don’t get through the review process? Obviously we use experts in crystallography as referees, and a lot of problems are caught, but clearly (i.e., based on kinaser’s comments) it’s not a perfect system. Are there questions we could prompt referees with to make sure there isn’t nonsense in the structure, or some kind of checklist we could send to remind them to check different aspects, or something else? Are there journals doing this really well that you trust completely? Thanks for any thoughts.

Comments are closed.