Skip to Content

How Many Binding Pockets Are There?

Just how many different small-molecule binding sites are there? That’s the subject of this new paper in PNAS, from Jeffrey Skolnick and Mu Gao at Georgia Tech, which several people have sent along to me in the last couple of days.
This question has a lot of bearing on questions of protein evolution. The paper’s intro brings up two competing hypotheses of how protein function evolved. One, the “inherent functionality model”, assumes that primitive binding pockets are a necessary consequence of protein folding, and that the effects of small molecules on these (probably quite nonspecific) motifs has been honed by evolutionary pressures since then. (The wellspring of this idea is this paper from 1976, by Jensen, and this paper will give you an overview of the field). The other way it might have worked, the “acquired functionality model”, would be the case if proteins tend, in their “unevolved” states, to be more spherical, in which case binding events must have been much more rare, but also much more significant. In that system, the very existence of binding pockets themselves is what’s under the most evolutionary pressure.
The Skolnick paper references this work from the Hecht group at Princeton, which already provides evidence for the first model. In that paper, a set of near-random 4-helical-bundle proteins was produced in E. coli – the only patterning was a rough polar/nonpolar alternation in amino acid residues. Nonetheless, many members of this unplanned family showed real levels of binding to things like heme, and many even showed above-background levels of several types of enzymatic activity.
In this new work, Skolnick and Gao produce a computational set of artificial proteins (called the ART library in the text), made up of nothing but poly-leucine. These were modeled to the secondary structure of known proteins in the PDB, to produce natural-ish proteins (from a broad structural point of view) that have no functional side chain residues themselves. Nonetheless, they found that the small-molecule-sized pockets of the ART set actually match up quite well with those found in real proteins. But here’s where my technical competence begins to run out, because I’m not sure that I understand what “match up quite well” really means here. (If you can read through this earlier paper of theirs at speed, you’re doing better than I can). The current work says that “Given two input pockets, a template and a target, (our algorithm) evaluates their PS-score, which measures the similarity in their backbone geometries, side-chain orientations, and the chemical similarities between the aligned pocket-lining residues.” And that’s fine, but what I don’t know is how well it does that. I can see poly-Leu giving you pretty standard backbone geometries and side-chain orientations (although isn’t leucine a little more likely than average to form alpha-helices?), but when we start talking chemical similarities between the pocket-lining residues, well, how can that be?
But I’m even willing to go along with the main point of the paper, which is that there are not-so-many types of small-molecule binding pockets, even if I’m not so sure about their estimate of how many there are. For the record, they’re guessing not many more than about 500. And while that seems low to me, it all depends on what we mean by “similar”. I’m a medicinal chemist, someone who’s used to seeing “magic methyl effects” where very small changes in ligand structure can make big differences in binding to a protein. And that makes me think that I could probably take a set of binding pockets that Skolnick’s people would call so similar as to be basically identical, and still find small molecules that would differentiate them. In fact, that’s a big part of my job.
But in general, I see the point they’re making, but it’s one that I’ve already internalized. There are a finite number of proteins in the human body. Fifty thousand? A couple of hundred thousand? Probably not a million. Not all of these have small-molecule binding sites, for sure, so there’s a smaller set to deal with right there. Even if those binding sites were completely different from one another, we’d be looking at a set of binding pockets in the thousands/tens of thousands range, most likely. But they’re not completely different, as any medicinal chemist knows: try to make a selective muscarinic agonist, or a really targeted serine hydrolase inhibitor, and you’ll learn that lesson quickly. And anyone who’s run their drug lead through a big selectivity panel has seen the sorts of off-target activities that come up: you hit someof the other members of your target’s family to greater or lesser degree. You hit the flippin’ sigma receptor, not that anyone knows what that means. You hit the hERG channel, and good luck to you then. Your compound is a substrate for one of the CYP enzymes, or it binds tightly to serum albumin. Who has even seen a compound that binds only to its putative target? And this is only with the counterscreens we have, which is a small subset of the things that are really out there in cells.
And that takes me to my main objection to this paper. As I say, I’m willing to stipulate, gladly, that there are only so many types of binding pockets in this world (although I think that it’s more than 500). But this sort of thing is what I have a problem with:

“. . .we conclude that ligand-binding promiscuity is likely an inherent feature resulting from the geometric and physical–chemical properties of proteins. This promiscuity implies that the notion of one molecule–one protein target that underlies many aspects of drug discovery is likely incorrect, a conclusion consistent with recent studies. Moreover, within a cell, a given endogenous ligand likely interacts at low levels with multiple proteins that may have different global structures.

“Many aspects of drug discovery” assume that we’re only hitting one target? Come on down and try that line out in a drug company, and be prepared for rude comments. Believe me, we all know that our compounds hit other things, and we all know that we don’t even know the tenth of it. This is a straw man; I don’t know of anyone doing drug discovery that has ever believed anything else. Besides, there are whole fields (CNS) where polypharmacy is assumed, and even encouraged. But even when we’re targeting single proteins, believe me, no one is naive enough to think that we’re hitting those alone.
Other aspects of this paper, though, are fine by me. As the authors point out, this sort of thing has implications for drawing evolutionary family trees of proteins – we should not assume too much when we see similar binding pockets, since these may well have a better chance of being coincidence than we think. And there are also implications for origin-of-life studies: this work (and the other work in the field, cited above) imply that a random collection of proteins could still display a variety of functions. Whether these are good enough to start assembling a primitive living system is another question, but it may be that proteinaceous life has an easier time bootstrapping itself than we might imagine.

17 comments on “How Many Binding Pockets Are There?”

  1. sgcox says:

    This paper is worth mentioning here:

  2. anon says:

    “I don’t know of anyone doing drug discovery that has ever believed anything else.”
    Quite a few academic PIs fancy themselves doing drug discovery these days. Many (not all, mind you, but many) of them are not very knowledgeable of the actual pharmacology, let alone medicinal chemistry. Thus “notion of one molecule–one protein target that underlies many aspects of drug discovery” may be spot on, insofar as it concerns the predominant mentality in academic “drug discovery”.

  3. petros says:

    This review focusing on cancer targets considers the multiplicity of potential drug sites on target proteins.
    Nat Rev Drug Discov. 2013 Jan;12(1):35-50. doi: 10.1038/nrd3913

  4. “This promiscuity implies that the notion of one molecule–one protein target that underlies many aspects of drug discovery is likely incorrect”
    Maybe it simply means that even those compounds which we think are selective for one target are probably not? It certainly seems to be the case for drugs like Gleevec and maybe we will find it to be the case for others if we dig deeper.

  5. PF9 says:

    A couple of points to consider:
    1) a lot will depend on how you define the extent of a binding site. Think about serine proteases: it is S1, how much of S1, do you add in more of the central cavity etc etc
    2) Just because a ligand shows binding in vitro, is there really a causal, PK/PD driven link to action in vivo? In many cases I doubt it (hERG aside)

  6. anchor says:

    #2-spot on! I moved into academia from big-pharma. I am exasperated at many level with these PIs and I find most of them to be “one trick pony.” They all in their infinite wisdom believe that if you fix an “issue” that is their specialty and staple diet for their existence in academia (logP, blood curve, mouse model etc.) then the drug molecule will happen! I am very frustrated at their ignorance. I try to reason with them that it is not that simple but my suggestions and reasoning’s fall by the way side. Call it their stupidity or naivety. More damaging these days are with the ready availability of Scifinder search engines, they are even getting bolder and as a medicinal chemist with modest success in the industry, am simply flabbergasted.

  7. Imaging guy says:

    When do you call a hit a hit? What is the cutoff Kd below which it is not longer considered a hit? Since there are different interaction assays, what about cross platforms reproducibility?

  8. littlegreenpills says:

    The estimate of the number of binding pockets seems confusing. Are they only considering catalytic/active sites? What about allosteric sites?
    If there are only about 500 sites then we are probably done finding “new” drugs and should just focus on tweaking the ones already out there to provide the desired effects.

  9. a. nonymaus says:

    #7 is onto something here. If something binds to two proteins that can be a problem unless the delta-Kd lets me dose so that one is 95% bound and the other is 5% bound.
    What I find surprising about receptors is that subtypes exist at all. What selection pressure is there to maintain so many nicotinic receptor subtypes when they all bind nicotine? Is it that they have different nicotine binding constants? If so, why doesn’t the cell just vary the receptor density? Is it that they have different effects on binding and the receptor binding-site differences are an incidental artifact of the structural changes required for the different effects?

  10. mausanony says:

    Evolution is a blind watchmaker. If some minor variation to the function conveys fitness to an emerging sub-species, it will be selected for. It matters not how that minor variation was arrived at: receptor gene duplication and slight sequence divergence? Perturbation to transcriptional regulation of that same gene within a different cell type? Perturbation of intracellular signalling cascade due to a mutation who knows where, which results in altered receptor density? Over the aeons, many, if not all, of the possible mechanisms that give rise to phenotypic variation will have a shot at contributing something to the organism, and the complexity (such as receptor subtypes that all bind the same thing) will accumulate.

  11. Johannes says:

    I’m somewhat skeptical of their conclusion as well. Fx SGX523, according to Stephen Burley, in a video posted on coursera, showed only binding affinity to a single protein target, something other thought impossible. Could be a freak, likely not

  12. Yolo says:

    This article touches on a concept very similar to this and applies it to library design:

  13. Anonymous says:

    Evolution is a blind watchmaker. If some minor variation to the function conveys fitness to an emerging sub-species, it will be selected for.
    This is simply not true. While natural selection is dependent upon the difference in the number of offspring among variant phenotypes, the difference is the average difference in number of offspring among variant phenotypes and not the individual difference in number of offspring wit

  14. Dr. Manhattan says:

    ” They all in their infinite wisdom believe that if you fix an “issue” that is their specialty and staple diet for their existence in academia (logP, blood curve, mouse model etc.) then the drug molecule will happen!”
    Anchor, I totally agree, based on my own experience! In fact, I suspect the real goal is to get continued funding for their academic “drug discovery” efforts. It is virtually impossible to perform real drug discovery in the absence of a large, multidisciplinary team.

  15. Cellbio says:

    Yes, CW, when you take drugs with known moa and presumed selectivity and screen them broadly in biology, you see activities not appreciated. When large collections, say a thousand molecules from one med chem campaign, or in another instance, 18 steroids of similiar structure are screened, it closer to truth that no two are alike than there is evidence of a single target associated with the compound’s pharmacology.
    We can only adhere to our idea of pursuing the biology of a single target, as the target-centric biology era has done, if we measure little else than the intended impact. And i believe this has been propagated in pharma and is not unique to academia. I dont think the best or most experienced in pharma hold these beliefs, but neither do i think the best of pharma often rise to the top. It can do wonders for your career to populate the pipeline with paradigm driven metric measured clinical candidates that fail spectaculary once in development. Helps with bonus too and lets the execs trot out wonderfully bloated pipeline charts, sometimes with dead molecules remaining as Ph1 or Ph2 zombies until other positive news allows for slipping in public notice of termination.
    Career building around socially endorsed endeavors that deviate from good science is, in my opinion, rampant in big companies. It is rare to find a company culture where the voice of a skeptical scientist that urges caution carries the same weight as that of a charismatic business leader, even when the salient issue is technical in nature. That leader, especially when not from the scientific ranks, gives us all the organizational problems spoken of often on this blog, and represented well in fables like Emperor’s new clothes.

  16. simpl says:

    After the finding reported in Nature on Nppb and receptors, make that 501? In fact, it reminded me of the old Beadle/Tatum idea – one gene complex = 1 protein – that would give you a maximum number of receptors of the order of 10000.

Comments are closed.