Skip to main content

Drug Assays

Why Does Screening Work At All? (Free Business Proposal Included!)

I’ve been meaning to get around to a very interesting paper from the Shoichet group that came out a month or so ago in Nature Chemical Biology. Today’s the day! It examines the content of screening libraries and compares them to what natural products generally look like, and they turn up some surprising things along the way. The main question they’re trying to answer is: given the huge numbers of possible compounds, and the relatively tiny fraction of those we can screen, why does high-throughput screening even work at all?
The first data set they consider is the Generated Database (GDB), a calculated set of all the reasonable structures with 11 or fewer nonhydrogen atoms, which grew out of this work. Neglecting stereochemistry, that gives you between 26 and 27 million compounds. Once you’re past the assumptions of the enumeration (which certainly seem defensible – no multiheteroatom single-bond chains, no gem-diols, no acid chlorides, etc.), then there are no human bias involved: that’s the list.
The second list is everything from the Dictionary of Natural Products and all the metabolites and natural products from the Kyoto Encyclopedia of Genes and Genomes. That gives you 140,000+ compounds. And the final list is the ZINC database of over 9 million commercially available compounds, which (as they point out) is a pretty good proxy for a lot of screening collections as well.
One rather disturbing statistic comes out early when you start looking at overlaps between these data sets. For example, how many of the possible GDB structures are commercially available? The answer: 25,810 of them – in other words, you can only buy fewer than 0.01% of the possible compounds with 11 heavy atoms or below, making the “purchasable GDB” a paltry list indeed.
Now, what happens when you compare that list of natural products to these other data sets? Well, for one thing, the purchasable part of the GDB turns out to be much more similar to the natural product list than the full set. Everything in the GDB has at least 20% Tanimoto similarity to at least one compound in the natural products set, not that 20% means much of anything in that scoring system. But only 1% of the GDB has a 40% Tanimoto similarity, and less than 0.005% has an 80% Tanimoto similarity. That’s a pretty steep dropoff!
But the “purchasable GDB” holds up much better. 10% of that list has 100% Tanimoto similarity (that is, 10% of the purchasable compounds are natural products themselves). The authors also compare individual commercial screening collections. If you’re interested, ChemBridge and Asinex are the least natural-product-rich (about 5% of their collections), whereas IBS and Otava are the most (about 10%).
So one answer to “why does HTS ever work for anything” is that compound collections seem to be biased toward natural-product type structures, which we can reasonably assume have generally evolved to have some sort of biological activity. It would be most interesting to see the results of such an analysis run from inside several drug companies against their own compound collections. My guess is that the natural product similarities would be even higher than the “purchasable GDB” set’s, because drug company collections have been deliberately stocked with structural series that have shown activity in one project or another.
That’s certainly looking at things from a different perspective, because you can also hear a lot of talk about how our compound files are too ugly – too flat, too hydrophobic, not natural-product-like enough. These viewpoints aren’t contradictory, though – if Shoichet is right, then improving those similarities would indeed lead to higher hit rates. Compared to everything else, we’re already at the top of the similarity list, but in absolute terms there’s still a lot of room for improvement.
So how would one go about changing this, assuming that one buys into this set of assumptions? The authors have searched through the various databases for ring structures, taking those as a good proxy for structural scaffolds. As it turns out 83% of the ring scaffolds among the natural products are unrepresented among the commercially available molecules – a result that I assume that Asinex, ChemBridge, Life Chemicals, Otava, Bionet and their ilk are noting with great interest. In fact, the authors go even further in pointing out opportunities, with a table of rings from this group that closely resemble known drug-like ring systems.
But wait a minute. . .when you look at those scaffolds, a number of them turn out to be rather, well, homely. I’d be worried about elimination to form a Michael acceptor in compound 19, for example. I’m not crazy about the N,S acetal in 21 or the overall stability of the acetals in 15, 17 and 31. The propiolactone in 23 is surely reactive, as is the quinone in 25, and I’d be very surprised if that’s not what they owe their biological activities to. And so on.
Shoichet scaffolds
All that said, there are still some structures in there that I’d be willing to check out, and there must be more of them in that 83%. No doubt a number of the rings that do sneak into the commercial list are not very well elaborated, either. I think that there is a real commercial opportunity here. A company could do quite well for itself by promoting its compound collection as being more natural-product similar than the competition, with tractable molecules, and a huge number of them unrepresented in any other catalog.
Now all you’d have to do is make these things. . .which would require hiring synthetic organic chemists, and plenty of them. These things aren’t easy to make, or to work with. And as it so happens, there are quite a few good ones available these days. Anyone want to take this business model to heart?

13 comments on “Why Does Screening Work At All? (Free Business Proposal Included!)”

  1. molecular architect says:

    “which we can reasonably assume have generally evolved to have some sort of biological activity”
    The real value of natural products as biologically active leads is due to a more fundamental property. There are a limited number of protein structural motifs (secondary structures). Natural products have evolved to BIND to these motifs, either in the proteins involved in their biosynthesis or in their biological targets. A NP which binds to one of these motifs represents a logical starting point for another protein which shares the motif, even if not part of the same enzyme class. For an excellent analysis of this property of natural products see the recent series of papers about “Biology Oriented Synthesis” by Herbert Waldmann. doi 10.1007/s00018-007-7492-1 and references therin.
    Based on your comments, Soichet’s analysis looks like a very interesting analysis. Will have to set time aside to read it in detail this afternoon.

  2. Retread says:

    #1 “There are a limited number of protein structural motifs (secondary structures). Natural products have evolved to BIND to these motifs, either in the proteins involved in their biosynthesis or in their biological targets.” True enough as far as it goes, but this is pretty protein-centric. Consider Thiamine, B12 etc. etc. Either they’ve evolved to bind to RNA (which they do in bacterial riboswitches) or RNA has evolved to bind them — more likely, since both are enzyme cofactors. It is possible that some natural products have evolved to bind RNA (of all sorts not just mRNAs), DNA or even the glycoproteins and mucopolysaccharides of the extracellular matrix.
    P.S. hope to have my own blog — probably called Chemiotics-II up soon

  3. EngelGW says:

    Well… As you mention “These things aren’t easy to make, or to work with…” That already two major drawbacks for a medicinal chemist in the pharmaceutical industry. If, in addition, the IP isn’t owned by the pharmaceutical company who employ him, it will definitely be difficult to find customers for such a business model.

  4. Rubiscoman says:

    I like the double meaning:
    “Now all you’d have to do is make these things. . .which would require hiring synthetic organic chemists, and plenty of them. These things aren’t easy to make, or to work with. And as it so happens, there are quite a few good ones available these days.”
    Are you saying synthetic organic chemists are hard to work with, and that quite a few good synthetic chemists are available these days 😉

  5. Sili says:

    I don’t know that these ‘issues’ would have presented themselves to me so readily when I was fresh out of organics class, but now I even have to think about what you mean by your evaluations. Disturbing how much can seep out of a brain in a few years.

  6. molecular architect says:

    #2 Point taken. My comment is protein-centric but then all (to the best of my knowledge) NPs are the product of protein-catalyzed biosynthesis and thus are designed to bind the biosynthetic enzymes. They then are predisposed to bind other enzymes composed of similar 3D motifs. Likewise, enzymes bind to other macromolecules (DNA, RNA, ploysaccharides, etc.). Thus, you could predict that NPs will likely bind to complementary motifs in these macromolecules too.
    While the ability of modern medicinal and synthetic chemists to design and make molecules is impressive, Mother Nature is still an outstanding, if not the best, source of inspiration.

  7. NP_chemist says:

    If in addition to Waldmann’s BIOS analyses one looks also at Quinn’s biosynthetic schemas, that effectively state that the mirror image of the last biosynthetic enzyme in the cascade is a proxy for the binding domain of the biosynthesis product (read target enzyme), then the circle is closed for NP structures and hence one potential reason for their activities.

  8. bootsy says:

    “Natural product likeness” is something that seems to keep popping up as a major topic every few years. I admit, I tend to get a bit turned off when someone says that just because it has lots of sp3 carbons, stereodefined alcohols, and complicated rings, that it is “natural product like”. Natural products, being products of incomprehensibly long periods of optimization, are much more refined than that. A few minor changes and all of the special properties that let them be large and still bioavailable are gone. Move a single methyl group on cyclosporin and now it’s NIM811 and doesn’t act on the immune system anymore. Change one more methyl group and it’s PSC833 and it doesn’t touch anything but PGP pumps.
    Also, it seems like a bad idea to make such specific molecules and ask them to be hits in an HTS. As this paper shows, any screening deck is a paltry amount of diversity, however you measure it. If you wants hits, you need some molecules that are more general inhibitors but make good starting points for building in potency and specificity. In this regard, the rising tide of fragment based screening seems a lot better way to hedge one’s bets.
    Finally, it also seems that when a protein encounters a molecule, the core atom connectivity is not something that matters much. Rather it is the shape of the surface and the relative distribution of charges and such that the protein (or RNA, or DNA) sees. I’m not sure how much the line drawings we use to represent what a molecule is matter. That’s why scaffold hopping can work at all.
    Still, I like reading papers like this for the thoughts and discussions they bring out. The overall idea sounds a bit like Infinity Redux though.

  9. Morten G says:

    There’s a company,, that does natural product chemistry but I don’t think they employ that many synthetic chemists. I guess they use chemists for product purification and QC.
    The idea is that they mix up gene cassettes from various organisms that produce something along the lines of what they want and put them in yeast. Then they select for the yeast that produce the products that inhibit their target best.
    I think it’s a pretty small company but at least they are hiring.
    Whether natural-like or non-natural like compounds are best… Well, intuitively I’d say that the natural-like are more likely to bind proteins but on the other hand I’ve never seen any data to support that hypothesis.

  10. retread says:

    #1 && #6 — Interesting way to look at natural products and what their analogues might bind to. One example of this sort of mechanism would be anti-idiotypic antibodies (if you regard proteins as natural products).
    Forcing the idea to where it probably doesn’t belong — one might expect lectins (proteins which bind the sugar components of glycoproteins) to resemble the various glycosyl transferases, sulfotransferases etc. etc. which build and modify the sugar chains — I don’t know if they do.
    Similarly, do the active sites of enzymes making the huge variety of neurotoxins which bind to the transmembrane segments of ion channels resemble these segments — particularly if the enzymes are cytosolic? Again, I don’t know but I doubt that they do.
    Nonetheless, an interesting idea, and like all such, it makes you think and try to come up with ways to test it.

  11. kerri says:

    I just wanted to thank you for this post! I am a 4th year grad student and literally the day you posted this, my PI asked if I would take this compound our group works with and use it as a template for screening in silico and then take the results (well as many of them as we can get our hands on) and do a cell study. I had no where to start learning about how to do such a task…. and you just gave me the best starting point ever!

  12. Jane Yao says:

    Screen our Newly Isolated compound library to generate new drug leads.
    Please take a look at our unique sample library containing low hanging fruits, and consider screening it in your next drug lead discovery.
    We ( provide over 12,000 non-commercially available compounds and fractions obtained by column separation of worldwide chemically untapped natural products.
    Health Resource Pharmaceuticals LLC

Comments are closed.