Skip to main content

Drug Assays

Thoughts on Compound Collections

I’ve recently had similar questions from two different people (on two different coasts) about screening collections and compound libraries, so it seems like it could be a topic of interest. So far I have yet to come across a drug discovery organization that really thinks that its compound libraries are what they should be – there are too many of those similar compounds from those similar projects that we used to be into back when, there are too many of those compounds that we purchased from vendors X and Y and Z, there are too many insoluble ones or decomposing ones or just plain boring ones.

It’s natural to think that the way to better screening results is to have a better compound collection – I mean, what could be more obvious? And it’s also obvious that every compound collection could always improve – it could be larger, or less greasy, or less streaked with legacy compounds that have too much baggage, etc. I don’t think it ever does harm to address these issues. The question is, how much good does it do?

I ask that question just because of the numbers involved. You have the numbers in your own collection, and you have the numbers in chemical space in general. The first is probably pretty large, and the second is larger than any human mind can deal with. Any one company’s compound collection – any one planet’s compound collection – is a tiny mote in the sky of chemical space, even broadly druglike chemical space. So you’re not going to make much headway on the absolute scale, no matter how many diversity compounds you can get into vials.

Switching to the relative scale, then, you’re still going to have to make a pretty big effort to change your own collection’s character. This is why organizations have, over the years, tended to divide their collections into subfractions. Here are X thousand compounds to represent the deck as a whole – that’s something to use to try an assay out to troubleshoot it and get an idea of what the hit rate will be if you go on. Here’s the fraction of the deck that meets these particular biophysical criteria (and over here is the fraction of the deck that doesn’t). Here’s the fragment collection, here’s the more natural-product-like end of things, here’s the deliberately more three-dimensional set, and so on. In some (many?) companies, a classic screen-the-whole-deck situation has gotten more rare over the years, although there are always targets that need that and more.

That brings up another consideration: sometimes you want more compounds because you’re not really getting any actionable hits in a screen (particularly against those tough targets). And sometimes you want more compounds because while you’re getting real hits, they’re not hits that you like. Maybe they’re from those legacy projects and the IP/selectivity considerations are too much of a hairball, or they’re coming out too nonpolar or with too few vectors for further optimization, what have you. You’d rather start somewhere else, and the deck doesn’t give you as many somewhere-elses as you’d want.

My guess is that the second category has a better chance of being fixed by adding swaths of new compounds – at least you know that the target is capable of binding small molecules. Now, it can always be the problem that the target just wants ugly chemical matter (CETP is a classic example), in which case there may not be much to do about it, but there will surely be other projects where something more workable presents itself. But the first category – just not getting hits – is a tougher situation. You can get stuck in an unprovable-hypothesis loop: if you didn’t get hits, you need more compounds. So you add more, and you didn’t get hits – well, you didn’t add enough, or the right ones. Who’s to say that the winner isn’t just over the horizon?

That’s a question that combichem was supposed to help answer back about twenty years ago or so, and it’s the question that its sequel, DNA-encoded library screening, is supposed to help answer now. From the beginning, the tricky part of the latter compounds sets has been the number of reactions compatible with the attached DNA barcodes. Combichem had that problem, too, in its early days: what price another ten thousand aryl amides? But (partly because of that grim experience) the DEL folks have been, right from the start, trying to expand their synthetic universe, and these efforts continue. My read so far on DEL screening is that it’s a very useful technique, but (as is always the case) not a magic bullet. People outside of drug discovery always seem to be amazed that we can screen a million compounds and not find something that binds to Target X, and they’d be even more amazed to know that you can run a whole series of DELs, containing (well, most likely) as many compounds as there are people on earth, and still not get a hit, either.

These are also questions that DOS, diversity-oriented synthesis, was designed to help answer. I’ve always had reservations about that concept, but there have been a number of interesting papers that have continued to come out of screening such collections. You can also see that synthetic work continues on reactions to make more diverse compounds sets feasible. Those papers, and more like them, are uncovered through a PubMed search on “Schreiber SL”, although there are others. One problem, though, is that a search for “diversity oriented synthesis” itself turns up both these papers and a mound of other stuff besides. There are an awful lot of papers that appropriate the term to mean “We made a bunch of stuff because we could and here it is”. The uncharitable view is that’s what the early DOS papers from the Broad added up to as well, but believe me, this other stuff is really the pits. When you see someone banging out a bunch of (say) chalcones and slapping a “diversity-oriented synthesis” label on the resulting work, well, it really makes you roll your eyes. Schreiber must love it, too.

But has the DOS concept caught on outside the Broad Institute? It’s pretty labor-intensive, from a synthetic chemistry standpoint, although it’s designed to minimize that as much as possible once its time to actually make the libraries. But that’s a real “measure twice, cut once” situation; the last thing you want to do is rip into a big library synthesis that generates a bunch of half-made and truncated stuff instead of your desired Land of Diversity. I would like to have a better feel for how this is all working out. It may be that only organizations with significant manpower and funding can make a go of it, and significant bravery/nerve may be a requirement as well.

Now DOS is, in the grand scheme of things, a series of shots into that gigantic chemical space mentioned above, and you could spend a lifetime at it and never make a dent in the number of untested areas – but (as above) is that the relevant question? The current version of it appears to be providing some interesting hits against interesting targets, and what more can you ask? It would be very worthwhile to study the hit rates (and hit quality) of DOS screening as practiced at the Broad versus what comes out of more conventional large compound collections, but if a large-scale comparison of this sort is publicly available, I’ve missed it.

So, thoughts? How much effort is appropriate to beef up the screening collection, and what kind of return can one expect? Are you better letting the vendors clean out their closets and sell the stuff to you, or trying to make compounds yourself/pay someone to make them? What’s the comparative return on those? And the return versus and investment in DNA-encoded library technology (versus paying someone who does it already?) And does anyone do DOS-like stuff for themselves? Comments welcomed. . .

15 comments on “Thoughts on Compound Collections”

  1. petros says:

    I remember when the Wonder Drug Factory was upping the diversity(and size) of its in house screening collection by inclusion of compounds from the Agrochem business. We were asked to suggest compounds for inclusion some of which were reported as having purities of 90% (or less)

    This was the early days when quantity of compounds was the main concern

    1. Massive Analysis, hero of science says:

      Why dont you start taking it apart and analyzing it a little more there bud? You want to talk about it?

  2. Cato says:

    As a practicing DEL chemist for the last several years, I can say that when the technology works, it really works. And it’s not as difficult as people think to set up. The problem is that people get caught up with thinking 1) more compound numbers is always better 2) forgetting that someone is going to have to do med chem on these hits at some point. The number of times I have seen massive three (or god forbid 4) cycle libraries that return bloated hits that must be triaged… when really most of our quality hits have been 2-cycle pieces. Yes the published chemistry is limited for now (stay tuned!) but in general the selection is pretty forgiving as long as the DNA was not destroyed. The real challenge imho is the purity of protein–if you want quality hits you need pristine protein, not “pure by Western”. For many targets it seems this is a challenge, but maybe this is true for any screening platform.

  3. 10 Fingers says:

    My view on these kinds of “library-enhancing” exercises is that they are all essentially building lampposts where we think we have a chance of finding some useful keys.

    The key metric in that statement is “useful” – which implies a great deal of context. The challenges of some new kinase versus CTEP or renin or RAS or an intrinsically disordered protein….well, many details about “desirable” compounds will matter. Whether one is purposing for a phenotypic assay or binding assay, whether you want a tool compound or something that can be optimized into a drug, there are a lot of things that can push one to very different kinds of molecules that are useful as starting points.

    I have done a lot of library building over the years. My return on DOS-like compounds in an HTS/HTX focused realm has been poor, for whatever reason (though not a big sample size here). By contrast, for a lot of compound sets from a focused large-fragment and “deliberately more three-dimensional set” has been pretty good. It is a good set for phenotypic screening as well.

    I’d expect DEL to be pretty useful over time, but (per Cato’s comment above) we are still learning how to use it efficiently. For pure efficiency of time in getting to certain kinds of POC molecules and assay reagents – particularly for challenging PPIs – I still think that boring old phage display is pretty powerful, but that seems to have fallen out of favor.

  4. Peter Kenny says:

    I think coverage of chemical space is the single most important consideration when selecting compounds for screening and design of generic libraries is typically more challenging than design of targeted libraries. Even when the library is generic, it is typically desirable to sample some regions of chemical space more thoroughly than others and one way to do this is to use progressively less restrictive cutoffs as more compounds are added to the library. It is generally believed that compounds of low molecular complexity enable chemical space to be sampled most efficiently but this becomes academic if binding is too weak to be detected. Molecular complexity is a great concept but it is difficult to define in practical terms (I have restricted extent of substitution when selecting fragments for screening). An article on design of compound libraries for fragment screening is linked as the URL for this comment. I have used the general approach described in this article in the design of a phenotypic screening library and for processing virtual screening output.

  5. Someone says:

    I was thinking about using DEL screening against IDPs. Came to the conclusion that using at least 3 cycle library would be beneficial in this particular case.

  6. Anon says:

    More and more I read posts and threads on this blog and wonder how folks substitute actuall thinking for a quick-fix/easy answer to solve all their drug discovery problems (HTS, PAINS rule-of-five). Everything scientists do is a just tool to test explanations that are hard to vary (Deutsch). Misunderstand the limitations of tool (not to mention having a bad explanations) and well, look where we are. CAR-T didn’t come out of HTS….

  7. Barry says:

    Its appetite for ugly chemical matter is not CETP’s worst aspect as a drug target.

  8. Truther says:

    Most compounds are defined by NMR traces and Mass Spec weight. Two things, if you really think about it, dont have any relationship outside of cryptic theory. These are fed into pharma pipelines and in short work are ingested by human beings under coercion by doctors employed by pharma ( see the CNN article on Nuedexta) to make millions.

    1. Mous says:

      LOL (at you, not with you)

  9. Anonymous Researcher snaw says:

    I don’t think ANYBODY is happy with their HTS Deck no matter how big it is.

    In my experience a screening campaign can be frustrating in two main ways: you don’t get any promising hits (in which case you’re stuck) or you get too many hits (in which case you have fun meetings where everybody waves their arms around trying to figure out how we can triage the hits because we cannot work on all of them). In the latter case, at least 75% of the time someone will make a PowerPoint slide depicting their proposed filtering strategy as a series of funnels where a large number of molecules go into the first funnel and at the end a manageable number get significant resources invested. The higher up the food chain the creator of a funnel slide is, the more likely it will end up with one molecule going to NDA (New Drug Application). Real scientists don’t talk about NDA when the project is in Early Discovery stages.

    1. Diver Dude says:

      “Real scientists don’t talk about NDA when the project is in Early Discovery stages.”

      Maybe that’s part of your problem right there, cos that’s the whole point of the the exercise.

      1. KazooChemist says:

        Yep, “begin with the end in mind”, otherwise it is too easy to lose your way.

        1. Hap says:

          But when what you can do and what you want are disconnected from what you can actually make happen, planning for what you want doesn’t help. You can generate a drug candidate, and make sure that it does what you want in vivo and vitro, but NDAs are way beyond your will to impose. Man proposes, nature disposes.

          From reading here, if you plan for an NDA, management will make one, whether or not one really exists (whether or not there’s a compound that has a chance of being an actual drug). That way lies madness or bankruptcy (except for the managers that shepherded the NDA to slaughter, for whom riches lie that way).

  10. exGlaxoid says:

    A previous employer had almost 4 million compounds across several sites. In most simple cases a screen would generate many hits, often many already known or redundant. But some screens found no hits, even with 2 million tries. The real question might be, can anything bind well to that target, does it have an active site, binding domain, or enough of a pore that something could go into?

    Chemical space is a great sounding concept, but there are compounds that are clearly similar which have entirely different activities in-vivo, so not sure if it really matters. Given that mother nature has simple employed great diversity, random screening, and genetic feedback loops to screen for biologics, it seems that all of those might help humans do the same experiments.

    If I was to create a general HTS collection today, I would try to minimize too many similar compounds, try to stick with compounds between 300 and 700 mw, aim for a variety of various ring types, and try to cover general areas like amines, acids, aromatics, heterocycles, peptidics, and carbohydrates, but other than screening out obviously reactive or false positives, I would aim for as many different compounds as possible, if you don’t know what the targets are ahead of time.

    Given our poor success at modelling compounds for medicinal activity, I think screening out compounds based on computational activity or rules of 5 or bad functional groups is just limiting yourself unnecessarily. There are drugs on the market containing nitro groups, alkenes, epoxides, and bromines, but I hear people say that they should all be eliminated from HTS collections, which seems daft to me.

Comments are closed.