I’ve recently had similar questions from two different people (on two different coasts) about screening collections and compound libraries, so it seems like it could be a topic of interest. So far I have yet to come across a drug discovery organization that really thinks that its compound libraries are what they should be – there are too many of those similar compounds from those similar projects that we used to be into back when, there are too many of those compounds that we purchased from vendors X and Y and Z, there are too many insoluble ones or decomposing ones or just plain boring ones.
It’s natural to think that the way to better screening results is to have a better compound collection – I mean, what could be more obvious? And it’s also obvious that every compound collection could always improve – it could be larger, or less greasy, or less streaked with legacy compounds that have too much baggage, etc. I don’t think it ever does harm to address these issues. The question is, how much good does it do?
I ask that question just because of the numbers involved. You have the numbers in your own collection, and you have the numbers in chemical space in general. The first is probably pretty large, and the second is larger than any human mind can deal with. Any one company’s compound collection – any one planet’s compound collection – is a tiny mote in the sky of chemical space, even broadly druglike chemical space. So you’re not going to make much headway on the absolute scale, no matter how many diversity compounds you can get into vials.
Switching to the relative scale, then, you’re still going to have to make a pretty big effort to change your own collection’s character. This is why organizations have, over the years, tended to divide their collections into subfractions. Here are X thousand compounds to represent the deck as a whole – that’s something to use to try an assay out to troubleshoot it and get an idea of what the hit rate will be if you go on. Here’s the fraction of the deck that meets these particular biophysical criteria (and over here is the fraction of the deck that doesn’t). Here’s the fragment collection, here’s the more natural-product-like end of things, here’s the deliberately more three-dimensional set, and so on. In some (many?) companies, a classic screen-the-whole-deck situation has gotten more rare over the years, although there are always targets that need that and more.
That brings up another consideration: sometimes you want more compounds because you’re not really getting any actionable hits in a screen (particularly against those tough targets). And sometimes you want more compounds because while you’re getting real hits, they’re not hits that you like. Maybe they’re from those legacy projects and the IP/selectivity considerations are too much of a hairball, or they’re coming out too nonpolar or with too few vectors for further optimization, what have you. You’d rather start somewhere else, and the deck doesn’t give you as many somewhere-elses as you’d want.
My guess is that the second category has a better chance of being fixed by adding swaths of new compounds – at least you know that the target is capable of binding small molecules. Now, it can always be the problem that the target just wants ugly chemical matter (CETP is a classic example), in which case there may not be much to do about it, but there will surely be other projects where something more workable presents itself. But the first category – just not getting hits – is a tougher situation. You can get stuck in an unprovable-hypothesis loop: if you didn’t get hits, you need more compounds. So you add more, and you didn’t get hits – well, you didn’t add enough, or the right ones. Who’s to say that the winner isn’t just over the horizon?
That’s a question that combichem was supposed to help answer back about twenty years ago or so, and it’s the question that its sequel, DNA-encoded library screening, is supposed to help answer now. From the beginning, the tricky part of the latter compounds sets has been the number of reactions compatible with the attached DNA barcodes. Combichem had that problem, too, in its early days: what price another ten thousand aryl amides? But (partly because of that grim experience) the DEL folks have been, right from the start, trying to expand their synthetic universe, and these efforts continue. My read so far on DEL screening is that it’s a very useful technique, but (as is always the case) not a magic bullet. People outside of drug discovery always seem to be amazed that we can screen a million compounds and not find something that binds to Target X, and they’d be even more amazed to know that you can run a whole series of DELs, containing (well, most likely) as many compounds as there are people on earth, and still not get a hit, either.
These are also questions that DOS, diversity-oriented synthesis, was designed to help answer. I’ve always had reservations about that concept, but there have been a number of interesting papers that have continued to come out of screening such collections. You can also see that synthetic work continues on reactions to make more diverse compounds sets feasible. Those papers, and more like them, are uncovered through a PubMed search on “Schreiber SL”, although there are others. One problem, though, is that a search for “diversity oriented synthesis” itself turns up both these papers and a mound of other stuff besides. There are an awful lot of papers that appropriate the term to mean “We made a bunch of stuff because we could and here it is”. The uncharitable view is that’s what the early DOS papers from the Broad added up to as well, but believe me, this other stuff is really the pits. When you see someone banging out a bunch of (say) chalcones and slapping a “diversity-oriented synthesis” label on the resulting work, well, it really makes you roll your eyes. Schreiber must love it, too.
But has the DOS concept caught on outside the Broad Institute? It’s pretty labor-intensive, from a synthetic chemistry standpoint, although it’s designed to minimize that as much as possible once its time to actually make the libraries. But that’s a real “measure twice, cut once” situation; the last thing you want to do is rip into a big library synthesis that generates a bunch of half-made and truncated stuff instead of your desired Land of Diversity. I would like to have a better feel for how this is all working out. It may be that only organizations with significant manpower and funding can make a go of it, and significant bravery/nerve may be a requirement as well.
Now DOS is, in the grand scheme of things, a series of shots into that gigantic chemical space mentioned above, and you could spend a lifetime at it and never make a dent in the number of untested areas – but (as above) is that the relevant question? The current version of it appears to be providing some interesting hits against interesting targets, and what more can you ask? It would be very worthwhile to study the hit rates (and hit quality) of DOS screening as practiced at the Broad versus what comes out of more conventional large compound collections, but if a large-scale comparison of this sort is publicly available, I’ve missed it.
So, thoughts? How much effort is appropriate to beef up the screening collection, and what kind of return can one expect? Are you better letting the vendors clean out their closets and sell the stuff to you, or trying to make compounds yourself/pay someone to make them? What’s the comparative return on those? And the return versus and investment in DNA-encoded library technology (versus paying someone who does it already?) And does anyone do DOS-like stuff for themselves? Comments welcomed. . .