Here’s an update from Alex Satz of Roche on DNA-encoded library (DEL) screening. I’ve been mentioning this technique on the blog since its early days, and I freely admit that when it was starting out I had trouble believing that it worked (or even could work). The idea, in short, is that you append a short bit of DNA to a small-molecule starting material, and then elaborate that into a combinatorial library of compounds while the DNA is still attached. The key part is that after each step, you use molecular biology techniques to add more DNA bases to the end, and these sequences are deliberately chosen to encode the synthetic history of the compound as it’s been branched out. You can have a huge number of these things, since you split out into hundreds or thousands of individual wells along the way, each one of which gets a discrete building block (and a corresponding discrete DNA oligo “bar code” ligated to it. That process is the first use of molecular biology, since the enzymes involved are very good at their jobs indeed.
In the end, once you’ve recombined, you’ve got a small volume of solution in which there might be (should be, in fact) many millions of compounds, each with its own DNA tag. At this point, you run a screen for binding, washing off everything to leave only the tight binders, and you use PCR and modern sequencing to figure out the DNA sequences of what’s left. And that’s the back-end use of molecular biology, because this combination of techniques allows you to identify extremely small amounts of material, as long as it has a DNA sequence attached to it. When you get the sequences, you go back to your master key and figure out which compounds those were. . .OK, scaffold 1. . .with an aminopyrrolidine at the first position. . . and the (R)-methyl side chain in the next step. . .and capped off with heterocycle number forty-eight. That sort of thing.
At this point, the time-consuming step kicks in: synthesizing these things “off-DNA”. You most certainly do not have fifty million individual vials sitting downstairs, each with a fifty-mg sample of each member of the DNA-encoded library. That would be 2500 kilos of stuff, and that’s neglecting the weight of the vials! You’d need a serious building to hold the “real” samples of one DNA-encoded library, and that’s exactly what you’re trying to avoid ever having to do. So you head back to the hood and make the non-DNA forms of these hits (as many as you think appropriate or feasible) and see how they bind in that form.
That’s the point where my mental picture really began to break down when I first heard about this. A small molecule with enough DNA attached to it to be a useful bar code, in my mind, looked like a canoe towing an aircraft carrier. I had trouble believing that these things would screen in a meaningful way, and trouble believing that you wouldn’t just get a hit set that was dominated by all sorts of DNA-driven interactions (since there’s so much more DNA in that vial than there is small-molecule chemical matter).
Well, those concerns are not idiotic, but they don’t keep the idea from working, either. (I try to keep this example in mind whenever I’m evaluating a new technology). As that new short review illustrates, DEL screening does indeed work, and by now it has delivered compound series that have advanced to the clinic. Once you’ve made a DEL, the screening technique for it doesn’t vary all that much (in most cases), and you can try variations without too much trouble. These screens burn up less protein than most other HTS efforts, and a single DEL, once produced, is generally good for re-use in a large number of screens. The tough part, as mentioned, is the workup of the results The sequencing has gotten easier and cheaper over the years, as has the sheer number-crunching of figuring out what compounds hit and what the relationships are between them. Resynthesis, though, is still pretty much the same delirious happiness it’s always been.
How large should these things be? There have been some reports of ridiculously huge libraries (billions, up towards a trillion compounds), but as Satz notes, there’s very little information about these and no sign that they’re producing more or better hits. His own experience seems to be that collections with (theoretical) compound counts in the millions/tens of millions have much better signal-to-noise and that at least under current conditions, going higher may well be a mistake.
He also has some thoughts about where DEL technology might be going. As always, one possibility is the availability of new chemistries that can be performed in the presence of the DNA bar codes. The diversity of the libraries has been more dependent on the variety of the building blocks than the reactions used to put them together. Another idea that would open things up more would be to have a way to have the DNA barcodes present but allow the compounds themselves to be cut free under the assay conditions (in order to run cellular and functional assays). There are several schemes being worked on to realize this, and they’re deserving of another post in detail.