Skip to Content

Drug Assays

DNA Libraries Are Here to Stay

Here’s an update from Alex Satz of Roche on DNA-encoded library (DEL) screening. I’ve been mentioning this technique on the blog since its early days, and I freely admit that when it was starting out I had trouble believing that it worked (or even could work). The idea, in short, is that you  append a short bit of DNA to a small-molecule starting material, and then elaborate that into a combinatorial library of compounds while the DNA is still attached. The key part is that after each step, you use molecular biology techniques to add more DNA bases to the end, and these sequences are deliberately chosen to encode the synthetic history of the compound as it’s been branched out. You can have a huge number of these things, since you split out into hundreds or thousands of individual wells along the way, each one of which gets a discrete building block (and a corresponding discrete DNA oligo “bar code” ligated to it. That process is the first use of molecular biology, since the enzymes involved are very good at their jobs indeed.

In the end, once you’ve recombined, you’ve got a small volume of solution in which there might be (should be, in fact) many millions of compounds, each with its own DNA tag. At this point, you run a screen for binding, washing off everything to leave only the tight binders, and you use PCR and modern sequencing to figure out the DNA sequences of what’s left. And that’s the back-end use of molecular biology, because this combination of techniques allows you to identify extremely small amounts of material, as long as it has a DNA sequence attached to it. When you get the sequences, you go back to your master key and figure out which compounds those were. . .OK, scaffold 1. . .with an aminopyrrolidine at the first position. . . and the (R)-methyl side chain in the next step. . .and capped off with heterocycle number forty-eight. That sort of thing.

At this point, the time-consuming step kicks in: synthesizing these things “off-DNA”. You most certainly do not have fifty million individual vials sitting downstairs, each with a fifty-mg sample of each member of the DNA-encoded library. That would be 2500 kilos of stuff, and that’s neglecting the weight of the vials! You’d need a serious building to hold the “real” samples of one DNA-encoded library, and that’s exactly what you’re trying to avoid ever having to do. So you head back to the hood and make the non-DNA forms of these hits (as many as you think appropriate or feasible) and see how they bind in that form.

That’s the point where my mental picture really began to break down when I first heard about this. A small molecule with enough DNA attached to it to be a useful bar code, in my mind, looked like a canoe towing an aircraft carrier. I had trouble believing that these things would screen in a meaningful way, and trouble believing that you wouldn’t just get a hit set that was dominated by all sorts of DNA-driven interactions (since there’s so much more DNA in that vial than there is small-molecule chemical matter).

Well, those concerns are not idiotic, but they don’t keep the idea from working, either. (I try to keep this example in mind whenever I’m evaluating a new technology). As that new short review illustrates, DEL screening does indeed work, and by now it has delivered compound series that have advanced to the clinic. Once you’ve made a DEL, the screening technique for it doesn’t vary all that much (in most cases), and you can try variations without too much trouble. These screens burn up less protein than most other HTS efforts, and a single DEL, once produced, is generally good for re-use in a large number of screens. The tough part, as mentioned, is the workup of the results The sequencing has gotten easier and cheaper over the years, as has the sheer number-crunching of figuring out what compounds hit and what the relationships are between them. Resynthesis, though, is still pretty much the same delirious happiness it’s always been.

How large should these things be? There have been some reports of ridiculously huge libraries (billions, up towards a trillion compounds), but as Satz notes, there’s very little information about these and no sign that they’re producing more or better hits. His own experience seems to be that collections with (theoretical) compound counts in the millions/tens of millions have much better signal-to-noise and that at least under current conditions, going higher may well be a mistake.

He also has some thoughts about where DEL technology might be going. As always, one possibility is the availability of new chemistries that can be performed in the presence of the DNA bar codes. The diversity of the libraries has been more dependent on the variety of the building blocks than the reactions used to put them together. Another idea that would open things up more would be to have a way to have the DNA barcodes present but allow the compounds themselves to be cut free under the assay conditions (in order to run cellular and functional assays). There are several schemes being worked on to realize this, and they’re deserving of another post in detail.

43 comments on “DNA Libraries Are Here to Stay”

  1. Innovorich says:

    So when chemical space is 10^56, does it matter if you screen 10^6 or 10^7 and don’t ask any question about chemical diversity??

    And yes the aircraft carrier has an effect in many (most?) cases but of course with any technique, the results will be target dependent and it’s always possible to tell the story of one that worked if that’s what your aim is.

    1. Bla says:

      What part of that chemical space is actually suitable in any way to be a drug? Most of it isn’t, even with the move liberal definition of “drug”.

      1. AR says:

        Approx 10^30 in drug like chemical space

  2. Sanjay says:

    So suppose you encode with a carbohydrate instead of DNA? The front-end labelling is harder but the enzymes are being characterized at breakneck pace right now; the back-end PCR isn’t a possibility but again identification of glycans is getting better real real fast. The gain is maybe you have something that you can use on different proteins where DNA isn’t an option, and you can get vastly more diversity from a much smaller molecule — more of a canoe tugging a trawler.

    1. truthortruth says:

      I don’t know enough about carbohydrate sequencing to address to comment on how feasible this is to achieve in a high-throughput setting. However, next-gen sequencing is so cheap and easy that the bar will be high to move away from this readout.

    2. HTSguy says:

      I think you are underestimating the vast amplification you get from PCR

      1. Dr. Manhattan says:

        Yes, that is exactly the problem with the carbohydrate approach. PCR and subsequent sequencing allows the compounds that do bind to rise above the overall “noise” of the screen. With Illumina sequencing technology, one can easily get 100’s of millions of reads. A decent binder will have an attached sequence that rises well above the background screen noise. Plus it is probably easier to encode the chemical synthesis information in the attached DNA than it would be for carbohydrates.

        1. GDS says:

          The incorporation of unique molecular identifiers allows us to distinguish bona-fide signal from PCR bias.

    3. MB says:

      Encoding information into carbohydrates and glycans has been an idea that has intrigued me for a LONG time. It’s true, the information capacity capabilities of carbohydrates is light years beyond what DNA can encode simply due to the chemical nature of carbohyrates (branching, multiple hydroxyls in axial/equitorial configurations, alpha beta linkages, etc. etc.).

      Unfortunately the tech right now to synthesize and decode sugars simply doesn’t exist yet because the chemistry is ghastly and what we currently have is way too expensive. The information that could be stored in carbs, however, is orders of magnitude larger; just 6 simple hexoses alone can be arranged in over 1 trillion possible different ways on paper.

      1. barry says:

        The information that could be encoded into carbohydrates is neither larger nor smaller than the information that could be encoded into DNA. All that would change would be the mw of the tag.

  3. Wavefunction says:

    The visualization and computational technology that allows one to cluster and visualize islands of potential SAR for select needles in the haystack also deserves a nod. From my own experience, this post-processing is crucial if you want to avoid getting drowned in a sea of false positives.

  4. Mike says:

    Isn´t DEL are capable of making only simple molecules that may not be very different from combinatorial libraries industries already have screened? If so, we do not look for any miracle here, right?

  5. Fluorinator says:

    That still wouldn’t solve the problem of your coded label not withstanding certain reaction conditions though, right? That is my biggest struggle with the concept overall (thinking purely from a synthetic chemists POV)…

    1. BioGuy says:

      Aqueous chemistry is absolutely a big limitation of DEL at the moment, but that should begin to change as more effort is placed into discovering novel reaction conditions that are compatible with DNA.

    2. CMCguy says:

      In practice oligonucleotides can be more robust in many reactions than imagined thus I am not sure condition constraints are a major factor in overall library construction presuming know the tags to avoid or suitably protect. I look at is more the other way around if the conditions might damage the DNA then worry what functional groups can incorporate in the molecules that would not make flat, minimal diversity, non-drug like molecules that seemed to be predominate in the old combichem libraries. Although seems unlikely can generate compounds that is actual drug such project maybe able to narrow the efforts for medicinal chemist to apply their trade toward viable candidates

  6. bill fitch says:

    It was twenty years ago, today.
    Affymax taught the band to play.
    Isotopically labelled dialkylamine encoded bead based libraries. Pretty sensitive MS detection. Not so many screening successes. Was it the difficulties with the scaffold synthesis? Or the attempts to use these libraries to find hits against undruggable targets?

  7. Chrispy says:

    Like you, Derek, I always thought the DNA was too big to get meaningful hits from an attached small molecule, and the chemistry that DNA permitted was pretty limited. It was like combichem was reborn: “Yeah, all of our compounds are chickenwire with a triazole, but we have MILLIONS of them.” The ability to explore SAR very quickly was intriguing, though, and a presentation I saw years ago by Praecis using SpotFire to “fly” through chemical hit space was awe inspiring. Since this has been around since the nineties, though, it is fair to ask if the technology has delivered on its promise. I am not aware of any real drugs that came from this. Perhaps some of your readers at GSK (which bought Praecis for $55MM in 2006) could enlighten us?

    1. Derek Lowe says:

      My guess is that it didn’t necessarily pay off for GSK – I’m not sure if their first mover advantage was worth anywhere near as much as they paid for it. But the technique itself isn’t that expensive to implement – the opposite, once you get a bit of experience.

    2. real says:

      Preacis had 25 million in cash when purchased for 55. The cost to GSK isn’t as clear as the number you are stating.

    3. bozo says:

      The problem is that you can only get close to billion compound libraries if you use easy chemistry and multiple steps. The products end up looking hideous. I’ve lost count of the number of god-awful weak DEL hits I’ve been shown in program meetings. There have been a few examples that have led somewhere useful and of course these have been milked for PR reasons and hyped out of all proportion.

      You could choose to make libraries that look smaller and more leadlike, but then you don’t need DNA to screen them…

      1. anon44 says:

        If the GSK ELT platform is underproducing, then its a GSK management issue. Your second point makes little sense, ELT is very cheap to implement and run, and libraries in the millions produce a lot of hits.

        1. bozo says:

          My point was fairly clear I thought. It isn’t possible using scaleable chemistry to make libraries of millions of compounds that are both diverse and chemically appealing enough to care about when they come up as hits. You can make nicer looking more druglike libraries using chemistry that’s less scaleable, but then you have fewer compounds and can screen them using other cheaper technologies.

  8. Magrinhopalido says:

    Whoever promised in the mid-90s that combichem was going to make it rain Development Candidates should be properly identified. That claim has made being part of combichem’s history very difficult.

    I once had an interviewer look at my resume and made note of my first employer. He then said “that’s two strikes against you but do your best.” He was kidding – sort of. I got the job anyway.

    What combichem is good at is creating very dense clusters in chemical space that can provide a lot of information via HTS. What we could not do 20 years ago was make huge libraries and we could not properly data-mine the HTS results. That’s possible now and it is truly exciting.

    Development candidates will still not rain from the sky but the HTS results and analysis of a DNA-encoded library can now provide a lot of valuable information to start an LO program.

    1. real says:

      Its going on now for AI. So pull out your notebook and start taking names!

    2. The College Bulldog says:

      Whoever promised in the mid 1990s… Plenty of promises made at Churchill College, Cambridge, 27-31 July 1997:

      1. Derek Lowe says:

        Now those conference notes are truly a blast of 100-proof combichem, just as I remember it at the time!

      2. Anonymous says:

        I think that Ellman rightly deserves credit for the first small molecule combichem library (SPOS of benzodiazepines), but right around the time of that first paper (JACS, 1992), I saw a poster from Panlabs (and spoke to the poster – presenter there). She had a taken a 10×10 tray of scintillation vials, added 10 commercial aldehydes across and then added 10 commercial R-Metals down to get 100 different adducts. Added a little workup water, pipetted out the layers, evap’d and got 100 NMRs. Pretty simple. I think each vial was labeled with a Sharpie, not DNA.

  9. exGlaxoid says:

    The Affymax purchase provided no real hits, a few gadgets that went “bing” with flashing lights, and some shares in Affymextrix, which did provide some financial benefit, which compensated for the lack of any real hits from libraries, many of which could not be screened well, contained only a fraction of the theroetical compounds in them when carefully analyzed, and mostly consisted of triazines.

    Praecis was another mess, as was hiring Mario Geysen , who spent another many millions to find nothing as well. The challenge of drug discovery did not seem to be a lack of hits in screens, but the will to pay to optimize and develop them. We had dozens of projects that had good science but were judged not to be profitable by commercial, including Cialis (given to ICOS for some magic beans) and a few others that were never developed.

    But those pale in comparison to Sirtris, where GSK management paid $720 million for a company whose data had been questioned by GSK scientists for years. That was a much bigger boondoggle, and partly explains why GSK stock is worth half what it used to be worth when the S&P is up 300%. It takes real leadership skills to take the worlds biggest company and destroy most of its value.

    1. Chrispy says:

      Why did GSK purchase Sirtris? It is the go-to example for boneheaded purchases in this industry. It was so ill-conceived and overvalued that at the time it was widely hypothesized that someone was getting a kickback or that there were other drivers. Scientists at GSK told me that I did not understand the depth of contempt that management had for the scientists, and that it was done in part/subconsciously to put them in their place. But I find that hard to believe.

      1. anon1 says:

        My impression as someone around at the time. The rank and file scientists and middle management were fiercely territorial, and would do, say, and think whatever if they thought it in their best interests. Obviously, anything involving an external purchase was not in the best interests of internal research groups. So upper management decided to completely ignore their own internal scientists. Granted, there is a real conflict of interest in having internal scientists judge external purchases. GSK management just got it plain wrong that time though.

        1. Anonymous says:

          Years earlier, Randall Tobias of Eli Lilly decided to purchase PCS (a pharmacy benefit management company) worth ~$200 MM for $4.4 B. ~2 years later, Lilly took a write down of $2 B. ~2 years after that, Lilly sold PCS for $1.5 B. A $3 B loss is a lot stupider than a measly $720 MM. Tobias later joined the Bush Admin State Department. He got caught up in an escort service scandal and resigned due to his hypocritical stance on such issues.

    2. Old says:

      Ah, the requisite Sirtris reference from another disgruntled GSKer who apparently has never kept up on the literature and the Sirtris technology after the acquisition. Thanks Derek, I needed my Sirtris bashing fix.

  10. crashandburn says:

    Any updates on what happened to the main players at Sirtis? Sitting on a tropical isle with their loot? Or still contributing to the Journal of Irreproducible results? Anyone care to chime in…?

  11. SP says:

    Oh god, the abstract illustration in that paper is the 4-cycle triazene library again. Seems like that’s the only structure that’s ever published.
    The problem with the Dr. Evil-ish claims of “100 BILLION compounds” is they have no idea which reactions actually worked, it’s certain that some did not and in combinatorial space that takes out large swathes of the SAR cube. But, they also claim all the inevitable truncation products as further diversity.

  12. anon forever? says:

    Who’s ‘they’?

  13. Hap says:

    Has anything come of Liu’s work? I have a hard time understanding how to use biological tools to not just extract and identify leads but to select for them and use evolution to make better ones, but I understand poorly how that works in practice.

    1. Cato says:

      My understanding of the other versions of this technology outside of the Praecis/GSK DNA recorded DELs, is that they are not practical. Although messy, the GSK style DEL allows you to reuse DNA oligos (the expensive part) and use chemical building blocks right off the shelf. Liu’s DNA templated synthesis requires preparation of individual DNA-chemical conjugates which in the end is rather impractical and inflexible and the libraries are likely quite small in scale. Similarly the DNA evolution technology (Harbury’s DNA sorting) requires lots of upfront construction and a long experiment time (I have heard the DNA sorting step requires a month!). I don’t know about the yocto version, but I imagine it is much of the same. I’ve always speculated that these companies say they do their “own” version of the DEL technology but actually use the DNA recorded libraries to get around patent issues.

      1. bozo says:

        Have you tried sketching out what molecules look like when you use 3 or 4 cycles of off-the-shelf building blocks in standard combichem reactions? They aren’t pretty.

    2. MoMo says:

      DEL smacks of a chemical pyramid Ponzi scheme. But give these scientists and their investors a chance. I asked several CEOs recently about investor composition before I hit them with binding affinity questions of DNA and intra and inter molecular interactions.

      They could not cogently answer.

      1. anon4 says:

        Large amounts of sheared salmon sperm DNA are added to each DEL selection. This removes any molecules that might bind to DNA (intermolecularly). Of course, any intramolecular interactions also lead to those molecules not binding the protein target (if they would have otherwise). Its a very simple question, so I’m not sure why you need a CEO to answer it for you?

  14. WTF says:

    Why was this accepted in a ACS journal?

  15. Walter White says:

    There is a significant amount of debate currently in the DEL field about the total size as well as the “right way” to analyze the large data sets that come from selections.

    You also need to bring up the fact that when you are done with the on-DNA chemistry from cycle 1 to cycle “x” there can be issues with damage to the headpiece from certain conditions. Overall, this translates to a problem of “total useful” sequences that can be read. Of the total millions of chemical entities coded by ELT, there may only be a single digit percent that can be successfully read.

    Kuai et al. also recently published a paper that shows how there can be a significant amount of random noise in libraries of lower size. There is definitely a sweet spot of library size. Too large and you run the risk of not covering all useful sequences. Too small, and the noise can take over. Some individuals in the business try to impose a statistical function or “cutoff” to single out the compounds with high copies. Others will try to also impose a “no target control” to single out compounds that may be promiscuous despite how after 1 round of selection this has absolutely no comparative power.

    DEL technology certainly has potential for discovery, however there needs to be more research into the picking of “what” to make and how the large datasets are analyzed.

Comments are closed.