Skip to main content

Chemical Biology

The Boston DNA Encoded Library Conference

As mentioned, I’m attending the first Boston symposium on encoded library platforms today. I’m starting this post, which I’ll update during the day as interesting things come up.

I thought I’d do a quick introduction to the ideas behind this technology, for those who haven’t been following the field. (That link above also links to several blog posts here about it). The idea is simple, but a bit odd-sounding. We spend a lot of time in drug discovery trying to get starting points by screening large libraries of diverse compounds (high-throughput screening). The combichem boom in the 1990s was accompanied by an HTS boom – the two fed off each other, actually. Screening libraries got bigger and bigger, and hopes were high that this would lead to more and more hits. Things didn’t quite work out that way in reality, though, because a lot of the compounds that got made were not too exciting, not too pure, and had properties that were not too druglike. It turned out that the hit rates from the traditional screening libraries were far better, and yielded far better starting points, and the hangover from the excesses of combichem has taken a long time to go away.

As Robert Goodnow, the first speaker at today’s meeting, is noting right now, the encoded library idea goes back to this era. In 1992, a thought-experiment paper was published by Sydney Brenner and Richard Lerner in PNAS on using DNA as a coding technology for a screening library, and that, looking back, is truly the start of the whole idea. As it’s developed, the plan is that you have a stretch of unique DNA attached to every single compound in your screening library, and you use the tools of molecular biology (PCR, sequencing, etc.) to both append these tags and to use them to identify the compounds themselves.

Those molecular biology tools are wildly impressive, by the standards of organic chemistry. Extremely small amounts of DNA can be amplified, and sequenced with great speed and accuracy. That allows you to think of screening libraries beyond the dreams of anything imagined previously. If you think about doing a combinatorial library off some core structure, with several sites to add new structural groups and a good selection of building blocks, things get pretty huge pretty fast. One hundred starting materials, taken through two more one-hundred-building-block expansions, gives you a million compounds in two steps. And not that many companies have legitimate million-compound screening libraries. Goodnow is pointing out that if you wanted to assemble a traditional library of one million separate compounds from scratch, you could easily be looking at hundreds of millions of dollars even with a lowball estimate.

A combinatorial library that size is a different beast, of course, different in chemical diversity first of all (more on this later). But it’s a worthwhile think to think about, even after the 1990s combichem experience. Now, if you just do all this combinatorialing in one flask, you’re going to have quite a mess on your hands, and any given compound (even if all the chemistry works) is going to be present in such tiny quantities that it’s going to be hard to work with and even harder to identify once you’ve screened. But if you make your library using split-and-mix techniques, and add bits of DNA each time to the tags on them that are specific to each intermediate and each step, you will have better chemical success, and (even more importantly) a means to identify any given compound once it shows up as a hit. Each DNA sequence will tell you exactly what compound it’s attached to.

And the libraries that can be assembled in this fashion are rather awe-inspiring. Collections of hundreds of millions of compounds at a time have been reported, and it’s worth remembering that these are far larger than the full set of all compounds that have ever been reported in Chemical Abstracts. So that brings up the two sides of the discussion about DNA-encoded libraries in general. On the one hand, you have the possibility of screening a billion or two compounds, far more than any target may have ever been exposed to. If your million-compound HTS didn’t come up with much, how about running it a thousand times with a thousand different screening collections? The response, though, might be that if these compounds aren’t very interesting or diverse from the start, then running a huge heap more of them is not going to be much of an advance. The truth is probably in between these two poles, and I’ll be writing more on that, updating this post, as today’s conference goes on.

Update 1: I’m listening to David Liu of Harvard talk about this work, which I certainly should have blogged about last year. He has a platform for generating DNA-encoded libraries of macrocycles, and this has produced some very interesting inhibitors of insulin-degrading enzyme. And I can tell you, from personal experience some years ago, that if you screen for IDE inhibitors in a traditional library you don’t find very much.

Update 2: Chris Arico-Muendel of GSK/Praecis has been talking about their efforts. They have, he says, 745 million compounds with cLogP under 4 and molecular weight under 400. (Note: that’s a “design number”, not the real number that are actually in there, but the reality is still huge). They’ve found that they can rank tractability of targets by seeing how they hit against various libraries – the correlation is quite good, although not perfect. But the best hits don’t always come out of the libraries that hit and enrich strongly, interestingly.

Update 3: Stephen Hale of Ensemble makes the case for macrocycles. They’re combining that compound space with DNA encoding, so their compound collections aren’t as wildly gigantic as many others in the field, but they’re in a totally different part of chemical space. One thing they’ve noted is that these compounds tend to have very favorable (this is, slow) off rates, which is an interesting effect.

Update 4: Nils Jakob Vest Hansen of Vipergen has a presentation on finding protein-protein interaction inhibitors via these encoded libraries. And that brings up a good point: this sort of thing is going to find a lot of its examples in the “break glass in case of emergency” category, targets that haven’t yielded anything, but are important enough so that people are willing to try something weird. Every new hit generation technology has to deal with this – combichem did, fragment-based methods as well, and encoded libraries most certainly see mostly tough targets. His company has a different approach than most to encoding the libraries. The GSK/Praecis presentation emphasized the wide variety of reactions that they’ve been able to run with the DNA tags attached, but Vest Hansen says that they’re sticking to much simpler chemistry, and getting the diversity from huge varieties of building blocks instead.

Update 5: Alex Satz of Roche is showing that the company has put an impressive amount of time and effort into this field. He’s crossed the billions-of-compounds-per-tube threshold, and emphasizes that it’s important to be able to run the screen quickly to see if anything worked, so you don’t waste everyone’s time. There are some very nice examples (no structures!) of screens against tough targets as well as more normal kinase-type proteins. There were, for example, 38 hits from a phosphatase phosphodiesterase screen, and 30 of them still had activity when they were resynthesized with no DNA tag (15 below micromolar levels, 15 above, and 8 leftover binding artifacts). He had a quote I enjoyed: “DNA-encoded libraries don’t make tough targets suddenly have good binding sites”.

Update 6: Thomas Franch of Nuevolution says that they’re focused on libraries that are dimers of fragment-sized molecules – they’re worried that if they move on to trimers, etc., their numbers would go up hugely, but that the chemical structures themselves would be more open to the traditional criticisms of combichem-derived compounds. They try to spend the fewest number of atoms possible on the linkers as well. They have, though, made a “completely crazy” library to test their assumptions, no property restrictions on a tetramer-derived sequence that potentially has almost 40 trillion compounds in it (although who knows how many are really there).

Update 7: Matt Clark of X-Chem says (truthfully) that all the filters and property-selection criteria we use are attempts to deal with what he calls “the mind-numbing futility of trying to sample chemical space”. He notes that Aldrich Market Select sells about 9400 carboxylic acids with MW less than 250, and 7500 similarly small amines. The resulting amide library would have 70 million members (as opposed to 98 million compounds in all of Chemical Abstracts). Trying to do that, he points out, would surely get you fired, but DNA-encoded libraries are actually about doing just that kind of experiment. (They’ve actually made the library from the set of primary amines!) One thing I’m getting from his talk (and several of the others) is that when this technique works – and it often does – it really does generate solid, actionable chemical matter, if the library itself was assembled with med-chem eyes on it from the beginning.

Update 8: Dario Neri of the ETH/Philochem is the keynote speaker of the event. He has an antibody company and a DNA-encoded library company, and he’s saying that when you do a phage-display antibody screen against a normal-looking protein, you will basically always get a good binder out of it. DNA-encoded chemical libraries (conceptually very similar) should be the same: you should be able to run such a screen in the confidence that there will be something found (a worthy goal).

He also points out that if you think of a DNA-encoded screen the way that you’d think of an SPR binding assay, then it actually shouldn’t work (and neither should phage-display assays, for that matter). Each individual compound is so far below its Kd that you really shouldn’t be able to get things to bind. No one would run a regular screening assay this way (orders of magnitude below Kd!). But yet they do. He thinks that something more complicated has to be going on, because the simple model just can’t be right – maybe something more akin to affinity chromatography? He’s not speculating, at least in public.

Another interesting point: Neri mentioned that at one point he never could have dreamed of competing with a big drug company in doing HTS. But encoded library technology levels the field a lot – it’s possible, without too much time and money, to generate multimillion-compound libraries. Perhaps, he says, high-throughput screening will become less of a factor, and more emphasis will go to target selection and other steps.

14 comments on “The Boston DNA Encoded Library Conference”

  1. Hap says:

    How do you encode the synthesis and the DNA simultaneously? I didn’t seem to understand it from the reviews, and to pull this off, it seems like you have to be able to perform precise synthesis on very small scale (so that only DNA with specific sequences, or subsequences, presumably in a small region of a sampling plate, has specific chemistry performed on it) or that you have to have reagents attached to DNA (like Liu’s work) so that the DNA sequences encode the synthesis, but that seems likely to lead to pretty small libraries. If it’s possible to answer, what am I missing?

  2. annonie says:

    As alluded to in your comments, Praecis has been a terrible investment for GSK. Lot’s of fun and interesting technology. Very few actual leads that were worked to a Phase 1 candidate in real projects. And with GSK’s commitment in moving more into vaccines and OTC businesses, it’s not clear to me that this “add on” will give much back going forward either. So, will it be “spun off”, sold? Watch this space.

  3. Medchemist says:

    Very useful posts and coverage, Derek. Thanks!

    Two limitations in my opinion:
    1- DNA-encoding strongly limits the chemistries used to create small molecules diversity.
    2-DNA tag might create strong noise to signal ratio as the target might bind to the tag.

    Any insight/comments from the “panel”?

  4. Anon says:

    It’s hard to extrapolate the value of GSK’s DNA technology based on their triazine publications. Let’s keep in mind that’s what’s published is usually what’s least attractive from a drug standpoint. There’s very likely far more value in GSK’s unpublished non-triazine projects.

  5. Anonie says:

    The best part about this post is it shows how little the traditional med chemists understand when you move out of milligram-in-a-pot space. The DNA-templated synthesis work is 10 years old, it’s not hard to understand. We should keep this grain of salt when they bash other emerging technologies that don’t look like their typical heterocycles…

  6. MoMo says:

    Thank heavens that this has come along and will revolutionize HTS and chemical diversity! Were saved!

    Now I can retire from Med Chem and go back to more important scientific endeavors, like Cold Fusion!

  7. SP123 says:

    That last bit is easy- Kd is usually applied assuming one component is in excess. Traditionally that’s always been the small molecule. Here it’s just flipped around, and whatever effective concentration the protein is at (which can be tricky to actually determine if you’re using a resin-immobilized target) is what drives the Kd. So charge some resin with 10uM protein solution and assuming it’s all captured you should find at least 10uM or higher binders.

  8. Adam Shapiro says:

    A limitation of DNA-encoded library screening is the requirement for a resin- immobilized target, generally a protein. This approach can’t be used with cell-based assays or crude extracts.

    1. David Edwards says:

      I’d have thought a more apt analogy was this.

      You have a haystack containing a million needles, all of which are unique. A traditional magnet will simply attract all of them, and not the single one out of that set of a million that you want to extract from the haystack.

      So, you construct a second haystack, and this time, put into it a million needles, each of which has a tag attached, You then fabricate lots of little magnets, each of which is specifically attracted to one or a small number of needles, and let them loose.

      You then use the tag to isolate the needle of interest, and find which of your new, highly specific magnets is attached to it. Then, you test that magnet in the first haystack, and see if it does indeed attach itself to the desired needle.

      I don’t pretend for a moment to be an expert in the field, but knowledge of the scientific method leads me to the above analogy as being probably better.

  9. DrSnowboard says:

    So you build a phenomenally bigger haystack but at least you have a magnet to find the needle, although the needle is also attached to a brick by a piece of string?

  10. Morten G says:

    Two Copenhagen-based companies. Cool.

    Dr Snowboard, I think your metaphor may have been overstretched. A little bit.

  11. annon 2 says:

    Anon: Spoken like someone from GSK communications trying to continue to make something out of nothing. Having spoken with people inside in the know, there have been exactly one (1) compound derived from a Praecis lead that progressed; in GSK’s managements wisdom, that compound also has been shelved, even though it did show promise.

  12. Derek Freyberg says:

    It seems to me that, whether or not you can tag – and hence identify – the particular one of the million compounds you have synthesized (using the example of the OP’s fourth paragraph), you still face the fact that the compounds all possess one huge similarity dictated by the two reactions that made them. I’m not suggesting that the compounds are *just* the million compound version of “methyl, ethyl, propyl, butyl, futile”; but there is certainly an element of that in any multiplexed synthesis method. If the target is decided in advance, perhaps the lack of diversity is not so important; but I don’t see this as a way of finding an “outside the box/off the wall” hit such as another bortezomib, rather a way to find a better whatever.

  13. DrSnowboard says:

    @Morten G: Overstretched metaphor? Of course, a bit like the cool technology.,

Comments are closed.