As mentioned, I’m attending the first Boston symposium on encoded library platforms today. I’m starting this post, which I’ll update during the day as interesting things come up.
I thought I’d do a quick introduction to the ideas behind this technology, for those who haven’t been following the field. (That link above also links to several blog posts here about it). The idea is simple, but a bit odd-sounding. We spend a lot of time in drug discovery trying to get starting points by screening large libraries of diverse compounds (high-throughput screening). The combichem boom in the 1990s was accompanied by an HTS boom – the two fed off each other, actually. Screening libraries got bigger and bigger, and hopes were high that this would lead to more and more hits. Things didn’t quite work out that way in reality, though, because a lot of the compounds that got made were not too exciting, not too pure, and had properties that were not too druglike. It turned out that the hit rates from the traditional screening libraries were far better, and yielded far better starting points, and the hangover from the excesses of combichem has taken a long time to go away.
As Robert Goodnow, the first speaker at today’s meeting, is noting right now, the encoded library idea goes back to this era. In 1992, a thought-experiment paper was published by Sydney Brenner and Richard Lerner in PNAS on using DNA as a coding technology for a screening library, and that, looking back, is truly the start of the whole idea. As it’s developed, the plan is that you have a stretch of unique DNA attached to every single compound in your screening library, and you use the tools of molecular biology (PCR, sequencing, etc.) to both append these tags and to use them to identify the compounds themselves.
Those molecular biology tools are wildly impressive, by the standards of organic chemistry. Extremely small amounts of DNA can be amplified, and sequenced with great speed and accuracy. That allows you to think of screening libraries beyond the dreams of anything imagined previously. If you think about doing a combinatorial library off some core structure, with several sites to add new structural groups and a good selection of building blocks, things get pretty huge pretty fast. One hundred starting materials, taken through two more one-hundred-building-block expansions, gives you a million compounds in two steps. And not that many companies have legitimate million-compound screening libraries. Goodnow is pointing out that if you wanted to assemble a traditional library of one million separate compounds from scratch, you could easily be looking at hundreds of millions of dollars even with a lowball estimate.
A combinatorial library that size is a different beast, of course, different in chemical diversity first of all (more on this later). But it’s a worthwhile think to think about, even after the 1990s combichem experience. Now, if you just do all this combinatorialing in one flask, you’re going to have quite a mess on your hands, and any given compound (even if all the chemistry works) is going to be present in such tiny quantities that it’s going to be hard to work with and even harder to identify once you’ve screened. But if you make your library using split-and-mix techniques, and add bits of DNA each time to the tags on them that are specific to each intermediate and each step, you will have better chemical success, and (even more importantly) a means to identify any given compound once it shows up as a hit. Each DNA sequence will tell you exactly what compound it’s attached to.
And the libraries that can be assembled in this fashion are rather awe-inspiring. Collections of hundreds of millions of compounds at a time have been reported, and it’s worth remembering that these are far larger than the full set of all compounds that have ever been reported in Chemical Abstracts. So that brings up the two sides of the discussion about DNA-encoded libraries in general. On the one hand, you have the possibility of screening a billion or two compounds, far more than any target may have ever been exposed to. If your million-compound HTS didn’t come up with much, how about running it a thousand times with a thousand different screening collections? The response, though, might be that if these compounds aren’t very interesting or diverse from the start, then running a huge heap more of them is not going to be much of an advance. The truth is probably in between these two poles, and I’ll be writing more on that, updating this post, as today’s conference goes on.
Update 1: I’m listening to David Liu of Harvard talk about this work, which I certainly should have blogged about last year. He has a platform for generating DNA-encoded libraries of macrocycles, and this has produced some very interesting inhibitors of insulin-degrading enzyme. And I can tell you, from personal experience some years ago, that if you screen for IDE inhibitors in a traditional library you don’t find very much.
Update 2: Chris Arico-Muendel of GSK/Praecis has been talking about their efforts. They have, he says, 745 million compounds with cLogP under 4 and molecular weight under 400. (Note: that’s a “design number”, not the real number that are actually in there, but the reality is still huge). They’ve found that they can rank tractability of targets by seeing how they hit against various libraries – the correlation is quite good, although not perfect. But the best hits don’t always come out of the libraries that hit and enrich strongly, interestingly.
Update 3: Stephen Hale of Ensemble makes the case for macrocycles. They’re combining that compound space with DNA encoding, so their compound collections aren’t as wildly gigantic as many others in the field, but they’re in a totally different part of chemical space. One thing they’ve noted is that these compounds tend to have very favorable (this is, slow) off rates, which is an interesting effect.
Update 4: Nils Jakob Vest Hansen of Vipergen has a presentation on finding protein-protein interaction inhibitors via these encoded libraries. And that brings up a good point: this sort of thing is going to find a lot of its examples in the “break glass in case of emergency” category, targets that haven’t yielded anything, but are important enough so that people are willing to try something weird. Every new hit generation technology has to deal with this – combichem did, fragment-based methods as well, and encoded libraries most certainly see mostly tough targets. His company has a different approach than most to encoding the libraries. The GSK/Praecis presentation emphasized the wide variety of reactions that they’ve been able to run with the DNA tags attached, but Vest Hansen says that they’re sticking to much simpler chemistry, and getting the diversity from huge varieties of building blocks instead.
Update 5: Alex Satz of Roche is showing that the company has put an impressive amount of time and effort into this field. He’s crossed the billions-of-compounds-per-tube threshold, and emphasizes that it’s important to be able to run the screen quickly to see if anything worked, so you don’t waste everyone’s time. There are some very nice examples (no structures!) of screens against tough targets as well as more normal kinase-type proteins. There were, for example, 38 hits from a
phosphatase phosphodiesterase screen, and 30 of them still had activity when they were resynthesized with no DNA tag (15 below micromolar levels, 15 above, and 8 leftover binding artifacts). He had a quote I enjoyed: “DNA-encoded libraries don’t make tough targets suddenly have good binding sites”.
Update 6: Thomas Franch of Nuevolution says that they’re focused on libraries that are dimers of fragment-sized molecules – they’re worried that if they move on to trimers, etc., their numbers would go up hugely, but that the chemical structures themselves would be more open to the traditional criticisms of combichem-derived compounds. They try to spend the fewest number of atoms possible on the linkers as well. They have, though, made a “completely crazy” library to test their assumptions, no property restrictions on a tetramer-derived sequence that potentially has almost 40 trillion compounds in it (although who knows how many are really there).
Update 7: Matt Clark of X-Chem says (truthfully) that all the filters and property-selection criteria we use are attempts to deal with what he calls “the mind-numbing futility of trying to sample chemical space”. He notes that Aldrich Market Select sells about 9400 carboxylic acids with MW less than 250, and 7500 similarly small amines. The resulting amide library would have 70 million members (as opposed to 98 million compounds in all of Chemical Abstracts). Trying to do that, he points out, would surely get you fired, but DNA-encoded libraries are actually about doing just that kind of experiment. (They’ve actually made the library from the set of primary amines!) One thing I’m getting from his talk (and several of the others) is that when this technique works – and it often does – it really does generate solid, actionable chemical matter, if the library itself was assembled with med-chem eyes on it from the beginning.
Update 8: Dario Neri of the ETH/Philochem is the keynote speaker of the event. He has an antibody company and a DNA-encoded library company, and he’s saying that when you do a phage-display antibody screen against a normal-looking protein, you will basically always get a good binder out of it. DNA-encoded chemical libraries (conceptually very similar) should be the same: you should be able to run such a screen in the confidence that there will be something found (a worthy goal).
He also points out that if you think of a DNA-encoded screen the way that you’d think of an SPR binding assay, then it actually shouldn’t work (and neither should phage-display assays, for that matter). Each individual compound is so far below its Kd that you really shouldn’t be able to get things to bind. No one would run a regular screening assay this way (orders of magnitude below Kd!). But yet they do. He thinks that something more complicated has to be going on, because the simple model just can’t be right – maybe something more akin to affinity chromatography? He’s not speculating, at least in public.
Another interesting point: Neri mentioned that at one point he never could have dreamed of competing with a big drug company in doing HTS. But encoded library technology levels the field a lot – it’s possible, without too much time and money, to generate multimillion-compound libraries. Perhaps, he says, high-throughput screening will become less of a factor, and more emphasis will go to target selection and other steps.