Skip to Content

The “Dark Matter” of a Compound Collection

An awful lot of drug discovery comes down (sooner or later) to screening compound collections. This has been true for a long time now, and it doesn’t look like it’s going away, either. So with that in mind, what’s in your collection? Did you buy a bunch of stuff from the vendors to fill it out? If so, your chemical matter is (1) biased towards compounds that are easier to synthesize, (2) probably branches out from a relatively limited number of scaffolds and (3) is duplicated in a number of other screening collections as well. Or did you assemble it from inside, on the other hand, building it up from years of your own med-chem efforts? In that case, then, you might have some interesting stuff in there, but it’s surely been shaped by the kinds of targets and projects you’ve worked on. Company A’s collection might be over-weighted towards kinase inhibitors, while Company B’s has relatively big heaps of PDE inhibitors and attempts at GABA ligands, from that big project that just seemed to go on forever. It’s a safe bet that no one thinks that their screening collection is all that it should be.

Or, as a colleague of mine put it recently when talking to a room full of biologists, “The compound library is not a treasure chest that we scoop jewels out of – it’s more like a huge basement, full of some things that could be useful and other stuff that we really should throw away”. The latter category, everyone would likely agree, includes the sorts of structures that seem to show up as hits in every other screen (rhodanines, polyphenols, etc.). But what about the opposite set of compounds? You’d imagine that some of that should-be-tossed material might be compounds that have been screened a number of times over the years and have never hit in anything at all. At what point do you give up on these things?

A new paper in Nature Chemical Biology takes a look at these, which the authors (from Novartis) are calling the “dark chemical matter” (DCM) of a screening collection. (Here’s a “News and Views” look at the paper as well). They’ve gone back over both the Novartis screening deck and the NIH’s Molecular Libraries collection, and picked out just those compound that have been through a wide variety of screening campaigns without ever lighting up an assay (234 Novartis assays and 429 PubChem assays linked to the NIH compound deck). Only 19 targets overlapped in the two sets, and there were (as you’d imagine) a wide variety of protein- and cell-based screens represented. The cutoff was that molecules had to have been screened in at least 100 assays without ever showing biological activity.

This is a really interesting idea. There do seem to be structural motifs that produce (or, in some cases, over-produce) biological activity, so what’s the other end of the scale like? And how many of these are around? The Novartis set had over 800,000 compounds that made the 100-assay cutoff, and about 14% of these were consistently inactive. The authors ran some simulations to see how many such compounds one would expect if activity were randomly distributed among the original set of compounds, and the numbers came out far lower. These aren’t just random losers, in other words – there’s something about them that makes them persistently less likely to show activity. Similarly, there were about 35,000 compounds in the 100-assay set for both databases (Novartis and NIH), and about 2700 of these had never hit, which is a much, much larger overlap than one could possibly expect by chance. (Statistically, a one-sided Fisher’s exact test gave a P value down around ten to the minus one-hundred-sixty-fifth, which no matter what you think of P values, is hard to argue with).

I would have thought that a good number of these were compounds that just plain didn’t work out in cell assays because of membrane permeability issues, although there certainly are cell assays that depend on surface proteins as well. As it turned out, splitting out the compounds based on biochemical versus cell-based assays shows a large overlap, so my concerns (and theirs, probably) were unfounded. Another thing that occurred to the authors was that these compounds probably hadn’t been QCed in quite a while, since they never hit, but a check of both collections showed that there was no difference in compound purity between active and inactive selections from either one. (I might have expected a slight skew to the other direction, actually – in my experience, nasty decomposed compound wells tend to hit even more often than usual, since there are more compounds in there, and some of them are pretty reactive and colorful).

So what do these compounds look like? Greasy bricks of aromatic rings? Nah, those are the compounds that tend to hit, unfortunately. DCM compounds were, on average, more hydrophilic and had fewer aromatic rings in them. The set of DCM compounds that overlapped between the two compound collections (and thus two sets of assays) were even more hydrophilic and lower molecular-weight than ever. There were some substructures that seemed to be enriched in the dark matter of both collections – diketopiperazines, for example, and some aminoalkyl N-methylpyrazoles, among others. None of these structures, I have to say, look odd at all; I don’t think any medicinal chemist would look at them and say “You know what, you could screen that stuff through a hundred assays and never see a damn thing”. Quite the opposite – they look fine. Clustering them in chemical space didn’t show any obvious “dark nebulae” – all the clusters with DCM compounds in them also have active compounds in them (and sometimes these are very similar structures indeed).

Interestingly, the authors were able to go back and look at these DCM compounds after another 34 assays had been run on them. What they found was that while these compounds did indeed have a lower hit rate than average, but these further assays did cut into the overall numbers. Dark-matter compounds, in other words, can hit, it’s just that they tend not to. 88% of the DCM compounds that did hit something in these further 34 assays, by the way, hit only in one of them (and one of them was a 12 nM primary hit). The same lower-than-usual hit rates were found when subsets of active-as-usual compounds were tested and compared to physiochemically similar DCM compounds in phenotypic-style cell assays (reporter-gene arrays, yeast viability and chemogenomics, and so on).

In fact, what these experiments suggested was that if you screen at (say) ten micromolar in such open-ended cell assays, a rather substantial part of the compound collection is going to react, and probably nonspecifically. DCM compounds, on the other hand, give you better signal-to-noise. So a big take-home from this is when you see a compound that’s been kicking around in the collection for a while suddenly hit for the first time, you should definitely pay attention to it – it might well be one of your better leads, especially if you’re running an assay that otherwise has a high false-positive rate:

From these experiments, we concluded that DCM is indeed less active than other compounds under normal high-throughput assay conditions but is not generally biologically inert. Indeed, our experiments supported the hypothesis that dark matter compounds have the potential to be potent hits with little or no target promiscuity and thus could present an opportunity for identifying new leads. Consequently, we recommend their identification and prioritization in screening libraries and in hit follow-up activities.

The authors also make the point that the assays used for this evaluation were almost entirely directed at mammalian targets – if you’re going for antifungals or antibacterials, you might see a different effect. I’d be interested in seeing someone do that sort of analysis, because there’s been speculation for some time that these targets are possibly selecting for types of chemical matter that screening collections don’t tend to have. I’d wonder the same thing about “undruggable” low-hit-rate mammalian targets (protein-protein interactions, transcription factors and so on). What does the DCM collection (as defined here) think of these?

This paper made me think about screening collections from an angle that I really never had before, and I really appreciate the substantial time and effort that went into it. As far as I know, it’s a unique look at these issues, but I hope to see some follow-up on its concepts from other organizations eventually. Good stuff!

22 comments on “The “Dark Matter” of a Compound Collection”

  1. exGlaxoid says:

    It sounds like the compounds that have not yet provided a hit would be excellent, as you know that they don’t hit many other targets. That is the real problem, if a compound was a perfect compound, that only hit one receptor and no others, you would only find one and only one hit for it in one assay only.

  2. John Wayne says:

    I love the ‘treasure chest’ vs ‘basement’ analogy for compound collections; I’m putting that in my toolbox.

  3. Me says:

    Entropic vs thermodynamic binding to targets. Sounds like the dark matter is enriched in the latter, whereas the promiscuous ‘all-green’ cpds are enriched in the former.

  4. PharmaJohn says:

    In my last lab, we definitely seemed to screen the same kinase scaffolds over and over. That and we barrowed scaffolds from abandoned projects. We tried the fragment approach, but it was an even slower process. Nowadays, we check with eMolecules, but where else can I go for relatively novel chemical matters?

  5. bhip says:

    Biologist here- are these kind of structures (i.e., diketopiperazines, aminoalkyl N-methylpyrazoles) commonly used in post-hit SAR development? Do these DCM structures show up in marketed drugs?

  6. Rule (of 5) Breaker says:

    @ exGlaxoid – I agree. I would keep these compounds around as they may yield little off-target toxicity if they should ever hit a target. Full disclosure: I am contrarian, so I like to look where other people don’t.

    @PharmaJohn – You need to make your own. Design novel libraries and make them in-house or have a CRO do it. Seriously, if something is commercially available, then everybody has it. If you are screening against a target nobody else is, then maybe that isn’t a problem, but how often does that happen / how sure are you of that?

  7. Mike says:

    If the “dark chemical matter” is more hydrophilic on average, does that mean that they typically have more hydrogen bond donors/acceptors and therefore have more opportunities for mismatches with the target that prevent binding? That would give you a lower hit rate, but the ones that do hit might turn out to be better leads.

  8. neo says:

    I’d love to see if these self-aggregate. Makes sense that if greasy compounds aggregate, you get enzyme and membrane disruptors (PAINS), perhaps if hydrophilic compounds tightly self-aggregate, you get no biological activity. We could call them Aggregating CHemicals Enzymatically Silent (Aches)

  9. Anon says:

    So the key insight here is that hydrogen bonds are more specific than hydrophobic interactions? I think Linus Pauling told us that a hundred years ago!

  10. Mark Thorson says:

    If you only use mammilian cells to screen, that means nobody has tested them for antibiotic activity, right? The magic bullet for MRSA could be out there and you’d never know it. About all you know is that they probably won’t hurt human cells.

  11. a. nonymaus says:

    I could readily explain some of the cell-based inactivity as compounds that are good metabolic substrates. Overall, this is a very interesting result that makes me wonder how many compounds have been prematurely purged as inactive due to being screened against the wrong targets.

  12. z says:

    I’m filing a complaint on the name.

    Dark matter in physics is used to note the large difference between what our current models and equations says the amount of matter in the universe/galaxy/etc. should be from our observations, and what we currently estimate the amount of matter (that we know of) is in our universe.

    It is “dark” as it isn’t visible, but from what we know, SHOULD be there.

    Same thing with dark energy, but with completely different set of equations/observations/etc. Completely unrelated save the dark part.

    This… is not even related in meaning. This is dark as being nonreactive despite everyone hoping it will be.
    It is interesting, but given that they’ve already acroymned it, I rather they use a name that doesn’t lead to confusion regarding similarities to dark matter.

    1. Kaleberg says:

      Physicists chose the term “dark” because dark matter only reacts with the kinds of matter in the Standard Model gravitationally. It’s lack of reaction makes it “dark”. In that sense, using the term “dark” for compounds that don’t seem to interact is analogous.

  13. CMCguy says:

    I guess although it seems to be an interesting intellectual exercise I am not clear where any significant practical value would be served in most cases by including a DCM classification. What is the average outcome of most HTS efforts typically now? Isn’t it fairly low (<5%?) except in cases were can load with a related series of likely hits hence anything that pops would garner further scrutiny and good candidates will find their way to the top of the pile. The purpose of screening libraries are to generate hits to turn in to drug leads therefore unless there is known or suspect promiscuity, or very limited ability to conduct the assay routinely would seem most teams coming across such a compound would perform appropriate follow-up that often includes other assays where one typically desires non-activity to minimize off target affect issues. Because one rarely knows where to begin against a new target a kitchen sink approach is a good place to start. Uniqueness often can be strong reason for excitement in discovery, particularly from IP realm, however the question rapidly becomes how exploitable is such a lead through ready chemical analogs.

  14. Daniel T says:

    I have had the same idea in the past to screen for the “inactive” compounds as they are the compounds that are going to be more specific for your target if they actually hit.

    One related idea I had was to take your collection of compounds, screen for inactivity in a broad range of test assays, then screen for toxicity in cell lines, and finally in mice before doing any real assays. The idea is to get number of compounds down to a small number so that you can do phenotype assays in whole animals. Also if the compound collection you have are has been well screened for inactivity and lack of toxicity then you should be able to dose with multiple compounds at once or screen for potentiation of existing drugs.

  15. lynn says:

    While I’m sure that most of these compounds HAVE been screened for antibacterial activity [growth inhibition, which can be done in HTS], it may be that some have not. Such a set of small, more hydrophilic, polar, maybe charged, compounds would be good fodder for anti-Gram-negative agents – because they have properties more tuned to entry into Gram-negatives. Also, oral bioavailability is not required for antibacterials, as most tough infections are treated in the hospital with parenteral drugs [where solubility is paramount]. [Novartis DOES do antibacterial discovery].

  16. Bruce Koch says:

    @CMCguy: a 5% hit rate for a HTS screen is frighteningly high. A more typical confirmed hit rate (after removing compounds that react with the target, interfere with the assay, or otherwise cause false positives – and I mean by testing these problems experimentally) is more like 0.1-0.5%.

  17. medchemist says:

    @ PharmaJohn: where can you go for relatively novel chemical matters?
    if you look for NP-derived compounds, i would ask analitycon in germany.

    As for purely synthetic compounds, yet rather novel and 3D, you might find interesting chemotypes from Edelris.

  18. Frootloop says:

    @Me, “Entropic vs thermodynamic binding to targets. Sounds like the dark matter is enriched in the latter, whereas the promiscuous ‘all-green’ cpds are enriched in the former.”

    No, the dark matter just doesn’t bind period. I assume you meant entropic vs enthalpic, but that’s all total baloney anyway IMO… there’s nothing non-specific about entropy-driven binding thermodynamics.

  19. Dr. Manhattan says:

    Mark Thorson: “The magic bullet for MRSA could be out there and you’d never know it.”

    Actually, MRSA is a well served infection in terms of treatment (although you wouldn’t know it from breathless TV reports on it). These is vancomycin, Televancin (lipoglycopeptide with dual mechanism of action), Cubicin (lipopeptide), Zyvox (linezolid-oxazolidinone), ceftaroline (novel cephalosporin works against PBP2a) and Tygacil (new tetracycline). As Lynn points out (Hi, Lynn!) it is the multi drug resistant Gram negatives (Klebsiella, pseudomonas, Acinetobacter) that really need are the organisms that are very challenging and have very few therapeutic options. And yes, these compounds do sound like they lie more in the chemical space for Gram negative compounds.

  20. J. says:

    We published something similar recently about NP and what could go bad: http://pubs.acs.org/doi/full/10.1021/acs.jmedchem.5b01009

Comments are closed.