When you screen a collection of compounds against a drug target, you have to be ready to deal with the fact that some fraction – and maybe it’s a large fraction – of the hit compounds are not really hits. They’re false positives of one sort or another. All the major techniques for compound screening generate false positives of one sort or another, and if the target you’re screening against has a low intrinsic hit rate, then almost everything on your hit list is probably in that category.
A very common mechanism underlying these is colloidal aggregation. This has only become appreciated relatively recently, thanks largely to the efforts of Brian Shoichet and his group at UCSF. Basically, the problem is that many compounds will start to pile up with each other in solution, not quite to the point of visibly crashing out in a sticky mess at the bottom of the vial, but enough to sequester your protein target. Taking it out of contention like this under the assay conditions will read out as a hit (lower activity!), so aggregators generally are hard to distinguish, in the first pass, from real hits. There are ways around this, the main one being to add some detergent or another to the assay system, after first screening those to see which ones it can stand. A small amount can make aggregates much less likely to form, and often makes a rather long list of supposed hits vanish into the mist. You can also analyze individual compounds by dynamic light scattering under the assay conditions, a more retail approach.
The really annoying things about aggregating compounds are that (1) there’s no good way to predict them up front, and (2) the whole phenomenon varies according to the assay conditions. Something that ferociously aggregates in Buffer A may be just fine in Buffer B. And as you’d imagine, this is all concentration-dependent as well. A great many compounds will start showing this behavior if you push them to high concentrations, which is one of the many reasons that you want to be really careful with data generated under those conditions, but the big question is whether a particular compound is aggregating at the concentrations where it appears to be active. If those ranges overlap, it doesn’t absolutely prove that the compound is a false positive, but you should at least put some sort of asterisk next to it to remind you that it certainly could be an artifact, and to look into it more closely if you really get interested.
There’s a new paper in J. Med. Chem. from Shoichet and colleagues with some interesting data on all this. They’re making an aggregator database freely available, with thousands of compounds in it, and it includes similarity searching. That needs to be used with some caution, because (for the reasons mentioned above) aggregation can be a slippery phenomenon (the paper reviews some of the past attempts to do this sort of thing, which have been of limited usefulness). Testing this tool prospectively against molecules from the literature, followed by experimental verification, showed that it is indeed pointing in the right direction – the higher the Tanimoto similarity to known aggregators, the more likely a compound is to be one itself, although there are always false positives and false negatives. Indeed, just looking at the data they’ve generated will show how difficult the problem is, since there are very similar compounds that display quite different behavior.
So there’s no substitute for experiment. But the broad strokes of the data are hard to argue with:
Comparing ChEMBL Version 17, representing compounds tested in the medicinal chemistry literature, to ZINC, representing simply purchasable molecules, over 7% of compounds in the medicinal chemistry literature are predicted to aggregate and 1% are known to aggregate, while 0.73% of purchasable molecules are predicted to aggregate. This suggests that aggregators are actually being prioritized as ligands in medicinal chemistry.
. . .since 1995 the prevalence of plausible aggregators has grown by more than 8-fold in the medicinal chemistry literature (we cannot exclude the effects of the different types of chemotypes that have been explored, for new targets, since 1995, in addition to the advent of target-based screening).
I can believe that. It wouldn’t surprise me a bit if one of the factors pushing that number up was the rise in metal-catalyzed coupling chemistry during that period, which tends to generate molecules with less-than-ideal solubility. But this analysis raises some interesting questions. One possibility, and it seems to be a likely one, is that the false-positive nature of aggregating compounds causes them to be pursued as leads more often than they should be, enriching them in the med-chem literature. Another one, and it’s not mutually exclusive by any means, is that the sorts of structural features that cause compounds to aggregate may also have some overlap with things that protein binding sites actually select for. If that’s the case, then we really are never going to be able to get away from aggregating compounds, because they’re too much like the things we want to find.
It would be interesting to look at a large set of diverse compounds that don’t seem to aggregate under any common assay conditions, and see what the hit rate is for them over a good list of high-throughput screening campaigns. Comparing that figure to the other compounds directly wouldn’t be right, since the other hit rates are surely inflated by just these sorts of aggregating false positives, but what I’d like to know is the comparison between “real false-positive adjusted hit rates across a number of assays for the whole compound collection” versus “real hit rates across a number of assays for the set of compounds that are least likely to aggregate”. I’d be willing to bet that the latter rate would be noticeably lower, unfortunately, but I’m not volunteering to run this study, either.