Skip to main content

Drug Development

Predicting What Group to Put On Next

Here’s a new paper in J. Med. Chem. on software that tries to implement matched-molecular-pair type analysis. The goal is a recommendation – what R group should I put on next?
Now, any such approach is going to have to deal with this paper from Abbott in 2008. In that one, an analysis of 84,000 compounds across 30 targets strongly suggested that most R-group replacements had, on average, very little effect on potency. That’s not to say that they don’t or can’t affect binding, far from it – just that over a large series, those effects are pretty much a normal distribution centered on zero. There are also analyses that claim the same thing for adding methyl groups – to be sure, there are many dramatic “magic methyl” enhancement examples, but are they balanced out, on the whole, by a similar number of dramatic drop-offs, along with a larger cohort of examples where not much happened at all?
To their credit, the authors of this new paper reference these others right up front. The answer to these earlier papers, most likely, is that when you average across all sorts of binding sites, you’re going to see all sorts of effects. For this to work, you’ve got a far better chance of getting something useful if you’re working inside the same target or assay. Here we get to the nuts and bolts:

The predictive method proposed, Matsy, relies on the hypothesis that a particular matched series tends to have a preferred activity order, for example, that not all six possible orders of [Br, Cl, F] are equally frequent. . .Although a rather straightforward idea, we have been unable to find any quantitative analysis of this question in the literature.

So they go on to provide one, with halogen substituents. There’s not much to be found comparing pairs of halogen compounds head to head, but when you go to the longer series, you find that the order Br > Cl > F > H is by far the most common (and that appears to be just a good old grease effect). The next most common order just swaps the bromine and chlorine, but the third most common is the original order, in reverse. The other end of the distribution is interesting, too – for example, the least most common order is Br > H > F > Cl, which is believable, since it doesn’t make much sense along any property axis.
They go on to do the same sorts of analyses for other matched series, and the question then becomes, if you have such a matched series in your own SAR, what does that order tell you about what to make next? The idea of “SAR transfer” has been explored, and older readers will remember the Topliss tree for picking aromatic substituents (do younger ones?)

The Matsy algorithm may be considered a formalism of aspects of how a medicinal chemist works in practice. Observing a particular trend, a chemist considers what to make next on the basis of chemical intuition, experience with related compounds or targets, and ease of synthesis. The structures suggested by Matsy preserve the core features of molecules while recommending small modifications, a process very much in line with the type of functional group replacement that is common in lead optimization projects. This is in contrast to recommendations from fingerprint-based similarity comparisons where the structural similarity is not always straightforward to rationalize and near-neighbors may look unnatural to a medicinal chemist.

And there’s a key point: prediction and recommendation programs walk a fine line, between “There’s no way I’m going out of my way to make that” and “I didn’t need this program to tell me this”. Sometimes there’s hardly any space between those two territories at all. Where do this program’s recommendations fall? As companies try this out in-house, some people will be finding out. . .

13 comments on “Predicting What Group to Put On Next”

  1. petros says:

    Long time since I’ve seen a Topliss tree mentioned. However, it was a very useful pragmatic approach to exploring substituents in the days when reactions were run in singlicate.
    Interesting to see that this is another AZ study on matched pairs.

  2. bhip says:

    Can’t help noticing that the chemist illustrated in the cartoon in the abstract appears to be of Asian ancestry…even cartoon chemists are been outsourced now….

  3. does not meet journal standards says:

    Pretty sure JMC has a policy of only accepting papers with experimental work. This paper has none. Sure they make a bunch of predictions and check them with chembl, but they used chembl to train. For shame, JMC editors.
    We all know the real story, those rules only apply to the unclean.

  4. Anonymous says:

    Nature has already answered this question: just look at the range of amino acid side chains and add a few of those.

  5. anonymous coward says:

    3: Maybe they have an unspoken policy for publishing only studies with experimental work, but that isn’t what they say:
    They specifically include computational studies as examples in their topics list:

    Substantially novel computational chemistry methods with demonstrated value for the identification, optimization, or target interaction analysis of bioactive molecules.

    Though they say the method has to be validated, they don’t say it has to be done experimentally – for example, it could more compactly account for previous results.

  6. LeeH says:

    People often miss the point of MMPs. They are not going to be indicative of average behavior. On the contrary. They are rare events, occurring way out on the edges of the curve. The question is – can those pairs (which really boil down to a particular molecular transformation) describe a change in structure that can be reapplied to some other compound, thereby changing some property in a favorable way? And is it obvious where on the starting structure it should be applied?

  7. Noel O'Boyle says:

    I’m one of the authors on the above paper. Thanks for the write-up Derek!
    @3: The journal guidelines (see section are fairly transparent on what categories of computational work are accepted. In particular, we made a case that our manuscript was within scope as it fit under “Substantially novel methods along with evidence for utility in medicinal chemistry with significant potential for advancing the field”.
    Regarding the use of ChEMBL, if you are referring to the retrospective test, both the training data and test data were indeed from ChEMBL, but from different time periods (we predicted newer data using older data).

  8. Ex Med Chem says:

    There’s a lot more to subtle group changes than trying to improve potency.
    My experience is that potency on your intended target is the easiest part (i.e usually the hit to lead phase gets you in the ball park potency you want). Its the subtle or sometimes dramatic changes to balance everything else (during lead optimisation), from off target activity, physical properties, PK, metabolism that usually ends in the not so futile Me, Et, Pr, Br, Cl, OR, etc etc.
    I’ve seen many examples where this kind of scan of a position has a found something unpredictable, such as a Cl picking up a strange H-bond type interaction.
    As for “predictive computational models” resulting in lets just make 1 or 2 to test the hypothesis by using some flawed computational random ranking generator, I’d always balance this approach with a hefty dose of empirical med chem.

  9. ScientistSailor says:

    @1 Petros,
    Funny I hadn’t heard about Topliss in years either, however it was mentioned in a talk this morning at the meeting I’m at. So that’s twice in one day. Maybe time to resurrect it?

  10. Piero says:

    An attempt of getting rid of even more “thinking head” chemists in favor of cheap hand labourers from far east?

  11. Anonymous says:

    Another useless paper.

  12. Ex Med Chem says:

    @11 nailed it!!

  13. Noel O'Boyle says:

    @6: But MMP does work quite well for physicochemical properties such as solubility/logP. This follows from the fact that group contribution approaches are widely used for such properties. But as you say, with activities, it’s doesn’t work so well for the reason you and Derek describe.
    @8: Sure, improving potency isn’t everything and may indeed be the easiest part. That’s no reason not to make it easier though. Also, the method is general; we’ve focused on potency as it’s known that the matched pair approach doesn’t work well for that.
    I’m all aginst flawed computational random ranking generators too – we always use the Mersenne Twister.

Comments are closed.