Skip to main content

Chemical News

Predicting New Reactions

While working on my talk on robotics and artificial intelligence, I was sent a link to this paper (PDF) which I thought was worth a look. It’s from a team at the University of Münster, and what they’re trying to do is look for patterns in the entire corpus of synthetic reactions. They’ve used data from Reaxys to create a database of over eight million binary chemical reactions, representing 14.4 million different species, and they’re trying to use this to predict new reactions.

This high-throughput reaction prediction (HTRP) idea has been tried several times before, and the paper provides a useful review of these. Broadly, these attempts have used either some sort of rule-based expert system framework, attempted to work out a logic or grammar of chemical reactivity to extrapolate with, or used outright machine learning techniques. (These don’t exclude each other, and there have been approaches that mix these together). Success has been mixed – well, very mixed, to be honest. I think it’s fair to say that the times it’s seemed to work, it’s been on areas of limited applicability, and when these ideas have been applied in a more general fashion, they haven’t necessarily worked very well.

This new paper, though, does seem to go further, but it also shows the limits of the current state of the art. As a number of other theoretical approaches to organic chemistry have done, they’re shifting the world of organic chemistry over into a graph-theory problem. From this perspective, discovering new reactions becomes a search for new nodes and edges in the graph. Studies of other sorts of networks gives some techniques for this, but I’m going to long-jump over the math involved (the paper has more than you want). But the underlying reasoning is easy to understand – let’s say you have two molecules, A and B, and both of them are both known to react with a partner C to give some new product in each case. The program will note all these similarities, and searches for cases where compound A reacts with yet another molecule type D. Since A and B have been classed as having a similar reactivity pattern (they both reacted usefully with C), the program hypothesizes that B will do something with D as well, even though no examples like this may be in the database (there are similarity cutoffs in the computational steps to keep too many things from coming out the other end of the process). The system can also predict what it thinks the new product of this reaction would be, based on the A-C reaction products.

They tested this approach by using everything up to 2013 to predict reactions, and the set of reactions published since then to check their results. Looking at 180,000 randomly selected reactions, the predictions were correct about 67% of the time. For the most part, the failed predictions are still reasonable – the examples given include structures where there are two sites of reactivity, and the predicted product was one of the two steps, while the actual reaction goes all the way to both.As you might expect, the accuracy  improves linearly with the size of the graph/data set. Known expert systems approaches, when applied to the same data set, give lower accuracy.

Of course, predicting some of these reactions is not much of a feat. To isolate the more interesting predictions, they removed all the ones that could have been predicted with a rule-based method from the pre-2013 literature, and looked at what was left over. Looking at 13,000 randomly selected reactions from that set, the accuracy was about 35%. You can look at that figure two ways – organic chemists looking to find new reactions may see this as “two-thirds wrong”, and therefore more likely than not to waste their time if they were to use it as a guide for discovery. Others might be more impressed, because even that 35% figure is one that (by definition) can’t be reached by existing rules-based methods.

I can think of several  ways to use these predictions, though. One (as above) would be to look through them for what could be useful transformations that haven’t been described, with an eye to bringing them into practice. You could also keep in mind this line from the paper: “Our model does fail to predict reactions of a molecule in positions where it has not been activated before, or where a mechanistic consideration is necessary to predict the outcome”, and use this as a guide in the other direction, to see what sorts of reactions would break into completely new space were they to be discovered. I wonder if a comparison of the starting graph and the one with the predicted nodes and edges could be compared to see if there are areas of organic synthesis that are unusually rich in predictable (but unknown) transformations, or unusually poor in them. Depending on your point of view, either area might have something to recommend it.

13 comments on “Predicting New Reactions”

  1. Anon says:

    I wonder… Given that the paper probably wouldn’t have been published if the predictions turned out to be false, couldn’t this just be a result of publication bias, reporting a random success? Let’s see if it is *truly* predictive for other reactions not yet reported…

  2. Morten G says:

    There’s a ton of data missing in their training set though. All the reactions that have been shown to not work are absent.

    I skimmed through to the conclusion. They haven’t released their model or set up a website so people could ask the model for suggestions, have they? Wait, they used the Reaxys data so they probably aren’t allowed to do that – Elsevier and all that.

  3. Anon says:

    One big mind-wank

    1. Andy says:

      OBMW…..Band name!!!

  4. Me says:

    Maybe I need to look into it a bit more to get a feel, but I have visions of:-

    dienophiles A + B react with cyclopentadiene.


    Dienophile A reacts with butadiene


    Dienophile B should react with butadiene

    Or is that what is meant by a rule-based prediction that is filtered out? I’m open to these sorts of methods in general, but I haven’t seen anything that comes close to a decent chemist’s eyeballs for these sort of ‘cursory glance, will-it-work’ type decisions. But then again comp chem gets better all the time.

  5. There Will Be Pubs says:

    Can their method predict reactions with blue LEDs? If so, we might have uncovered a wellspring of unlimited Science/Nature papers

    1. Anon says:

      No. You need a lot of grad students for that.

  6. Spartacus says:

    Derek, what do you think about todays Nature paper on Aducanumab? -I just saw that you’ve been quite pessimistic one year ago.

    1. Anon says:

      Biogen’s stock is completely flat, which probably reflects the fact that a) everyone expected it to show efficacy at this stage; and b) everyone expects it to fail later on. Just like every other antibody for AD.

      1. Me says:

        *lol* there’s an antidote to the upbeat reviews the press are handing out around this!

  7. tangent says:

    From the paper, their idea of “reaction prediction” is “predicting the products, catalysts, and reagents — given only a set of reactants.” Is this standard? It strikes me (not in the field) as a weird goal.

    Given only a set of reactants, you could apply a million different reagents, to make different products. Which one of those million is the right answer to predict? Likewise the other condition dimensions. The question they’re solving is “what reaction do you think would be published on these reactants?” I.e. they’re not actually predicting chemical behavior, they’re predicting publication behavior.

    I’d hope to answer a different question: given reactants and conditions, what, if anything, is the product? (And the yield if you don’t mind!) That has a well-defined single answer. And if you could do the forward prediction problem, you can in principle search it to answer “are there any catalysts that would make the product I want?” , or back-solve for other unknowns.

    1. Scott says:

      Like the last time I commented on a big data in chemistry discussion, you gotta ask the right questions.

  8. Gerben van Straaten says:

    How does their method compare to the age-old method of “Throw a grad student at it”? Humans aren’t exactly experts at predicting reactions either and often end up making the wrong predictions.

Comments are closed.