Skip to Content

Chemical News

Automated Reaction Discovery Gets Smarter

Here’s an interesting paper from the Cronin lab at Glasgow. It’s titled “Controlling an organic synthesis robot with machine learning to search for new reactivity”, and that title alone will make some of the readership here eager to hear more, while sending others fleeing in dismay. It seems difficult to be neutral about such topics, but here we go:

The idea is to find new reactions. There are, broadly speaking, three ways to do that. One is by sheer brainpower (human or software): predicting a reaction where none is known by calling your shot based on chemical intuition, quantum mechanical simulations, etc. The second, at the other end of the scale, is by sheer serendipity: mix things together, over and over, and see what happens. And the third is a hybrid, a sort of directed serendipity that allows for random chance but tries to narrow the search down to promising areas instead of sampling more randomly.

Now, there have been a number of reports of new reaction searching by automated microscale synthesis, which sounds like the second method. But these are really more like the third, hybrid one, since they tend to start in areas (metal-catalyzed reactions, for example) where a good deal of interested reactivity is already taken for granted, and with substrates that are likely to participate. This latest paper is another hybrid approach, but this time, the plan is to evaluate a smaller set of test reactions and see if the system can predict what might happen across the rest of the reaction landscape.

It’s not a particularly high-throughput system – six reactions at a time, 36 a day, although that’s still more than any human could put up with. The robotic system is also hooked up to a flow NMR system and an IR spectrometer. The data from these instruments is used to tell the software “Was this reactive or not?”, that is, “Did something happen?” The model was trained on 72 reactive and non-reactive mixtures (as chosen by an actual human chemist), and the authors say that it was about 80% accurate in predicting reactivity in general after that. Note that the model itself is agnostic about chemical structure or known reactions – all it knows are representations that this is reactant one, that’s reactant two, etc., and that these are binned into broad classes (which we humans would refer to as “aldehydes”, “aromatic amines” and so on).

So now if you hand the system a new set of reactants, it starts out by running some random combinations of those and seeing if anything happens. It goes through a set of these and builds a model of what it thinks the “hot spots” for reactivity in the whole set might be, then runs the new combinations that seem most likely to do something based on that model. The results of these experiments are, naturally, fed back into that model for further refinement.

A test of the system was based on a large literature set of Suzuki couplings. After building a model of reactivity based on those (several thousand reactions), the robot was turned loose on a set of 576 potential reaction combinations (also from the literature, but not in the training set). And it did indeed pick up more reactive combinations first, as evaluated by looking over the known results for the first 100 reactions selected, the second 100, and so on – the fraction of these giving actual products started out high and went down progressively, with the last batch being most things that were predicted not to work (and in fact had not).

Of course, we’ve only been talking about “Does this reaction make a new product”, without going into the details of what these products might be. Suzuki couplings are a lot easier to predict, but what about the set of randomish ordinary reagents? The team went back to the combinations that were predicted to do things and looked, then, at what actually happened. Four new reactions were actually found this way, and one of these is shown at right (two of the others include reaction with DBU’s heterocyclic ring system, interestingly). They have an X-ray of that one, actually, in case you’re wondering.

My take on this is that the software model developed here could be usefully combined with some of the higher-throughput approaches described by other groups. That way, you could potentially set up entire plates of microscale reactions with (presumably) enhanced hit rates for new reactions already built into them. There are bound to be other ways to set up the neural-net prediction models, too – this attempt would suggest that the approach is feasible, at any rate. And of course, you could also imagine a follow-up system that takes some of these new reactions and then does an automated optimization of the conditions (via design-of-experiments or some such approach) if you pick out a reaction of particular interest. And you would also want to take the most robust of these new reactions and make sure that your retrosynthesis software package knows about them, too, right? As one automated system hands its results on to another. . .

20 comments on “Automated Reaction Discovery Gets Smarter”

  1. AR says:

    How on earth does the DMAP get on at the ortho position?

    1. tlp says:

      great example of a reaction that would be unlikely to discover rationally

      1. Anon says:

        If you read inorganic or organometallic journals, they are full of these type of unexpected reactivity.

        1. tlp says:

          yeah but this one doesn’t seem to involve any metals

    2. DanielQ says:

      Could be oxidation of the pyridyl to an N-oxide, followed by reaction with the Michael acceptor and rearrangement, similar to ortho chlorination/acetylation?

  2. Led says:

    The pyridine N from DMAP is adding to the alkyne, followed by some funky rearrangement?

    Perfect reaction mechanism question for grad students on the next cume exam.

    1. Scot says:

      Or Tuesday night group meeting!

    2. Sigmatropic says:

      I was able to draw up a mechanism but flawed mechanism. I got close but could not manage to get the DMAP substituted in the ortho position.
      I started out just thinking the same: DMAP intercepts the alkyne which forms an enolate/allene of sorts. This ‘enolate’ then reacts with nitrosobenzene which in a subsequent step cyclizises by a 4-endo-trig mechanism, which according to baldwin’s rules is not allowed. The immediate product eliminates DMAP and the oxazete product does a 2+2 cycloaddition with another alkyne, which should’nt occur under thermal conditions. That product, undwergoes a retro-4 pi-electrocyclic to give a 1,2-oxazine. This oxazine does a retro-6pi-electrocyclic and the ring opened product reacts with DMAP again and cyclizises to form the final 5-membered hetereocycle. After protonation gives almost the product.
      Don’t you just love paper chemistry.

  3. Anonymous says:

    I would like to suggest another way to possibly find new reactions: correct mistakes in old (pre-NMR) articles. I have a handful of papers from the old-old (not just the plain old) literature that cannot possibly be correct. The experimentals are simple and clear but the structural conclusions cannot possibly be correct (based on modern chemical knowledge of reactivity and stability). I have to admit that I’d have a hard time disproving an old structural assignment based on melting point alone. 🙂 Some of these might be new reactions.

    Paywall on the article: Was the shown reaction of 12 + 1 + 14 (25 C in MeCN) to give 20 proven by isolation and x-ray?

    1. JP says:

      “Paywall on the article: Was the shown reaction of 12 + 1 + 14 (25 C in MeCN) to give 20 proven by isolation and x-ray?”

      Yes and yes (x-ray of the cis isomer). And they have a pretty thorough NMR data set of both isomers too – 1H, 13C, HMBC, HSQC.

    2. CMCguy says:

      I am not sure you would find as many mistakes as you are speculating. I have been amazed about how the correct compounds turned out from following old pre-WWII publications, mainly English and German sources. Although characterization cited was minimal, indeed often just a MP, confirmation with modern tools was obtained plus typically yields reasonable or better than reported since quality of starting materials and improved equipment applied.

      1. Anonymous says:

        I did say “handful” and not boatload full. Here’s one. I was tasked to make a particular compound type that many here would speculate to be relatively unstable, but not impossible to isolate carefully. (It was proposed as the crucial intermediate in a total synthesis.) There are many examples of how unstable the functional groups are that would lead to the unraveling of the target. Yet, in 192x, a simple set of procedures — reflux; acid/base; reflux some more; blah, blah, blah — was claimed to provide crystals of this one-of-a-kind structural type. No one has reported any authentic analogous structure since. My proposal was to (a) repeat the prep and determine what they really made (undergrad project) and (b) add group Z to prevent the unraveling of the desired compound. (You all know examples: replace a labile proton with a CH3 or heteroatom; replace a side chain donor with an acceptor; etc. … without changing the core of the not-yet-proven to be authentic structural type.)

        Aha! One real historical example of a genuine NEW REACTION discovered via a mistaken old-old paper! This was described by Jerry Berson in “Discoveries Missed, Discoveries Made: Creativity, Influence, and Fame in Chemistry” (1992). In the 1890s, Johannes Thiele instructed his student, Walther Albrecht, to react quinone with 2 eq of cyclopentadiene. Thiele was expecting the double-aldol condensation double-fulvene (C16H12, zero O) (wrong!). Albrecht published the double Michael (C16H16O2) (wrong!). In 1928, Diels and Alder reacted quinone with 2 eq and got what is now known as the double Diels-Alder adduct (C16H12O2). (They did not correct Albrecht’s paper either because they did not know about it or they chose to ignore it.) It was not a new reaction discovered by questioning a prior result but, post-hoc, it fits my analysis.

        And I agree with CM about the accuracy of so many pre-NMR structure proofs based on degradations, melting points, derivatizations, etc.. Some were absolutely brilliant. Then again, some took decades, not one minute in the NMR.

  4. anon says:

    Confused on why these are still floating into Nature and being turned into breathless soundbites instead of being published in excellent journals with less fluorish like Ange/JACS.

    Surely Cronin can do anything he wants by now, and the lengthy review process for Science/Nature is exhausting.

    1. John Wayne says:

      I’ll quote my PhD adviser: “Graduate students assume that professors are rewarded for publishing complete and high quality research in reputable journals. Unfortunately, the average dean can add (impact factor) and count (papers) better than they can read.”

  5. Hype says:

    Are they reporting new reactions or new side reactions?

    1. Gardner says:

      What is the difference between a weed and a flower?

  6. fherow says:

    Their Machine learning algorithms do not need extracting chemical information as data,just digitalized data unrelated to Chemistry, like one-hot encode,it is amazing.

  7. Istvan Ujvary says:

    Don’t ask me why but the biomimetic synthesis of Daphniphyllum alkaloids by the Heathcock-gropu has come to mind. As you recall this involves a serendepitiously discovered tetracyclization reaction (a wrongly labeled lecture bottle was the ‘culprit’).
    For details:
    and (from 1996)
    The latter link (“Nature knows best” — really?) contains an interesting statement:
    “Organic chemists who like to design and execute multistep syntheses or complex molecules have the goal of eventually putting themselves out of business. We hope to do this by becoming so proficient at what we do that synthesis becomes a routine task that can be relegated to a well-trained technician, or even a machine. We can fantasize that some 25th century physician, encountering a new disease that requires a certain specific organic molecule, may simply draw the structure of that molecule, complete with stereochemical information, and receive in return a detailed recipe for its synthesis. Better still, the computer might program a robot to actually perform the synthesis and deliver an actual sample of the desired molecule.”
    Was he pessimistic? Do we have to wait centuries?

  8. /a/non says:

    I think I’ve said this on a different post of yours Derek, but the more the loop can be closed on this stuff, the more powerful it is. Computers can crush us puny humans at Go because they can learn from a million games against themselves a day.

    We might never get there for chemistry, but I can imagine a world where your favorite hypothetical retrosynthesis machine is continuously trying out batches of reactions and learning what works while simultaneously trying to get your end product.

    Or maybe company A has a fleet of specialized machines for categories of reactions, and company B pays them to get an AI lead survey of the chemical space around it.

    Exciting times ahead!

    1. Derek Lowe says:

      Yes indeed. The jury is still out on how we’ll best approach these things computationally, since our games are harder than Go. But in the end, I really have no doubt that they’re amenable, at least in many ways, to this approach. . .

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.