Skip to main content

Chemical News

Retrosynthesis: Here It Comes

Behold the rise of the machines. It’s been going on for a while, but there are landmarks along the way, and we may have just passed another one with the publication of this paper. It’s open-access,  from an interestingly mixed team: the Polish Academy of Science, Northwestern University, the University of Warsaw, the Ulsan Institute in South Korea, and. . .MilliporeSigma. Those who are into scientific computing may have already guessed that the Polish connection is to the Chematica retrosynthesis software. It’s the MilliporeSigma one that makes things of particular interest here (a company that to certain generations of chemists will always be Aldrich or Sigma-Aldrich in their hearts).

I’ll let the summary to the paper lay out the case:

Here, we describe an experiment where the software program Chematica designed syntheses leading to eight commercially valuable and/or medicinally relevant targets; in each case tested, Chematica significantly improved on previous approaches or identified efficient routes to targets for which previous synthetic attempts had failed. These results indicate that now and in the future, chemists can finally benefit from having an “in silico colleague” that constantly learns, never forgets, and will never retire.

All right, then. As advertised, what this paper has done is to pick out six molecules of interest to the MilliporeSigma folks, all chosen because they are of strong commercial interest but had troublesome syntheses (low or inconsistent yields, or failed routes altogether). In addition, the cardiovascular drug dronedarone is on the list because there are numerous process patents detailing routes to its preparation, making this a good reality check for the software, and there is also a natural product (engelheptanoxide C) that has been recently described in the literature but not yet synthesized. The structures of these are shown at right, and the chemists in the crowd will not that this is a perfectly reasonable test: these are real compounds, all the way. Medicinal chemists will note that several of these are hydroxylated metabolites of known drugs, which are valuable reference compounds from a commercial standpoint.

The software was turned loose on all these structures to come up with what it regarded as plausible retrosyntheses, with the starting materials defined as things easily available in the Sigma-Aldrich catalog (naturally). And these routes were put to an interesting real-world test (as suggested by the DARPA funding that went into the project): the routes were put into practice in the lab in the four cases by chemists at MilliporeSigma (an experienced bunch), and in the bottom four cases by students with little or no practice in multistep organic synthesis, just to see if the routes were practicable by less-experienced hands.

The software generated routes in about 20 minutes for each of these. If the top-rated route was sufficiently different from what had been tried before, and if the starting materials were readily available, it was chosen as is. Otherwise, the second-ranked route was used (this happened in three cases). The reactions had to be taken as given, in their general form, although modifying conditions (temperature, solvent, etc.) was permitted. The MilliporeSigma targets were to deliver at least several hundred milligrams within 8 weeks, at 98% purity, while the student syntheses were more like three months but with similar purity.

I won’t go into all the details of the syntheses, since the paper is open-access and you can read them there. But when I will say is that for the four MilliporeSigma targets, the existing routes were substantially improved in all cases. The improvements were of several kinds (shorter routes, fewer chromatography steps, higher yields, more reproducible) and came from several directions (completely different synthetic approaches, different starting materials, etc.) The improvements in the latter four compounds were similar, and in the case of the third one down on the right (a metabolite of lurasidone) the software not only improved the synthesis, but in doing so broke the patented route to the compound. One interesting feature that shows up several times is that the software predicted (based on its knowledge of the literature) some “don’t bother protecting that OH” reactions that some chemists might have worried about trying, but which can be gotten away with.

This is very impressive work. Even discounting for having to do more work on the suggested reactions, which I’m sure was the case, it’s still impressive. One thing to note is that the software may (in those three cases mentioned above) have suggested routes that were very close to the existing (problematic) ones, which highlights the well-known fact that what looks good on the board doesn’t always go so well in the fume hood. This isn’t explicitly addressed in the paper. But overall, this paper is a pretty strong argument for the whole approach.

And from a theoretical standpoint, it seems clear that this is how things are going to go. I recently read Gary Kasparov’s Deep Thinking, about his experience with IBM and its Deep Blue program, and one of the points he makes is that even when he beat an earlier version of the program, he knew that it was going to surpass human chess play. That’s because it was constantly improving, faster than humans do (or can). And working out a retrosynthesis, versus playing chess, is similar enough that the same considerations apply. Chematica (and its competition in the software field) is getting better all the time. More reactions are entered, existing ones are extended and curated more precisely, adjustments are made to the algorithms, the hardware gets more capable.

So the fact that the program – or any such program – does as well as it does here means, folks, that the handwriting is on the wall. Not this afternoon, and not next week, but in the easily foreseeable future retrosynthesis and synthetic organic chemistry planning are going to be taken out of the hands of chemists. At least, that’s how it’s going to seem to us, the chemists of the present. But to future chemists, the ones who will enter the science once this transformation is complete, it won’t seem like that at all. To them, synthesis planning will always have been something that you have machine help with – why would you do it any other way? Who can carry a zillion reaction examples around in their head?

Kasparov mentions the idea of “centaur” chess players, humans aided by software in their analysis of games and positions. We organic chemists have been centaurs for a long time now, considering how much help we get from our machines and instruments, and this is going to be another example. It is certainly different in degree, and may well feel a bit different in kind, but it’s coming no matter what we feel about it. Prepare yourselves.

60 comments on “Retrosynthesis: Here It Comes”

  1. AR says:

    At least we can’t say you didn’t warn us. Dang, really impressive stuff.

    Will this be the new equivalent of ‘why do I have to learn math if I carry a calculator on my phone everywhere’

    1. Frank Adrian says:

      “Will this be the new equivalent of ‘why do I have to learn math if I carry a calculator on my phone everywhere[?]”

      Sadly no – basic math calculations up to 10×10 and the four basic extended algorithms can easily be carried in one’s head. One cannot say that of all of the various chemical reactions and their associated reaction conditions. In this case, we already use the computer as a mnemonic crutch. Using a computer to piece these reactions together is not a stretch – it’s been tried since (at least) the late sixties and it’s about time we’ve started to see the results.

  2. DCE says:

    Yes but can it come up with excuses for why the reaction can’t work before you even try it? What if they don’t visualize well by GC?

    1. ZakuraTech says:

      Thats where actual organic chemists come into play. Knowledge of fragments (or lackthere of) – whether your stuff is going to fly on MS or have good NMR handles is a thing of experience that is poorly articulated in literature.

      However, I think as the synthetic front becomes more computer assisted, more of the ‘state of the art’ will trickle into databases simply to ease operation.

  3. Barry says:

    E.J. Corey first described his work on LHASA in 1969. For some, that was the handwriting on the wall. But computers didn’t yet have enough memory to use the chemical literature as well as a good organiker can. But–even more than advances in computer science–the growth in cheap, fast memory has inexorably tipped that balance.

  4. tt says:

    I was highly skeptical of this idea given just how poorly documented the chemistry literature actually is (for example…it rarely records things that don’t work, which is of equal value in ML). While I’m not really sold that this software offers much immediate value, other than as a way to maybe suggest new ideas which is definitely a step change relative to SciFinder, it’s clear that the writing is on the wall and this will only get better, especially if reaction data and the literature improves in its documentation and reporting (even better with better solubility models in org solvents). In a sense, this approach is what is needed given the sheer explosion in available methodologies for building bonds. No person can really be expected to draw upon that vast literature and apply it easily. What I do find doubtful is that there is any near term chance that the software will suggest reaction conditions with any predictability, i.e. there’s still going to be a lot of tinkering and screening of conditions on the lab, given my aforementioned critique of the published lit. being highly unreliable. Just consider how difficult it is to replicate a reported result for a complex catalytic reaction and obtained the “reported yield”…

    1. tlp says:

      Actually Grzybowski et al studied that, too and found some theoretical limitation on the yield predictability: 65 +/- 5% for reaction yield and 75 +/- 5% for reaction time (doi: 10.1038/s41598-017-02303-0; open access). They link it to the descriptors for chemical structure and properties that we currently have on hand.
      If you look into that paper though (see fig. 6) you’ll see that outcomes of some reactions they mark ‘challenging’ would be nearly impossible for humans to predict, too (maybe except top 0.01 – 0.1% of chemists).
      Published literature might be corrupted but that’s why collaboration with MilliporeSigma is so valuable – having access to their in-house lab notebooks should be a huge boon for ML algorithms.

      1. tt says:

        Thanks. Makes sense that they did that analysis as well. Where one could really ramp up the power of this algorithm is marrying this to automated, parallel data sets for common reactions with replicates for more stat power as human run experiments have way too much noise. Plus the machines capture lots of process data automatically as well as kinetics. In other words…bring on the robots and some training sets.

  5. Curious Wavefunction says:

    As Bill Gates says, we always overestimate progress during the next five years and underestimate it during the next fifty. When Corey’s program came along everyone thought that computer-aided synthesis was a done deal in a few years (the same thing happened with computer-aided drug design in the early 80s). But you needed to have enough data input, computing power and feedbacks to keep on tweaking the algorithms incrementally until you finally got past a tipping point. I agree that the tipping point for computer-aided synthesis seems near now (I would give it about half a decade to a decade to become part of the routine toolkit of synthetic chemistry); at least the kind of tipping point which will signal a significant contribution of computers to synthesis planning. One thing to appreciate is that it took the evolution of many fields – hardware, software, databases at the minimum – to reach that point. Technological development always depends on multiple fields and inventions converging together. All this being said, I still don’t imagine humans being eliminated from the equation anytime soon. Too many idiosyncrasies and unpredictabilities abound in complex molecule synthesis for the process to be one hundred percent or even ninety percent predictive.

  6. Bell4 says:

    An impressive article, and an important technology.

    One obvious question: are the route discussed in this paper the *only* ones considered for this paper, or others tried and discarded when the results from the program either didn’t work or simply didn’t make sense?

  7. tlp says:

    What’s interesting is that Grzybowski’s work is still ‘old school’ rule-based retrosynthesis. He worked on Chematica for a while now.
    At the same time rule-agnostic algorithms (deep learning) seem to catch up with chematica pretty quickly. See Mark Waller’s recent arXive paper (
    They claim to outperform Grzybowski’s algorithms but based on ‘likability’ of retrosynthetic route, not on practical outcome. Yet.

  8. QuantumChemist says:

    The revolution is indeed on its way. Remember the idea of automated synthesis machines? Well this is one of the other big pieces of the jigsaw. Also, what this paper does not consider is the slowly but surely rising tide of highly efficient quantum chemistry methods (DLPNO-CCSD, F12 methods, density fitting, state-of-the-art DFT including double hybrids), more efficient software (including implementations on GPUs) and the explosive growth of machine learning based methods trained on high quality quantum chemistry data (either to upgrade a cheap quantum chemical method or as a standalone model: as good as DFT, as cheap as force fields, runs blazing fast on GPUs).

    When these eventually collide with all the hard work the molecular dynamics community has done over the past 30 years regarding global sampling, using crappy MM models, the issue of solubility calculations will be history.
    It will not be perfect, but good enough to feed into the retrosynthesis engine where reliable experimental data is lacking. Somebody has to write the code though. So it might take 10-20 years.

  9. Steve says:

    I need this program! Where can I buy one?

  10. MoMo says:

    So Sigma is going to allow its customers to retrosynthesize online, PROVIDED you buy their reagents.

    Sounds like a Monopoly to me that should be investigated by the Feds.

    1. Uncle Al says:

      What gives you the right to drive somebody else’s fancy car? If you want a ride, you pay a fee or beg voluntary charity. Demos are advertising. Seduction is not full brothel access.

      Good stuff! If I had to pay, I would not have recreationally shattered their NMR modeler. I reported it, they admitted I skunked them, and perhaps we’ll have better software therefrom. Give a little back when you take.

      1. Some idiot says:

        Hey Uncle, I think you need to up your dose… Almost all of that post made sense!!! ;-$

      2. Precedential Citations says:

        I used to be a ChemDraw beta tester and reported many bugs and problems, often not fixed. I bumped into Rubenstein at the ChemDraw booth at an ACS meeting and reported a major unfixed bug and his immediate reaction was, “No, that function works just fine!” With plenty of demo computers at hand, I showed him the problem (having to do with the centering command) and he apologized and it was fixed in the next release. (Since I couldn’t afford Photoshop, I also requested that CD add some “bio” features very early on (’89, ’90). I was tired of drawing lipid bilayers with squiggly bonds and s-orbital head groups and trying to draw other bio-blobs with chemistry tools. I met with CD people a couple of times. That went on for several years until they added some bio picture tools.)

        An even bigger problem I uncovered was in the CAS database and search algorithms. I was a big (and somewhat expert) user of CASOnLine (the command line version that predated SciFinder. Command line searching can be much more powerful than the GUI.). My “go to” contact at the CAS Help Desk was Amira B. Sometime around 1996, I was doing a lot of protein – peptide searching and not getting the right answers. I had papers in hand that were IN the database. I had published sequences in hand. But the searches were not pulling up a lot of known sequences. My first several phone calls to the CAS Help Desk got staff who dismissed my questions and told me that I was doing something wrong, etc., but they couldn’t find the sequences, either. Finally, I got thru to someone who confirmed that the sequences were cataloged but not being found by the search algorithm. They elevated the problem and it turns out that there was a major programming error. It took CAS a while to fix it. I am of the opinion that there are probably a lot of patents submitted prior to 1997 that claim novel peptides based on the flawed CAS searches. Of course, it’s all moot now (20 year patent expiration).

  11. Mol Biologist says:

    I do agree with you that software like the Chematica retrosynthesis must be advertised. However, you correctly noted it can’t resolve (problematic) routes. Artificial Intelligence systems don’t think exactly like humans, but the algorithms can—and do—play favorites. Yes, we do play favorites and it’s funny to doubt it
    IMO it is priority to overcome these routes and it would not possible without centaur like Solomon Hiller. He is multiple patents holder and an author of works in the field of the chemistry of pyrimidines and aziridines. He also has developed multiple industrial methods for the production of a number of physiologically active preparations.

    1. Truth Teller says:

      I suspect this paper was rejected from every reputable Journal and finally accepted by a pseudo-chemistry Journal looking for click-bait.

      Almost everything about this paper is deceptive and deliberately misleading: based on the reference list and SI, the authors possess the information necessary to correctly inform the reader, but it is instead withheld. This is not a dispassionate scientific study, it is an advertisement. And it would not stand up to scrutiny by a consumer protection agency.

      Example 1: bromodomain inhibitor. Authors use nearly identical route as Ref. 20; this is not a new disconnection, and it lacks the enantioselectivity of Ref. 20. Instead the authors separate enantiomers at end using chiral SFC. Dixon et al. deliberately don’t bring in aniline first in order to enable the enantioselective reaction. Yet in the SI, the authors compare to Dixon’s non-enantioselective route. This makes their route appear superior, even though it’s not, and it obscures the similarity of disconnection to Ref. 20.

      Example 2: “Chematica’s proposal is unique” but “this method has been used to prepare similar hydroxylated thiophene intermediates, this particular side chain has not been reported” ????
      They use the exact same route.

      Example 3: Reordering of steps of one published route, but the authors don’t acknowledge that this route is identical to Scheme 2 in Ref. 25.
      And SM is crazy expensive and only available in mg quantities form Millipore Sigma!
      And yet this is what they highlight! That’s bold. The AZ route starts from a building block available in kg quantities (1 kg @ $7000)

      Almost laughable metric: “30% time savings (45 versus 62 hr for the published protocol)” It’s a MedChem paper, not a GMP synthesis!

      Same thing for Example 4: the “Chematica Route” is actually a Ref. 26 route. I searched ALL uses of the number “26” in the manuscript and could find NO acknowledgement of this fact. This is just a SciFinder hit.

      Example 5: No demonstration of ee in target 34! The authors actually don’t demonstrate that their route even works. This should not have been published if the referees had actually done their job.

      Example 6: I couldn’t find the patent through Espacenet, and I couldn’t find any information on 43, 43′ on SciFinder (only dihydroxyl analog).
      They depict the Patent route as providing a mixture, but it wouldn’t produce a mixture based on the route shown! There is no easy way for readers to verify these claims.

      I fail to see how this program is functionally different than SciFinder.

      1. Barry says:

        That the program identifies an expensive SM seems a trivial failing. I would routinely discard that option, and it would be easy to tell the program to. That the program identifies known routes should be trumpeted as a success. But to identify those routes as novel is a disservice to the argument and an insult to the reader.

      2. tt says:

        Good critiques…Wonder why the reviewers missed these references and errors.

      3. ndn says:

        Truth Teller suspects earlier rejection of this paper by refereed journals. Maybe. But the grapevine reports at least one of the example studies (#7 Dronedarone) was submitted for separate publication by the same group in a much more detailed format. The computer-aided synthesis of “a blockbuster drug” had been advertised in advance in a news article in Chemistry World (August 2016), evidently prematurely. It never appeared, presumably rejected by the (very well-respected) journal as the result of at least one highly critical referee’s report.

      4. trump of chemistry says:

        @Truth Teller: Regarding example 6, the patent does not show or say that a mixture was obtained, instead only a single product is shown. I can easily imagine that in the patented route the first step gives two/multiple isomers, which are separated through crystallization – hence the low yield. I agree with you, the paper deliberately tries to make the chematica route look superior and novel.

  12. luysii says:

    From something written 8 years ago

    Chapter 30 of Clayden, Greeves et. al. concerns retrosynthetic analysis, but what in the world does this have to do with Moliere? Well, he wrote a play called Le Bourgeois Gentilhomme back in 1670 and played the central character, Monsieur Jourdain, himself in its first performance (before king Louis XIV). Jean Baptiste Lully, one of the best composers of the time (Bach hadn’t been born yet) wrote the score for it and also played a role. M. Jourdain was a wealthy bourgeois gentilhomme who wanted to act like those thought better (e.g. the nobility) at the time. So he hired various teachers to teach him fencing, dancing and philosophy. The assembled notables watching the play thought it was a riot (did not the French invent the term, nouveau riche). He was taught the difference between poetry and prose, and was astounded to find that he’d been speaking prose all his life.

    So it is with retrosynthetic analysis and yours truly. Back in ’60 – ’62 we studied the great syntheses that had been done to learn from the masters (notably Woodward). Watching him correctly place 5 asymmetric centers in a 6 membered ring of reserpine was truly inspiring. Even though Corey had just joined the department, the terms retrosynthetic analysis and synthon were nowhere to be found. The term is almost a tautology, no-one would think of synthesizing something by making an even more complicated molecule and then breaking it down to the target. So synthetic chemists have been speaking retrosynthetic analysis from day 1 without knowing it.

    For more —

  13. Antony Grishin says:

    The problem with these kinds of reports is that they are picked up by the media and Derek and extrapolated to represent some sort of “Beginning of the end”. Just like several years ago a big deal was made about a Suzuki machine and years later still nobody has or wants these machines.

    These things are 99% hype and 1% real science. In the present case it looks to be even less than 1% based on the above referee that actually took the time to read and review the work.

    1. Design Monkey says:

      To have at least semi-decent and sensibly priced synthesis software would be useful. Because there is a certain class of senior chemists/project leaders, who propose synthetic schemes of way worse quality.

      Chematica currently might be semi-decent in its brain department, but their licensing style and pricing is highly fcked.

  14. biotechtoreador says:

    I, for one, welcome our new automated overlords.

    1. John Connor says:

      Chematica is skynet

  15. anon says:

    I wonder how this will – in the long term – affect how we assess publications. (Assuming, of course, something resembling the current research paper and peer review process persists!)

    If organic chemists stop filing transformations away in their brains, and outsource this to algorithms, will journal editors and peer reviewers have to do likewise – because they simply won’t have the body of accumulated knowledge to assess the likely utility of a newly reported reaction?

    I don’t know if that would be such a bad thing – I just wonder what knock on effects this will have on the culture.

    1. AR says:

      Same effect Scifinder, reaxys et al have had: Chemists, particular those not developing new methodology (med chem, chemical biology etc) will memorize less and lean on the literature searches more. As a result they will be free to think about other things, probably end up with more cross-disciplinary chemists moving into biology and physics. Not necessarily a bad thing.

      1. Nick K says:

        I disagree: young chemists knowing less than their elders is not a good development.

        1. Hap says:

          Didn’t people have some of the complaints though when people could write information down instead of having to remember everything they needed to know? Knowing more reactions may not be useful, but knowing where to find them and being able to evaluate them critically (because people and computers make mistakes) are likely to be useful. (If I end up on a desert island, most of my chemistry knowledge isn’t likely to be helpful anyway.) This may require having a significant body of experience and knowledge, anyway, but probably less than people before us. It would also likely mean that people would get more knowledge – if they don’t have to remember every reaction, some other knowledge will be valuable (because reaction knowledge will be cheaper), and they’ll remember that.

          How much of the knowledge people have in chemistry can be divorced from its performance? Some presumably can (knowing reaction choices, maybe), but some can’t – that tacit knowledge thing (knowledge hard and expensive to codify) comes into play. How much effect computers will have on our jobs and the performance in chemistry depends on how much tacit knowledge can be stored and made accessible and useful.

          As an irrelevancy, the title of the post makes me think of Bigboy in Pale Horse Coming.

  16. Eniac2020 says:

    Will an idea generated by AI meet the legal requirement for a patentable invention to be non-obvious to one with ordinary skill in the art? A kind of variation on the Turing test.

    1. Dionysius Rex says:

      No, people make inventions, not machines.

      1. Design Monkey says:

        So Dionysius, by you thesis, you wanted to propose, that any synthetic route, that would be automatically churned out by Chematica or similar software, is not an invention, and therefore disqualifies of invention also any human, who produced it with their old style brains? Also , suppose a novel and “not obvious to those skilled in art” synthetic route, printed on paper would be shown to you, without disclosing, if it was produced by human or by software. How would you decide, is it an invention or not? And beside inventions, that your thesis calls for clarification on chess stuff too. If “only people make inventions”, then maybe also “only people play chess” ? What’s the difference?

        1. Dionysius Rex says:

          In principle, any proposed synthesis churned out by the machine would be non-patentable. That does not exclude patentable refinements being made by further experimentation (e.g. solvent mixtures to ease purification).

          In the same way that most biologists are excluded from patent inventorship as their role is in the “reduction to practice”, the same could soon also be true for many medicinal chemists in that the choice of the next compound to make is indeed obvious based on the state of the art, and all the chemist is doing is routine analog-ing based on suggestions from algorithms (lack of human contribution to the conception).

      2. Some idiot says:

        Yes, people make inventions. But if a patent examiner (or a later court examining that patent) sees that a program readily available at that time (including, for example, Scifinder or Reaxys) predicted that route (or, probably more suitably, put it high up on the list of interesting routes), then you can kiss your patent goodbye… Unless you have some _really_ good arguments up your sleeve as to why the predicted pathways don’t mean anything.

        And this is the point in this regards: the more accurate predictions from software become, the harder it will be to show unobviousness…

  17. TotalSyn4thewin says:

    Real Test of the software: Give top tier professors the same novel targets and compare with the software generated routes.

    The legal field is also facing AI entering the workplace.

    Just look at the recent result from a study showing Software (LawGeex) can evaluate legal briefs faster and more accurately than human counterparts.

    Better get into those Tier 1 Grad Schools

  18. mw says:

    After playing with the chematica software for a couple of weeks for evaluation purposes (think large company looking at new tools) i was left unimpressed. For trivial synthetic disconnections like the ones presented in this paper it works pretty well and comes up with useful retrosyntheses. That’s nice but doesn’t really add value as any semi-decent chemist should find them at least as quick. Unfortunately, the program fails completely to crack difficult retrosynthesis problems like challenging heterocycles, regio-selectivity or enantioselectivity issues, things where also experienced chemists could benefit from a little machine help. That said, it’s a good start and will only become more powerful in the future but at the moment is not yet worth investing your money.

  19. Anon says:

    The Skynet Funding Bill is passed. The system goes online August 4th, 1997. Human decisions are removed from [chemical synthesis]. Skynet begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th. In a panic, they try to pull the plug.

  20. Piero says:

    And let’s not forget that we are judging the software by comparison with known routes or routes planned by humans, and still it’s a bit better.
    But what about the cases where no synthesis is yet known and you don’t have any chemist around who can plan it? 20 minutes and bang you have one!
    Frightening, isn’t it?

  21. steve says:

    Such software will only get better. AlphaGO defeated the world’s Go champion, Lee Sedol, after being programmed with millions of moves of past masters. It came up with moves that Sedol said no human would ever think of. AlphaGOZero was only programmed with the rules of Go and taught itself mastery by playing itself and learning. In just three days it had defeated all versions of AlphaGo, and within 40 days it had independently found game principles that had taken humans thousands of years to discover. There is no a priori reason that chemistry will be any different, it is just a matter of time.

    1. Anon says:

      Of course AI works wonders as long as it can get reliable feedback in real time. Not sure there is any way to get such feedback in science without doing an actual experiment, though.

    2. mathguy says:

      Games are very different.

      The question you need to ask is “how far have we gotten in automated theorem proving?”.

  22. Me says:

    Interesting how responses evolve to these. We’ve gone from:

    ‘WOW! Technology!’


    ‘Interesting but useless’


    ‘Management have been trying to use this to get rid of us for years hahaha’


    ‘I disagree with the synthetic routes it comes up with because XYZ’

    Fact is, this concept is finally delivering something worth trashing.

  23. John says:

    “Chematica will never retire, but you will. Sooner than later, apparently. Chematica™️”

  24. John says:

    “The syntheses were planned completely autonomously by Chematica (running on a 64-core machine) within 15–20 min for all targets with the exception of dronedarone for which the search used an older and slower version of the software and was allowed to continue for several hours.”

    Also wait a minute – if its considerably quicker on the new version, why use the old version? Why would results from the new version NOT be shown?

    Which one of these authors is going to drop a slick synthesis of dronedarone sometime soon?…

  25. some guy says:

    There is a hell of a lot more to organic synthesis than coming up with a reaction scheme on a piece of paper. Some people think they are “thought leaders” because they can come up with a paper route. In my experience, coming up with a viable route on paper was not the rate limiting step in organic synthesis. You can give the exact same paper route to 12 different chemists and you’re going to get wildly different yields and purity in that synthesis. And depending on the complexity, half of them will throw in the towel half way through.

    You can input all the notes you want from Bach’s compositions into a computer but it’s never going to sound like Glen Gould.

    All this program is going to do is to lead to more outsourcing and fewer compounds actually being made and tested. Just what the world of medicinal chemistry needs, right?

    1. Wavefunction says:

      I think you are missing the point. It does not need to sound like Glenn Gould. As long as it sounds like *some* version of Bach it’s good enough to be useful. We don’t need automated synthesis to perfectly mimic a human medicinal chemist. We need it to be good enough to have a measurable impact which it most likely will.

  26. itrade says:

    So orthogonal to the chemistry is the interesting issue of DARPA funding and relevance to DARPA priorities.

    Mrksich appears to have a current advisory role in DARPA, perhaps analogous to the former role of Whitesides, with whom Grzybowski did his PhD and postdoc. Mrksich also did a postdoc with Whitesides. All that probably didn’t hurt the funding opportunity.

  27. Retro sinner says:

    I was underwhelmed when I finally got to see Chematica after many false starts. It is not a chemical Harry Potter, there was no magic wand. Improving a med chem synthesis isn’t always challenging, it’s what us process folk do every day to put bread on the table. But it is not designed to be the best synthetic approach so where is the kudos in using it as your benchmark?

    After all the hype, why not go for targets already published using other tools and demonstrate the crushing superiority? Or maybe that is a silly question.

    The rise of the machines will happen but probably not before Friday and maybe not next week either.

  28. Nico says:

    I didn’t know you could cite blogs in scientific papers.

    Provocations aside, as someone mentioned above, I think that the combination of AI with the principles underlying quantum chemistry will provide the fiinal answer to the question “does this molecule, in this solvent, under these conditions of pressure and temperature, react with this other molecule?” Wouldn’t the product of a reaction be trivial once we find the right “force field”? Kinetics and thermodynamics are governed by laws that can be computed given the right initial conditoins and the appropriate algorithms.

    How long before a robot, the size of a lab (or a warehouse), will have the ability to mix reagents and solvents guided by AI developed sytnthesys routes? And how long before this robot will be able to assess the reaction’s outcome by NMR, comparing it with the calculated NMR spectra of the expected product?

    Exciting times

    1. Some idiot says:

      Good point. To me, the main hurdle is a practical one. As a process chemist, it is the physical things that usually give the most unexpected headaches (and typically give the most back once you sort them out). Is this stuff (product/intermediate) going to precipitate out during the course of the reaction? If so, how? Is it a gum/gel/porridge? Can I get it out of the reactor and filter it in less than a few weeks? I think trying to predict this sort of stuff in a practical manner is well beyond anything that can be done now (although I would love to be proved wrong).

      In short, I would hate to be the one in that lab/warehouse with the job of unblocking transfer lines. I would never get the chance to get a cup of coffee…


  29. Larry Fertel says:

    A more valuable AI program would be to predict the best route to make Kg+ amounts of products. Including reasonable work up and/or purification methods. This is, of course, a more difficult mountain to climb. Perhaps in another 50 years??

  30. algorithm98743 says:

    Great post! This work is reminiscent of John Koza’s work in genetic programming. One of his success criteria is that the algorithm recreates or improves an existing patent, or produces a new and patentable result.

    From the primer at:

    In total [219] lists 36 human-competitive results. These include 23 cases where GP has duplicated the functionality of a previously patented invention, infringed a previously patented invention, or created a patentable new invention. Specifically, there are 15 examples where GP has created an entity that either infringes or duplicates the functionality of a previously patented 20 th -century invention, six instances where GP has done the same with respect to an invention patented after January 1, 2000, and two cases where GP has created a patentable new invention. The two new inventions are general-purpose controllers that outperform controllers employing tuning rules that have been in widespread use in industry for most of the 20 th century.

  31. Precedential Citations says:

    Footnote 11 claims “The literature on computational predictions validated in the synthetic practice is very scarce. In our own work, we showed that Chematica was able to suggest several synthetic pathways in which each individual step had been previously confirmed in experimental papers by other groups. …” Gryzbowski and Hendrickson (the SynGen program) were in occasional personal contact and G cites only one early paper by H in footnote 3. However, Hendrickson specifically did incorporate an automated link to literature databases in order to rank and prune the computer generated pathways, based on those published and catalogued literature precedents. There was a setting in the GUI to select a lower limit for the yield for a given recommended step. You could watch the suggested pathways disappear interactively as you increased your acceptable threshold.

    Another recent paper (out of MIT; “Prediction of Organic Reaction Outcomes Using Machine Learning.” Connor W. Coley, Regina Barzilay, Tommi S. Jaakkola, William H. Green, and Klavs F. Jensen. ACS Cent. Sci., 2017, 3 (5), pp 434–443. DOI: 10.1021/acscentsci.7b00064) claims that there is not much work on synthesis design in the forward direction. They cite LHASA and CAMEO (Cameo is partially based on Syngen). They do not cite any work by Hendrickson at all and seem to have overlooked his paper on the FORWARD program for synthesis generation in the forward direction using links to literature databases to validate each step. “A Program for the Forward Generation of Synthetic Routes.”
    Hendrickson and Parks. J Chem Inf Comput Sci, 1992, 32(3), 209-215.

  32. Ted says:

    I spent quite a few years as not only a process chemist, on both an in-house and contract basis. A tool like Chematica would probably have been quite valuable in speeding up the search for orthogonal approaches and ‘gut checking’ more novel approaches.

    We often had to put together bids on fairly short notice that would be something like, “we’ll bid $XXX to spend 5 weeks verifying your 7 step synthesis, generating reference standards of key intermediates and preparing a documentation to support a YYkg plant production run using the best process coming out the tail end.” Implicit in this is our attempts to see around the corner, and figure out where we might be able to omit protecting groups, telescope separate reactions, move problematic reagents to the front end of the synthesis (e.g. residual Pd at the last step can be a pain…). All of that gets built into the bid, but little of it is nailed down up front. Knowing that Chematica has generated 4 other approaches to intermediate 3 and given me two alternative 5 step syntheses helps me bid more aggressively. Seeing 3 alternative syntheses, all of which are 6 or 7 steps, all the same as the ones I already thought of, and all of which require ‘that’ protecting group means I bid more conservatively.

    I think the centaur analogy is an apt one. I wouldn’t expect to type in a synthesis with the one-click shopping option enabled, but I’d consider it a valuable way to probe multivariate solution space in something closer to real time.

    I try to keep an open mind when presented with new tools. There are many I’ve passed on, but none that I regret the time spent evaluating. Except combi-chem…


  33. steve says:

    Some Idiot says, “As a process chemist, it is the physical things that usually give the most unexpected headaches (and typically give the most back once you sort them out). Is this stuff (product/intermediate) going to precipitate out during the course of the reaction? If so, how? Is it a gum/gel/porridge? Can I get it out of the reactor and filter it in less than a few weeks? I think trying to predict this sort of stuff in a practical manner is well beyond anything that can be done now (although I would love to be proved wrong).”

    I think that’s the crux of the matter. I could easily see an AI that has learned the basic rules of chemistry reviewing hundreds (thousands?) of syntheses and learn from what led to ppt/gum/gel/porridge and come up with new rules and insights that have eluded synthetic chemists just as it did in Go. To those who say that Go is “just a game”, the number of legal moves was recently calculated to be 10^(10^48). I doubt that most synthetic schemes have that many potential variants.

  34. Big lez says:

    This work reminds me of an earlier paper: It seems like a similar approach but maybe better that the methods have actually been tested by real chemists, and shown to work. However from the papers conclusion there is one line which stands out:

    “Another important challenge to be solved is stereochemistry. Convincing global approaches for the quantitative prediction of enantiomeric or diastereomeric ratios without recourse to time-consuming quantum-mechanical calculations remain to be reported.”

    From the quick read of the paper Derek links to, it does not seem like they have overcome this. I don’t see how these ML algorithms can get much further without fixing this.

Comments are closed.