Skip to Content

Synthetic Chemistry: The Rise of the Algorithms

Here are two papers in Angewandte Chemie on “rewiring” synthetic chemistry. Bartosz Grzybowski and co-workers at Northwestern have been modeling the landscape of synthetic organic chemistry for some time now, looking at how various reactions and families of reactions are connected. Now they’re trying to use that information to design (and redesign) synthetic sequences.
This is a graph theory problem, a rather large graph theory problem, if you assign chemical structures to nods and transformations to the edges connecting them. And it quickly turns into one that is rather computationally demanding, as are all these “find the shortest path” types, but that doesn’t mean that you can’t run through a lot of possibilities and find a lot of things that you couldn’t by eyeballing things. That’s especially true when you add in the price and availability of the starting materials, as the second paper linked above does. If you’re a total synthetic chemist, and you didn’t feel at least a tiny chill running down your back, you probably need to think about the implications of all this again. People have been trying to automate synthetic chemistry planning since the days of E. J. Corey’s LHASA program, but we’re getting closer to the real deal here:

We first consider the optimization of syntheses leading to one specified target molecule. In this case, possible syntheses are examined using a recursive algorithm that back-propagates on the network starting from the target. At the first backward step, the algorithm examines all reactions leading to the target and calculates the minimum cost (given by the cost function discussed above) associated with each of them. This calculation, in turn, depends on the minimum costs of the associated reactants that may be purchased or synthesized. In this way, the cost calculation continues recursively, moving backward from the target until a critical search depth is reached (for algorithm details, see the Supporting Information, Section 2.3). Provided each branch of the synthesis is independent of the others (good approximation for individual targets, not for multiple targets), this algorithm rapidly identifies the synthetic plan which minimizes the cost criterion.

That said, how well does all this work so far? Grzybowski owns a chemical company (ProChimia), so this work examined 51 of its products to see if they could be made easily and/or more cheaply. And it looks like this optimization worked, partly by identifying new routes and partly by sending more of the syntheses through shared starting materials and intermediates. The company seems to have implemented many of the suggestions.
The other paper linked in the first paragraph is a similar exercise, but this time looking for one-pot reaction sequences. They’ve added filters for chemical compatibility of functional groups, reagents, and solvents (miscibility, oxidizing versus reducing conditions, sensitivity to water, acid/base reactions, hydride reagents versus protic conditions, and so on). The program tries to get around these problems, when possible, by changing the order of addition, and can also evaluate its suggestions versus the cost and commercial availability of the reagents involved.

Of course, the true value of any theoretical–chemical algorithm is in experimental validation. In principle, the method can be tested to identify one-pot reactions from among any of the possible 1.8 billion two-step sequences present within the NOC (Network of Organic Chemistry). While our algorithm has already identified over a million (and counting!) possible sequences, such randomly chosen reactions might be of no real-world interest, and so herein we chose to illustrate the performance of the method by “wiring” reaction sequences within classes of compounds that are of popular interest and/or practical importance.

They show a range of reaction sequences involving substituted quinolines and thiophenes, with many combinations of halogenation/amine displacement/Suzuki/Sonogashira reactions. None of these are particularly surprising, but it would have been quite tedious to work out all the possibilities by hand. Looking over the yields (given in the Supporting Information), it appears that in almost every case the one-pot sequences identified by the program are equal to or better than the stepwise yields (sometimes by substantial margins). It doesn’t always work, though:

Having discussed the success cases, it is important to outline the pitfalls of the method. While our algorithm has so far generated over a million structurally diverse one-pot sequences, it is clearly impossible to validate all of them experimentally. Instead, we estimated the likelihood of false-positive predictions by closely inspecting about 500 predicted sequences and cross-checking them against the original research describing the constituent/individual reactions. In few percent of cases, the predicted sequences turned out to be unfeasible because the underlying chemical databases did not report, or reported incorrectly, the key reagents or reaction conditions present in the original reports. This result underscores the need for faithful translation of the literature data into chemical database content. A much less frequent source of errors (only few cases we encountered so far) is the algorithm’s incomplete “knowledge” of the mechanistic details of the reactions to be wired. One illustrative example is included in the Supporting Information, Section 5, where a predicted sequence failed experimentally because of an unforeseen transformation of Lawesson’s reagent into species reactive toward one of the intermediates. We recognize that there is an ongoing need to improve the filters/rules that our algorithm uses; the goal is that such improvements will ultimately render the algorithm on a par with the detailed synthetic knowledge of experienced organic chemists. . .

And you know, I don’t see any reason at all why that can’t happen, or why it won’t. It might be this program, or one of its later versions, or someone else’s software entirely, but I truly don’t see how this technology can fail. Depending on the speed with which that happens, it could transform the way that synthetic chemistry is done. The software is only going to get better – every failed sequence adds to its abilities to avoid that sort of thing next time; every successful one gets a star next to it in the lookup table. Crappy reactions from the literature that don’t actually work will get weeded out. The more it gets used, the more useful it becomes. Even if these papers are presenting the rosiest picture possible, I still think that we’re looking at the future here.
Put all this together with the automated random-reaction-discovery work that I’ve blogged about, and you can picture a very different world, where reactions get discovered, validated, and entered into the synthetic armamentarium with less and less human input. You may not like that world very much – I’m not sure what I think about it myself – but it’s looking more and more likely the be the world we find ourselves in.

29 comments on “Synthetic Chemistry: The Rise of the Algorithms”

  1. Puff the Mutant Dragon says:

    Never send a human to do a machine’s job…

  2. processchemist says:

    I heavyly relied on (reasoned) computational methods to optimize reactions/processes (DOE) in the last few years, and I can tell you that people of little knowledge about DOE approach with the aid of black box style software published opinable papers… in many kind of computational approaches control by a skilled operator is all, to avoid nonsense or obvious results.
    I find interesting the second paper for a couple of reasons: in process chemistry one pot reactions and telescoping are solutions often used (and investigated any time). The examples reported are a bit obvious (you can find tons of one pot or telescoped reactions in OPRD) but here a DOE approach would require many experiments (with discrete parameters for every reactant) and the capability of prevision of the algorythm seems good. It would be nice to see it crunch less simple targets.

  3. HAL 9000 says:

    I’m sorry, Dave, I can’t let you do that reaction.

  4. NoDrugsNoJobs says:

    If one analogized planning a complex synthetic scheme to a chess match where several moves are planned out in advance, then a computer clearly can do quite well. The difference here is the rather huge additional challenge of the underlying assumption/information within each transformation. With chess, the particular piece, its movement and the other pieces and their movement are the only variables and they can be accounted for with 100% accuracy. However, where so much more uncertainty enters in, there is where the art and intuition and personal experience begin to play a role. It seems an interesting idea but unlike chess, will be limited by the quality of information going into it. This means the best reaction program will not beat the best synthetic chemist but will certainly be a powerful tool in his/her aresenal.

  5. Josh says:

    Just plain hysterical!
    Made my day

  6. NCharles says:

    The word ‘repertoire’ comes more to mind for me, but I have to admit that it’s the first time I have every seen the word ‘armamentarium’ used.

  7. ech says:

    These kinds of problems have been of interest to the AI community for a long time, and there are a number of techniques to attack them. Unless you use heristics to narrow the scope, the algorithms are all NP-complete, meaning that they explode computationally as the number of nodes and edges gets large. Fortunately, the computing power now available is available to attack larger and larger versions of these problems. Quantum computers might help some in the long term.
    Even if you have to dedicate a $1000 node in a server farm for a month to optimize a reaction, if it saves quite a bit over the life of a compound, that’s still a win.
    ObGetOffMyLawn Comment: I was reading a paper on performance of an adsorption reaction that talked about how it took a PC a few hours to simulate a 24 hour reaction run, and how this was a really long time. Oh Yeah? I was doing research in the 80s that took five workstations in parallel all night to do one simulation run. (Uphill both ways, in the snow, @ 100 degrees.)

  8. Anonymous says:

    Big yawn; computers can create paint-by-numbers slock art, but they will never evolve into a Dali or Picasso. The same is true with organic synthesis. Need paint-by-numbers organic synthesis turn the chore over to the machines. If you want art leave it to the humans.

  9. JC says:

    I, for one, look forward to our Synthesis Robot Overlords.

  10. AndrewD says:

    @9, JC
    I thought that was Big Pharma managment.

  11. Phil says:

    Following your analogy, industrial syntheses don’t need to be Picassos. In fact, they are usually Thomas Kinkade prints. They would be happy hanging in a dentist’s office. If it gets the job done, perfect.

  12. DCRogers says:

    “This result underscores the need for faithful translation of the literature data into chemical database content.”
    Early retrosynthetic programs suffered mightily from this — the results were only as good as the quality of the retrosynthetic transforms the program knew.
    I recall a quote by someone (Al Long?), something to the effect that only E.J.Corey himself could truly write a good transform. Given his many responsibilities, I doubt he spent much actual time on this!
    (As an aside, the other groundbreaking early retrosynthetic program was from Todd Wipke’s SECS program at UCSC… not sure what the state of that effort is now.)

  13. Tokamak says:

    When everything is automated, even repair of the machines themselves, and nobody has to do anything, how will we, as a society, distribute wealth?

  14. David Formerly Known as a Chemist says:

    This will undoubtedly lead to the Chinese-version of the “In The Pipeline” blog wherein Chinese chemists complain how all their jobs are being taken by low-cost software.

  15. John Wayne says:

    @3 and 14: I laughed out loud twice while reading the comments for this one topic; new record 🙂

  16. Am I Lloyd peptide says:

    Corey blew it by making his program prohibitively expensive and virtually inaccessible to everyone. He ignored the now well-established the fact that the most successful computational techniques are cheap or free. Hopefully the Northwestern group will be cognizant of this fact and will make their program open-source, available for everyone to test and refine.

  17. oldstang says:

    As any process chemist can tell you, the hard part of executing a synthesis is the isolation of products. I don’t forsee a time when software can predict that. You can only go so far when the last step of your procedure is “load the reaction mixture onto the CombiFlash and elute with EtOAc.”

  18. ech says:

    @13: For two differing fictional perspectives on wealth distribution in automated societies, see:
    – Kurt Vonnegut’s “Player Piano”
    – Ian Banks’ “Culture” novel series. Mostly standalone novels, I recommend “Consider Phlebas” as a good place to start

  19. Luddite says:

    One major problem with this is that the reactions that don’t work are generally not published, hence the algorithm can not know about many of the conflicts that will exist. This is a problem for the humans too of course but to a far lesser extent I would guess.

  20. MoMo says:

    This is leading to a waste of time, like combinatorial chemistry did back in the 90’s.
    Why not subject every product of every reaction to every reaction? Same result.
    Now get back to work, all of you.

  21. I, Robot says:

    @12: Hendrickson at Brandeis was another early player in the computer aided organic synthesis (CAOS) game. See
    His program was Syngen (Synthesis generator; 1978ish; same time as Wipke’s SECS). It was retrosynthetic; it linked to a catalog of starting materials to rank availability and estimate and rank cost; it linked to a reaction database to estimate yields for proposed steps and rank them.
    There are or were a bunch of other CAOS programs out there.
    There is also a link at to Webreactions, a nice little reaction database search engine. It’s not SciFinder, but it’s fast, simple and FREE.

  22. Sundowner says:

    Having played with computers for more than 20 years, I love the idea. Being a synthetic chemist, I hate the idea, though it would make my work much easier. Or would put me out for work for the design.
    The main problem I see here is the reliability of the reactions database. Because honestly, everybody knows that a lot of the info contained there is not exactly reliable…

  23. MolecularGeek says:

    Obviously, the wealth will go to the people who know how to make the machines do what we want them to and repair them when something goes wrong. In other words, the geek shall inherit the earth. *duck*

  24. Paul says:

    Shouldn’t it be possible to automate validation of the transformations in the database?

  25. Design Monkey says:

    @17. oldstang
    Isolation in the typical research lab already go that way.
    Sheesh, the young kids nowadays, they don’t have any crystallization skills, and nervously blink, when you suggest them to do a fractional vacuum distillation.

  26. Kaleberg says:

    Over the past three or four decades an awful lot of fields have been rebuilt around a piece of software. Civil engineers got COGO back in the 60s. Circuit designers got SPICE in the 80s. Mathematicians got Mathematica or MATLAB. Mechanical engineers got one of several finite element analysis packages. It’s high time chemists moved into the 20th century.
    So, expect a bunch of really awful software that seems to have a good idea or two but is basically unusable. Then look for a few packages that are sort of useful, but ridiculously expensive. Then come the first actually useful commercial products that aren’t insanely priced and the first almost useful open software versions with crappy databases. If chemistry is like other fields, it’s going to be awful, but in ten or twenty years, this kind of software will be pervasive.

  27. Anonymous says:

    the only software a chemist need is chemdraw

  28. benzyme says:

    I cant believe I am reading such negative posts. Computers are going to make chemistry into information technology. You people think our minds can comprehend literally thousands of distinctive rules ? Computers do that every day, without single mistake. They can also predict yeilds, reaction speeds and equlibria, required temperatures, best catalysts and other usefull things. Synthesis is braching problem, with many routes to the desired product. If you want optimalization – that is to pick the best possible route to the product, you have to go through all routes and compare them with each other. Good luck human brain. And yes, you people who studied organic chemistry and r mechanisms, you wasted A LOT of time. To design new synthesis route, you chemists used some memorized algorithms. Well it turns out, computers are better in algorithms then chemists. But sure, use drawing boards instead LOL

Comments are closed.