Here’s a new paper in Nature on computer-generated synthesis of natural products. More formally, you’d call it retrosynthesis, since the thought process in organic chemistry tends to work backwards when you have a particular target that you’re trying to make: “OK, this part could could be made from something like this. . .and that, you could make by condensing two pieces sort of like these. . .”
You work back to more accessible starting materials, based on the transformations that you know about or can picture being feasible. For more simple molecules, it’s the kind of thing you ask sophomore students to do on one of the higher-point-value questions at the end of the test. But for larger and more complex ones, it can be a great deal of work. The “decision tree” about what pathways to use to build up a tricky structure can be huge, and the relative advantages of each are not always obvious. Some of the things that we chemists do value, though, are brevity (fewer steps are almost always better), high yields in each step (because even 90% yield per step will whittle your material away surprisingly quickly), use of readily available/inexpensive reagents and materials (especially important to industrial chemists, obviously), reproducibility (no one goes in and tries to reproduce a 35-step total synthesis for the heck of it, but if you ran step 26 fourteen times and it only gave you a decent conversion once, that’s bad form), and what we all call “elegance”.
That last one is hard to define, but one aspect of it is the “I didn’t see that coming” factor, where parts of a complex molecule are assembled more quickly and surely than you would have pictured, through some nonobvious path. Compression is another aspect, and that can mean something as simple as doing more than one step in the same flask without having to do the whole work-up-the-reaction-isolate-and-purify-the-product thing every single time. Past that, it’s getting the most out of every chemical step, something like “This hydroxy group is what makes the nucleophile come in from this direction in this step, and in the next one it’s also going to be what sets off the rearrangement that fixes two more chiral centers at the same time” Getting things like that to work ain’t easy. Actually just seeing them in the first place isn’t, either. If you know organic chemistry well enough, the reaction to a great synthesis really does have an aesthetic component (as overplayed as that part is in the writing of some practitioners over the years). It’s like watching a talented writer land the last lines of a poem, managing to finish its point while also suggesting new meanings that apply to the earlier stanzas, and simultaneously making a subtle but deliberate unexpected reference to some other work of art that illuminates the whole poem from a different direction still.
So from all that, you can tell that coming up with such synthetic proposals is (for many organic chemists) something that they see as a unique and central part of their discipline. And that’s why attempts to automate it grate on some people’s nerves. Imagine how Petrarch might have reacted to a brochure for a Sonnet-o-Matic. If your horse is high enough, you can regard such software as an offense against Nature and against your honor, and even if you’re a bit closer to the ground it might occur to you that part of your recognized job is now under some form of assault.
I’ve written about such programs a few times over the years. There are several commercial software packages out there already, and a number of competing approaches. I think it’s fair to say that none of them have taken over the world, but it’s also fair to say that they’re being taken seriously. There are particular advantages to a computational approach to retrosynthesis that are harder to realize with one’s brain: avoiding a thicket of process patents, for example, or not even considering using reagents and starting materials that are not on some particular list. And that’s not even mentioned the difficulties of keeping up with the literature itself – as I’m fond of saying, a retrosynthesis program can learn new chemistry every evening, while most of us can’t keep up that pace.
This latest work is from the people who came up with the Chematica program, and it has some interesting insights into what happened when the authors tried to push the software into more challenging natural product chemical space. There were, they report, many instances where the program knew all the individual steps that could go into such a synthetic route, but still failed to find one. They had to make a number of modifications to make it work more strategically – for example, being willing to admit a step that made things temporarily more complex for a bigger synthetic payoff a step or two later, or looking for opportunities to accomplish more than one chemical step at a time. Not all of these are at that strategic level, I should add – one extension was directly having the software recognize about a hundred useful and well-precedented functional-group interconversions and sequences that had shown up in human-driven total synthesis over the years.
The analogies to chess playing come to mind whether you want them to or not: you’re getting the software to handle the idea of sacrificing a piece to gain better position or better prospects in the end game, or loading it with particular lines of play that have proven useful and forcing it to take those into account. And these analogies work so well because organic synthesis is itself a game, played on a very large board with very complex rules, and with the added complexity of new pieces and moves being discovered from time to time. That’s why we like it so much.
In the end, the authors assembled a set of natural product syntheses from the literature, all of which we can presume to have been worked out by various sorts of humans. And they mixed these with a set (done on broadly similar sorts of molecules) generated by the souped-up version of Chematica. They sent these around to a number of experienced chemists and did a sort of Turing test, asking people if they could tell which routes were from the humans and which were from the machines. You can try the same experiment – start from the beginning of the Supplementary Information file and make your calls. The answer key comes after the syntheses are laid out.
What I can tell you is that no, it appears that the experts couldn’t really tell the difference. And that says something about Chematica, but I fear that it also says something about organic synthesis. None of these syntheses, the known human ones nor the machine-generated ones, are going to trigger a major aesthetic experience for anyone. The natural product structures are fair ones, but they’re generally not complicated enough for something really elegant or surprising to occur. That makes their synthesis, even when performed by humans, a bit more of a mechanical exercise than it would have been at one time. We know a lot more chemistry than we did in R. B. Woodward’s day, and what he often had to invent, we now use as a matter of course. Whole classes of ring systems and functional group combinations have been worked on to the point that we have pretty reasonable ideas of how you might produce them. And while those aren’t always going to work in practice, enough of them will (and there are enough alternatives for the steps that don’t) that the resulting synthesis falls into the “Yeah, sure, why not?” category, rather than “Whoa, look at that”.
No software is yet producing “Whoa, look at that” syntheses. But let’s be honest: most humans aren’t, either. The upper reaches of organic synthesis can still produce such things – and the upper stratum of organic chemists can still produce new and starting routes even to less complex molecules. But seeing machine-generated synthesis coming along in its present form just serves to point out that it’s not so much that the machines are encroaching onto human territory, so much as pointing out that some of the human work has gradually become more mechanical.