Here’s a really interesting overview in J. Med. Chem., looking back on what kinds of reactions get run during drug discovery projects. (Try the extensive Supporting Information, too, since it’s open-access). There have been several efforts like this in the past (and these are detailed in this paper itself), but this one is (ambitiously) mining a massive pile of data from the drug industry that until now hasn’t been approached in this manner: all the patent filings from 1976 to 2015. (The previous summaries had looked either at in-house data or extracted reactions from J. Med. Chem. itself and other journals, as well as public databases like ChEMBL). Overall, they’ve gone through 200,000 patents and patent applications, and extracted 1.3 million unique reactions, which means that many readers of this site must have contributed (as have I). For comparison, the raw data set was over nine million patents and 3.3 million reactions, but that was trimmed down for med-chem focus and redundancy.
Update: here’s an interactive web page with many of the plots from the paper. Enjoy!
This data set is not only larger than previous ones (and potentially a bit closer to the raw material, since far more med-chem is patented than is ever formally published), it’s also broken down into a larger number of reaction types. This allows for some unusual “popularity charts” showing how different transformations have fared over time (see below). But first, some general trends: US patent application data only became available in 2001, and from then until 2010 there was a steep rise in the number of applications in the medicinal chemistry/drug discovery area. That, in turn, fueled a corresponding rise in granted patents. But since 2010, applications have headed back down, and in the last few years, issued patents have been flat. The reason for this is not obvious, at least to me – an echo of the 2007/2008 financial crisis? That seems possible, up to a point, but early-stage medicinal chemistry isn’t exactly tied closely to the financial cycle, and in the last few years of this data set there’s actually been a boom in the startup biopharma business. Worth thinking about.
Turning to the actual chemistry, there seems to have been an increase in types of reactions during the 2000s as well, which has also leveled off a bit in recent years. The paper breaks the reactions down into broad categories, then finer ones. In the big picture, for example, heteroatom alkylation and arylation reactions of all sorts are the biggest category, and have been so across the entire time studied. Their share seems to be declining slightly, but they still have a good lead. Looking at that category in more detail, though, you can tell that there have been quite a few changes. Back in the mid-1970s, N-alkylation via alkyl chlorides and Williamson ether synthesis reactions were the top two in this area, but they’ve both declined pretty steadily since then, with chloro N-arylation and SnAr ether formation rising up.
The number two category is acylation, also holding steady over the years, but that one also shows a lot of turnover in the details. In the mid-70s, the amide Schotten-Baumann reaction was well in the lead, but it’s done nothing but decline since then, with acid/amide couplings (via all the well-known reagents) completely taking over in this area. Interestingly, 1996 seems to have been the real changeover – the two reaction categories were close during the 1980s and early 1990s, but amide couplings have been on an unstoppable tear ever since and now are the majority of all acylations.
The number three reaction category these days is deprotection, which over the years has been totally taken over by N-Boc removals (it used to be more about hydrolyzing esters). The flip side, protecting group formation, can be found in the SI file, and (as you’d expect) Boc formation has also taken there. Interestingly, that whole category used to be ruled by O-acetyl formation, and the roles have just completely reversed over the years.
But deprotection as a category looks to be overtaken by C-C bond formation, which the chemists in the audience – not that there’s anyone else still reading by this point – all knew had to be coming on. Yes, the Rise of the Suzukis is very much apparent in this data set (it would be invalid if it didn’t show that one, to be honest). Aryl bromide couplings are still almost down in the noise in 1990, but completely take over the category as the years go on (you can see Stille-type couplings rising along with them, but then gradually declining). And just a few years behind these, the aryl chlorides come following along. The Sonogashira reaction experiences a huge rise during the 1980s, but it’s been falling since its peak in 1989/1990. It still held on as the number one reaction of this type for several years in the 1990s, but the boronates were heading up the whole time. And what was number one before that? The good old Wittig reaction, actually – number one through most of the 1980s for carbon-carbon bond formation and ruling the category, but facing a relentless decline in market share since then. Interestingly, the 1976 data point shows what is probably the last of a former era of chemistry – the number one C-C bond forming reaction that year was the Knoevenagel.
There’s also an interesting heat map of oxidation reactions, which I won’t try to summarize, except to note that one of its themes is Farewell to Chromium, and another is Hello to Dess-Martin. The SI file has some more breakdowns of this sort, and I found the heterocycle formation one to be pretty lively. That area is much more of a fair fight then going up against the Mighty Suzuki or Boc deprotection – hardly anything ever gets over a 25% share of the whole, and those change over quite a bit themselves, although thiazole formation did rise from 1970s obscurity to become number one in the area for the last ten or fifteen years. Tetrazole formation ruled briefly in the mid-1990s, which I would assume are all the “sartan” angiotensin receptor antagonists, while benzimidazoles peaked in the mid-1980s and are now challenging again. Sharpless-style “click” chemistry triazoles, the catalyzed Huisgen reaction, are also climbing, as you’d imagine, coming from almost nowhere in the early 2000s. The biggest fall from popularity in this group would seem to be pyrroles – they were top 3 back in the 1970s, but crashed and haven’t really recovered (good riddance, as far as I’m concerned – I’m not a fan).
Well, I could go on like this for eight or ten more paragraphs – it’s a really interesting paper for an organic chemist. Why did ketone reductive aminations spike in the early 1980s but level off? And why did thioether formation start declining around 2000? And why did someone blow a whistle and declare that no more Barbier reactions would be used to make drug candidates after about 1980? (In case you’re wondering, the Grignard reaction has really hung in there – it’s never dominated, but it’s never gone away, either, and although we don’t have the data, I’d have to expect that it’s been a solid performer for a century now).
There’s another aspect of the paper’s analysis that’s worth noting, though: reaction yields. Those are stated for about 460,000 reactions in the data set, and what shows up is a not-huge but not-imaginary decrease in median reaction yields over that time. You can imagine several reasons for this – modern assay and analytical techniques being able to work with smaller amounts of product would have to be up there. Running parallel arrays of reactions would have an effect, too – they’re not all going to work great. And those same modern separation techniques can both cut down the high-yielding reactions a bit, due to Inevitable Losses in Chromatography, and at the same time allow for lower-yielding reactions to give forth some pure compounds for testing. The authors even break down yields by reaction type – you’ll find that the consistently highest-yielding reaction of the last 40 years has been acid chloride formation from the free acid, which I can well believe, whereas the relative stinkers of that period have been acylsulfonamide formation, fluorination in general, and the Chan-Lam coupling, and I have no problems whatsoever believing that one, either.
Consistent with the other analyses of this sort, looking at the product structures formed over the years shows an increase in aromatic rings, which since 2005 or so has mostly been driven by heterocycles, and a fairly steady climb in the number of molecules with an obvious “solubilizing” group stapled onto them. Molecular weight climbed steadily over the years, leveling off around 2010, while cLogP peaked around 2005 and has been coming down a bit since then. Interestingly, the number of rotatable bonds was rolling along pretty steadily until the early 1990s and was then completely shot out of the sky, which I would guess reflects the Suzuki Invasion.
There’s a lot more in the paper, and I certainly won’t try to summarize it all. Any synthetic organic chemist should find it interesting reading. I wonder, given yesterday’s post, if an analysis like this a couple of decades from now will find the statistics being moved around by flow-machine-friendly reactions? We shall see. . .