Here’s a really interesting paper from consultants Jack Scannell and Jim Bosley in PLoS ONE, on the productivity crisis in drug discovery. Several things distinguish it: for one, it’s not just another “whither the drug industry” think piece, of which we have plenty already. This one get quantitative, attempting to figure out what the real problems are and to what degree each contribute.
After reviewing the improvements in medicinal chemistry productivity, screening, secondary assays and other drug discovery technologies over the last couple of decades, the authors come to this:
These kinds of improvements should have allowed larger biological and chemical spaces to be searched for therapeutic conjunctions with ever higher reliability and reproducibility, and at lower unit cost. That is, after all, why many of the improvements were funded in the first place. However, in contrast , many results derived with today’s powerful tools appear irreproducible  ; today’s drug candidates are more likely to fail in clinical trials than those in the 1970s  ; R&D costs per drug approved roughly doubled every ~9 years between 1950 and 2010   , with costs dominated by the cost of failures ; and some now even doubt the economic viability of R&D in much of the drug industry  .
The contrasts  between huge gains in input efficiency and quality, on one hand, and a reproducibility crisis and a trend towards uneconomic industrial R&D on the other, are only explicable if powerful headwinds have outweighed the gains , or if many of the “gains” have been illusory   .
Note the copious referencing; this paper is also a great source for what others have had to say about these issues, too (and since it’s in PLoS, it’s open-access). But the heart of the paper is a series of attempts to apply techniques from decision theory/decision analysis to these problems. That doesn’t make for easy reading, but I’m very glad to see the effort made, because it surely wasn’t easy writing, either (the authors themselves advise readers who aren’t decision theory aficionados to skip to the discussion section, and then work back to the methods, but I wonder how many people will follow through to the second part of that advice). Scannell and Bosley mention that concepts from decision theory are widely used at the front end of drug discovery programs (screening) and at the end (clinical trials), but not much in between, and this paper could be described as an attempt to change that. They believe, though, that their results apply not only to drug discovery, but to other situations like it: rare positives in a very large landscape of widely varied potential negatives, with a wide range of tools available to (potentially) narrow things down. Figure 1 in the paper is a key overview of their model; definitely take a look as you go through.
So then, feeling as if you’ve been given permission to do what one normally does anyway (flip to the end section!), what do you find there? There are some key concepts to take in first. One is “predictive validity” (PV), which is what it sounds like: how well does a given assay or filter (screening data, med-chem intuition, tox assay, etc.) correlate with what you’re really wanting to get out of it? As they mention, though, the latter (the answer against the “reference variable”) generally only comes much later in the process. For example, you don’t know until you get deep into clinical trials, if even then, whether your toxicology studies really did steer you right when they pointed to your clinical candidate as likely to be clean. They also use the phrase “predictive model” (PM) to refer to some sort of screening or disease model that’s used as a decision-making point as well. With these terms in mind, here’s a clear takeaway:
Changes in the PV of decision variables that many people working in drug discovery would regard as small and/or unknowable (i.e., a 0.1 absolute change in correlation coefficient versus clinical outcome) can offset large (e.g., 10 fold or greater) changes in brute-force efficiency. Furthermore, the benefits brute-force efficiency decline as the PV of decision variables declines (left hand side of both panels in Fig 4). It is our hypothesis, therefore, that much of the decline in R&D efficiency has been caused by the progressive exhaustion of PMs that are highly predictive of clinical utility in man. These models are abandoned because they yield successful treatments. Research shifts to diseases for which there are poor PMs with low PV . Since these diseases remain uncured, people continue to use bad models for want of anything better. A decline in the average PV of the stock of unexploited screening and disease models (PMs) can offset huge gains in their brute-force power (Fig 4).
Let’s all say “Alzheimer’s!” together, because I can’t think of a better example of a disease where people use crappy models because that’s all they have. This brings to mind Bernard Munos’ advice that (given the state of the field), drug companies would be better off not going after Alzheimer’s at all until we know more about what we’re doing, because the probability of failure is just too high. (He was clearly thinking, qualitatively, along the same lines as Scannell and Bosley here). Munos was particularly referencing his former employer, Eli Lilly, which has been placing a series of huge bets on Alzheimer’s in particular. If this analysis is correct, this may well have been completely the wrong move. I’ve worried myself that even if Lilly manages to “succeed”, that they may well end up with something that doesn’t justify the costs that will surely follow to the health care system, but which will be clamored for by patients and families simply because there’s so little else. (There’s that “well, it’s bad but it’s all we’ve got” phenomenon yet again).
I’m very sympathetic indeed to this argument from this paper, because I’ve long thought that a bad animal model (for example) is much worse than no animal model, and I’m glad to see some quantitative backup for that view. The same principle applies all the way down the process, but the temptation to generate numbers is sometimes just too high, especially if management really wants lots of numbers. So how’s that permeability assay do at predicting which of your compounds will have decent oral absorption? Not so great? Well, at least you got it run on all your compounds. In fact, this paper makes this exact point:
We also suspect that there has been too much enthusiasm for highly reductionist PMs with low PV       . The first wave of industrialized target-based drug discovery has been, in many respects, the embodiment of such reductionism    . The problem is not necessarily reductionism itself. Rather, it may be that good reductionist models have been difficult to produce, identify, and implement  , so there has been a tendency to use bad ones instead; particularly for common diseases, which tend to have weak and/or complex genetic risk factors   . After all, brute-force efficiency metrics are relatively easy to generate, to report up the chain of command, and to manage. The PV of a new screening technology or animal PM, on the other hand, is an educated guess at best. In the practical management of large organisations, what is measureable and concrete can often trump that which is opaque and qualitative , even if that which is opaque and qualitative is much more important in quantitative terms.
Exactly so. It sounds to some managerial ears as if you’re making excuses when you bring up such things, but this (to my mind) is just the way the world is. Caligula used to cry out “There is no cure for the emperor!”, and there’s no cure for the physical world, either, at least until we get better informed about it, which is not a fast process and does not fit well on most Gantt charts. Interestingly, the paper notes that the post-2012 uptick in drug approvals might be due to concentration on rare diseases and cancers that have a strong genetic signature, thus providing models with a much better PV. In general, though, the authors hypothesize that coming up with such models may well be the major rate-limiting step in drug discovery now, part of a steady decline in PV from the heroic era decades ago (which more closely resembled phenotypic screening in humans). This shift, they think, could also have repercussions in academic research as well, and might be one of the main causes for the problems in reproducibility that have been so much in the news in recent years.
As they finish up by saying, we have to realize what the “domains of validity” are for our models. Newtonian physics is a tremendously accurate model until you start looking at very small particles, or around very strong gravitational fields, or at things with speeds approaching that of light. Similarly, in drug discovery, we have areas that where our models (in vitro and in vivo) are fairly predictive and areas where they really aren’t. We all know this, qualitatively, but it’s time for everyone to understand just what a big deal it really is, and how hard it is to overcome. Thinking in these terms could make us value more the data that directly reflect on predictive value and model validity (read the paper for more on this).
In fact, read the paper no matter what. I think that everyone involved in drug discovery should – it’s one of the best things of its kind I’ve seen in a long time.