Here’s a paper on “Avoiding Missed Opportunities” that’s come out in J. Med. Chem. By that, the authors are referring to the common practice, in a drug development project, of setting up criteria that the eventual candidate needs to meet (at least this much potency, PK at least to this level, selectivity against counterscreens X and Y of such-and-such-fold, etc.) They’re emphasizing that this, while reasonable-sounding, has its pitfalls:
However, having chosen a profile of property criteria, we should consider the impact that this choice will have on the decisions made, i.e., the compounds or chemical series chosen for progression. In some cases, the choice of compounds is very sensitive to a specific property criterion or the importance given to it. In these cases, that criterion may artificially distort the direction of the project; if we are not very confident that the “right” criterion has been used, this may lead to valuable opportunities being missed.
They have a point. Especially as a project goes on, it can be difficult to reconstruct some of the thinking that went into some of the cutoffs (or to recall how arbitrary some of them might have been). A totally different group of people might be working on things by that point as well, exacerbating the problem:
In the context of drug discovery, consider a progression criterion for potency that specifies that the IC50 must be less than 10 nM. If the most active member of a chemical series with good ADME and safety characteristics has a potency of 50 nM, would it make sense to reject this series? In a simple case such as this, it may be possible to spot this exception, but with the increasing complexity and diversity of the data used in early drug discovery, these sensitivities may not always be apparent. In addition, as time progresses it can be difficult to remember the details of chemical series explored earlier in a project, and consideration of these sensitivities can reveal alternative directions or backup series, should a project reach an insurmountable issue with its primary series.
When you’re addressing several parameters at once, it can be hard to keep all this in mind. And in the earlier stages of the project, it’s highly unlikely that any compounds will clear all the hurdles, so you have to progress things that have one or more defects, in the hopes that these can be fixed later on. That, in turn, can lead you to under- or over-value the importance of some of the criteria. The ones that were met early on can get less attention, since they were fixed so early, even though they might be more important to the eventual success of the project than the ones that took longer to fix, for example.
As the authors note, there’s also a problem of false quantitation. If you decide that cLogP has to be (say) under 3, is a 2.95 really different from a 3.05? For a calculated property? Almost certainly not. You have to allow these cutoffs some fuzziness, but to some people’s eyes, that’s just a way of making excuses. (And even if you’re OK with it, you still have to draw a line somewhere, eventually). But you also should have some clear idea of how important it is to make that number, which can be a hard question to answer.
The paper suggests “desirability functions” rather than simple cutoffs. That is sort of a weighted variation, where you assign greater and lesser importance to the various criteria and calculate a score accordingly. And this is fine, but you’re still faced with the problem of how you’re going to combine the weighted scores – add them up, or multiply them? If you add them, you run the risk of having other properties be good enough to outweigh a killingly low score on one of them, making a doomed compound look better than it is (especially for compounds that have progressed through more assays!) Multiplying them gets rid of that problem, but allows for too much emphasis to be placed on a relatively poor score in one category. These sorts of problems are one reason that some drug-candidate scoring systems proposed in the past (such as “chemical beauty“, blogged here) have been shown, after more study, to be rather poorly effective.
The main part of the paper proposes a method (sensitivity analysis) to figure out just how sensitive the eventual prioritization is to the various components, which is worth knowing. Clearly, the ones that have more leverage on the final results should get more attention and have more thought put into them, especially as regards uncertainties in the numbers that are being fed into the score. To my mind, this actually gets back to that paper by Scannell and Bosley about the problem of nonpredictive assays. One way for an assay to end up in the nonpredictive category is to have a lot of variability – there’s so much noise in the numbers that you can’t tell what’s going on.(And unfortunately, there are many other ways for an assay to be nonpredictive!) This paper arrives at the same conclusions, that these things can kill even the most well-thought-out scheme for working through a drug development process.
Part of the danger is that such a well-thought-out scheme can be very attractive, and the higher up the managerial ladder you go, the more attractive is probably is. At that level, you aren’t involved in the day-to-day details of any one project, but you are going to be held responsible for how many of them succeed. Under those not-all-that-appealing conditions, you’ll be looking for some way to keep track of everything that will give you an accurate picture but not eat up all of your time. So a scoring/dashboard/overview scheme will sound like just the thing. Moreover, these often break things down into one (or just a few) numbers, for even easier digestion, but the smaller the bite-size pieces that a given metric scheme delivers, the more you should mistrust it. Reality can be quite resistant to being distilled down this way.
There’s a very relevant analogy from the 2007-2008 financial crisis. The people running the firms that managed the big portfolios of bonds and derivatives were quite likely using inappropriately simplified measures for risk and the degree that the various components were correlated. The further you went up the organization, the more likely it was that people just looked at one or two numbers (value-at-risk, the correlation number from the Gaussian copula, what have you) and decided that they had things figured out well enough. They didn’t. But it’s human nature to look for something that’s easier to get a handle on, and to overvalue it once you think you’ve found it, especially in a messy, multifactorial situation like investing or drug discovery.