Skip to main content

Drug Assays

We Can’t Calculate Our Way Out of This One

Clinical trial failure rates are killing us in this industry. I don’t think there’s much disagreement on that – between the drugs that just didn’t work (wrong target, wrong idea) and the ones that turn out to have unexpected safety problems, we incinerate a lot of money. An earlier, cheaper read on either of those would transform drug research, and people are willing to try all sorts of things to those ends.
One theory on drug safety is that there are particular molecular properties that are more likely to lead to trouble. There have been several correlations proposed between high logP (greasiness) and tox liabilities, multiple aromatic rings and tox, and so on. One rule proposed in 2008 by a group at Pfizer is that clogP >3 and total polar surface area less than 75 square angstroms is a good cutoff – compounds on the other side of it are about 2.5 times more likely to run into trouble. But here’s a paper in MedChemComm that asks if any of this has any validity:

What is the likelihood of real success in avoiding attrition due to toxicity/safety from using such simple metrics? As mentioned in the beginning, toxicity can arise from a wide variety of reasons and through a plethora of complex mechanisms similar to some of the DMPK endpoints that we are still struggling to avoid. In addition to the issue of understanding and predicting actual toxicity, there are other hurdles to overcome when doing this type of historical analysis that are seldom discussed.

The first of these is making sure that you’re looking at the right set of failed projects – that is, ones that really did fail because of unexpected compound-associated tox, and not some other reason (such as unexpected mechanism-based toxicity, which is another issue). Or perhaps a compound could have been good enough to make it on its own under other circumstances, but the competitive situation made it untenable (something else came up with a cleaner profile at about the same time). Then there’s the problem of different safety cutoffs for different therapeutic areas – acceptable tox for a pancreatic cancer drug will not cut it for type II diabetes, for example.
The authors did a thorough study of 130 AstraZeneca development compounds, with enough data to work out all these complications. (This is the sort of thing that can only be done from inside a company’s research effort – you’re never going to have enough information, working from outside). What they found, right off, was that for this set of compounds the Pfizer rule was completely inverted. The compounds on the too-greasy side actually had shown fewer problems (!) The authors looked at the data sets from several different angles, and conclude that the most likely explanation is that the rule is just not universally valid, and depends on the dataset you start with.
The same thing happens when you look at the fraction of sp3 carbons, which is a characteristic (the “Escape From Flatland” paper) that’s also been proposed to correlate with tox liabilities. The AZ set shows no such correlation at all. Their best hypothesis is that this is a likely correlation with pharmacokinetics that has gotten mixed in with a spurious correlation with toxicity (and indeed, the first paper on this trend was only talking about PK). And finally, they go back to an earlier properties-based model published by other workers at AstraZeneca, and find that it, too, doesn’t seem to hold up on the larger, more curated data set. Their-take home message: “. . .it is unlikely that a model of simple physico-chemical descriptors would be predictive in a practical setting.”
Even more worrisome is what happens when you take a look at the last few years of approved drugs and apply such filters to them (emphasis added):

To investigate the potential impact of following simple metric guidelines, a set of recently approved drugs was classified using the 3/75 rule (Table 3). The set included all small molecule drugs approved during 2009–2012 as listed on the ChEMBL website. No significant biases in the distribution of these compounds can be seen from the data presented in Table 3. This pattern was unaffected if we considered only oral drugs (45) or all of the drugs (63). The highest number of drugs ends up in the high ClogP/high TPSA class and the class with the lowest number of drugs is the low ClogP/low TPSA. One could draw the conclusion that using these simplistic approaches as rules will discard the development of many interesting and relevant drugs.

One could indeed. I hadn’t seen this paper myself until the other day – a colleague down the hall brought it to my attention – and I think it deserves wider attention. A lot of drug discovery organizations, particularly the larger ones, use (or are tempted to use) such criteria to rank compounds and candidates, and many of us are personally carrying such things around in our heads. But if these rules aren’t valid – and this work certainly makes it look as if they aren’t – then we should stop pretending as if they are. That throws us back into a world where we have trouble distinguishing troublesome compounds from the good ones, but that, it seems, is the world we’ve been living in all along. We’d be better off if we just admitted it.

25 comments on “We Can’t Calculate Our Way Out of This One”

  1. Anonymous says:

    “One could draw the conclusion that using these simplistic approaches as rules will discard the development of many interesting and relevant drugs.”
    …except that the biggest issue in cost of drug development is not the false negatives, but all the false positives. I.e., working on bad drugs rather than discarding good drugs. So best to have a crude filter that throws out a lot of good stuff than no filter at all.

  2. Anonymous says:

    Clarification from 1 above – provided of course that the filter preferentially filters out bad compounds, even if that preference is only very small.

  3. Anonymous says:


  4. Derek Lowe says:

    Sorry about that – link should be up now. As usual, I get so involved writing the rest of the post that I forget to put the original reference in!

  5. Pete says:

    I like this article and we really need to be questioning the assumptions we make in drug discovery. Unfortunately much of the data analysis (including Flatland) in the druglikeness business gets distracted by significance and doesn’t focus on what’s really important for prediction which is the strengths of the relevant trends and sizes of effects. Data analysis that has been conducted competently can still be irrelevant. These days when I present (I’ve linked a slide deck as the URL for this comment) on dodgy data analysis, I note that if we want people to believe us when we tell them how difficult drug discovery is then we can’t afford to do poor data analysis. Otherwise those whom we’re trying to convince may conclude that we’re simply in a mess of our own making. Given the subject of yesterday’s post, it’s also pointing out that many (most?) analyses of drug-likeness use proprietary data and are not reproducible.

  6. Can’t believe this one had slipped through your grasp – I would have sworn you already wrote about it!
    I hope we see more of this sort of introspection in the future, it is the only way to draw more complete conclusions. Otherwise you are stuck with junk like the papers that try to look at the usefulness of animal toxicology studies, but the only data they have to go on is from the drugs that actually got approved…

  7. MoreMedChem says:

    All of the computational rules are for “risk reduction” based on properties of compounds versus historically good compounds.
    Just as someone shouldn’t invest in the stock market based on historical performance alone, these rules are intended for reducing risk not a substitution for safety testing. The prdeiction methods in no way account for outliers.
    The question at hand is whether the clinical testing requirements have outpaced our ability to make predictions and perform adequate safety testing in pre-clinical models or …. are we just advancing compounds that haven’t had the proper scrutiny (due to a loss of experience or lack of proper testing).
    I am hoping it is the former and better tox. model and more realistic preclinical testing will help. If it is the latter the industry is truly in trouble.

  8. Anon2 says:

    1. Can someone explain the difference in compound associated tox vs. mechanism based tox
    2. What is the incentive for a pre-clinical scientist to ask the questions you are asking here? We obviously need more of that, however we all know that failing a candidate earlier isn’t going to get you a bonus, guarantee you a job, and isn’t something that can be easily placed on a resume. It is much easier for a medical director to claim intuition on what will pass/fail [and be rewarded for it] than for the people reading this forum. Even if one could calculate this, company hierarchy/politics will have many people hesitate.

  9. older chemist says:

    In my view its extremly unsuprising that the rules will not hold when it comes to real drugs as the exposure must be a key driver in tox realted attrition. To make conclusion and beeing supprised that a 1-2mg a drug looks like hell comes across as odd. Even if we had very exact tox models they will be pretty useless if we cant predict human doses and exposure.

  10. Pete says:

    I would argue that for intracellular targets, we can’t even measure exposure. The free (unbound) intracellular concentration is not generally measurable for an arbitrary compound. This is discussed in more detail in the blog post that I have used as the URL for this comment.

  11. simpl says:

    Isn’t there something like a lifetime molecular fingerprint tracker? You know, like musicians can recognise an unknown piece from a composer from its style.
    I’m sure that a group in Basel loved seven-membered rings, and another enjoyed indoles.
    And if molecular design is a question of style, how much is substantial, and how much is personal?

  12. Lulu says:

    During my many years in the industry, I’ve observed that:
    1) it’s much easier to create a model than to create a drug, but
    2) the career rewards to the creators are about the same.
    With those incentives, it’s not surprising we have an abundance of models rather than drugs…

  13. anonao says:

    During my many years in the industry, I’ve observed that:
    1) it’s much easier to synthesis plenty of compounds than to create a drug, but
    2) the career rewards to the creators are about the same.
    With those incentives, it’s not surprising we have an abundance of random compounds rather than drugs…
    (the models that can make you “rich” are more in finance than pharma, at least in research, models in clinical stage are valuable)

  14. Chrispy says:

    Anon2 — compound based vs. mechanism based toxicity I believe in this context to be the same as off-target vs. on-target tox. So a compound may be toxic because it hits something that is not your target, in which case an alternate compound that hits the target cleanly may do the trick. On target tox, though, really spoils the party, since it implies that hitting the target with anything will result in toxicity. I think the p38 inhibitors fall into this category as do the MMP inhibitors — anyone have any other examples?
    People who think they know what a drug looks like should spend some time looking at the structures of metformin, ciclosporine, vancomycin, tecfidera, etc…

  15. Anonymous says:

    May increase chances compounds are successful, but definitely increases the chances they miss compounds as well. Reminds me of my favorite drug discovery quote by Kenny & Montanari, J. Comput Aided Mol Des 2013, 27(1), pp. 1-13:
    “Given that drug discovery would appear to be anything but simple, the simplicity of a drug-likeness model could actually be taken as evidence for its irrelevance to drug discovery.”

  16. anon says:

    Will be interesting to see what, if anything, comes from the NCATS Tox21 challenge:

  17. anon3 says:

    So many people have produced so many rules and trends that I wonder if it’s possible to make a compound apart from H2O that won’t have somebody piously wagging their finger at you. The best way to avoid all risk of failure is not to progress anything, and I think we’ve got a lot more risk-averse than we were when we used to discover lots more drugs.

  18. LeeH says:

    This is a classic case of how not to use predictive models.
    First, most of the Lipinski-like models are designed to weed out compounds that are likely to have bad physicochemical properties that might impact absorption and perhaps distribution properties. Their usefulness in the tox realm is weak. Also, they’re meant to be back-of-the-envelope warnings, something you can more or less see with your eyes, not complex data-mining-based rigorous models.
    Second, the real use of these alerts is to warn the project team of systematic issues (like their chemical series has logPs in the 6-7 range), suggesting that perhaps some of the possible liabilities should be looked into. That is, ignore them at your peril, but after sufficient experimental measurement of the property in question their usefulness is gone. It’s Bayes. Your prior probability that a compound property (absorption, for instance) will be desirable may be low, but after new evidence your posterior probability of success may be OK. The model has effectively changed for you. You’re done with the models. The model is not appropriate for those particular compounds. Proceed to development.
    (And don’t even get me started on up-front domain applicability. This is a no-brainer.)
    But dismissing the utility of predictive models outright is a mistake. Perhaps this sort of thing is why AZ is where they are.

  19. Warren Buffett says:

    Only when the tide goes out do you discover who’s been swimming naked.

  20. TX raven says:

    @ LeeH
    So, you are trying to discover a drug and you need to be warned that you are working with LogP 7 compounds? Does not sound like a lot of thinking is happening…
    It’s been more than a decade sine the Ro5 dialog started…
    @ 17 – love that quote!

  21. anonao says:

    @TX Raven
    Well, PAINS for example have been around for a while but it doesn’t stop people publishing new hits with reactive groups.
    Things take time before getting in the people minds, logP of 7 may sounds to you very high now, but without Ro5 would this still be the case?
    Rules are guidelines and warnings, then people have to be clever enough to interpret and know their limitations.

  22. LeeH says:

    You’d be shocked.
    Perhaps I didn’t make the point, but rules such as RO5 are not just for aggregious property violations, but for subtle ones. If your logP is 4.99, you may not be violating the rule, but you are in danger of having issues with your compounds. Also, people focus on the extremes, but rules such as these are also useful for knowing where to try to drive towards (i.e. the middle of the “allowed” distribution).

  23. DrSnowboard says:

    It will take 10 years before Pfizer get those tablets of stone edited. And another 10 years before the dinosaur sheep of AZ copy their handwriting

Comments are closed.