Skip to Content

Drug Assays

The End of Compound Property Optimization Is At Hand

Here’s another Big Retrospective Review of drug pipeline attrition. This sort of effort goes back to the now-famous Rule-of-Five work, and readers will recall the Pfizer roundup of a few years back, followed by an AstraZeneca one (which didn’t always recapitulate the Pfizer pfindings, either). This latest is a joint effort to look at the 2000-2010 pipeline performance of Pfizer, AstraZeneca, Lilly, and GSK all at the same time (using common physical descriptors provided to a third party, Thomson Reuters, to deal with the proprietary nature of the compounds involved). The authors explicitly state they’ve taken on board the criticisms of these papers that have been advanced in the past, so this one is meant to be the current state of the art in the area.
What does the state of the art have to teach us? 812 compounds are in the data set, with their properties, current status, and reasons for failure (if they have indeed failed, and believe me, those four companies did not put eight hundred compounds on the market in that ten-year period). The authors note that there still aren’t enough Phase III compounds to draw as many conclusions as they’d like: 808 had a highest phase described, 422 of those were still preclinical, 231 were in Phase I, 145 in Phase II, 8 were in Phase III and 2 in Phase IV/postmarketing studies. These are, as the authors not, not quite representative figures, compared to industry-wide statistics, and reflect some compounds (including several that went to market) that the participants clearly have left out of their data sets. Considering the importance of the (relatively few) compounds in the late stages, this is enough to make a person wonder about how well conclusions from the remaining data set hold up, but at least something can be said about earlier attrition rates (where that effect is diluted).
605 of the compounds in the set were listed as terminated projects, and 40% of those were chalked up to preclinical tox problems. Second highest, at 20% was (and I quote) “rationalization of company portfolios”. I divide that category, myself, into two subcategories: “We had to save money, and threw this overboard” and “We realized that we never should have been doing this at all”. The two are not mutually exclusive. As the paper puts it:

. . .these results imply that substantial resources are invested in research and development across the industry into compounds that are ultimately simply not desired or cannot be progressed for other reasons (for example, agreed divestiture as part of a merger or acquisition). In addition, these results suggest that frequent strategy changes are a significant contributor to lack of research and development success.

You think? Maybe putting some numbers on this will hammer the point home to some of the remaining people who need to understand it. One can always hope. At any rate, when you analyze the compounds by their physiochemical properties, you find that pretty much all of them are within the accepted ranges. In other words, the lessons of all those earlier papers have been taken on board (and in many cases, were part of med-chem practice even before all the publications). It’s very hard to draw any conclusions about progression versus physical properties from this data set, because the physical properties just don’t very all that much. The authors make a try at it, but admit that the error bars overlap, which means that I’m not even going to bother.
What if you take the set of compounds that were explicitly marked down as failing due to tox, and compare those to the others? No differences in molecular weight, no differences in cLogP, no differences in cLogD, and no differences in polar surface area. I mean no differences, really – it’s just solid overlap across the board. The authors are clearly uncomfortable with that conclusion, saying that “. . .these results appear inconsistent with previous publications linking these parameters with promiscuity and with in vivo toxicological outcomes. . .”, but I wonder if that’s because those previous publications were wrong. (And I note that one such previous publication has already come to conclusions like these). Looking at compounds that failed in Phase I due to explicit PK reasons showed no differences at all in these parameters. Comparing compounds that made it only to Phase I (and failed for any reason) versus the ones that made it to Phase II or beyond showed, just barely, a significant effect for cLogP, but no significant effect for cLogD, molecular weight, or PSA. And even that needs to be interpreted with caution:

. . .it is not sufficiently discriminatory to suggest that further control of lipophilicity would have a significant impact on success. Examination of how the probabilities of observing clinical safety failures change with calculated logP and calculated logD7.4 by logistic regression showed that there is no useful difference over the relevant ranges. . .

So, folks, if your compounds most fit within the envelope to start with (as these 812 did), you’re not doing yourself any good by tweaking physiochemical parameters any more. To me, it looks like the gains from that approach were realized early on, by trimming the fringe compounds in each category, and there’s not much left to be done. Those PowerPoint slides you have for the ongoing project, showing that you’ve moved a bit closer to the accepted middle ground of parameter space, and are therefore making progress? Waste of time. I mean that literally – a waste of time and effort, because the evidence is now in that things just don’t work that way. I’ll let the authors sum that up in their own words:

It was hoped that this substantially larger and more diverse data set (compared with previous studies of this type) could be used to identify meaningful correlations between physicochemical properties and compound attrition, particularly toxicity-based attrition. . .However, beyond reinforcing the already established general trends concerning factors such as lipophilicity (and that none too strongly – DBL), this did not prove generally to be the case.

Nope, as the data set gets larger and better curated, these conclusions start to disappear. That, to be sure, is (as mentioned above) partly because the more recent data sets tend to be made up of compounds that are already mostly within accepted ranges for these things, but we didn’t need umpteen years of upheaval to tell us that making compounds that weight 910 with logP values of 8 are less likely to be successful. Did we? Too many organizations made the understandable human mistake of thinking that changing drug candidate properties was some sort of sliding scale, that the more you moved toward the good parts, the better things got. Not so.
What comes out of this paper, then, is a realization that watching cLogP and PSA values can only take you so far, and that we’ve already squeezed everything out of such simple approaches that can be squeezed. Toxicology and pharmacokinetics are complex fields, and aren’t going to roll over so easily. It’s time for something new.

43 comments on “The End of Compound Property Optimization Is At Hand”

  1. MoBio says:

    I’m most interested in the causes of the ‘toxicology failures’. They make a note regarding ‘kinase non-selectivity’ but leave it at that. It would be helpful to have more data regarding the types of toxicity that were uncovered (presumably in the animal tox studies and not in vitro tox)–especially as this is apparently a large driver of the failures.

  2. Wavefunction says:

    Sounds like some necessary medicine.
    “So, folks, if your compounds most fit within the envelope to start with (as these 812 did), you’re not doing yourself any good by tweaking physiochemical parameters any more.”
    True, in fact I would recommend occasionally ‘shocking’ your compounds out of that envelope into novel and uncharted areas so that there’s a higher possibility of targeting more challenging targets like PPIs. Macrocycles and natural products for instance do exactly that.

  3. PharmaHeretic says:

    So who is really shocked that optimizing compounds for some mythic (and clever sounding) parameters was basically ineffective? I mean, look at the structure of drugs which were actually approved in the past and had the biggest therapeutic impacts. Antibiotics anyone? Therapeutically important and innovative rugs with a MW of less than 200? I could go on, but you get the point..
    We now have to ask why these clever sounding ideas began to exert so much influence over the process of compound development. Why did people never try to question them- even though utilizing they did pretty much nothing to improve the success rate of drug discovery as measure by innovative and approvable drugs? And perhaps more tellingly- why did expensive and ultimately useless fashionable ideas such as HTS, heavy reliance on transgenic mouse disease models, proxy makers for disease progression etc become the defining feature of pharma after the late 1980s?
    I chalk it down to the rise of “professional” managerialism- specifically the almost total dominance of well dressed and clever sounding managerial types (Ivy League MBAs, Lawyer types and other assorted CONmen) on the process of drug discovery and development. It comes down to greedy and clever morons trying to run what they don’t know or understand by using scientific-sounding “metrics”.
    Sadly those conmen made a lot of money by running ever more audacious scams. However the industry has a whole has been decimated by their imagism, careerism and financialism driven scams. I cannot help but compare them to viruses who infect healthy host cells, multiply within them to ultimately destroy them and moving on to other nearby healthy cells for repeating this cycle- till they run out of new hosts.

  4. Am I Lloyd says:

    #3: The problem is that the incentive structure is all wrong and rewards short-termism and lack of true long-term innovation. If the system rewards those greedy conmens and morons, can you blame them for taking advantage of the system and lording it over the rest of us? Human nature is selfish and greedy – what we can do is change the incentive structure so that the right kind of greed prevails.

  5. Ano says:

    @3 did you read Derek post and the article???

  6. Pete says:

    It’s worth asking ourselves how predictive we expect a small number of crude molecular descriptors to be of in vivo outcomes. You’ll find trends and these can be massaged to make them look more interesting but none of these descriptors is actually descriptive of specific molecular recognition. Given enough data, the most anemic of trends can acquire eye-wateringly good significance levels. Even the descriptors themselves are flawed. Octanol/water logP doesn’t see hydrogen bond donors and whether we count hydrogen bond acceptors or kid ourselves that PSA is fundamentally different from counting hydrogen bond acceptors, we’re not actually capturing the hydrogen bond acceptor potential of molecules.
    It was good to see the authors acknowledge flaws in earlier studies although I would still challenge what is said about one of those studies in its citation:
    “Leeson, P. D. & Springthorpe, B. The influence of drug-like concepts on decision-making in medicinal chemistry. Nat. Rev. Drug Discov. 6, 881–890 (2007). This paper shows, perhaps for the first time, a link between physicochemical properties, notably lipophilicity and molecular mass, and in vitro promiscuity.”
    given the acknowledged flaws in the data analysis and the fact that a similar study (Hopkins, Mason & Overington (2006) Can we rationally design promiscuous drugs? Curr Opin Struct Biol 16:127-136) published a year earlier came to the opposite view of the influence of molecular mass.

  7. PharmaHeretic says:

    @5
    Yes, I read it and also other previous ones on this blog about the whole concept of compound property optimization.
    A lot of the resource wasting and otherwise disruptive fads in pharma (and other R&D heavy sectors) are merely the symptoms of an underlying disease. Let me put it this way- the current setup pays far to those who pretend to do things than those who actually achieve them. Consequently, those sectors are rapidly dominated by those who can peddle or pretend to master the most clever sounding BS- regardless of whether they can deliver actual products.
    As you might have guessed, those who don’t run these scams will often follow them or pretend to agree with them for fear of losing their jobs.
    Here is an example from another R&D heavy sector- Today both the american government and corporations buy 1970-era Russian RD-180 or NK-33 derived rocket engines to launch satellites and spacecraft. They do so despite spending way more money and resources on developing rocket engines. Ever wonder why all those expensive engineering simulations and “cutting edge” prototypes have lead nowhere or produced anything better than 1970-era russian rocket engines?
    Also the main upper-stage cryogenic engine (RL10) still used on almost every single “american” rocket launcher was developed in the late 1950s-early 1960s. Does that sound vaguely familiar?

  8. Cassie says:

    My observation after 20 years in the biz is that models and metrics are
    1) much easier to develop than drugs
    2) result in just as much reward to the developers’ careers and finances.
    Is it any wonder we’re run by the Monyeball guys now?

  9. Anonymous says:

    @9, @10,
    I’m totally with you. What a lost decade…
    Not sure if today is a good day or a bad day. I guess it depends on what happens next.
    Problem is, those who tried to oppose the nonsense are probably out of a job.

  10. Ano says:

    Not sure why so many messages posted – German network not so slow usually.
    @PharmaHeretic. My message was too short, but it is mentioned by Derek and the authors that once you are in this chemical space there is no difference between good and bad compounds (good = drugs, bad = fail). However it doesn’t say that being outside is better/equal/worst. Or can you share with us your recent blockbusters you have on the market that breaks some of the rules? (antibodies are similar to natural product, our body make them).
    People climbing by telling BS as you say are also the ones like Pete going through the world paid to give talks to rant (and claiming to do research). So need everything in this world. With all the money in the US for biotech if you are so great I’m pretty sure you can start your own biotech and show the world how it should be done.
    What’s next, a car analogy?

  11. MarkySparky says:

    @7 PharmaHeretic
    SpaceX Merlin rockets are clean-sheet versus Russian designs. marginal improvements of upper-stage engines is extremely high-cost/low-reward. The trend has been to use proven tech rather than reinvent the wheel, not that there aren’t more efficient/capable engines in development…
    Not disagreeing with your overall point re: groupthink and short-termism. The state of US spacelaunch prior to ~2009 was a pretty ugly nest of government complacency/corruption and corporate myopia. It seems to be slowly changing for the better.

  12. PharmaHeretic says:

    @MarkySparky,
    The SpaceX Merlin series is not a clean sheet design. It kinda went like this..
    Rocketdyne H-1 (1963) –> RS-27 (1974) — > use of newer materials + simplification –> Merlin 1 A-D (2000s).

  13. signal to noise says:

    The findings of this study were to be expected. As much as some pundits of Ro5 and Ro5 spinoffs would like to think that simple metrics like this could predict outcomes like pre-clinical or clinical toxicity, they fail to acknowledge the many uncertainties of biology, especially in vivo. For example, Figure 6 is a complete wash. Definition(s) of “promiscuity” is also lacking. Ro5 et al. may be helpful in addressing (oral) bioavailability, but that seems about it.
    Predicting compound attrition (pre-clinical or clinical failure) is a whole other ball game.
    Since this coalition of competitors had to mask their data to work with it collectively, it would be a nice gesture to the community at large if they could make their (masked) data set available. Further, independent data analysis would be healthy.

  14. Pete says:

    Signal to noise (#13), promiscuity of a compound in NRDD 6:881–890 (2007) is defined as number of assays that compound shows >=30% inhibition at 10 micromolar. It is unlikely that this will have any general physiological relevance. It should be remembered that free (unbound) drug concentration (which provides the driving force for drug action varies with (a) time (b) location (c) from drug to drug. Also it is not clear exactly how the lipophilicity values in NRDD 6:881–890 (2007) are actually calculated.

  15. MarkySparky says:

    @12 PharmaHeretic
    I think you are under-weighting the importance of design/materials/process/control changes in the Merlin versus older engines. They do not share much beyond their combustion cycle and propellants.

  16. MedChem says:

    I hope this paper would be the last straw that ends the uncreative and repressive regimes of so many professional yes-man-plus-con-man middle managers.

  17. 4merchem says:

    @14, I believe that the % of assays where a compound shows >=30%I is related to physchem properties. Most hits are just noise and nuisance, which might be related to solubility, lipophilicity leading to stickiness on plates and so on. I doubt very much that any such relationship exists for just confirmed stoichiometric binders. I’m shocked that for once I agree with you that this is hardly physiologically relevant (it’s okay, though, because it’s not for the reason you gave…) 🙂

  18. Anonymous says:

    Yet more studies confirming the need to go back to phenotypic screening.

  19. Pete says:

    4merchem (@17) I am well aware that >=30%I at 10 micromolar may not be due ‘real’ activity. I believe that my point is valid whether or not the activity is real.

  20. steve says:

    I’m just a lowly biologist but maybe the final compound optimization should be done using animal models. In vivo veritas and all that…

  21. denton says:

    If we assume that a good proportion (but not all) of toxic endpoints are specific target or mechanism-based, then physical properties would not address their prediction. It is a rare project that seeks to identify the molecular target(s) for a tox endpoint. Until this happens, we will not have the data to design safer compounds.

  22. Wavefunction says:

    Speaking of ‘rules’ for compound optimization, here’s a paper describing another ‘rule of 3’ (link in my handle), this time for phenotypic screening. A lot of the material in it seems rather obvious and well-known; in addition it’s always dangerous to distill rules for anything without considering the exceptions. And while the content should speak for itself it’s also worth considering where this paper comes from. Fool me twice…

  23. Mike Waring says:

    Great to see this has sparked some interest. I think a lot of the comments take things too far. The message we tried to give, and I honestly believe, is that controlling logD / MWt are very helpful to get to candidate drug quality compounds, optimising simple things like ADME properties. The fact that all the compounds in this set lie within the acceptable range supports this clearly – lots of more lipophilic ones will have failed long before getting to this point as we all know. It definately doesn’t say this is useless, and I think that in some cases during optimisation, further tweaking of logD can be beneficial. We are just showing that having done so, don’t expect that to make a difference to in vivo tox outcomes etc. Many of us didn’t think it would and so are not surprised, others of us thought it might and had reasonable grounds to think so. We tried to collect the best data set we could and tried to present it in as open a way as possible so this could be assessed in a data driven manner.

  24. Pete says:

    Funny that you should bring up Ro3, Ash, because the FBDD Ro3 says a lot about drug discovery culture. As introduced, the FBDD rule didn’t actually say how the hydrogen bond acceptors should be defined. That didn’t stop library vendors touting the Ro3-ness of their offerings and people continued to base their library design on Ro3. Using the Ro5 definitions of HB acceptors is actually an excellent way to eliminate carboxylate mimics (e.g tetrazole, acylsulfonamide) form the selectiion process. I’ve linked a blog post on the topic from 4 years ago as the URL for this comment

  25. Anonymous says:

    Hmm, I wonder where these 812 compounds came from? HTS?

  26. Andre says:

    Most drug candidate optimization relies on in vitro testing using purified proteins. In other words, optimization happens in an artificial situation that does not reflect the in vivo situation. Toxicity is therefore not pickup early enough. In my opinion, phenotypic drug screening and candidate optimization using organoid cultures or whole animal models (zebrafish, Xenopus) should be considered as alternatives. Drug candidate optimization should not be solely guided by artificial predictions of what makes up the ideal drug. It should be only limited by the chemical reactions that are possible around a given drug candidate or scaffold. If I remember correctly, Paclitaxel would have not made the cut as it is too large to fit the role of five criteria….. It was however identified as being cytotoxic for cultured cells on the basis of a phenotypic drug screen performed in 1964….

  27. Zebras and fish says:

    @26 ironic that your solution to testing in artificial situations is to use fish.

  28. Mike says:

    This also doesn’t have anything to do with phenotypic screening / in vivo optimisation or whatever. All of these compounds will have shown activity in phenotypic assays and in vivo – who would take something to the clinic that didn’t? If we had phenotypic tox assays that were predictive then that might be different. @21 denton makes a good point re tox being down to specific pharmacology and elucidating tox mechanisms – a point we did make in the paper.

  29. Andre says:

    @27: Unfortunately, mice and men do not fit into 96-well plates for screening chemical libraries or for testing dozens of drug candidate derivatives. You’ll have to compromise somewhere….. The suggested phenotypic screening approach is not intrinsically more flawed than the currently favored paradigm of target-based drug discovery.

  30. Pavel says:

    @23
    totally agree …
    some parameters are good for certain purposes
    However, people like it black or white (simply boxed, in this case into some parameter), so I don’t wonder that you didn’t satisfied the readers.
    realization, if happens, that between the black and white lays infinite field of grey takes some time.

  31. Rule Schmule says:

    The problem with calculated properties is there in the name- they are calculated, and, generally, an overly simplified representation the molecule under study. The trick of using them in optimisation is to work out, for your series, what measured property they are “predictive” of in order maximise the chances of success with any given design. Calc props for large data sets are never going to track with “success” because of structural diversity which is not well represented by clogP, TPSA etc. Surely no rational person would use calc props over measured values of solubility, permeability, clearance, selectivity and yes lipophilicity etc. to gauge the quality of their compounds or to show progress. And of course a compound with perfect properties can fail for a endless reasons.

  32. Rule Schmule says:

    The problem with calculated properties is there in the name- they are calculated, and, generally, an overly simplified representation the molecule under study. The trick of using them in optimisation is to work out, for your series, what measured property they are “predictive” of in order maximise the chances of success with any given design. Calc props for large data sets are never going to track with “success” because of structural diversity which is not well represented by clogP, TPSA etc. Surely no rational person would use calc props over measured values of solubility, permeability, clearance, selectivity and yes lipophilicity etc. to gauge the quality of their compounds or to show progress. And of course a compound with perfect properties can fail for a endless reasons.

  33. Fred says:

    Ok, so:
    Rule of 5: fairly useless.
    Combo chem: fairly useless
    genomics: maybe in another 10 years
    Outsourcing overseas: worse than useless
    Mergers and acquisitions: WAY worse than useless.

  34. JC says:

    Fred I wouldn’t say combi chem was all that useless. It brought the investor money in when we had the Gilson’s running for an hour to dispense the building blocks.

  35. Anonymous says:

    @23 M Waring: It is a sound practice when you report summary statistics to include the value of N, total number of data points for each analysis. This ensures all observations are included (Ref: Avoiding Careless Errors: Know Your Data, Kristin L. Sainani, PhD)

  36. Nick K says:

    #33, 34: Cynical but completely true, unfortunately.

  37. Pete says:

    Hi Mike (#23), I believe that the data analysis in your article is honest, that you’ve asked some reasonable questions and reported your findings clearly. My initial comment (#6) was to highlight the need to distance oneself from previously published studies in which data analysis is flawed. In the data analysis field, failure to maintain an appropriate distance from flawed analysis can taint one’s own study (although I don’t think this is a real issue here). The trends that you’ve observed are not strong but that’s perfectly OK because you’ve asked reasonable questions using a dataset that most (all?) of us following this discussion would consider to be extremely relevant to attrition.
    I would still challenge your assertion that, “The message we tried to give, and I honestly believe, is that controlling logD / MWt are very helpful to get to candidate drug quality compounds, optimising simple things like ADME properties. The fact that all the compounds in this set lie within the acceptable range supports this clearly – lots of more lipophilic ones will have failed long before getting to this point as we all know”. This is not to say that I believe logP/D and MW are unimportant but just that I don’t think your analysis supports the assertion. I also believe that trends within structural series are often stronger than in the large, structurally diverse datasets that feature commonly in analyses of drug-likeness, attrition etc. Hydrogen bond donors/acceptors and ionizable groups are often relatively conserved within series which minimizes problems caused by differences between hydrogen bond donor/acceptor strength and pKa. This means that logD (measured) or logP (predicted) can often perform well as predictors of solubility, permeability etc.

  38. Mike says:

    @35 I agree and we have done so.

  39. Mike says:

    @35 I agree and we have done so.

  40. Wavefunction says:

    #24: Good point Pete. I would say that that another problem with many calculated properties like HBAs and HBDs is that they are conformation-dependent and therefore not easily captured by static structures.

  41. ana says:

    the program I’m on right now shows a pretty clear trend between calc logP and permeability. Definitely seen similar trends in microsomal stability vs. logP. Property optimization can definitely become overly prescriptive but I believe there’s some value to looking at these properties. At least now med chemists have in hand a lot of tools for predicting them easily that werent around 10 years ago so questions werent even asked

  42. Dan Severance says:

    There was a time (and still is in some places) where potent=drug, and adding more and more grease yielded higher and higher potency. Unfortunately that potency came mostly from desolvation effects – gaining free energy by removing it from solvent. Unfortunately that 10x increase in potency also lead to a 10x increase in many deleterious targets (ion channels, CYPS, etc.) which are tested at much higher concentrations and thus limiting solubility can mask those issues.
    Thus, getting into some REASONABLE range was not a fad,
    though correctly some targets will require forays outside of those semi-arbitrary limits, the idea that one’s entire portfolio is populated with such is NOT a recipe for success.
    At one company I worked in the chemists knew that target X required large molecular weight, etc. with multiple patents and one compound showing some efficacy in the clinic before being killed (partner bought another compound farther along in the clinic). In chemistry meetings (I am a modeler) I noticed that the SAR was flat with very large changes. I suggested the intermediate be tested in the assay and it was equipotent. Suddenly the series went from average MW of 550-600 to 320-400 leading to a final clinical candidate with MW=330. Incredibly potent, soluble, and made it to Phase III where I don’t know the final reason for stopping it.
    Moral of the story – as long as you are going outside of the range – try peeking BELOW it as well as above once in a while!

  43. simpl says:

    Ano’s comment reminded me that one argument to justify biologicals in the 80s was that a biological may be expensive and unstable, but if successful could lead to cheaper smaller molecules. The finance side of this panned out very differently so far, justifying biotech prices for small molecules. But are there any examples of a natural product being a lead for a chemical successor?

Comments are closed.