Skip to main content

Drug Assays

Too Many Metrics

Here’s a new paper from Michael Shultz of Novartis, who is trying to cut through the mass of metrics for new compounds. I cannot resist quoting his opening paragraph, but I do not have a spare two hours to add all the links:

Approximately 15 years ago Lipinski et al. published their seminal work linking molecular properties with oral absorption.1 Since this ‘Big Bang’ of physical property analysis, the universe of parameters, rules and optimization metrics has been expanding at an ever increasing rate (Figure 1).2 Relationships with molecular weight (MW), lipophilicity,3 and 4 ionization state,5 pKa, molecular volume and total polar surface area have been examined.6 Aromatic rings,7 and 8 oxygen atoms, nitrogen atoms, sp3 carbon atoms,9 chiral atoms,9 non-hydrogen atoms, aromatic versus non-hydrogen atoms,10 aromatic atoms minus sp3 carbon atoms,6 and 11 hydrogen bond donors, hydrogen bond acceptors and rotatable bonds12 have been counted and correlated.13 In addition to the rules of five came the rules of 4/40014 and 3/75.15 Medicinal chemists can choose from composite parameters (or efficiency indices) such as ligand efficiency (LE),16 group efficiency (GE), lipophilic efficiency/lipophilic ligand efficiency (LipE17/LLE),18 ligand lipophilicity index (LLEAT),19 ligand efficiency dependent lipophilicity (LELP), fit quality scaled ligand efficiency (LE_scale),20 percentage efficiency index (PEI),21 size independent ligand efficiency (SILE), binding efficiency index (BEI) or surface binding efficiency index (SEI)22 and composite parameters are even now being used in combination.23 Efficiency of binding kinetics has recently been introduced.24 A new trend of anthropomorphizing molecular optimization has occurred as molecular ‘addictions’ and ‘obesity’ have been identified.25 To help medicinal chemists there are guideposts,21 rules of thumb,14 and 26 a property forecast index,27 graphical representations of properties28 such as efficiency maps, atlases,29 ChemGPS,30 traffic lights,31 radar plots,32 Craig plots,33 flower plots,34 egg plots,35 time series plots,36 oral bioavailability graphs,37 face diagrams,28 spider diagrams,38 the golden triangle39 and the golden ratio.40

He must have enjoyed writing that one, if not tracking down all the references. This paper is valuable right from the start just for having gathered all this into one place! But as you read on, you find that he’s not too happy with many of these metrics – and since there’s no way that they can all be equally correct, or equally useful, he sets himself the task of figuring out which ones we can discard. The last reference in the quoted section below is to the famous “Can a biologist fix a radio?” paper:

While individual composite parameters have been developed to address specific relationships between properties and structural features (e.g. solubility and aromatic ring count) the benefit may be outweighed by the contradictions that arise from utilizing several indices at once or the complexity of adopting and abandoning various metrics depending on the stage of molecular optimization. The average medicinal chemist can be overwhelmed by the ‘analysis fatigue’ that this plethora of new and contradictory tools, rules and visualizations now provide, especially when combined with the increasing number of safety, off-target, physicochemical property and ADME data acquired during optimization efforts. Decision making is impeded when evaluating information that is wrong or excessive and thus should be limited to the absolute minimum and most relevant available.
As Lazebnik described, sometimes the more facts we learn, the less we understand.

And he discards quite a few. All the equations that involve taking the log of potency and dividing by the heavy atom count (HAC), etc., are playing rather loose with the math:

To be valid, LE must remain constant for each heavy atom that changes potency 10-fold. This is not the case as a 15 HAC compound with a pIC50 of 3 does not have the same LE as a 16 HAC compound with a pIC50 of 4 (ΔpIC50 = 1, ΔHAC = 1, ΔLE = 0.07). A 10-fold change in potency per heavy atom does not result in constant LE as defined by Hopkins, nor will it result in a constant SILE, FQ or LLEAT values. These metrics do not mathematically normalize size or potency because they violate the quotient rule of logarithms. To obey this rule and be a valid mathematical function HAC would subtracted from pIC50 and rendered independent of size and reference potency.

Note that he’s not recommending that last operation as a guideline, either. Another conceptual problem with plain heavy atom counting is that it treats all atoms the same, but that’s clearly an oversimplification. But dividing by some form of molecular weight is an oversimplification, too: a nitrogen differs from an oxygen by a lot more than that 1 mass unit. (This topic came up here a little while back). But oversimplified or not – heck, mathematically valid or not – the question is whether these things help out enough when used as metrics in the real world. And Shultz would argue that they don’t. Keeping LE the same (or even raising it) is supposed to be the sign of a successful optimization, but in practice, LE usually degrades. His take on this is that “Since lower ligand efficiency is indicative of both higher and lower probabilities of success (two mutually exclusive states) LE can be invalidated by not correlating with successful optimization.”
I think that’s too much of a leap – because successful drug programs have had their LE go down during the process, that doesn’t mean that this was a necessary condition, or that they should have been aiming for that. Perhaps things would have been even better if they hadn’t gone down (although I realize that arguing from things that didn’t happen doesn’t have much logical force). Try looking at it this way: a large number of successful drug programs have had someone high up in management trying to kill them along the way, as have (obviously) most of the unsuccessful ones. That would mean that upper management decisions to kill a program are also indicative of both higher and lower probabilities of success, and can thus be invalidated, too. Actually, he might be on to something there.
Shultz, though, finds that he’s not able to invalidate LipE (or LLE), variously known as ligand-lipophilicity efficiency or lipophilic ligand efficiency. That’s p(IC50) – logP, which at least follows the way that logarithms of quotients are supposed to work. And it also has been shown to improve during known drug optimization campaigns. The paper has a thought experiment, on some hypothetical compounds, as well as some data from a tankyrase inhibitor series that seem to show the LipE behave more rationally than other metrics (which sometimes start pointing in opposite directions).
I found the chart below to be quite interesting. It uses the cLogP data from Paul Leeson and Brian Springthorpe’s original LLE paper (linked in the above paragraph) to show what change in potency you would expect when you change a hydrogen in your molecule to one of the groups shown if you’re going to maintain a constant LipE value. So while hydrophobic groups tend to make things more potent, this puts a number on it. A t-butyl, for example, should make things about 50-fold more potent if it’s going to pull its weight as a ball of grease. (Note that we’re not talking about effects on PK and tox here, just sheer potency – if you play this game, though, you’d better be prepared to keep an eye on things downstream).
LipE chart
On the other end of the scale, a methoxy should, in theory, cut your potency roughly in half. If it doesn’t, that’s a good sign. A morpholine should be three or four times worse, and if it isn’t, then it’s found something at least marginally useful to do in your compound’s binding site. What we’re measuring here is the partitioning between your compound wanting to be in solution, and wanting to be in the binding site. More specifically, since logP is in the equation, we’re looking at the difference in the partitioning of your compound between octanol and water, versus its partitioning between the target protein and water. I think we can all agree that we’d rather have compounds that bind because they like something about the active site, rather than just fleeing the solution phase.
So in light of this paper, I’m rethinking my ligand-efficiency metrics. I’m still grappling with how LipE performs down at the fragment end of the molecular weight scale, and would be glad to hear thoughts on that. But Shultz’s paper, if it can get us to toss out a lot of the proposed metrics already in the literature, will have done us all a service.

39 comments on “Too Many Metrics”

  1. Anonymous says:

    Somebody forgot to close an italic tag?
    Very interesting article nonetheless.

  2. Pete says:

    The problem with ligand efficiency is that it is thermodynamically invalid. The choice of 1M as the standard concentration is a convention and thermodynamics tells us that we can use any concentration that we want as the standard concentration. This means that any insights into compound quality (compound A is better than compound B) must be invariant with respect to the choice of standard concentration. There is a paragraph in our recent article on prediction of alkane/water partition coefficients (JCAMD 27:389-402)in which this point is discussed in more detail (the doi is 10(dot)1007(slash)10822-013-9655-5).
    I am not a great fan of metrics and my challenge to those who promote them is that measurability is not enough. The metrics also have to be relevant. A big problem is that drug discovery management (who would much prefer to be called Leaders) tend to reward the metricators who merely need to claim a correlation in order to be seen to be predicting.

  3. Anonymous says:

    Great, with all these metrics we can invest all our time to “predict” that the drugs we already know to be active, will be active, instead of doing real experiments to learn anything new. Welcome to Big Data.

  4. Industry Guy says:

    @3 Nail on the head there my friend. I really like PK predictions killing compounds from actually getting PK. One red box is a no go….

  5. Anonymous says:

    An observed correlation is no more than a hypothesis, and may even be an illusion beased on spurious patterns in the noise. It still needs to be scientifically tested by further experiment.

  6. Andrés says:

    Dear Derek,
    It is very easy to remove the citations. Here we go:
    Approximately 15 years ago Lipinski et al. published their seminal work linking molecular properties with oral absorption. Since this ‘Big Bang’ of physical property analysis, the universe of parameters, rules and optimization metrics has been expanding at an ever increasing rate (Figure 1). Relationships with molecular weight (MW), lipophilicity, and ionization state, pKa, molecular volume and total polar surface area have been examined. Aromatic rings, and oxygen atoms, nitrogen atoms, sp3 carbon atoms, chiral atoms, non-hydrogen atoms, aromatic versus non-hydrogen atoms, aromatic atoms minus sp3 carbon atoms, and hydrogen bond donors, hydrogen bond acceptors and rotatable bonds have been counted and correlated. In addition to the rules of five came the rules of 4/400 and 3/75. Medicinal chemists can choose from composite parameters (or efficiency indices) such as ligand efficiency (LE),group efficiency (GE), lipophilic efficiency/lipophilic ligand efficiency (LipE/LLE), ligand lipophilicity index (LLEAT), ligand efficiency dependent lipophilicity (LELP), fit quality scaled ligand efficiency (LE_scale), percentage efficiency index (PEI), size independent ligand efficiency (SILE), binding efficiency index (BEI) or surface binding efficiency index (SEI) and composite parameters are even now being used in combination. Efficiency of binding kinetics has recently been introduced. A new trend of anthropomorphizing molecular optimization has occurred as molecular ‘addictions’ and ‘obesity’ have been identified. To help medicinal chemists there are guideposts,rules of thumb, and a property forecast index, graphical representations of properties such as efficiency maps, atlases,ChemGPS, traffic lights, radar plots, Craig plots, flower plots, egg plots, time series plots, oral bioavailability graphs, face diagrams, spider diagrams, the golden triangle and the golden ratio.

  7. Derek Lowe says:

    #6 – what I meant was that I didn’t have time to put links in to every one of the 40 references. I actually did want to keep the footnotes, though, just to show how many things he cited (!)

  8. schinderhannes says:

    Once again a great post based on a great paper, but….
    In the second half you start to wander of praising LLE. That shocked me!
    I hate this concept!
    Simply because Logs have no unit one cannot go havoc with them. But that ain´t physics or so that is playing stupid games to me. By simply subtracting them from each other you have defined a one to one relationship. If you had any real rational behind it you would want to calculate a slope.
    True there is a correlation here, no one should be surprised, but is one log unit on the p(IC50) worth exactly one log unit logP, that would be a darn coincidence.
    the formula should be p(IC50) – a logP.
    With a being the slop in non log world. Mathematically is would be just as sound to come up with the new SchinderLigandEfficency I define as p(IC50) – 2 logP, prolly works just fine when you rank your compounds by it.
    The fact that there has been no effort to find the optimal coefficient for pure laziness (or maybe cause it is different for every target?) is a real bummer to me.
    One last general comment. All these numbers and formulas fed with ´em have never proven to be any better than the gut feeling of a med-chem veteran. I´d say them models only start being useful if they out-perfrom his intuition.
    Rant of…..

  9. Anonymous says:

    True story based on mis-use of metrics: I previously started a biotech based on peptidomimetic drugs for AD. The compounds were shown to be completely resistant to proteolysis, completely solu le in both water and chloroform, good LogP profile, and wizzed through cell membranes in Caco-2 assays. But the experienced (ex-pfizer) CEO killed the project on the basis that the compounds did not obey Lipinski’s rules. He even published a report saying that the compounds were not bioavailable, without ever testing them in vivo, and the company eventually went into liquidation. Years later I met Chris Lipinski, and explained what happened. He said the CEO was an idiot. I tend to agree.

  10. watcher says:

    Even Lipinski spoke of limits, estimations, and inappropriateness of his rule of 5. But many across the industry have used them as gospel, if not a given law of mediinal science. Yet many Med Chemists and upper managers want & need reference points to guide them in approaches toward making new drugs, not wanting to follow the biochemical or biological feedback. Know which I’d prefer to believe….mol wt above 500 or oral activity in an animal model……

  11. Rant-time says:

    I totally agree with @8 about LLE being nonsensical, for the same reason, that pKi and logP aren’t on the same scale. It makes no more sense than log(apples) – log(oranges).
    Note I said pKd, NOT pIC50, though, which loads of people seem to forget, leading them to compare things that should never be compared, such as totally different assays run against totally different targets. That’s what really gets on my goat.

  12. Anonymous says:

    People inherently are really bad at risk assessment–including many trained in science. Medicinal chemists in particular tend to rely on various correlations and so-called gut feelings in making predictions on how “drug like” a molecule will be. We tend to emphasize the rare occasions such predictions coincide with actual outcome, and discount the majority occasions that the predictions fell short.
    A great baseball slugger can only make a hit approx. 3 out of 10 times of at-bats. What do you think a great drug hunter’s batting average is like? 1 out of 100, 500…1000 perhaps?
    What are these drug hunter super stars doing differently from the rest of us mere mortals?

  13. weirdo says:

    My all-time favorite is the “golden ratio” paper. It’s sheer brilliance. I was hoping it had been published on April 1, but alas . . .
    Son’s of Fibonacci, unite!

  14. The Aqueous Layer says:

    Thankfully we have SpotFire to clear everything up…

  15. w says:

    Someone tell me, since when is an LE measure a substitute for thinking?

  16. Derek Lowe says:

    #13, I remember when that Golden Ratio paper came out. I read it, with increasing puzzlement, and thought that since it had been reviewed and accepted by Drug Discovery Today that there must be something more to it than that. So I looked it over again after a couple of days, and no, I didn’t extract anything more that time. My best guess is that whoever vetted it thought it must be something deep. Mud puddles can look deep, too, if you haven’t seen one before.

  17. weirdo says:

    ” thought that since it had been reviewed and accepted by Drug Discovery Today that there must be something more to it than that”
    That was your mistake!

  18. A very thoughtful post!

  19. Anonymous says:

    Remember the clip in Disney’s flick “Dumbo” where the crow plucks a tail feather and gives it to Dumbo saying it is the magic feather that he needs to hold in his trunk so he can fly? Well the Ro5 was our first magic feather of drug invention (or perhaps it was molecular modeling or combichem or…). Unfortunately the drug world does not work that way. Lithium carbonate, Botox, halothane, hydroxyl urea, Velcade, Symmetrel, metformin, Trisenox, nitroglycerin, cisplatin, glucose, Fosamax, cyclosporine, valproic acid, methyldopa, levodopa, Byetta, auranofin, xenon, Tecfidera, most natural product drugs… are all useful agents for medicating people. The discovery of these drugs did not require such magical feathers, and in fact today’s magic feathers might have been an impediment to their discovery.
    I grew up in the era of in vivo screening. My first successful lead candidate was identified by screening in a modified Erwin safety screen and a series of psychiatric animal model screens. I knew my first compound was active because it hit in the condition avoidance screen. I knew it was safe enough because it passed the Erwin screening benchmarks, sort of. I knew my lead was orally available and CNS available from day one. Because my boss was an alcoholic and not paying attention, I was able to spend one man-year on my own optimizing this lead using in vivo screening results as a guide. When my drunk boss was moved into a lateral abyss, I had to stop working on my analog series because my new boss thought I was his PhD technician. He subsequently directed me to work with the nifty synthetic intermediate, MTPT, and make his genius-originated inactive compounds.
    Three years later a biologist identified the novel MoA of my original compounds, and my compounds became hot with one of my in vivo optimized analogs being named a development candidate. Fortunately I got my modest bonus for making that candidate four years earlier, but before the animal toxicology killed my compound. I learned an extremely valuable lesson when my highly mouse-safe drug candidate dramatically croaked the toxicology rats at a very low dose. It seems rats are not just big mice. That truism makes me way too cynical about the application of simple magical feathers to drug discovery.

  20. sgcox says:

    “Because my boss was an alcoholic and not paying attention, I was able to spend one man-year on my own optimizing this lead using in vivo screening results as a guide.”
    Finally we have a guide how to solve managerial crisis killing innovation in big pharma raised many times on this blog !
    Yes, I am kidding but not by much.

  21. Pete says:

    At least LLE does have a simple thermodynamic interpretation (at least for predominantly neutral compounds that bind as neutral form)in that it can be thought of a binding constant where the unbound ligand is in octanol rather than water. It can be helpful to think of efficiency metrics as affinity or potency that is either scaled (e.g. by molecular size) or offset (e.g. by logP). When you scale, you’re assuming that the relevant line goes through the origin which corresponds (when using affinity) to Kd equal to the standard concentration. Offsetting, in contrast, means that (as correctly noted by by schinderhannes #8)means that you’re assuming unit slope for the response of potency or affinity to lipophilicity. At the risk of opening a huge can of worms, there is also the question of whether to use logP or logD. I’ve linked a blog post on some of the voodoo thermodynamics of efficiency metrics to my URL for this comment.
    I have a couple of general gripes about efficiency metrics. Firstly, it is often (usually?) impossible to tell exactly what has been used to calculate a particular efficiency metric (logP, ClogP or logD). If you’re not convinced, perhaps take a look at how LLE was originally ‘defined’. Also there is the question of units and I am always amused when people (incorrectly) convert IC50 to deltaG (perhaps to make biological activity more ‘physical’) before discarding the units when the results are presented.

  22. Anonymous says:

    The more variables (molecular properties) you explore, the number of correlations between those variables will rise exponentially, however the proportion of those correlations which are meaningful and reproducible will diminish.
    Nassim Taleb explains the problem with this in Big Data, and it is very relevant here.
    For example, if you plot 50 different ligand properties against 50 different target protein properties, you will probably find a good correlation in at least one of those 2,500 relationships. But does it mean anything? Of course not.

  23. Robur says:

    Got to admire Michael Shultz for pointing out how very light and delicate is the cloth of the Emperors new clothes.

  24. anon says:

    #9: …the experienced (ex-Pfizer) CEO…
    (in my “Mythbusters” voice)
    “Now THERE’S yer problum”

  25. Babylon says:

    Reflecting the tone of other comments, my worry also lies in how these properties are being used within Pharma.
    Too often it seems that the only choice with them is between true-believer or heretic.
    In team presentations, I see an ever-growing list of tabulated values (color-coded to highlight progress) accompanying every compound – perhaps to demonstrate adherence, certainly to mitigate personal risk should the compound disappoint.
    Yet someone still asks, “haven’t you calculated the…. (insert name of most recently published metric)?”
    Too often they end up being used without full understanding or context – reduced to an opportunity for point-scoring rather than insight.
    etc etc etc etc
    Ultimately they may be revealing more about the Industry than the molecules it aims to design?

  26. jgault says:

    Thank god someone said it,
    “I said pKd, NOT pIC50, though, which loads of people seem to forget, leading them to compare things that should never be compared, such as totally different assays run against totally different targets.”
    If I have to sit in another meeting and explain to chemists that no metric can be used as a criteria that uses an IC50 I think my head is going to explode (BTY I just gave away who jgault is to my entire department). The lack of any fundemental understanding of the relationship between binding energies and biological outcome measures is appauling. We need to get back to a deeper understanding of the complexities of the systems we are trying to study and stop trying to simplify our work so the dim and powerful can understand it. It’s going to be a hard road and their will be many casualties but more than just our industry depends on it.

  27. annon2 says:

    Everyone was wanting short cut ways in filtering the millions, no billions of new comopounds that were certain to emerge from combinatorial chemistry. But even as the fad of one began to fade, the motivation for the other did not. Everyone who could do a plot or make a calculation wanted their own set of rules that could add to or supersede the rule of 5. I’m sure Lipinski would find it all rather humorous if asked.

  28. TX raven says:

    Wait, there is more!!!
    PFE colleagues just introduced the Lipophilic Metabolism Efficiency, which they call LipMetE. You may see it with your own eyes at
    Now we really got something here! 🙂

  29. No Paralysis by Analysis says:

    As an ex-medicinal chemist now working in chemical development, but who “dabbled” in the combinatorial field at the time when Lipinski’s “Rule of Five” paper came out, this discussion brings back some interesting memories. Instead of initially impacting our medicinal chemistry efforts, the Lipiniski paper was used to revamp our program for combinatorial library design and synthesis. In agreement with the many comments in this blog, these efforts were certainly needed at that time.
    In any event, I believe that the real issue here is not what is the most appropriate metric or metrics to apply, but the potential for “paralysis by analysis” (or worse yet, throwing the baby out with the bath water) by blindly applying any or all of these metrics to any drug discovery program. The example of the “experienced” Pfizer CEO story in killing the AD peptidomimetics (Anon #9) is a case in point.
    What makes drug discovery and development difficult is not applying the “perfect” metrics to a class of compounds, its making the transition from in vitro data into in vivo data. Ultimately the key challenge is to get appropriate in vivo efficacy from a compound or series of compounds whose in vitro data suggest that it might be successful.
    Of course things don’t get easier once you get to this point, but at least you will get data where you can assess more relevant drug development issues such as PK, tox and efficacy. Of course, animal models have their limits; and going from one promising species to another (such as the mice to rats example Anon #19 cited) can lead to failure.
    I once heard that Zetia would not have been developed at Schering-Plough except for the effort of two champions in medicinal chemistry and biology who were focused on getting in vivo data to understand the MOA of their initial set of compounds. In some respect, this example supports Bernard Munos’ point about how innovation in Pharma has been destroyed.
    To foster innovation on identifying a development candidate for a therapeutic target, a “skunk works” approach must be considered. Here, restrictions due to metrics and budgets (and being cynical, micromanagement of the bench scientists) need to be shelved temporarily to allow a focused scientific team obtain data based on limited available scientific rationale or a hypothesis. The word “temporarily” is the critical factor here, as these programs can’t go on forever without having clear milestones for success. Hence clearly defined Go/No Go criteria is required for this approach, as well as any drug discovery and development program.
    However, being completely risk-adverse in drug discovery, is not going to lead to success. The hard part is to define how much risk is actually worth taking. Right now no metric can tell us that.

  30. DrSnowboard says:

    I really enjoyed this post and the comments. It made me think of two handed regression as a route to meaning….

  31. agogmagog says:

    @ 19. ‘before the animal toxicology killed my compound’.
    – loved this. 🙂

  32. Anonymous says:

    I love how spurious correlations have replaced the need to actually test anything. I wonder if we can convince the FDA to approve a drug based only on its molecular weight…

  33. Grim Reaper says:

    Where is the knowledge we have lost in information?
    T.S. Eliot, The Rock

  34. srp says:

    Thanks to Derek and the commenters for a very useful thread. It gives civilians at least a dim picture of what goes on inside drug development organizations and what may be dysfunctional about standard practices. It also puts more specificity on some of the complaints around here about process management stamping out true innovation, etc.
    My first analogy to explain this to those even less-informed would be painting by numbers and expecting to create a masterpiece.

  35. NHR_Guy says:

    I used to work at a company that many here are pfamiliar with. I was always amazed at the blind adherence to these magic formulas… was almost cult like. They had legions of PhDs sitting in offices combing over reams of data in Spotfire looking for the straight line with a slope of 1. In the end, they had a hard time discovering drugs internally so they just bought a bunch of other companies, stripped the asserts, dumped the scientist, and then took the drugs and claimed them as their own discoveries.

  36. Morten G says:

    I almost always laugh when people quote Lipinsky’s RoF. They’ve never read the paper and usually it’s not even relevant to what they are trying to do.

  37. Mr T says:

    I have now worked as a medicinal chemist in several companies, some being fanatical about metrics, and others which were not really interested. Guess which ones had the best design ideas and shortest optimization cycles? The ones using metrics. In my experience using metrics help people select best series faster, think out of the box, try to challenge the model, tweak the molecules and make meaningful changes. Again and again I have seen projects team taking the wrong decisions because they could not be bothered to rank compounds by LiPE…
    The attitude that you should not try to quantify quality “because it’s impossible” is ludicrous; not everything is predictable in complex systems, but as scientists we should at least try.

  38. Mr. SONG says:

    Very interesting topic and very interesting comments from both Derek and audience.
    I personally think we could use such metrics suc as LipE as a rough guide if we do not have other data available. But we have to be very careful…

Comments are closed.