Skip to main content

Drug Assays

Compound Properties: Starting a Renunciation

I’ve been thinking a lot recently about compound properties, and what we use them for. My own opinions on this subject have been changing over the years, and I’m interested to see if I have any company on this.
First off, why do we measure things like cLogP, polar surface area, aromatic ring count, and all the others? A quick (and not totally inaccurate) answer is “because we can”, but what are we trying to accomplish? Well, we’re trying to read the future a bit and decrease the horrendous failure rates for drug candidates, of course. And the two aspects that compound properties are supposed to help with are PK and tox.
Of the two, pharmacokinetics is the one with the better shot at relevance. But how fine-grained can we be with our measurements? I don’t think it’s controversial to say that compounds with really high cLogP values are going to have, on average, more difficult PK, for various reasons. Compounds with lots of aromatic rings in them are, on average, going to have more difficult PK, too. But how much is “lots” or “really high”? That’s the problem, because I don’t think that you can draw a useful line and say that things on one side of it are mostly fine, and things on the other are mostly not. There’s too much overlap, and too many exceptions. The best you can hope for, if you’re into line-drawing, is to draw one up pretty far into the possible range and say that things below it may or may not be OK, but things above it have a greater chance of being bad. (This, to my mind, is all that we mean by all the “Rule of 5” stuff). But what good does that do? Everyone doing drug discovery already knows that much, or should. Where we get into trouble is when we treat these lines as if they were made of electrified barbed wire.
That’s because of a larger problem with metrics aimed at PK: PK is relatively easy data to get. When in doubt, you should just dose the compound and find out. This makes predicting PK problems a lower-value proposition – the real killer application would be predicting toxicology problems. I fear that over the years many rule-of-five zealots have confused these two fields, out of a natural hope that something can be done about the latter (or perhaps out of thinking that the two are more related than they really are). That’s unfortunate, because to my mind, this is where compound property metrics get even less useful. That recent AstraZeneca paper has had me thinking, the one where they state that they can’t reproduce the trends reported by Pfizer’s group on the influences of compound properties. If you really can take two reasonably-sized sets of drug discovery data and come to opposite conclusions about this issue, what hope does this approach have?
Toxicology is just too complicated, I think, for us to expect that any simple property metrics can tell us enough to be useful. That’s really annoying, because we could all really use something like that. But increasingly, I think we’re still on our own, where we’ve always been, and that we’re just trying to make ourselves feel better when we think otherwise. That problem is particularly acute as you go up the management ladder. Avoiding painful tox-driven failures is such a desirable goal that people are tempted to reach for just about anything reasonable-sounding that holds out hope for it. And this one (compound property space policing) has many other tempting advantages – it’s cheap to implement, easy to measure, and produces piles of numbers that make for data-rich presentations. Even the managers who don’t really know much chemistry can grasp the ideas behind it. How can it not be a good thing?
Especially when the alternative is so, so. . .empirical. So case-by-case. So disappointingly back-to-where-we-started. I mean, getting up in front of the higher-ups and telling them that no, we’re not doing ourselves much good by whacking people about aromatic ring counts and nitrogen atom counts and PSA counts, etc., that we’re just going to have to take the compounds forward and wait and see like we always have. . .that doesn’t sound like much fun, does it? This isn’t what anyone is wanting to hear. You’re going to do a lot better if you can tell people that you’ve Identified The Problem, and How to Address It, and that this strategy is being implemented right now, and here are the numbers to prove it. Saying, in effect, that we can’t do anything about it runs the risk of finding yourself replaced by someone who will say that we can.
But all that said, I really am losing faith in property-space metrics as a way to address toxicology. The only thing I’m holding on to are some of the structure-based criteria. I really do, for example, think that quinones are bad news. I think if you advance a hundred quinones into the clinic, that a far higher percentage of them will fail due to tox and side effects than a hundred broadly similar non-quinones. Same goes for rhodanines, and a few other classes, those “aces in the PAINS deck” I referred to the other day. I’m still less doctrinaire about functional groups than I used to be, but I still have a few that I balk at.
And yes, I know that there are drugs with all these groups in them. But if you look at the quinones, for example, you find mostly cytotoxics and anti-infectives which are cytotoxins with some selectivity for non-mammalian cells. If you’re aiming at a particularly nasty target (resistant malaria, pancreatic cancer), go ahead and pull out all the stops. But I don’t think anyone should cheerfully plow ahead with such structures unless there are such mitigating circumstances, or at least not without realizing the risks that they’re taking on.
But this doesn’t do us much good, either – most medicinal chemists don’t want to advance such compounds anyway. In fact, rather than being too permissive about things like quinones, most of us are probably too conservative about the sorts of structures we’re willing to deal with. There are a lot of funny-looking drugs out there, as it never hurts to remind oneself. Peeling off the outer fringe of these (and quinones are indeed the outer fringe) isn’t going to increase anyone’s success rate much. So what to do?
I don’t have a good answer for that one. I wish I did. It’s a rare case when we can say, just by looking at its structure, that a particular compound just won’t work. I’ve been hoping that the percentages would allow us to say more than that about more compounds. But I’m really not sure that they do, at least not to the extent that we need them to, and I worry that we’re kidding ourselves when we pretend otherwise.

32 comments on “Compound Properties: Starting a Renunciation”

  1. academic says:

    I know there is tremendous bruhaha in the academic community over organs on a chip with the hope of getting access to human toxicology info on a compound cheap and easy. I am sure the large pharma folks have lots of experience and sophistication with tox models, so I will ask them, do these organs on a chip add anything new to the mix? Is a liver-on-a-chip really just hepatocytes in culture which is not new and I assume not a perfect model of liver toxicity in a whole human? and etc for all the other organs-on-a-chip?

  2. anon says:

    First of all, Derek, we do not measure things like cLogP, polar surface area, aromatic ring count, and all the others? We compute them. And we compute them in the sometimes vain hope that they are acceptable surrogates for a much more expensive experimental result. And this assumption of surrogacy needs to be tested every time. If an SAR can be established between cLogD and an observation such as toxicity, it can be used to help prioritise as long as you retest the assumption every now and then. And if you have a structural alert that predicts a particular toxicity, then that is the first thing you should test for. When these alerts were first created in the mid 1990’s, they were intended to stop us buying blemished compounds, or building small screening sets with increased probability of false positives. Now they are built so risk averse that they rival the TSA for false positive rates.

  3. OldLabRat says:

    “And yes, I know that there are drugs with all these groups in them.”
    Rules applied with thought can bias the odds of a discovery project, but they can’t replace doing the experiment. The issue isn’t that a objectionable moiety exists in known drugs, but that the rule makers like to ignore the caveats or conditions that make an epoxide or quinone acceptable in one case, but not in many others. I think the real path forward isn’t finding better rules to de-risk drug discovery, but in doing a better job designing experiments to assess the risk involved in having a quinone present in one’s molecule. Maybe discovery projects need less adherence to screening funnels and more to figuring out the right experiment at the right time.

  4. MoMo says:

    If you are dealing with enzymatic or protein based in vitro assays you may find the parameters revealing, hydrophobic pockets emerge, parameters related to electronics and receptor binding jump out and other juicy insights come into light. But thats for the academics that like fancy pictures and slides for the executives so they can tell their buddies at the country club what acronymal targets they are working on.
    But the problem as I have experienced are with the PK/PD and efficacy issues that bring those measurements and compounds into the land of the doomed. They are not operable in whole animal systems and you are just wasting your time trying to correlate molecular mechanisms into systemic therapeutic systems.
    Let’s just stop all the screening BS as everyone hates their own chemical libraries anyway and take compounds straight into animal models, put the legions of screening and hypothetical chemists into straight jackets so they don’t get in the way, and observe the results in vivo.
    Now whether the animal models reflect therapeutic reality is a different story, and we can deal with these shortcomings later by banishing bad models and forcing their scientists into early retirement.
    Think of the money that will be saved. Mice are cheap compared to computational chemists.

  5. Boghog says:

    @MoMo: An interesting provocative proposal 😉
    But really, animal models as a primary screen? This is the way things used to be done and arguable productivity then was way higher than it is now. On the other hand, animal models are not cheap either. Furthermore, just think of the reaction from animal rights activists.

  6. anonao says:

    @MoMo And how is this supposed to decrease clinical failure? Now: assay/compounds (medchem-comchem)/animal model > clinical trial and fails for tox or efficacy.
    You suggest going back to before and do medchem (I assume you need to make compounds, but a medchem with no knowledge of potential bad properties or toxic groups – free mind), then animal models (the same as above) > human and why would it not fail again?
    The problem here is not the methods you use, but the animal model, if the model can’t tell you if it is right, using it as primary assay is not going to help. Unless you go straight to human…

  7. George Kaplan says:

    Derek: are you suggesting that we should, as chemists, reassert ourselves against the (weak and post hoc) arguments made by the computational crowd? Were they not really trying to maintain some relevance in discovery as their main enterprise largely collapsed around them? (That would be the enterprise of taking data from structural biologists and “explaining” it to real chemists and then developing “computational hypotheses” for the real chemists.) I never understood what the legion of computational folk were really supposed to be doing around here …

  8. Pete says:

    Greetings from S’frica (I am in Cape Town for the week) and I’ll put my cards on the table. Some of the ways in which properties such as lipophilicity, molecular size and ‘3D-ness’ are used to ‘predict’ toxicology (and other in vivo outcomes) suggest that those self-appointed ‘experts’ who seek to lead the opinions of the Great Unwashed (I enthusiastically include myself in this latter category) have lost their way (and quite possibly also the plot).
    If we are going to use individual properties to predict outcomes then the relevant trends need to be strong if we want people to take us seriously. Although we can think in terms of correlations, I find it more helpful to think about how much variance in the outcome is explained by the property in question. If our favorite property explains 80% of the variation then we can probably define useful guidelines but if the figure is only 10% then, at best, we’d only be looking at something we could use in a multivariate model. One of the problems is that the ‘experts’ get sidetracked by significance and ignore size of effect and, with big datasets, it is possible to get eye-wateringly good P values even when the underlying trend is feeble. We introduced ‘Ro5 envy’ in our correlation inflation article and, although the term will have undoubtedly got up a few noses, we still think that the term succinctly makes the point that the Ro5 article has spawned what one might call a ‘drug discovery rules-n-guidelines business’.
    The inflation of correlation is not the only problem when trying to predict toxicology. There is also the issue of relevance and I would certainly question the relevance >30% inhibition at 10 micromolar as a threshold for activity when attempting to rationalize off target activity.
    It’s also worth remembering that the justification for defining ligand efficiency metrics (LEMs) comes from the assumption that factors such as lipophilicity and molecular size are predictive of risk. Even if the scientific basis of LEMs was secure (see our recent critique of LEMS to find out more) weak links between risk factors and in vivo outcomes would still weaken (maybe fatally) the case for using these metrics.
    I have linked ‘Data analytic sins in property-based molecular design’ as the URL for this comment. This presentation links both the correlation inflation and LEM critiques.

  9. CMCguy says:

    The toxicology unknowns are indeed complicated particularly as often can rapidly get short term data to reject based on bad acute responses but doing meaningful long term studies is non-trivial and harder to interpret.
    #4MoMo I challenge your assertion that “Mice are cheap compared to computational chemists” when one takes in to account all the costs of facility, daily operation and staffing its an expensive proposition which doesn’t include all the data generation and analysis and can not do a one mouse one chemist comparison. In fact I think your statement actually has been turned around to support value of computational chemistry where Execs told it much cheaper to apply computer design and predictions to the high cost of doing animal testing. I would surmise underemphasis of early animal models and testing probably has negatively influenced the already poor productivity of drug discovery in the last decades.

  10. LeeH says:

    You bring up a lot of good points.
    1) It’s very important to match computationally-derived descriptors to the problem. As you mention, structural descriptors are very likely much more useful for tox issues than simple physicochemical ones. So, counting aromatics is likely not useful in this instance, but it IS useful for predicting solubility, where it falls off pretty preciptitously when you go above 3. PSA alone is pretty good at predicting BBB penetration, but it’s better if you include information regarding functional groups (where an acid can kill BBB penetration even if the PSA is relatively low). Etcetera, etcetera.
    2) Models may be useless, even if they are highly predictive, depending on the therapeutic indication. As you mention, even a great tox model may be irrelevent if the indication is pancreatic cancer, where the situation is grave. On the other hand, if you get a flag for a compound in an obesity program, you’d better do your homework.
    3) The mechanism of the property you are modeling may make modeling close to impossible. Again tox is the poster child, since it is an aggregate of many effects, including off-target effects, ADME, and worst of all, reactivity.
    I’m glad you’re shining a light on this issue, since the tendency of the medchemists is to either dismiss modeling wholesale or to accept it as a religion. Both paths ignore the nuances.

  11. Hap says:

    Do drug chemists/biologists know enough about the failure or toxicity rates of specific compound structures to impute significant conclusions to compound substructures? We know lots about what quinones can do chemically, for example, and probably some about what they can do in cells and with proteins, but does anyone know how may fail and how many succeed (and in particular how many fail because of toxicity), let alone the corresponding figures for other compound classes?
    Because chemists aren’t likely to see very many candidates (and probably not so many structures, at least as a fraction of those tried), it seems like people have to make decisions based on anecdote rather than data. It seems (highly) nontrivial to get some of that data and put it together, but companies should have it somewhere. (It would probably be a good target for general collaboration, if you could scrub enough proprietary data out.) Without data, people are making guesses, educated guesses, but still guesses. If you don’t have enough money of time to try everything, then you probably need to get as much as you can out of the ones you do try, even when (particularly when) they fail.
    It seems like I’m asking for the moon, but I’m not sure what kind of data you’d need to be more certain about what structures do bad things and what ones don’t. If this kind of data is practically unobtainable, then how do people proceed?

  12. Wavefunction says:

    #7: Computational chemistry has gone beyond the structure-based approach; cheminformatics is now a very widespread part of the enterprise. A lot of people don’t appreciate that the purpose of the field is not just to design drugs at the molecular level but also to contribute in “smaller” but still important ways like assigning targets to phenotypic screening hits, designing libraries around known scaffolds etc. The definition of the field has changed quite a bit in the last two decades.
    In any case, all these attempts to filter and predict tox are basically the result of our ignorance of the complexities of biological systems. Neither computational nor any other kinds of chemists has a good handle on it yet.

  13. Cellbio says:

    I would offer that rather than try to use animal models as the empirical screen that we drive better and more thorough cell screening. When I started in small molecule drug discovery, I saw filters applied at every step, with few compounds coming to valuable assays, such as primary cells relevant for the pharmacology. With a small sample size of similarly filtered compounds measured in only one facet, the intended biology, the pharmacology looked simple and coherent.
    However, take a large set of compounds, say around 3K, that have similar potency in a biochemical screen and run them through a panel of diverse cell biology screens and you see that pharmacological diversity is inherent otherwise similarly defined compounds, some with quite similar structures. That diversity is not necessarily shared with the small filtered set.
    I learned to never think that I was advancing an inhibitor of an enzyme, but a compound that had that feature as one of its properties. That feature provides a connection to an indication and provides justification for further investment. However, as said above, probing our ignorance of biological complexities with a compound has little to do with our intentions and is almost wholly empirical. Hard concept to sell, as people love to feel in control, and data help to that end, even if the correlations drawn are meaningless.

  14. John Wayne says:

    “I learned to never think that I was advancing an inhibitor of an enzyme, but a compound that had that feature as one of its properties.” – Cellbio
    Bravo. Thank you for articulating something I’ve been moving towards in my mind for years.

  15. Jonas says:

    I am a computational (medicinal) chemist I agree with you MoMo. Simple rules and (single) target-based drug discovery easily creates an illusion of controll. Having said that, in vivo testing requires that the math and stats are done properly.
    I have written a blogpost about it. Far from Dereks calibre, needless to say, but it might entertain some of you – check my handle in case you’d like to read it.

  16. John Profumo says:

    Has anyone ever read the business management book “The Goal”? Its one of the few I’ve ever read, and it has lessons for us in the drug discovery world. We dont make widgets, but we certainly do have our share of ‘bottleneck’ issues (more like ‘eye of the needle’ issues) like the protagonist of that book has. One of the lessons from that book was the caution of an “over-reliance of surrogate measures of success”. One of the simplest ways for me to think about our challenges is always to remember what the goal is. The goal is to develop a medicine that can be tested in humans. So instead of using surrogates to kill projects, (we are way to crazy infatuated with “fail fast” at the expense of actually doing our jobs of delivering therapeutics that can be tested in humans), we should look at what the specific problems are in our specific projects and address them with killer experiments or tests. Otherwise, people don’t make compounds like Ledipasvir (above link) and drive them to success. Don’t be like me…Don’t mess with models (and surrogates etc).

  17. Christine Keeler says:

    Dearest John, I couldn’t agree more. Irrelevant metrics represent a soothing alternative to unachievable goals and surely compound quality is the Miss Congeniality in the molecular beauty contest.

  18. Cellbio says:

    Nice to get a Bravo from John Wayne! Next up, an all right all right all right from Mathew!
    #17 and #18- I agree and it is a real problem. As discussed much in comments on this blog, fail fast or other faddish trends reset the ultimate goal to intermediate metrics that created incentive to predict failure or to ‘improve’ endlessly before definitive measures, typically clinical trials. This is however, much more of a problem in big companies than in start-ups, where the drive to develop the compound in hand often leads to less being measured than more, sometimes much less than would be helpful. However, I like this environment much better. It does come with moments when one’s personal comfort zone gets challenged. I get why highly compensated execs at big companies grow risk adverse. It is because they can! The decisions are often not now or never, but now or maybe later when the proposition is more clear. The bill comes due one day, with research FTEs paying the debt, but at least until that day, everyone in charge has played by the rules and met the metrics that pays out their bonus.

  19. Scott Boyer says:

    Derek: Thanks for bringing this study up for discussion. As one of the authors of ‘that recent AstraZeneca paper’, I agree with the title of your previous blogpost ‘You can’t calculate your way out of this one’.
    In fact, I may walk around chanting this mantra.
    As a toxicologist and the head of computational toxicology at AZ for well over a decade, I appreciated weapons-grade machine learning as much as the next guy, but mostly I tried to structure and present actual, relevant data to med chemists and expected that they think about them and act on them within the limitations of their project. The message of the MedChemComm paper was more-or-less the same: you as med chemist have more freedom than you might think, but with that freedom comes the responsibility to listen when data are presented and to think hard about what the data mean.
    There are good toxicologists willing to talk to you, but no one should be fooled into thinking that this is simple. Equally foolish would be to assume that it is impossibly complicated.
    And yes, you probably are too conservative. However, even I might balk at a quinone – but let’s talk about your dose and the substitutions you were thinking of……nothing is impossible.

  20. Anonymous says:

    Is it possible we have partially lost the ability to objectively assess the merits of a project, as ‘positive’ results have become entwined with career success?
    We shouldn’t be relentlessly searching for data to undermine a project, but we should happily play devil’s advocate to our own data and cordially accept and encourage re-examination of the validity of, for example, assays and other model tools. It should be integral to our approach to search for robustness in data, but if that search is likely to cost me my bonus due to project cancellation?
    Let’s wait until next quarter eh?

  21. Anonymous says:

    Too many people waste time and money running these tests just to build up a nice sales pitch, when they already know they will give a positive impression of their compound, instead of adding value by running killer experiments to remove real uncertainty. Value is created by addressing the biggest risks early and head on, not by avoiding risk or by pushing risk into the future where it becomes more expensive. So stop faffing around, and focus only on the killer experiments rather than delaying the inevitable.

  22. I think as things progress further more structured oriented approach will be implemented in manufacturing toxicology drugs, see the case of ebola now researchers are saying it would take nearly 2 years to come across a drug which can cure Ebola

  23. Chris says:

    Think of the computational results as flags, if the model predicts significant IKr activity then test it.
    If the calculated physiochemical properties suggest poor solubility, CYP inhibition, and rapid metabolism then there are simple in vitro test to check.

  24. @12 Dr Snowboard says:

    Let us not forget this beauty!

  25. model-tastic says:

    Having spent a good bit – namely all of my life building models I breathe a sigh when the focus is on one or two properties. OK people lets move on.. more pressing #23 Ebola see my handle for the pubmed link to a paper from 2013 (this is one of a couple by different groups screening FDA drugs – and not mine) might be surprised..I do not think these HTS hits are quinones!

  26. Douglas Kell says:

    The issue with PK is subtler and more insidious. Gross PK can hide HUGE differences in distributions between individual cells in an organ/tissue, and this can underpin both lack of efficacy and tox – the two main causes of attrition:

  27. DrSnowboard says:

    @25 meh, I discount that in that it started from a natural product. The Gilead cpd traces itself back throug ha lot of work ( to the original BMS observations. Medchem.

  28. Hood Monkey says:

    It’s not the absolute value of a calculated property that’s important but the trend. Here are strategies that have worked many times in med chem:
    high clearance: lower clogP
    low permeability: lower TPSA, remove HBD
    low solubility: lower clogP, remove AromRing
    high hERG: lower clogP, reduce basicity
    The art of med chem is to be able do these things to fix PK/safety issues while maintaining potency and not breaking something else.

  29. simpl says:

    Surely, chemistry is all about models? Making, refining and simplifying them, and putting them in their place. What about electronegativity,functional groups and mechanistics?
    My model for simplfying parameters is L.Thurstone’s work on intelligence tests. Once they worked out that each question added a new parameter, they ran statistics to define the main thrusts and reduced an n-dimensional set of vectors to a handful of main constructs.

  30. samadamsthedog says:

    I agree that it is more reasonable to attempt to approach PK than toxicity using Lipinski-type properties; but when you say “you should just dose the compound and find out”, you miss the point that what easily computed properties are good for, if anything, is helping you decide what to make in the first place, if you have thousands of possibilities to choose from. As a project progresses, real assays take over, so easily computed properties, and computation in general, has to be judged on whether it helps you get to that point.
    Rules of 5 and the like are rules of thumb, and I am surprised if the overwhelming majority of practitioners don’t treat them that way.

  31. Recovering Mathematician says:

    I know it’s pedantry but for future blog posts it’s ClogP. You don’t capitalise the l in log.

Comments are closed.