Skip to Content

Why You Don’t Want to Make Death-Star-Sized Drugs

I was just talking about greasy compounds the other day, and reasons to avoid them. Right on cue, there’s a review article in Expert Opinion in Drug Discovery on lipophilicity. It has some nice data in it, and I wanted to share a bit of it here. It’s worth noting that you can make your compounds too polar, as well as too greasy. Check these out – the med-chem readers will find them interesting, and who knows, others might, too:
MW350 graph jpeg
MW500 graph jpeg
So, what are these graphs? They show how well compound cross the membranes of Caco-2 cells, a standard assay for permeability. These cells (derived from human colon tissue) have various active-transport pumps going (in both directions), and you can grow them in a monolayer, expose one side to a solution of drug substance, and see how much compound appears on the other side and how quickly. (Of course, good old passive diffusion is also operating, too – a lot of compounds cross membranes by just soaked on through them).
Now, I have problems with extrapolating Caco-2 data too vigorously to the real world – if you have five drug candidates from the same series and want to rank order them, I’d suggest getting real animal data rather than rely on the cell assay. The array of active transport systems (and their intrinsic activity) may well not match up closely enough to help you – as usual, cultured cell lines don’t necessarily match reality. But as a broad measure of whether a large set of compounds has a reasonable chance of getting through cell membranes, the assay’s not so bad.
First, we have a bunch of compounds with molecular weights between 350 and 400 (a very desirable space to occupy). The Y axis is the partitioning between the two sides of the cells, and X axis is LogD, a standard measure of compound greasiness. That thin blue line is the cutoff for 100 nanomoles/sec of compound transport, so the green compounds above it travel across the membrane well, and the red ones below it don’t cross so readily. You’ll note that as you go to the left (more and more polar, as measured by logD), the proportion of green compounds gets smaller and smaller. They’re rather hang out in the water than dive through any cell membranes, thanks.
So if you want a 50% chance of hitting that 100 nm/sec transport level, then you don’t want to go much more polar than a LogD of 2. But that’s for compounds in the 350-400 weight range – how about the big heavyweights? Those are shown in the second graph, for compounds greater than 500. Note that the distribution has scrunched disturbingly. Now almost everything is lousy, and if you want that 50% chance of good penetration, you’re going to have to get up to a logD of at least 4.5.
That’s not too good, because you’re always fighting a two-front war here. If you make your compounds that greasy (or more) to try to improve their membrane-crossing behavior, you’re opening yourself up (as I said the other day) to more metabolic clearance and more nonspecific tox, as your sticky compounds glop onto all sorts of things in vivo. (They’ll be fun to formulate, too). Meanwhile, if you dip down too far into that really-polar left-hand side, crossing your fingers for membrane crossing, you can slide into the land of renal clearance, as the kidneys vacuum out your water-soluble wonder drug and give your customers very expensive urine.
But in general, you have more room to maneuver in the lower molecular weight range. The humungous compounds tend to not get through membranes at reasonable LogD values. And if you try to fix that by moving to higher LogD, they tend to get chewed up or do unexpectedly nasty things in tox. Stay low and stay happy.

24 comments on “Why You Don’t Want to Make Death-Star-Sized Drugs”

  1. Blogging says:

    What is new about this? Old concepts, just repackaged by someone who wants to write another paper on the subject. In general, keep Mol. Wt low, not too greasy, not too polar—“just right”. Anyone in this field knows that data from Caco cells should only be used as a rough guide, that data needs to be followed up with in vivo absorption results if one wants an oral, systemically available drug candidate since these cells do not tell the whole story.
    One can get too caught up in these types of rules if they are implemented without any thought about the target, medical treatment & need, supporting biology etc. It’s important to also say that there are drugs on the market that help a lot of people’s medical conditions and make a profit for their companies which do not comform to these rules—cyclosporin, rapamycin are two that immediately come to mind. Does anyone propose to throw these out when doing SAR around biology?

  2. goldilocks says:

    Waring published this exact same paper in BMCL about a year ago.
    Defining optimum lipophilicity and molecular weight ranges for drug candidates—Molecular weight dependent lower log D limits based on permeability
    Bioorganic & Medicinal Chemistry Letters, Volume 19, Issue 10, 15 May 2009, Pages 2844-2851

  3. Hap says:

    1) People believe lots of things that aren’t true – going on belief without data isn’t helpful. Having data, even if it says what people thought before, is helpful because it gives an idea whether what you believe is true in general or just in your experience.
    2) “The race goes not always to the swift, nor the fight to the strong, but that’s the way to bet.”

  4. I’m trying to look at this with a 3-D perspective – stack the plots on top of each other so that MW is on the z-axis coming out of the paper (or monitor or IPhone or…). At higher molecular weights, the green region will get pinched out of existence, but the same is also going to happen at zero molecular weight. So what is the MW that given the largest green region? It’s obviously somewhere below 500 and above 0 (unless a previously unknown homeopathic-type principle exists! Why stop at diluting the therapeutics to infinitely low concentration, cut down their MW to that threshold too!)

  5. SPRITY says:

    The biggest problem in drug discovery at this moment is that every company is trying to cut cost. As a result, someone would come up with a simple experiment or a mathematical formula trying to predict a complex biological process for any molecule in the universe. If one compares the experimental logP values with calculated logP values for molecules with reasonable complexity (MW >400) for a diverse set of compounds, the error (frequently >2 units) would convince any decent chemist not to use clogP for any prediction across different structure classes. One has to be very careful when interpreting Caco-2 data. When the assay pH is different from the molecule pKa by >2 units, the Caco-2 data always indicate poor permeability because >99% of molecules are ionized at the assay pH. ClogP and Caco-2 data are useful in developing SAR within a single structure series. However, just like any QSAR parameters, they should not be used to make prediction across different structure classes.

  6. anon the II says:

    Why is it that almost all of these “insights into drug discovery” articles are written by Brits? Not all, but a disproportionate preponderance.
    Just curious.

  7. CMCguy says:

    As already mentioned Caco-2 (& logP) data needs to be interpreted in the proper context which is true with most of the information medchem/discovery people (and science in general) use to make decisions. As echoed many times it usually takes those with substantial experience that remember the basic principles and can connect up bits and pieces and add in other disparate data then balance out the factors to come up with a good potential drug. Of course purges often eliminate that pool of knowledge, as too expensive, which bodes poorly for future prospects. I think part of the “lack of productivity”, particularly in Big Pharma is that things like Caco-2 become entrenched gating functions, established by R&D managers who have forgotten how science works, so that people after a time give up fighting such positions and then potential good compounds get killed for illegitimate reasons. Makes me think even stuff like the Calorimetry has danger in being a fad or over interpreted and should be a useful tool and not the sought after panacea to new drugs.

  8. Excuse me, what was the Y axis again, in smaller words please?
    Those graphs look rather fishy to a non-chemist such as me. Why are there broad rivers of no plotted points surrounding the blue lines? Do the compounds somehow know that hitting exactly 100 nm/s is not allowed but, a bit above or below is fine? Or were results that would fall too close to the line arbitrarily excluded from the experiment? Why?
    And how come the division between read and green is so consistent? Shouldn’t there be a few green dots in the predominantly red area (compounds that cross the membrane quicker than expected for their greasiness; perhaps some transport protein happened to fancy them), or a few red dots in the green area? At the very least, with the indicated large difference between “350-400” and “>500”, the true dividing line between read and green for 350 ought to be visibly offset from the 400 one, and between those two extremes we should see a mixture of red and green on the plot.
    Or were the plots created from scratch starting from axiomatic blue lines, with randomly generated “data points” then colored red or green according to the already-given answer provided by the blue lines?
    The answer to all of this is probably behind the paywall…

  9. Morten G says:

    This reminds me of a wonderful article on why arbitrary divisions, like over/under 100nM/s, are terrible science. I think it was using MW and rotatable bonds as examples. Can’t remember where I saw it.

  10. Dr Fizzchem says:

    Agree with @1. Hardly startling – little more than the hard-earned wisdom of yesteryear; repackaged to bolster CV’s and placate demands that ‘something be done about attrition’.
    Most of the guys who knew this stuff inside-out have been let go, leaving a generation of cut-price replacements to learn it all again from scratch. They’ll find it useful (but so will THEIR cheaper counterparts). Tragic.

  11. Yggdrasil says:

    I agree with #8, I don’t really understand what’s being plotted on the Y-axis, how the data is being displayed, and what the blue line is supposed to represent. If the Y-axis is representing partitioning (related to an equilibrium constant), how are the authors getting information about the kinetics of transport? If this graph is supposed to be some type of scatter plot, it clearly shows no correlation between the Y-axis (whatever that is) and polarity. Unfortunately, my institution does not have a subscription to the journal, so I can’t see the article.
    It’s also important to note that the size of the compound will affect its diffusion constant independent of its ability to interact with transport proteins in cells. What would be interesting would be to compare some measure of a drug’s transport through these cells versus molecular weight and how this curve differs from the expected change in the drug’s diffusion constant. Analyzing the data this way would be a better way of addressing Derek’s points.

  12. milkshake says:

    After several jobs at pharma I do not trust Caco dataat all – the assay is way too artificial to have a predictive power. I would sooner take a cassette dosing-PK results from three rats than Caco plus microsomal stability. These assays simply flag as “problematic” too many good compounds. I understand that medchem projects have to narrow down the numbers of compounds that go into animals but caco and microsomal stability data are in my opinion less than worthless for the purpose. I would still use microsomes to study oxidative metabolism with LC/MS but nothing more.

  13. TFox says:

    Here’s the caption: “The change in probability of achieving high permeability (Caco-2 Papp > 100 nm/s) with logD for molecular weight bands between 350 and 500. Permeable compounds are coloured green, non-permeable are red, the data points are drawn at their logD value in the x direction, and in y position they are jittered randomly within the range corresponding to their permeability category. The blue line indicates the probability of high permeability as a function of logD.” If I’m reading that right, the color and the Y position both mean the exact same thing: did the compound score as high permeability or not, on a Caco-2 assay, and the exact Y position is meaningless. Me, I’d just plot the fraction with high Papp, as opposed to plotting zillions of dots in random positions, with maybe another graph for number of compounds, but then the graph is dull to look at.

  14. RM says:

    Just posting that I, too, hold the same confusions as @8 and @11.
    I’m not so confused by the Y-axis, *except* that I would have expected the rate of transport (purportedly the blue line) to be measured by the partitioning – the fact that it’s not means that my appraisal of the situation is flawed. And I certainly can’t see why cutoff for the rate of transport cutoff (blue line) should vary as LogD does.
    I’d bank on the coloring to be an above-the-line/below-the-line style choice (akin to Nicolaou’s rings), but that still doesn’t explain the no-man’s-land near it (unless those points were omitted “for clarity”, which is silly, as the whole red/green issue should make things clear enough.)

  15. Thanks for the caption, TFox.
    The “y position(s) … are jittered randomly within the range corresponding to their permeability category” does not impress me as a shining beacon of intellectual honesty — it makes the graph looks like it is more information dense that it actually it.
    It also raises the question of where the blue delimiter lines come from in the first place. Certainly not directly from the data being plotted: In the “>500” graph there is not a single green point to the left of logD=1, but so many red ones that it seems statistically unlikely that the true probability of redness for logD=0 could be as low as the plotted 95% (I enlarged the graph and counted pixels) — unless the plotted sample is not representative along the x axis either.
    It may look even worse at the right-hand edge of the same graph, where the 4

  16. Hmm, my second-to-last paragraph got chopped off; apparently we cannot write less-than signs here. It should have been:
    “It may look even worse at the right-hand edge of the same graph, where the “4 less-than logD less-than 5″ interval has about 7 green points and 50 red ones, yet the blue line claims that the probability of redness here should be at most 60%,”

  17. schinderhannes says:

    no time to read the full paper, but doesn´t the perfect separation of red and green dots and the space around the blue line look fishy to anybody else?
    Or is there a simple explanation for this artifact?
    Just courious…

  18. Lillywhite Chemist says:

    Yes! The data was binned in spotfire to make nice looking graphs that highlight the point you’re trying to make, not drown people in the boring details which (as Milkshake points out adroitly as usual) might make you misread the message by worrying about the data.

  19. I like LogD and are wondering how many people are measuring it in a routine setup? And saying that more “polar” compounds have a smaller logD is quite an oversimplification, isn’t it?

  20. Hap says:

    #16: Less-than and greater-than signs are brackets for HTML coding – thus if you use them, the site expects the contents to be a command, and ignores everything afterwards. &lt (the ampersand and the letters lt, with no spaces) will give you a less-than sign, and &gt (the ampersand and the letters gt, with no spaces) will give a greater-than sign that won’t cut your comment off.

  21. MedChem says:

    Do we need a study to know this??!!! No wonder our productivity is low.

  22. Log D metrics will save the industry says:

    – I get to see both sides of the fence (development and discovery) as I sit on both teams. Its often shocking that the development folks, and I’d assume senior management in a lot of areas, seem to think med chemists don’t understand the importance of Log D, MW, Sol etc., all they care about is potency… but that just plain wrong/crazy. The med chemists I work with understand these principles very well, and this is old news. You can spend all day trying to shove metrics/rules down their throat, but its just not helpful.
    – Seems to me the problem is that it is very hard to make a compound that is potent (sticks to the target) and selective (doesn’t stick to any of the other thousands of proteins in the body and cause tox). If you’re compounds greasy and potent, but not selective this can lead to viscious cycle of making a bigger and bigger molecules to try to avoid hitting off target.
    – However if you’re aggressive and try to get some polarity/ionization into your core scaffold early it will pay off.
    – Building in good in vitro assays which account for free fraction/serum shift, and forcing yourself to conduct some sort of early in vivo screen at a reasonable dose should help you avoid getting boxed into a greasy scaffold.
    – Conducting a couple early off target in vitro assays comparing the polar vs. greasy molecules should help show that management folks that you’re making progress on ‘selectivity’ early (greasy molecules tend to hit on a lot of targets), and this should help you justify your approach even if it takes a little longer to design in the potency you need (assuming the management folks are reasonable).
    – Lastly all that said, the ‘winner’ companies will be the ones who can both find and deliver acceptable molecules that require large sizes and high log Ds… ‘600 is the new 500’

  23. RenegadeSci says:

    “Meanwhile, if you dip down too far into that really-polar left-hand side, crossing your fingers for membrane crossing, you can slide into the land of renal clearance, as the kidneys vacuum out your water-soluble wonder drug and give your customers very expensive urine.”
    lol, I’m learning a lot from these mini-talks. keep it up.

  24. LeeH says:

    The binning is done so that you actually get a sense of the density of compounds in a given bin. Otherwise, you’d just see an array of evenly spaced points, with no information whatsoever.
    These plots are doing a bit of lying with statistics. They exaggerate the differences between 2 MW classes. But the omit the biggest class, namely MW 400-500, where things are probably somewhat more boring.
    These plots illustrate why drugs tend to have a distribution centered around a logP of about 2.5. Much below it, and cellular permeability goes to crap. At higher logPs (data not shown), solubility tends to get bad, along with the often mentioned tendencies to be chewed up by things looking for lipophilic molecules. Not to mention serum binding, although people don’t get bent out of shape about this as much as they used to.

Comments are closed.