Skip to Content

Software Eats the World, But Biology Eats It

I can strongly recommend this Bruce Booth post at LifeSciVC on computational models in drug discovery. He’s referencing Marc Andreessen’s famous “Why Software Is Eating the World” essay when he titles his “Four Decades of Hacking Biotech and Yet Biology Still Consumes Everything”. To tell you about where Bruce is coming from, I can do no better than to quote an article just as he does. Here we go:

Drug companies know they simply cannot be without these computer techniques. They make drug design more rational. How? By helping scientists learn what is necessary, on the molecular level, to cure the body, then enabling them to tailor-make a drug to do the job… This whole approach is helping us avoid the blind alleys before we even step into the lab…  Pharmaceutical firms are familiar with those alleys. Out of every 8,000 compounds the companies screen for medicinal use, only one reaches the market. The computer should help lower those odds … This means that chemists will not be tied up for weeks, sometimes months, painstakingly assembling test drugs that a computer could show to have little chance of working. The potential saving to the pharmaceutical industry: millions of dollars and thousands of man-hours.

This quote is from a Discover magazine article from 1981. And there you have it. Over thirty-five years later, and this promise still hasn’t really been fulfilled. As pointed out in his post, Bruce is no stick-in-the-mud about new technology, and I’m sure that he would absolutely love to realize the savings in time and money that robust computational modeling would provide. But it isn’t there yet. We have useful techniques, we have techniques that can help, but we’re never sure which projects are going to benefit and which techniques might be best to use. I have had the same experience that he mentions – virtually every project I’ve been on in my career has had a computational contribution. But it’s the biology that comes along and overrules everything.

It’s worth highlighting a few examples about biology defeating or obstructing CADD-inspired discovery, though the list of programs could be very very long. T-cell kinase ZAP70 has been attacked by CADD since mid 1990s (here), and yet there are no approved drugs against it. MAPK/p38 is another well-trodden CADD target: dozens of publications out there about CADD success stories against p38 with new and improved binders and the like; yet, clinical development is a veritable graveyard for these programs, as figuring out the safe and effective biology of these projects remains a challenge. Or take renin inhibition – after years of great CADD-enabled discovery, the first program got approved but only to find out in subsequent Phase III that drug development wasn’t kind (see #16 in the FDA’s recent roster of failures).

I also agree very heartily with the recommendations he makes at the end of the post. It’s crucial that the computational folks be integrated as much as possible with the chemists and biologists. This is a terrible place for the “throw it over the wall” procedure; the modelers need to be speaking with the drug discovery team at all times. As tempting as it might be, they also need to be very careful about ruling things out, as opposed to recommending ideas that might work better. We don’t have the horsepower we need, in most cases, to step in and say with confidence that “You shouldn’t work on these compounds at all – our model says that they won’t work”. (And when there is that level of confidence, it’s often something that you didn’t need computation to know, frankly). Always check, if it’s at all possible. Make some compounds – if they really don’t work, it’ll give you more confidence in the model, if nothing else, and if they do, you’ve learned something valuable and interesting.

Read the whole thing; there’s a lot more than I’ve mentioned here. As Bruce takes pains to do, I want to emphasize that I’m not bashing the modelers, either. I’ve worked with a lot of good ones, and the ones who have been best at their job have also, without fail, been the best at not overpromising and realizing what they have the best chance of delivering for the project. And it’s important to remember that computational hardware, software, and techniques are getting better all the time. They’re not getting better as quickly as people thought they would in the 1980s, and they’re not yet where we want to them to be in an ideal world, but the field is advancing every year. Don’t turn your back on it, but don’t fall for the hype that some folks will want to sell you, either. The Silicon Valley types are particularly vulnerable, because they don’t know what biology is like (or how little we understand), and they can be particularly eager to sell you things, too, so be especially wary from that direction.

26 comments on “Software Eats the World, But Biology Eats It”

  1. anoano says:

    To be fair, biology eats traditional med-chem as well, with and without CADD.
    But if a project is heavy on CADD and biology turns it down, blame on modeler; if a traditional med-chem driven project is killed by biology, then the blame is on whoever decided to go for that project in the 1st place!
    (easier to blame the guy who predicted activity than blame our med-chem expert brain power).

    But the article is good also to point the over hype of new software companies. Which is not recent and applicable to any pharma related vendors, one may remember how vendors were selling HTS libraries, combi-chem, … when they were out

  2. Dominic Ryan says:

    I recall a Gordon Conference where the ‘rule of 5’ was discussed, We are all familiar with that but for modelers and comp chem generally Mick Laginess pointed out that there is also the less well known ‘rule of 1’.

    That is the number of times they get to be wrong. Mick might not be the original source of that but he presented it well. Of course the problem with that statement is that most project teams ‘get it wrong’ most of the time.

    That sentiment is a byproduct of the over promise hype heyday. I think it is much less true today but it is worth remembering that the chemist at the bench is the one putting time and sweat into making it. This underscores the importance of being a true team, jointly engaged in exploration and problem solving.

  3. Polynices says:

    How often do modelers go back and look at new
    drugs found through other methods to see if their tools would have recommended or predicted those drugs? I’m guessing not often enough and that it wouldn’t make the models look too good but maybe I’m too cynical?

    1. M says:

      Well, the standard evaluation for any in silico method (either in-house or in the literature) is basically to go back and show you could find a known drug, so the answer is modelers evaluate all the time and it shows the methods work really well. Of course retrospective analyses are of limited value.

      Since modeling is an area of expertise and not just “an algorithm” actually figuring out how well a modeler would have contributed to a completed successful program is pretty hard. Obviously the potential for fooling yourself and thinking you’d have applied the right technique and interpreted the results in the right way is strong, however self-aware and honest you’d try to be. And of course predicting “would I have been able to convince the chemists to make the compounds I was suggesting?” is another issue as well. Not that it’s not a good exercise.

      Typing this up I did start wondering how many medicinal chemists go back after a project and look at the computational suggestions they did not act on and try to assess if they could have gotten somewhere faster. Though again, probably far too many parameters to really draw a solid conclusion in the real world.

    2. MM says:

      I am a molecular modeler, and I do that systematically on the projects I am working on.
      It is absolutely essential to learn experience this way. In molecular modeling, devil really is in the details and there are always a lot of details to take care of.

      But guess what? Most of the time I do not tell anyone.
      There is absolutely NO incentive for a molecular modeler for doing that openly.

      This basic work has to be done silently when you are dealing with people that believe any scientific result should be written in stone (or else, the researcher done a bad job), or worse, prone to confirmation bias, which can be particularly severe when a computer screen is seen as a super microscope for biochemical processes (biologists are more much more afflicted by that than the modelers themselves, I swear)… And unfortunately, this happens most of the time.

      Med chem project schedules should be organized so that there is room for back-and-forth experiments between the lab people and the computer people. Having both working closely together at the early stages is the only efficient way to avoid a lot of pitfalls. This also assumes that you never get optimal predictions in a single pass, and you need at least one year to figure out this way how to optimize the computational experiments to the target (and this is being quite optimistic already). Unfortunately, I found this kind of strategy to be very rare in academic research. It requires time, sufficient funding, and people without strong egos.

      Academics involved in theoretical/computational chemistry always know well that they will get grants more easily if they get in “Silicon Valley” mode (after beating pro go players next step for AI is to find a universal cure for cancer for sure! Modern data mining will at last enable the genomics revolution! and so on…) rather than telling the truth (which starts like “you know, if I get to work with crap biological hypotheses and crappier experimental data, there is not much to expect from me… will never turn lead into gold”).

      Industry or academia, same thing. If you must deal with that investor with no experience in the hard realities of biology… how is the situation different from the academia when you encounter the same kind in the funding agencies… Most of your potential piggy banks do not want to know how difficult / uncertain things really are. Sometimes the project leader is like this too. Tell the truth, blame on you. Quoting “If we knew what we were doing, this would not be called research” could get you fired / sink all your chances when you are auditioned for tenure. And I am not being too cynical.

  4. Magrinho says:

    I believe that the “right or wrong” is rarely the most useful way to evaluate the value of computational modeling.

    The best modelers and the best interactions I’ve had with modelers resulted in the project team thinking differently about the problem at hand and generating hypotheses that we would never have generated without modeling.

    It is very difficult to translate that interaction into a tangible “cost/benefit” analysis but we need to try.

  5. Curious Wavefunction says:

    Everyone in an organization – including the modelers themselves – needs to have and communicate a realistic view of what modeling can and cannot do. It’s almost never about suggesting *the* compounds that would have better PK or affinity. A lot of modeling is about narrowing down the set of ideas to be tested (thus eliminating undesirable ideas: often an under-appreciated aspect), to steer the team into areas of chemical space they might not have gone into, to make non-intuitive suggestions like cyclization or scaffold hops, and to zero in on particular chemotypes or properties through something as simple as correlation analysis or basic QSAR. At the very minimum, modeling should be able to generate easily testable and falsifiable ideas and hypotheses. The problem with all this is that modelers’ ideas can get diluted or forgotten in the general flow of ideas, and modelers often have to struggle to get their names on patents because while you can put an easily made compound in a bottle and assign it to a particular synthetic chemist, you cannot put even an all-pervasive idea in a bottle. Again, the issue is one of realistic integration of modeling and expectations from it into an organization, a goal that cannot be achieved without everyone’s active participation.

  6. Earl Boebert says:

    Useful quotes:

    “…a mathematical model of the growing embryo will be described. This model will be a simplification and an idealisation, and consequently a falsification. It is to be hoped that the features retained for discussion are those of greatest importance in the present state of knowledge.”

    —Alan Turing, 1952.

    “All models are wrong, but some are useful.”

    —George E. P. Box, 1979

    “The purpose of computing is insight, not numbers.”

    —Richard Hamming, 1961

    Turing’s “It is to be hoped” says it all.

  7. mallam says:

    Very simply stated: biology is messy. Algorithms don’t like mesiness.

  8. Isidore says:

    I remember a quote from a talk by Charles Weissmann, originally of Geneva, Switzerland and now at Scripps Florida, who cloned gamma-interferon and was a founder of Biogen: “Model systems are like model students, they do exactly what you want them to do.”

  9. DCRogers says:

    Or, “If you torture the data long enough, it will confess to anything” – Darrell Huff

    The temptation to p-hack is strong, as the frequent attempts to rescue clinical trials by identifying patient subsets that “would have worked” shows. And it can be invisibly done: if I darken your door with a dataset of 4 variables with great predictive statistics, you have no way of knowing how many variables I discarded to get to this illusory Nirvana.

    Anyhow, I find a lot of this moot, as “computational methods” more often means something more like IT these days, as projects need to manage voluminous data that would be impossible to process for individual med chemists.

    At best, computational methods are like steam-shovels, which can move mountains, compared to an individual, who can only scoop by the shovelful. But you still gotta be digging in the right place – something neither computers or humans have in hand, yet.

  10. John Wayne says:

    Great quotes guys!

  11. LeeH says:

    The recurring theme here is “things we keep doing over and over”.

    The money guys keep hoping that the computational methods will greatly reduce the need for work at the bench. It won’t, but it’s likely to give you an incremental savings. The guys at the bench keep hoping that the computational methods will grant them new insights. It happens, but it’s rare. Good models are often too complex to be understood by mere mortals (just like the property you’re trying to predict). And finally, the modelers keep hoping that the suggestions from their models will be tried long enough for their modest predictive power to be somehow measurable (before losing the attention of the medchemists, as Dominic reminds us of Mic’s “one strike you’re out” rule).

    And of course everyone keeps forgetting that we have this conversation over and over.

  12. Anonymous says:

    In reply to the April 13 2017 “A Few Days Off” topic, dearieme posted about “Nonexistent Compounds” and Anonymous replied to that with, “Anyone in med chem has probably been asked to make plenty of non-existent and impossible to make compounds by their biologist, computational and other colleagues (and bosses). “We need you to put a hydrogen bond acceptor over here.” “Can you keep the aromaticity but put a methyl group over here?” And you would all recognize the sometimes paradoxical, self-immolative, oxymoronic aspects of many such requests.”

    Some of the computational suggestions were very challenging and could be a lot of fun to think about. When you don’t have 2 years devote to such a suggestion, it goes from being challenging to being stupid and gets dismissed.

    On balance, I’ve made suggestions to computational colleagues such as, “Can’t you just calculate it?” and they tell me why that (those) problems are not amenable to solution with current tools.

    And I have asked about (suggested) ideas to bio counterparts who often point out a nasty facts that undermine them. You have to know the whole textbook and not just Chapter 7 to assess the quality of this stuff.

    With a good set of colleagues, “mutual stupidity” is not a problem and we have fun and learn a lot from each other that way.

    Except for one time when I asked the computer guys to put a “joke of the day” program on the computer. A few days later, I got another crazy-funny chemistry suggestion. I thanked them for installing the joke program and they got angry at me. They hadn’t installed it yet. 🙂

  13. Anonymous Computational Biologist says:

    Exercise for the reader:

    Reread this entire comment thread and ask, “How is climate modeling different?”

    (I’m not a “denier,” but I am a computational biologist, and I wonder why it feels so transgressive to ask that question, even though I am confident that the average biopharma computational model is more robust and certainly more verifiable than the average climate model.)

    1. Earl Boebert says:

      I agree. I think the problem of climate change is more usefully characterized as an exercise in risk management than a statement of settled, reproducible science. And I must disagree with the assertion that any model is “verifiable.” I’m with Popper on this: models, like mathematical proofs, can never by verified, they can only be refuted. Absence of a refutation does not imply the model is valid; there is always a (perhaps vanishingly small) probability that a refutation exists.

      The potential for a lurking refutation increases if the model undergoes modification. In the systems verification game there is a concept called “operational assurance” which is the confidence gained because your system hasn’t killed anyone yet. 25 years of operation assurance goes to zero if you make one change.

      As other comments (and Turing) have noted, the value of a model is as a tool to place structure on an inquiry. Despite what various marketers would like you to believe, they are not like a Batcomputer that chews on an input, rings a bell, and spits out a card that reads “The Penguin’s hideout is at First and Main.”

      1. Anonymous Computational Biologist says:

        Thanks for the follow-up comment. I agree with the Popperian view, and I thank you for raising it.

        1. Morten G says:

          The model is from 1896 and the data collected since then mainly fits with it: https://en.wikipedia.org/wiki/Svante_Arrhenius#Greenhouse_effect

          In addition to a warming effect there is also the question of ocean acidification. Are coral reefs dying because of increased ocean temperature or lowered ocean pH? When forests were dying due to acid rain humanity was keen to change. Why not for underwater forests?

  14. Nitrosonium says:

    1 in 8000 screened makes it to market?

    Don’t we wish!

    1. Bagger Vance says:

      Geez, that was pre-combichem days too IIRC. All those compounds made by hand…

  15. loupgarous says:

    Derek: “I also agree very heartily with the recommendations he makes at the end of the post. It’s crucial that the computational folks be integrated as much as possible with the chemists and biologists. This is a terrible place for the “throw it over the wall” procedure; the modelers need to be speaking with the drug discovery team at all times. As tempting as it might be, they also need to be very careful about ruling things out, as opposed to recommending ideas that might work better. We don’t have the horsepower we need, in most cases, to step in and say with confidence that “You shouldn’t work on these compounds at all – our model says that they won’t work”. (And when there is that level of confidence, it’s often something that you didn’t need computation to know, frankly). Always check, if it’s at all possible. Make some compounds – if they really don’t work, it’ll give you more confidence in the model, if nothing else, and if they do, you’ve learned something valuable and interesting.”

    It’s also important to capture all of these real-world interactions. Then unleash the “deep learning” software (I keep hearing the stentorian voice of “Deep Thought”, the computer that created the Earth as a huge biocomputer to get the answer to “Life, the universe and everything” and the answer turned out to be “42” when I read the words “deep learning”) on all the recorded interactions.

    Don’t impose forms with “fields” to be filled or any other such nonsense on the way people engage in these interactions, either. That’ll distract the only people who actually understand the issues at play from doing so thoughtfully. Also, the presence of a form someone else designed limits you to thinking in the way this person anticipates – and so the form captures your thoughts through someone else’s filter. Not good.

    Deep learning that lives up to the name will either derive useful and novel insights from the flow of information to and from CADD experts and actual drug design experts – or it won’t and the deep learning guys need to go back to the drawing board.

    But if enough conversations between the priests of CADD and drug design professionals are captured and analyzed by humans (still better at intuitive leaps than computers) and deep learning machines, we might get CADD that works out of it, because the people who design it would have better clues to what they’re getting wrong and what’s working.

    And I really wonder why we’re not doing it. The approach has worked in other areas.

  16. Jose says:

    Insanely relevant XKCD comic to comp aided drug design:

    https://xkcd.com/1831/

  17. Shaughn Robinson says:

    Perhaps the we should use “computational potency design” or “computational solubility design”, rather than drug design. That would remove the ridiculous expectation of potency modeling being applied to the wonderfully unknown land of biology.

  18. Jim Bosley says:

    Derek, you say ‘We don’t have the horsepower we need, in most cases, to step in and say with confidence that “You shouldn’t work on these compounds at all – our model says that they won’t work”’. But that is exactly what my colleagues at Pfizer and my former firm, Rosa, did in about 2014, with GPR119. We created a math model of diabetes for a compound which Pfizer had no clinical data on (there was only sparse clinical data, selectively reported, for GPR119), and predicted that the target would fall short. Pfizer had the information they needed, earlier than other companies, and killed the program. Later, JnJ did a trial on their GPR119 compound, and the clinical results matched our model predictions to an astounding degree and validated Pfizer’s decision.
    You’ve had some kind words for a paper Jack Scannell and I wrote about how program outcomes depend upon decision support models with high predictive validity. Math models that integrate data in a scientifically valid way (cell, mass, species balances, known receptor binding and reaction rate kinetics, validated pathway maps) are not perfect. But but adding such models to the mix is ALWAYS are better than basing a decision upon a single datum alone, or on a limited subset of data. There’s a reason why most of the leading R&D based firms have Quantitative Systems Pharmacology programs now: they add value and provide better predictive validity.
    But you’re article is correct: these models aren’t perfect. There has been overclaiming. Sometimes, in retrospect, by me. 🙁 Now, there are “Big Data” hucksters mixed in with serious advocates. The hucksters seem to imply that we can dump petabytes of data into the computer, and their algorithms and code will pop out validated targets and optimal compound structures for specific patients, with validated markers to ID those patients. Big Data will help drug development, but I am skeptical of claims that we will have data in/cures out programs anytime soon, if ever.
    We are working on improving the modeling methodologies such as those used in the example above and (stay tuned) will be announcing a new company to do this soon. For now, though, I think that there are existing, proven techniques that could improve drug development. The problem isn’t that the don’t work – they do. The problem is that they are underutilized.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.