Skip to main content

AI and Machine Learning

AI and Drug Discovery: Attacking the Right Problems

I’ve been meaning to write some more about artificial intelligence, machine learning, and drug discovery, and this paper (open access) by Andreas Bender is an excellent starting point. I’m going to be talking in fairly general terms here, but for practitioners in the field, I can recommend this review of the 2020 literature by Pat Walters, which will take you through a number of important topics and where they seem to be heading.

Even if you’re not a computational drug discovery type, a look at Pat’s roundup might be instructive, because seeing the actual problems that the field is wrestling with will very quickly take the shine off a lot of hyped-up headlines and press releases. These include things like “How do we even estimate the uncertainty in our model, and how do we compare it to others?”, “How do we deal with molecules as three-dimensional objects with changing conformations, as opposed to two-dimensional graph-theory objects or one-dimensional text strings?”, “Since no one can actually dock a billion virtual molecules into a protein target, how can we reduce the problem to something theoretically manageable without throwing away the answers we want? And how will we know if we have?” and “What do we do when our model will only start to work if we feed it more data than we’re ever going to have?” The next time you see a proclamation that everything’s been made obsolete by AI-driven modeling, keep those in mind.

The Bender paper is a good place to start if you’re not knee-deep in such questions, though, and  I especially appreciate a point it makes in its Figure 2. That’s the result of simulating improvements in the drug discovery process, with various estimates on the cost of capital, expected return from a new drug, patent lifetimes, and so on. It’s useful because very, very often you’ll hear the pitch for a new computational approach in terms of how it’ll speed everything up. No more stumbling around screening piles of molecules! No more tedious property optimization! But while those would be nice (and remember, we aren’t there yet), the real problem is having drug candidates fail in the clinic. All that other stuff is a roundoff error compared to the clinical failure rate.

That’s what the paper’s simulation found. Lowering the cost of the preclinical stages by 20% or making them 20% faster (which to a certain degree are the same thing) does indeed save you money. . .but those are overwhelmed by the savings that you could realize if you could just reduce the clinical failure rate by 20%. The absolute best ways to do that would be through picking better targets and through picking compounds and targets that don’t throw up unexpected toxicity in humans. Those, sadly, are exactly the areas that AI/ML approaches are currently making the least traction in, because it’s so hard to think up a useful way to attack them. Speeding up screening or estimating physical properties, for all their difficulties, are so much more tractable. Which accounts for the press releases talking these up as if they’re removing gigantic stumbling blocks to fast and easy drug discovery.

This is not a new insight. But it’s a hard one to swallow, for several reasons. We have an awful lot of proxy measurements in this business (the Bender paper is very good on this topic). We have to have them, because measuring the most important things (does this drug work against a human disease, and to what degree, and without causing more problems than it solves) can only really be done in the clinic. We come up with mechanistic biochemical rationales, cell assays, animal assays, evaluation schemes for compound structures and physical properties, all sorts of things to try to increase our chances for success when the curtain goes up and the real show starts. Which is human dosing.

These proxies generate heaps of numerical data, so it’s understandable that computational approaches use them to try to make better predictions. But in the end, they’re all still just proxies. The paper’s Table 2 goes into details, with the strengths and weaknesses of the various assays and systems. The bottom line is that they’re all useful, and they’re still not enough. We all go into the clinic having done a lot of stuff that’s Necessary But Not Sufficient, and if you don’t hold your breath when the first human doses start, then you haven’t been doing this stuff long enough. What everyone wants are AI systems, computational techniques, and models that will reduce all that finger-crossing and tachycardia, but that’s unfortunately some ways off.

It’s hard to even think about the best ways to (for example) improve target prediction or human toxicity computationally, other than just assembling more and more knowledge (which has been the program for the last few hundred years, and therefore does not make for a sexy stock prospectus). You’d need much better simulations of living biology than we have, and getting that to come into focus is going to take a lot of work and a lot of time. As it is, no one even bothers (for example) trying to predict side effects when a compound goes into a two-week tox assay in rodents. You’re about to find out what they are, and pretty much anything that’s a real concern is going to come as a surprise to you anyway. And it’s not like side effects are constant through a population, either – variations in human physiology and immune systems make sure of that, and that’s a whole different level of difficulty. Here’s a summary:

The need to make decisions with sufficient quality is only compatible in some cases with the data we have at hand to reach this goal. If we want to advance drug discovery, then acknowledging the suitability of a given end point to answer a given question is at least as important as modelling a particular end point. . .

The problem is, modeling is easier to start doing than dealing with that suitability question. It can also be harder to explain this point to investors, to granting agencies, and to upper management, because improvements in things like assay quality and target selection are harder to quantify and come on slowly. This, to me, is the big question looming over a lot of AI/ML approaches to drug discovery, and I’m really glad to see a paper addressing it head-on.


43 comments on “AI and Drug Discovery: Attacking the Right Problems”

  1. Peter Kenny says:

    What is touted as AI in drug design actually seems to be a rehash of multivariate data analysis. I would argue that, in order to make impact in drug design, computational chemists and cheminformaticians need to think more in terms of Design of Experiments and hypothesis-driven design (how to generate the information for decision-making as efficiently as possible) and less in terms of prediction (which for pharmaceutically-relevant quantities will probably continue to be challenging for some years to come). While better multivariate analysis (whether you call it AI, Machine Learning or multivariate analysis) may well prove beneficial, it remains far from clear that AI is a useful framework for drug design. There does seem to be a lot of Kool-Aid being drunk.

    The case for CompChem/cheminformatics in drug design is often presented in terms of predicting quantities that could be measured so that they don’t have to be measured. The bigger problems in drug discovery are the quantities, like unbound intracellular drug concentration in vivo, that cannot (currently) be measured. Biology, as Derek continually reminds us, is tough.

    1. This is exactly right. I think it’s a straw-man argument to ask whether AI/ML can predict clinical trial outcomes, then dismiss it because it struggles there. It’s not going to “solve” drug discovery any more than a new chemical reaction or robot or trial design will “solve” drug discovery. It’s a very powerful tool to improve the process (hopefully dramatically in some of the early stages). It’s also never going to operate in a vacuum of careful, large scale experimental data generation.

      If you can go after a target with AI/ML that’s genetically validated but “undruggable” maybe your trial success rate improves. If you can explore far more compound options for a given target, maybe potency and properties make it so you can dose less and have lower chances of toxicity. If you can do 1000x more weak experiments quickly and get higher quality data by aggregating and processing it with ML, maybe you can solve problems you couldn’t before. If AI/ML comes up with a starter compound and helps guide a traditional med chemist to a better one, is that AI/ML solving drug discovery? Is that a useful tool?

      Derek is right to say that ML should be tested against hard problems, but essentially the paper and this article is asking, “if we solved the easy early stage problems faster/better, would that move the needle?” Ok maybe not, but we’re interested in solving the hard early stage problems. The paper and this article aren’t framing things in the right way.

      Disclosure: I founded an AI/ML company with its own wet-lab operations to generate large datasets, and I am a big fan of Derek’s despite my mild disagreement here.

      1. Peter Kenny says:

        Hi Nicolas,

        The incremental nature of drug discovery is not always appreciated by ML evangelists and it can be helpful to think of screening, hit-to-lead and lead optimization phases to the process (drug design takes place in the second and third phases). One reason for this state of affairs is that our ability to calculate what we need to know in order to do what an engineer would recognize as design is limited (in the case of intracellular unbound concentration in vivo we can’t even measure what we need to know and this makes drug design ‘indirect’ to some extent). One area in which I think ML/AI scientists might have significant impact is achieving optimal coverage of relevant regions of chemical space in an efficient and cost-effective manner.

        I see many of the ML methods that are currently touted for drug discovery as falling into one of three categories (analysis of structure in data; classification; regression). While I believe that one is most likely to learn something new from analyzing structure of data, these methods are unlikely to be directly relevant to drug design (I see them fitting more naturally into the in the screening phase where one might want to map molecular structures in chemical space or perform image analysis on output from high content screens). Genuinely categorical data are actually rare in drug design and a common motivation for categorization of continuous data is to hide the weakness of the underlying trend (or the crappiness of the model).

        In lead optimization, medicinal chemists tend to work in local regions of chemical space (e.g. structural series) and they tend to think in terms of relationships between molecular structures. The question is not so much ‘how potent will this compound be?’ but more ‘How much potency will I gain by substituting with chloro and how much solubility will I lose?’. The medicinal chemist considering the application of a hERG model is likely to be more concerned about how the model will perform in the local chemical space defined by the project compounds than the regions of chemical space sampled by the training set compounds. My view is that many (most) predictive models in drug design that are touted as ‘global’ are actually ensembles of local models.

        I’ve linked ‘The nature of ligand efficiency’ as the URL for this comment and some of the material in the ‘Introduction’ and ‘Molecular size and design risk’ may be relevant to this discussion.

        1. Hi Peter,

          I actually think we mostly agree, I was trained as a bench biochemist and am mostly skeptical of the pure AI/ML approaches. I also agree that lots of models are much more locally optimized and fail to generalize more than is commonly assumed by AI/ML evangelists. Biochemistry is hard.

          My thesis is that the only way to get generality out of a model is to train that model on as large a dataset as possible. Few such datasets exist, and most are pretty noisy (as you highlight). My hope is to build those datasets by casting as wide a net as possible (we use DEL, itself a noisy compromise, but a decent one if done well), then repeatedly casting as wide a follow-up net as possible with increasingly high-quality/content experiments. This is akin to what you’re suggesting on combing chemical space as efficiently as possible, but I think you have to do it with a hybrid approach. Experiment and ML are joint, ML can’t work to its full ability without the strong experiments.

          The idea is to give that medicinal chemist who will unavoidably have to make and test a couple dozen compounds as much information as possible when they reason about potency vs. hERG. When tackling very difficult targets, I think this extra data can be especially useful.

          I know that this is not unlike the traditional design-make-test cycle of traditional med-chem. It’s a variation where the AI/ML is a partner throughout and where we try to generate large datasets for as long as possible under the assumption that the AI/ML will be better at guiding downstream experiments if we do.

          Thanks for sending the link to the ligand efficiency article, I’ll have a close read!

          (once again, full disclosure, founded an AI/ML/DEL drug discovery company)

      2. Dominic Ryan says:

        One can look at this from a larger scale of economics. ML has a long history of improving the efficiency of the workflow in selected projects. That was often through more of a design of experiment approach as others have pointed out (Peter K). I suppose this latest attempt at quantitation gives better context to the benefit of speeding up *a successful* project. Does the data also show speeding up a fail decision point though?

        I think a better question, also very hard to answer, is the relative benefit of running three projects and selecting the one with the best clinical prospects, vs making one go a bit faster. Derek is touching on this in his original post and I agree that a better project is best. Complexity bites though. In my career I have seen portfolio management run from one end of the spectrum to another: “Only validated targets with clear efficacy and safety windows”: sounds great, lots of data from literature, but can you actually improve on what is known in a *commercially* useful way? “Novel targets with a new mechanism of action (usually built around a new platform)”: sounds great, but often sparse relevant chemical structure types to refer to and if a new platform the relevance of what is known may not be relevant because of needing a new type of biological exposure. You trade later failure for early enthusiasm.

        The middle ground is where much effort is spent but that begs the question, is there enough data to build highly directive models?

        Any ML method has to contend with both applicability domain and noise. The applicability domain is asking if your small pool of compounds for which you have data are in the same pool as the compounds you ultimately want to get to but don’t know much about yet? That brings up the 10**60 reference again -is it sulphonamide or sulfonamide :-), it is *very* easy to be in the wrong pool. There is no simple way to have very high confidence in that when you need it most at the early stage.

        Noise is the other big factor, and a pet peeve of mine. How often have you seen IC50 data cited to 3 significant figure? What about a cell based assay similarly cited? A truly excellent biochemical biochemical assay might be good to within two fold. A great cell-based assay could be 3-5 fold. That is relevant because the noisier the data the more of it you need to have confidence in the ML conclusions.This is the data curation trap when trying to compile clinical data, or linking clinical data with literature biology. IBM Watson was pointing at that, I don’t think you could call that a success despite what seemed to be impressive resources behind it. For every Watson I suspect there are a dozen other approaches that reached their limit. As with so much of life, we hear about a few successes and the failures mostly go quietly off into the sunset.

        As others have said, ML is a useful tool, like so many other tools. There will be occasions of important contributions but I have a feeling we are still climbing the new technology hype curve on this one.


        1. I think I agree with you for the most part. The thing that I would highlight a bit more is that I think people keep putting AI/ML approaches as the starring, even only player. I (and my company) don’t think that way. Generating excellent wet lab data on a large scale is the key here. I really don’t think AI/ML will have big successes without the data generation piece, and further one where the data is generated with the idea of doing ML downstream (which changes how you think about controls and the like). A bolt-ML-onto-whatever might work sometimes, but I think to really go after new targets or MOAs or to make better early stage decisions, you must have a strong data generation approach to feed the models. It also fixes some of the modeling error issues where the model makes a lot of mistakes. If you have a reliable way to screen lots of predictions, it’s fine if you fail a lot because you’ll win enough to make progress.

          The theme here is rethinking the early stage drug discovery processes in light of these new tools and tightly integrating both wet and computational science. But, I’m very much biased here.

    2. Todd says:

      I agree, but it’s low on the sex appeal. Knowing what to measure and what not to measure is key, but it’s also hard to sell. In biotech, it gets really easy to baffle people in more numbers as opposed to the right ones.

    3. Yt1 says:

      Multi variable analysis, especially with techniques like ‘lasso’ (where you let the program decide which variables are relevant) have always been labeled as ‘ML’. Or so I thought?

      Anyways, we all know that ML is very cool tool, but is not going to make drug discovery easy.

  2. Mythbuster says:

    Well, if people expect “AI” (or rather machine learning) to predict well in areas where they don’t have training data (or any prior knowledge) for, they might be better served with taking an undergraduate ML class first.

    You also shouldn’t expect to unscrew a screw with a hammer. Use the right tools for the job.

    1. fajensen says:

      Maybe 80% of all machine learning, once one opens the magick box, is a linear regression applied after an algorithm has separated the data points into enough “high dimensional spaces” to fit a straight line either through the points in the space (regression) or between the points (classification).

      This means ML will work well for whatever one can do with statistics, it is not going to work well for problems that require insight.

      1. Yt1 says:

        I don’t think random Forrest is linear regression at all.

      2. Kent Kemmish says:

        What’s the other 20%?

        1. fajensen says:

          I am just a hack, but, I would think that the “Reservoir Computing” techniques are not based on linear regression models, the classic Kalman Filter (the one ML-application that does “drive” with real people at scale, like f.ex. aircraft) works with differential equations, and “Stochastic Computing” is working with probabilities.

          In My Opinion – Intelligence via the current crop of Machine Learning will not be achived even with still better algorithms, running in football-field sized server halls full of GPU’s. These are just crude simulations.

          The real thing will, IMO, eventually emerge from using “better computronium”, like surveyed here:

          This is not to say that current Machine Learning are not useful, but, it simply won’t figure out what a chair *is* and what objects can be used as “chair”, a task that any 2-year old can do!

    2. tommysdad says:

      Well, in this case the “people” are investors who do not know any better, listening to scientists pitching AI/ML as a “solution” for drug discovery. Who is at fault here? The liar or the the believer of the lie?

  3. Mostafa says:

    You could have commented on VC approaches to AI/ML, which are no longer related to scientific impact. Scientists can’t ignore that trend

    That’s what I did recently: AI for drug discovery as a Venture Capital meme – the case of Insitro:

  4. Comey says:

    This look like a good review summary of vaccine immune parameters etc, so could be good blog substrate for you to report back next week with your usual insightful and useful commentary

  5. Druid says:

    One reason the overblown promises of AI & ML are hard to swallow is the implicit assumption made by entrepreneurs completely ignorant of the skills and processes of drug invention and development that those of us who practice them (and for once practice seems the right word) are so stupid and ignorant that it takes us decades to complete. From my point of view, every lesson learned, including self-teaching, in biochemistry, cell biology, anatomy & physiology, synthetic, analytical and scale-up chemistry, patent law, toxicology, clinical trial design, regulatory affairs, ethics and health economics has been and is useful in helping to achieve this. Even if a laptop could do it in half the time, which it can’t, it would still take thirty years. A few old-timers around the place can be a good thing. The danger is when young senior managers get frustrated with the cost and their own lack of experience and are tempted to buy into unsubstantiated AI promises to shave ten years off the R&D process.
    I try to avoid saying a project will not work because that is an easy bet, but I don’t mind saying it about artifical ignorance.

  6. En Passant says:

    … I can recommend this review of the 2020 literature by Pat Walters, which will take you through a number of important topics and where they seem to be heading.

    In my naive read of that review I found one thing conspicuous by its absence.

    I saw no indication of anyone using Generative Adversarial Networks (GANs) or variants on that method to generate lead molecules (or to do anything else).

    GANs are a relatively new AI technique or method. GANs are very computationally intensive. They can run a long time to generate any results at all. So maybe they are not a big candidate for funding in drug discovery.

    But with a simple search for such papers, I did find a popular press indication, a 2018 article from one company, Neurosearch, that is using a variant of GANs for lead molecule discovery:

    Creating Molecules from Scratch I: Drug Discovery with Generative Adversarial Networks

    The link is:

    Given the relatively cheap cost (and getting cheaper), of computational horsepower today, very high speed GANs should become even less expensive.

    Can anyone here with experience and expertise in the drug discovery field comment on the possibilities of using GANs to find candidate molecules?

    1. Mark says:

      GANs are fine as far as they go, but the problem is how to train them. If you’re building a GAN to generate faces, you can do that pretty well because you have an enormous corpus of data for the discriminator to work off, so the discriminator can get pretty good at distinguishing “face” images from “not face” images. The problem with GANs for generating molecules is basically what is the objective function? If the objective is just to build chemically sensible molecules, then there’s loads of training data and you can get a pretty good discriminator. However, that’s deeply unexciting. If the objective is, for example, to build molecules that bind against protein X, then as part of your GAN you need to build a discriminator that can distinguish between binders and non-binders, and there’s generally nowhere near enough data to do that. If your objective is to build molecules that are decent preclinical candidates, then your discriminator also needs to learn about pharmacokinetics, metabolism, cellular distribution, off-target effects, common tox problems and more, and there *really* isn’t anywhere near enough data to do that and probably never will be.

      So, the current state of the art is that GANs have been shown to be able to produce molecules with some binding ability to a protein of interest, with those molecules generally being actually rather similar to the known actives. That’s not going to set the world on fire.

      There’s also a related problem coming from the fact that we can’t in general build good algorithms for distinguishing binders from non-binders against a particular protein: virtual screening algorithms might get 10% hit rates on a good day. If you’re running that algorithm over sets of commercially-available compounds that you can get for $10 a pop, then having to screen 100 compounds to get 10 binders and maybe 1 or two that look exciting is perfectly feasible. If you have a GAN set up suggesting molecules for you, then all of a sudden you’re spending $10K per molecule or more for custom one-off syntheses, but still with the same hit rates. That gets really expensive really quickly.

      1. En Passant says:

        Thanks for that overview for a non-chemist.

        As you noted, and as the article I linked also noted, the threshold issue of screening molecules that can actually exist, has been fairly well solved.

        To belabor the obvious, it’s now clear to me that for the remaining (huge) candidate selection issues, processor speed and I/O bandwidth are not issues, but data is the issue.

        Given the rapid success of Operation Warp Speed to develop vaccines, I would like to be optimistic about the development of the databases you noted are necessary for GANs to rapidly predict useful drug candidates.

        But in practice, data requires money, in copious quantity. About that, it is more difficult to be optimistic.

  7. Ken says:

    “Since no one can actually dock a billion virtual molecules into a protein target, how can we reduce the problem to something theoretically manageable?”

    This brings back fond memories of my theoretical computer science classes, where the question was just “can this be computed”. Why yes, yes it can; though not with the space, time, matter, and energy resources available within this universe…

    1. Christopher Ing says:

      I like the thought experiment, but we have to set the bar a bit higher in this case. There have been multiple teams who have docked libraries greater than 1 billion molecules in the past year, ,

  8. Docker One says:

    Docking a billion molecules against a protein target in a couple days is straightforward and fairly inexpensive now. The Shoichet group published a 700M compound screen a couple years ago and OpenEye published on screens of billions last year. To the point though, quality of prediction and availability of high quality training data is a major challenge.

    1. MedChemist says:

      The problem of docking is not the volume it is its quality. What conformations are used for the protein and the molecules, are they relevant to in cell or in media conditions. How energies are estimated… I’ve seen so much heat produced in universities computer clusters to compute completely useless dockings that just did spit out the usual suspects that stick to everything…

    2. I may be wrong, but... says:

      Can’t you *screen* many million different compounds using DEL?
      For a large number of targets, this should be more than enough to deliver more hits than you can handle.
      So, why do virtual screening when I can generate actual data?
      Why look for a solution to a problem that doesn’t exist?

      1. tommysdad says:

        Oy, yet another example of a little knowledge being a dangerous thing.
        To do a DEL correctly is actually quite an endeavor. And if you want to do a DEL screen against a truly novel target looking at a specific binding site, good luck sifting through your hits.

        1. I may be wrong, but... says:

          I see… and AI has a great track record screening “against a truly novel target looking at a specific binding site”.
          Of course, testing the top x compounds from a virtual screen under these conditions is a piece of cake.
          Indeed, AI can do better than experimental screens for every type of target.
          Good luck!

        2. True! Doing DEL correctly is hard. However, it can be done against unknown targets and has yielded good results. I’d say that hard, undruggable targets is exactly where you want to use something like a DEL. They’re not magical though, and it’s important to know their limitations. Now the real next step is doing a DEL and using that data to feed ML. We’re doing that at my company, and it’s yielding some interesting results.

      2. AIeee says:

        “For a large number of targets, this should be more than enough to deliver more hits than you can handle”

        Not in our hands it isn’t, or at least it hasn’t been against any of the targets we’ve run it on. Although the folks in our company who do this do not appear to realize it.

  9. Yuri Zavorotny says:

    AI Character/Personality

    I find that it helps to understand AI/ML/Neural Networks better by explaining its personality and operational principle in plain language. Below is my take on it.

    First, let’s clarify what Neural Network is not: it’s not your classic (von Neumann) computer architecture with its clearly defined components designed for execution of algorithms step-by-step.

    An neural net does not run algorithms, or execute operations. It cannot be programmed. Rather, it can be taught to recognize a class of pictures (e.g. pictures with a bicycle on them). And yes, it is that specific — whatever data it works with, it treats it as 2-d pictures.

    It cannot learn on it’s own — it has to be taught. It is because it has zero capacity for knowledge and understanding, it needs to be told what to believe, what to accept as true.

    Otherwise, the process of training is simple — the teacher shows the NN a picture, asking it to guess whether it depicts a bicycle. Once NN made a guess, the teacher tells it the right answer.

    Note that Neural Network doesn’t try to remember individual pictures. Instead, it maintains an idea/concept of how a bicycle should look like (and how it shouldn’t). Specifically, a concept is a collection of patterns and anti-patters. Detection of a pattern suggests that a match is found. If an anti-pattern is detected, then it is likely “not a match”. Overall, it is blazing fast and very memory-efficient, especially for a design this simple.

    So what we can expect from a Neural Net in terms of it behavior?

    * Intuition. That is, essentially, what neural networks do. They never know anything for sure, so it’s always a guesswork based on experience. And since it is not a result of an algorithm, it is unexplainable… like that gut feeling.

    * Emotional responses in humans and animals, including empathy.

    * Qualia, a.k.a Locke’s Simple Ideas, a.k.a. Kant’s intuitions, I prefer “concepts” — collections of patterns/anti-patterns matching a particular class. That’s why, for example, an attempt to describe what a “chair” is mostly results frustration, even tho you know what a chair is don’t you?.. Well, actually you don’t — but you might have a pretty good idea of the “chair”.

    * Creativity — because intuition is guesswork, if NN have no clear idea, it picks a probable outcome, or simply picks at random. Call it “a leap of faith” — a drunkard searching for the keys under the lamppost is one example. Assuming the existence of objective reality we all share is another.

    * The sense of beauty, aesthetics. Its rational equivalent concept is known as “efficiency”. I.e. “beautiful” is what NN perceives as “efficient” (and, in some cases, literally).

    * Subjectivity — relying on its personal experience alone makes NN 100% subjective. To NN there is no such thing as “outside”, making the objective reality just as incomprehensible.

    * Very superficial. Being an image recognition system, NN makes choices based on superficial appearances.

    * Irrationality — it’s always guessing and hedging her bets, NN knows nothing and understands nothing.

    * Selfishness. NN = unexplainable AI. It cannot explain its own beliefs, much less understand other’s perspectives other than its own. Unable to reconcile its perspective with that others leads to power struggle, as the only conflict resolution strategy.

    Call it the “beautiful” mind — and yes, you might know the type 🙂

  10. HAL Skynet coming for our jobs says:

    Derek, your analysis that AI would be more useful if it could predict clinical trial outcomes is correct but uninteresting. Medicinal chemists also “come up with mechanistic biochemical rationales, cell assays, animal assays, evaluation schemes for compound structures and physical properties, all sorts of things to try to increase our chances for success when the curtain goes up and the real show starts”. If I’m to accept your dismissal of AI, then I’d also need to agree that we should fire all of the experienced medicinal chemists. After all, there’s NO person or technology today that can predict clinical trial outcomes, which is why we run them. The question isn’t “What would be the most useful technology?” (since the answer would be something like time travel or teleportation), it’s “Where can we improve the current processes?”.

    If AI can’t predict clinical trials but it can out-med-chem the medicinal chemists, then it’s still useful, interesting, and scary!

    1. Derek Lowe says:

      Good points. But (1) at least the medicinal chemists aren’t always issuing press releases about how they’re suddenly going to leap past everything that they’ve done before, and (2) I’m not yet convinced that AI can even, as you put it, “out-med-chem” them either. Although I see no reason why it won’t be able to at some point, to be sure.

      1. metaphysician says:

        Amusing postulate: mankind succeeds at building an artificial superintelligence, with both general capacity and ramp up. AI is tasked with “solving” drug discovery. Its best workable plan-

        1. Use its strategic advantage to take over the world.
        2. Use its economic advantage to generate vast amounts of resources.
        3. Use those resources to support a vast increase in the number of human med chems working on the intractable problems of the field.

      2. TallDave says:

        yep eventually we’ll have full-scale molecular-level modelling tailored at least to an individual’s genome, if not to a model built against a detailed snapshot of their actual cells to some fantastic resolution

        then after a decade or so of clinical trials that exactly match model predictions we might be able to stop relying on clinical trials quite so much

        then treatment development could become virtual and happen in weeks instead of years

        of course eventually will take quite a while

        today it seems we can’t even simulate a cell in toto except in a general sense of understanding the input and outputs of organelles and the basic chemistry involved in their operation

        for now still locked firmly in the prison of physical experimentation

  11. Marko says:

    FDA Authorizes First Machine Learning-Based Screening Device to Identify Certain Biomarkers That May Indicate COVID-19 Infection

  12. Actual PI says:

    Ya, no one cares about AI and drug discovery. Come up with a new idea, thanks.

  13. Sulphonamide says:

    Any thoughts on how many years / decades are we from having that fabled in silico library of all 10 to the power of 60 (or whatever) drug-like molecules and being able to fire them through an accurate model of how our target looks, in the confirmation we want to hit, in a reasonable time frame (over a good lunch) and gain sufficiently accurate docking scores that we know which 10 molecules to tell our robot to make? No sign of increases in computational power stagnating any time soon…

    Or is this so fundamentally out of reach (…like time travel) that it is in fact much more likely that a completely new / nascent approach (e.g. targeting mRNA rather than proteins) will sweep all our current technologies aside before it can ever be realised (and the image of medicine in 2121 I teach – if we can understand it, we can cure it; all we need is a target – is indeed nothing more than the light relief it is intended to provide).

  14. Bad family says:

    For some reason there are a lot of people (specifically parents) who are saying that they will not get the vaccine because they read something on how ivermectin treats COVID safer than a vaccine. I know this is nonsense and I remember reading some scholarly articles earlier on how it was one of the “hopefuls” but fell through. I do not know where they are getting their info from, but more and more of my family members are asking for a prescription for it and saying it’s safer than the vaccine. I was wondering if you would speak to this when you get a chance. I know it’s absurd. Sign of the times I guess.

    1. fajensen says:

      Machine Learning works very well on the problem of finding suckers and matching them up with exactly the kind of disinformation they will buy into.

      And I am not even sure about the “suckers”: Intelligent people will unreason themselves into believing the most ridiculous things!

  15. milkshake says:

    I have a biologist friend who went into Big Data company associated with a major cancer clinic. The idea was to build a database of every piece of data on individual patients in the clinical trials at the center, complete with preserved biopsy samples. The idea was to sell subscription – this would have a real chance to improve the outcome of future trials, by letting third parties to learn from the past trials in a fine detail, even the unpublished failed ones.

    Then in turned out that their fancy datasets were only as good as the last unmotivated drone doing the data entry and sample preservation, and those people were hired trough a contractor who won the contract by making the lowest bid . There were cases of individuals working from home who faked the data entries wholesale (many data entries are repetitive) because they were paid by the entry and there was no quality control. Tissue samples were poorly preserved in many cases and so on.

    1. Thoryke says:

      The devil is in the details for data _and_ metadata! If you aren’t careful about both, the applicability of anything that gets “found” will be suspect….

  16. TallDave says:

    drugs are the tip of the iceberg

    coding space available in protein combinations is mind-boggling

    sadly still quite a ways away from being able do much with it

    but oh the potential optimizations that must be lurking in there

    waiting to be mined from the computational ether like bitcoin

    conceivably grandkids might enjoy a world far beyond senescence

  17. Chris Phoenix says:

    Sounds like a new approach to human toxicity, that made it faster and cheaper to assess failure, could be useful. Here’s a wild idea for a more sensitive poison detector:

    Do a bunch of whole-transcriptome observations of animals being healthy vs. very mildly poisoned with poisons where we know the mechanism. Plug that into an ML model and try to train it to recognize the very mild signs of that mechanism. For initial validation, do this for just one mechanism.

    Then test this on more chemicals (known to us, but not the model) at various doses and see how accurate and sensitive the model is at predicting toxicity. Probably throw out this idea. But if it works, go back and try other mechanisms, and then poisons where we know the symptoms but not the mechanism.

    If you find that transcriptome-plus-trained-model gives a useful indication of toxicity at sub-damaging doses in general, then, where possible, gather similar data in humans and adjust the model. (We poison ourselves with alcohol all the time – surely there are lots of poisons we could ethically do this with.)

    If this worked, how much time/money would it save?

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.