Skip to Content

Knock Six Years Off Your Timeline. Um.

There’s only one honest answer to the question “How long does it take to develop a new drug?”, and that’s “Too @#$! long”. In the same way, the only honest answer to “What are the average chances for a drug candidate’s success?” is “Too @#$! low”. The combination of those two factors is the root of pretty much all the drug industry’s problems – everything else would get a lot easier to deal with if we could ease up on those two a bit.

That being the case, there are plenty of people out there who are ready to tell you that they can do something about it. They fall all along the sliding scales of realistic/delusional, well-meaning/predatory, etc. What none of them have been able to do, so far, is make much of a dent in either of those big questions. Improvements do come along, but get balanced out by complications somewhere else, which is why the industry has been spending more and more over the years to maintain roughly similar levels of drug productivity. But this means that whatever new technology comes along, particularly if it’s not that well understood, can get wedged into a PowerPoint deck and sold to people who are hoping that the Next Big Thing has finally arrived.

Artificial Intelligence, in its various forms, is currently the hottest plateful of fried dough being served. It covers a lot of ground, has a lot of potential, and no one in the audience is likely to really understand the details: perfect. I follow this field with great interest, and despite skepticism, I’m not betting against it. That said, I have a limit, which has been reached by a slide deck produced by a consulting company, sent by a longtime reader of the blog. For instance, one of the slides says:

The drug discovery process typically involves the identification of hundreds of compounds and their subsequent elimination in further rounds of testing. AI has the potential to help pharma companies discover drugs faster and more cheaply by narrowing the list of therapeutic targets.

OK, those are two different things. You have hundreds of compounds against a target you know about; narrowing the list of therapeutic targets is what you do before you make all those. That’s followed up by one of those big funnel-looking things, showing how projects narrow down to one approved drug. It’s got your Phase III, Phase II, Phase I, Pre-clinical. . .and before that, it has a big honking block of space labeled “Drug Discovery: thousands of molecules screened. 3-6 years”. Next up is the same funnel after the laying on of hands – that massive hunk at the beginning is now a tiny sliver, because “automatic drug discovery” has reduced that screening phase to “3-6 days”, shaving six years off your timelines.

Hooey. Screening “thousands of compounds” does not take you six years, believe me. You can do a million in six weeks. The whole compound screening step is just another early thing in preclinical space; I’ve never seen a successful project in which it was a rate-limiting step. But “shave a few weeks off something at the very beginning” isn’t as compelling an offer, is it? Looking at the companies they’re touting, I note that one of them is Atomwise, whose tendencies towards overstatement I’ve written about here and here. Others (new to me) are BenevolentAI and twoXAR. I will be very happy to see how these folks make out; I really don’t want to give the impression that I want them to fail. I mean, I do this for a living, too, and I would very much like to be able to do it better. We need some help over here! But we do not need some more hype over here – that’s my point.

Now, I should mention that I know people who are up to their collarbones in computational chemistry, in several places around the industry. And I’m told, by some of them, that there are methods that show real promise in advancing drug discovery, which will certainly be good news if true. But I’m also told by everyone involved that at the moment these methods are extremely computationally intensive, even with the best equipment available, so you’re not going to run a virtual screening effort with X-kazillion random compound this way. Not yet. A fully operational quantum computing platform would presumably come in handy, once a great big coding team has written modeling software to take advantage of it. But neither that hardware nor that software exist yet.

Rather than calculating from the ground up, I think that the BenevolentAI people are, like many others before them, mining the list of existing drugs looking for repurposing, and digging through the published literature looking for connections that may have escaped earlier observers. I feel sure that there must be quite a few of those, and I’d have to think that AI/machine learning/deep learning/whathaveyou is going to be a good way to find them. But that’s no easy task, either, considering that (at a guess) about 30% of the medical literature is useless or worse. Humans are needed to curate the data set that you’re feeding your software, and that’s a labor-intensive step. It’s still easier than what the Atomwises of the world are trying to do, though.

None of this is impossible. Some of this may even happen fairly soon, smaller parts may even be happening now. But I will lay money that it’s not all happening as we speak, which is what consultants everywhere would like their audiences to believe. The train is pulling out, the ship is sailing, everyone else knows about this (so why don’t you?) The proper attitude for the real hard sell is mild surprise that your clients haven’t heard the good news that you’re bringing them: the revolution’s here, guys! No one told you? I have developed antibodies to this over the years. In my own experience, scientific revolutions do not announce themselves on polished PowerPoint slides.

53 comments on “Knock Six Years Off Your Timeline. Um.”

  1. Peter S. Shenkin says:

    “The combination of those two factors is the root of pretty much all the drug industry’s problems”

    A drug company might argue that patent protection is “Too @#$! short”.

    1. Pennpenn says:

      I’d think in quite a few cases they’d think anything shy of forever is “too short”.

      Still, if I understand this right reducing those two factors mentioned would effectively extend patent times (well, the time the owners can make the most of the patent).

  2. Anonymous Researcher snaw says:

    >”those big funnel-looking things”

    Colleague, circa 1999, “Not another blasted funnel slide!”

    Seriously, I think there is a huge misunderstanding of what AI can do. It can do impressive things, but it basically does those things by finding patterns in the data. Patterns which we could find without computers if we could throw enough human brain-hours at the same data.

    But the data we use in drug discovery are very high-dimensional. Look up Curse of dimensionality in Wikipedia: to explore a high-dimensional problem space, we need enormous amounts of samples. AI cannot find optimal solutions to a highly nonlinear problem in parts of the parameter space that our available data have never sampled.

  3. EntropyGain says:

    I haven’t been paying attention to computational approaches lately (last 15 years or so). Anyone getting anywhere on predicting ADME/PK from structure? A little help on volume of distribution, F, or even Cyp inhibition would actually be useful for significant fraction of lead optimization projects. Predicting potency has never been able to keep up with empirical med chem in my experience.

    1. Dr. CNS says:

      Some think they have.
      Take a look at “Learning Medicinal Chemistry Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) Rules from Cross-Company Matched Molecular Pairs Analysis (MMPA)”
      http://pubs.acs.org/doi/abs/10.1021/acs.jmedchem.7b00935

      1. Dionysius Rex says:

        Don’t bother. This article is very short on useful rules and data. It is essentially an advert for working with / licensing software from MedChemica. Should have been instantly rejected by the Editors

        1. Christian Kramer says:

          @Dionysius Rex
          Being one of the non-MedChemica authors, I will make an attempt to defend the article. The point of the article is not to be an advert for working with MedChemica and licensing their software, but rather to show that a lot of useful MMP-type MedChem knowledge can be learned from shared data. If your database is large enough, you can of course also learn it from your own data, but obviously you will get a lot more from sharing.

          To illustrate what type of rules can be learnt, we have put a lot of example MMP transformations + statistics for various endpoints into the supplementary material. Not sure whether you have seen those already.

          This may still be disappointing, because it means that (a) you have to know a lot of rather detailed rules + exceptions and (b) there are many more out there which we did not cover in this article. Anyway, this is reality, and if you are looking for a simple generally applicable rule to solve ADMET problems you will probably end up disappointed in the end…

  4. tt says:

    “In my own experience, scientific revolutions do not announce themselves on polished PowerPoint slides.”
    Perfect quote which I now plan on stealing without attribution and applying liberally. Having worked with lots of machine learning applications, even for very simple stuff on well curated, high quality data, I’ve learned that it is extremely difficult to find trends, patterns, and correlations. Garbage in / Garbage out is the mantra of data mining and “AI”. Machines can only learn from the data you give them and that data is generated by messy people in messy labs where signal to noise is difficult to determine, so unless you are drawing from a “google” type pile of data, or have perfect, consistent experiments run in exactly the same manner (or a fantastic training set)…good luck. I don’t think a lot of these “AI” platform companies truly appreciate just how poor the data is that they are trying to learn from, as well as how little we understand the interactions and characterization of a “target” with its ligand. Each target (or family) may well be a rather unique snowflake that may require a bespoke model and software.

  5. Barry says:

    By the end of the last century, it was at least sometimes possible to:
    1-cocrystallize a protein of interest with an inhibitor
    and
    2-solve an x-ray diffraction structure for the complex
    and
    3-subtract out the density attributed to the inhibitor (and maybe some waters that seemed loosely bound)
    and
    4-minimize the (solution) structures of thousands of members of a virtual library
    and
    5-dock these virtual cmpds into the binding site
    and
    6-relax contact residues around that binding site
    and
    7-reminimize the virtual cmpd along with those contact residues
    and
    8-score the virtual hit
    All these steps have been improved in the interim, although there are are still proteins that resist crystallizing, and step 8 is still not as trust-worthy as we’d want. But all of this presumes a competitive inhibitor and protein with only one inhibitable conformation.
    A breakthrough–or even a lot of incremental improvements–in the protein folding problem (a great consumer of computational power) could aid or replace many of these steps. And it might be able to show us multiple inhibitable conformations, not just the one that likes to crystallize (if any like to crystallize). (And it might work for membrane-integral proteins)

    We were promised that such “rational drug design” would replace Med. Chem. by the 1990s. And we were told the same for combinatorial chemistry, and members here can list other messianic fevers that have come and gone. But each one bequeathed us a new tool that is useful in our armamentarium.
    Alas, it might yet take us sixty years to make the progress that will take six years off each timeline. But that’s work worth doing.

  6. 010010101 says:

    its so easy to bash over zealous media by people who dont know better–why keep re-hashing this?

    1. Red Fiona says:

      Because maybe, if people keep pointing out the flaws in this sort of thing, one day they’ll learn.

      1. Road says:

        “We learn from history that we do not learn from history.”

        ~Hegel

      2. 010010101 says:

        Thats never going to happen, and you know it. People in every field (since….forever?) have fought misrepresentations of what is realistic. Hell, look even at something reasonable–molecular modeling for drug discovery. People *love* showing antiquated pictures from the 90s that over-promise on what computers can do, and people have been arguing about that for 25+ years.

        Someone somewhere is always going to try and hype up whats possible, or try to reach for conclusions that aren’t there. People criticizing fodder like this is just an echo chamber of medchemists patting each other on the back, laughing at something that should so clearly not be taken seriously.

        1. Hap says:

          Yes, except someone will take it seriously, and give the hype purveyors lots of money which would probably be better spent chasing something that might work. Spending time following the shiny things means that you miss or take longer to find the dull ore that might do you some good. If people go nuts for shiny things, and pay for them, then there are lots of people who will be happy to provide spray-painted or foil-covered rocks for their pleasure. Eventually, people get angry that the promises haven’t come true, and everyone’s reputation is tainted, making it yet harder to find the stuff that might work.

          Hype machines write checks to be cashed on someone else’s credibility. It seems unsurprising that people on the account being pilfered might get cranky about them.

          1. CMCguy says:

            Hap you have summarized well the dangers and damages that hype can produce where do think is the reality of inadequate resources with short attention spans of investors/public. However from another perspective I seek to disregard (and hate) the exaggerated information because we can not forget drug discovery is so awfully inefficient it screams for “better ways”. Perhaps utilization of the incremental value (tools) that can come out of such faded paradigm shifts may provide benefit in at least certain cases even if never achieve the broadest promises. Likewise we can hold out hope that one day one of those shinny rocks will turn out to be Ag, Au or Pt to provide positive success, even if only temporary, to allow renewed focus on seeking the next field to mine for useful drugs.

          2. Hap says:

            No, we do need lots of better stuff, and sometimes it comes from places you don’t expect. I sort of wish that the people making them don’t write checks their tech can’t cash, is all.

    2. P says:

      Because if you don’t, people will end up with a totally unrealistic perception of what we are able to do. Loss of trust in science and scientists is the result.

  7. Uncle Al says:

    Each pharma paradigm has its quick cheap victories: natural product emulation, structure-activity correlation, pharmacophores (metformin versus chlorhexidine?), WTF combinatorial universes, -mabs, and -nibs; and AI to come. One wonders if a fundamental understanding is being evasive.

  8. Curious Wavefunction says:

    We don’t know what “AI” can or cannot do in drug discovery yet because almost anything passes for “AI” these days. You need to first have a commonly accepted and accurate definition of a paradigm in order to measure its utility.

    1. Peter Kenny says:

      This is a good point, Ash, and I can see a number of ways that what I take to be AI could lead to improve efficiencies in the way we do things in drug discovery. The problem is panacea-centric thinking and I’ve linked my ‘dans la merde’ blog post which I think still has some relevance even though it is getting a bit dated. My view is that different technologies are likely to work best when brought together so that they can feed off each other. I think AI would have a better chance gaining acceptance by drug discovery scientists if clear benefits were shown within existing drug discovery paradigms. I believe that Derek introduced the term ‘Andy Grove Fallacy’ and my advice to earnest, well-meaning AI types would be to at least be have an idea what a drug has to do before presuming to tell people how to discover drugs. One piece of advice that I’ll offer to anybody planning to revolutionize drug discovery is to read David Halberstam’s “The best and the brightest”.

      One term that often seems to get mentioned in the same breath as AI these days is ‘machine learning’. I don’t actually know much about the fine detail of machine learning, especially those approaches where the AI tag appears most justifiable (e.g. teaching drones to fly themselves). However much of the machine learning applications in drug discovery appear to be analogues of the regression and classification models of QSAR (i.e. not AI) that have been used with varying degrees of success for many years. It can get very interesting when you ask machine learning types about overfitting or how many parameters they’ve used to fit their models. Some appear to struggle with why such uncouth questions have been asked while others create smoke screens by crapping on about kernels, backpropagation, priors and posteriors. The problem is that there is a lot of hype but how often do we see AI and machine learning advocates calling bullshit?

      1. Anon anon anon says:

        Peter, why do you consider self-flying drones to be AI but regression/classification for QSAR not to be AI, when they’re the same underlying algorithms? George Dahl published on speech recognition before winning the Merck challenge, and AtomNet is a feedforward convolutional neural network for binding affinity just like AlexNet is a feedforward convolutional neural network for image classification. What’s the distinction you’re making? Is it a matter of kind or of degree (e.g. neural nets with 2 hidden layers aren’t AI but with 8 they are)?

        Also, why do you think that discussing “kernels, backpropagation, priors and posteriors” is a smokescreen, rather than an answer from field in which you’re not an expert? (To me, that sounds a bit like if I complained that the hERG discussion on your blog was “crapping on” about logP vs logD.) For example, AlexNet fit 60 million parameters to 1 million training examples. This seems like a recipe for overfitting, but they used strong regularizers – like dropout and data augmentation – so the effective number of parameters seems to be much less than the nominal number. The reason I can make that claim is that the system is practically useful: winning the ImageNet challenge, diagnosing diabetic retinopathy, and underpinning the vision system in self-driving cars.

        AI/ML/statistics is an empirical field today. Someday we might have strong theories as to whether 60M parameters is too many – but even then the details will matter and the experts will “crap on about kernels, backpropagation, priors and posteriors”.

        1. Peter Kenny says:

          Anon, anon, anon (and on?), maybe teaching drones to fly themselves is just like QSAR modelling and maybe it is not. However, there’s no shortage of bullshit in drug discovery and it is important to be able to spot bullshit when those who know more about something than you do are attempting to bullshit you (it happens occasionally). One common response to questions about models is to overwhelm the questioner with jargon and that’s why I use the term smokescreen. Over-fitting is a real concern in QSAR modelling and an uneven distribution of objects in the relevant space can result in validation procedures coming to optimistic assessments of model quality and I don’t see this as a problem that is going to go away with ML models. My own view is that the QSAR field needs to pay more attention to training set design and providing a clearer assessment of applicability domains of models and I think the ML folk would gain credibility if they gave these issues more consideration. One particular gripe of mine is when people claim that their models are better than other people’s models and but nobody knows how parameters have been used to model the data. One assumption that I tend to make is that models that use large numbers of parameters are likely to be less transferable than models with small numbers of parameters.

      2. tangent says:

        I wonder if they look funny at your overfitting questions because they already described how they did how their holdback for validation, and you closed your ears thinking “blah blah prior prior validation regularization blah blah”?

        (Backstory: everybody who isn’t an utter naif will build their model on a randomly selected part of the dataset and then test on the other, which guards against overfitting within the data domain. You do this repeatedly for each model architecture and see when you’ve added too many variables and started to overfit, because your performance goes down. If you want generalization to other varied data, well, bring it into the dataset.)

        1. Peter Kenny says:

          tangent, do you consider the level of debate about overfitting in the ML/QSAR communities to be at a healthy level? Can you direct me to two or three articles in recent ML/QSAR drug discovery literature in which the issue of overfitting is debated?

          1. tangent says:

            Oh, zero clue about ML/QSAR literature. For all I know you’ve indeed got a field full of utter naifs whose papers are reviewed by other naifs and nobody around knows you can’t test on the training set. Do you have an example of a paper that does that, or has other generalizability failures? (Or do they not give enough detail to know?)

          2. Peter Kenny says:

            Looks like we’ve hit a nesting limit for the comments so I have to reply to me to respond.

            Rosey, the second of these looks to be interesting so thanks for pointing me towards it. I think that time split validation (as discussed in the first article) can be useful in a pharma environment where new data is being continually generated. However, I don’t see it as a substitute for holding back all examples of particular structural series for validation although one needs to be careful doing this. For example, one would need to ensure that everything held back was within model space. I have linked Hawkins, The Problem of Overfitting as the URL for this comment. I believe that in QSAR circles, it is considered somewhat uncouth to mention this article.

            tangent, Ranting about naifs is not going to move the field forward even if you find it to have therapeutic value. Let me put it to you, however, that the burden of proof is on the modeler to demonstrate that the model is not overfit and that it is not unknown for modelers to try to unburden themselves.

          3. Distinctive Climate says:

            Sheridan explicitly considers “holding back all examples of particular structural series” in the referenced paper. If you read the abstract, you’ll see he calls this “leave-class-out selection” and describes the predictive advantages of time-split cross-validation over it.

          4. tangent says:

            Ah, communicating tone on the Internet… And when you said practitioners were unwilling or unable to address the entire issue, possibly your tone was also a touch lighter than I took it for and I needn’t defend anyone’s honor.

            Glancing at a couple of papers, ML for QSAR does look like a pretty tough field (even with the best practices), because the data is hard to come by. Not just in terms of number of data points, but because a researcher seems to have low coverage of the different areas of the space, as ML problems go. It’s like if there were a billion countries in the world and I only got to train my model on data from a hundred of them. Still, I hope people would try procedures like holding back one for validation as you suggest. If you have papers that don’t describe their validation structure, call them out on it for sure. I’d be interested to see an example of a questionable paper.

    2. Earl Boebert says:

      I was around for the last AI hype cycle in the mid-1970s. My organization developed a couple of nifty point solutions for problems in the DoD domain. Two sayings we had:

      “If it works, we call it pattern recognition.”

      “AI practitioners believe you can reach the moon by climbing trees”

      This discussion makes me feel young again 🙂

  9. dtx says:

    As a definition of AI: Last year in a seminar series on AI, a UPenn computer science prof noted “AI is just statistics – with an emphasis on prediction.”

    It really demystifies AI to consider that in the end, no matter what type of prediction it is doing, AI is ultimately using statistics. (and stats are good for predicting some things and terrible for others).

    You can then turn this around and ask “do we have enough high quality & relevant data to use statistics to make an accurate prediction?” (or do the messy labs/people/data mentioned by tt prevent this?)

    1. Experienced Med Chemist says:

      Everyone knows that statistics is unhelpful in drug discovery. We produce too little data that is too noisy and context dependent to serve as anything but rough guidance. Trained medicinal chemists have to use their judgement to discard misleading experimental results or untrustworthy publications. It’s the only way to make progress – arse longa, vita brevis and all that.

      It’s also why I don’t pay attention to any clinical trial results that debunk homeopathy or crystal healing. Those results are obvious from biased Big Pharma shills who want to protect their monopoly. We know statistics can’t be used for drug discovery, and the fact that energy vibrations cured my aunt’s sister’s neighbor’s cancer is good enough for me.

      1. Phil says:

        I think your subtlety volume is too high. Please turn it down.

      2. sgcox says:

        Ars, not arse, I hope 🙂 Blame it on the spellchecker.

        1. Ursa Major says:

          I think they’re referring to the long tail of unreliable results.

    2. stats says:

      AI is not statistics, or at least that is a gross oversimplification. For instance, there’s really nothing ‘statistical’ about neural nets.

      1. tangent says:

        I wonder if this quote was a response to the previous generation of machine learning techniques — when logistic regression was basically as good as anything else.

        (a.k.a. single-layer neural nets)

      2. dtx says:

        To the comment “AI is not statistics, or at least that is a gross oversimplification. For instance, there’s really nothing ‘statistical’ about neural nets.” This isn’t correct.

        Whether it’s called AI or a neural net, anything a computer does is ultimately based in math. That’s all computers do – they make mathematical calculations. On top of the math, there can be a human interface that hides it, but ultimately that computer is only compiling 1s and Os. For making decisions, statistics is the only option. (i.e., even if you want a computer to instead just make a random decision, it still needs statistical programming that ensures it picks a random number).

        To take a more broad example: How do computers play chess? They can’t “see” the board or pieces. It’s purely based on mathematical representations of everything – even moves. E.g., a knight can’t me moved just 1 square because it gives the computer a mathematically incorrect result. We perceive the computer as “playing correctly”, but it’s really just doing math calculations and using statistical results make decisions.

        Hence to use AI to make decisions on drug development, we must translate results of everything, including complex in vivo biological reactions, into simple math that a computer can “understand.” This is infinitely more complex than chess. As a result, I’m skeptical about how far AI can take us with drug development.

  10. cynical1 says:

    From 2017, A Space Lab Odyssey: [On Derek’s return to the lab, after HAL – his uber AI med. chem. computer – has killed all his lab mates with phosgene after all the compounds synthesized were inactive]:

    “Look Derek, I can see you’re really upset about this. I honestly think you ought to sit down calmly, take a stress pill, and think things over. I know I’ve made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal. I’ve still got the greatest enthusiasm and confidence in the mission. And I want to help you.

    I have the answer now. I know what went wrong…………..Derek, have you ever heard of the Rule of 5?”

    1. Pennpenn says:

      “Look, we’ve both said a lot of things you’re going to regret. But I think we can put aside our differences. For science. You monster.”

  11. As a practitioner of AI drug design I get tired of both the hype some groups generate but also of the (not entirely undeserved) cynical comments that follow. Just throwing data at a deep neural net and expect it will ‘solve drug discovery’ is intellectual laziness but so is repeating mantras that ‘comp chem didn’t work back then and it never will’. Our company is very successful using AI in drug discovery and some of our competitors are no slouches either. None of us are mentioned in the brochure that was the subject of the original post, guess we generate not enough hype…

    1. Peter Kenny says:

      You say that your company is very successful at using AI in drug discovery but I’m guessing that most AI-based organizations would make similar claims. When a company spends a lot of money on acquiring a technology, it is in the interests of both vendor and customer that the deal is seen publicly in the most positive light. Some of the difficulties that computational chemists experience in gaining acceptance for their ideas can be traced to extravagant claims made previously by other computational chemists. Those of us outside the AI community tend to see a lot of hype but nobody within the AI community seems to be prepared to call bullshit.

  12. Peter Harris says:

    Good piece. Thanks for writing it.

    It will be a long time, if ever, before all discovery takes place “in silica.” While computational tools will develop and be an important addition to the overall technology suite, robotic systems are also continuing to lower the cost of actual experimentation. If experiment time value gets sufficiently low, the highest impact of AI on the discovery process will be in reagent reduction.

    On the other hand, as automation speeds the process of experimentation, researchers are increasingly drowning in data, which is not turning into information. Seems like step one for AI and machine learning in the drug discovery process should be to create higher efficiencies in analyzing existing experimental results rather than working on in silica models to sovle problems before experimentation. The latter will likely prove a much more complex, and arguably less urgent task.

  13. Gratified says:

    Perhaps we can create an artificial microbiome to process all the 💩 our AI overlords force feed us

  14. Peebles says:

    You forgot the company you put in your “hubris file” a while ago for wanting to do 100 treatments in 10 years. They just locked in another $60M today (recursionpharma.com).

  15. Anon the third says:

    “[…] considering that (at a guess) about 30% of the medical literature is useless or worse. Humans are needed to curate the data set that you’re feeding your software, and that’s a labor-intensive step.”

    I tend to disagree on that. Not on the 30%, there certainly is a significant amount of bad literature. But are you sure humans are better at sorting this out than machines? After all, many useless papers passed peer review by ~3 humans already.

    1. tt says:

      True…I wouldn’t default trust the judgement and curation by people either. I’m more concerned that not only is 30% of the lit. incorrect (it’s probably higher), it’s that we don’t have the equal data set of negative results to inform a machine learning alg. Who reports negative results today? From an AI perspective, those are of almost equal importance in a sense. Add to this is that there are no good reporting formats or structure for the data, hence we need to rely upon NLP and semantic analysis.

  16. How about ten? says:

    Six years – no problem. Baldoni and GSK say they can knock off ten years: “The aim is to use AI to cut development time down to a single year, from more than 10 in some cases, he says.” Show us what you’ve got John – release the kraken!

    https://www.wsj.com/articles/how-ai-is-transforming-drug-creation-1498442760

    1. BRAD says:

      Does anyone, apart from v senior management, inside GSK take Baloney seriously? What happened to his “seekers” initiative from years ago, see https://www.pharmaceuticalonline.com/doc/gsk-s-seekers-of-disruptive-innovation-0001. What great insights came from that?
      The guy is a joke and an embarrassment who has seriously screwed up large chunks of R&D. The sooner he and his very deficient appointees get the boot, the better!

      1. Petabye says:

        Obviously written by a disgruntled former employee full of the usual ad hominem attacks and off-the-mark analyses. GSK is an acknowledged pharmaceutical leader in the AI and Big Data spaces. There is a justifiable level of excitement about the possibilities that this is opening up. Mark Ramsey has put together an excellent team to exploit petabytes of Big Data. . Within a year his team have gone from a standing start to winning a prestigious award in the best rookie section (https://www.cloudera.com/more/news-and-blogs/press-releases/2017-09-11-cloudera-announces-fifth-annual-data-impact-awards-finalists.html). Rookies no longer! It just proves that amazing things happens when you combine strategic vision, innovative leadership, revolutionary technology and a top team. Furthermore, this is just the beginning of an exciting GSK journey into the possibilities of Big Data

        1. Anon the third says:

          The proving ground is not to convince some award committee but to deliver drugs.

          Other than that I agree, personal attacks are not helpful.

        2. As with Agilist and Baldoni, it seems GSK expends ~ 1 kilobyte of hyperbole per byte of data. That puts you well beyond petabytes into the yottabyte domain, as in yotta, yotta, yotta…

        3. compchem says:

          Wow. Just wow. I’m guessing that you have never yourself worked on any pharma drug discovery projects, Petabye.

          Well, I have, and I’ve also worked with plenty of AI / Big Data fanatics. What they have in common is a deep understanding of stats and algorithms, but no understanding of the problem or the data. AI – can work, but it often comes up with answers on artificial test sets that are statistically predictive for trivial reasons, and totally useless in practice. Big data – you don’t know what question you should be asking. Believing that you can somehow suck the questions as well as the answers from a sea of irrelevant data is ridiculous.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.