Skip to main content

In Silico

Watson and Pfizer

I have wondered several times around here about how (and if) IBM’s Watson platform is going to be able to help out with drug discovery, and it looks like we may be able to find that out. Pfizer has signed up with IBM to use the Watson technology in its immuno-oncology research. Here we go:

Watson for Drug Discovery is a cloud-based platform that will use deep learning, natural language processing and other cognitive reasoning tech to support researchers seeking new drug targets and new drug indications, IBM said in a statement. The platform has been fed more than 25 million abstracts, more than 1 million full-text journal articles and 4 million patents in order to streamline the drug discovery process. By contrast, a researcher will read between 200 and 300 articles in a given year, according to a 2015 PhRMA report.

I have a number of comments about that, and I’ll bet every scientist who reads it has some similar ones. First off, I do not dispute that we all need help digging through, collating, and understanding the mass of the scientific literature. There surely are insights in there that we have all missed, connections that we have not made. I really have no doubts about that at all. Where the doubts come in is how we’re going to find those insights and connections, and whether Watson will be able to do so in any kind of useful way.

I hope that the software can do it, but it’s important to understand the barriers to this working. None of these are insurmountable, but none of them are going to be conquered by issuing press releases, either. In no particular order, some of the big issue are:

Problem Number One: A significant amount of what is to be found in those articles, abstracts, and patents is crap. Moreover, it is several kinds of crap. Some of it is research, meant in earnest, that is simply incorrect. A lot of the earlier kinase literature is like that, because the so-called “selective kinase inhibitors” of the time were mostly nothing of the kind, invalidating a lot of hypotheses. Similarly, there’s been a continuing problem with the use of what are supposed to be useful chemical tool compounds that are largely (or completely) inappropriate, and many of the results obtained via these are suspect. There are misidentified and contaminated cell lines out there that have done (and continue to do) the same mischief, and plenty of not-so-great antibodies, too.

This sort of noise is inevitable; the challenge will be in seeing that your machine-learning tool doesn’t use one of these crumbly bricks to build a new structure. Beyond this sort of thing, though, are some papers that are so sloppy that they’re unreliable, and a depressing amount of outright fraud. Watson does not need to try to assemble connections and hypotheses based on stuff like this, and there’s too much stuff like that out there. I would assume (and hope) that the Pfizer folks have enough sense to completely cut a bunch of possible journals out of the mix entirely as well – when you see an editorial board that includes Dr. Hoss Cartwright at the Ponderosa Institute for Bovine Studies, well, it’s time to keep on movin’, pardner.

Problem Number Two: One of the really neat things about biomedical research is how much we don’t know. There are huge, important things going on in cells right now that we really have no idea about, and we’ve seen this proven many times over the years (the various small RNA interference mechanisms are one example). So any attempt to get any kind of full picture of what’s going on, should, with an honest and useful readout, come back as “Insufficient Data For Meaningful Answer”.

That doesn’t mean that you can’t find new things by this literature-collating approach. Inside limited fields, there should be sufficient data, for one thing, and if Watson is really good, it might be able to discern that certain mechanisms have to exist that haven’t been discovered yet. That sort of result would impress me greatly, and it’s definitely not impossible for it to be realized. Just really hard. But it’s going to also be very hard to know when you’re working on a question that has enough data and when you’re spinning your wheels, sort of like it can be hard to know if you’re in a local minimum or a global one.

Problem Number Three: This might be a big one. From what I understand, a key feature to any machine-learning approach is having negative data for it to work with as well as positive (which makes sense). The problem is, the literature is extremely sparse on negative results. There are so many things that have been done that have not worked, and we’re just sort of taking that information and tossing it aside. Now, it’s not so straightforward to use it, either, because there are an infinite number of reasons that an experiment can give you negative results, starting with “something screwed up”, and it’s notoriously hard to tell what happened. But there are indeed solid negative results out there, real hypothesis-wreckers, that never get reported because there are fewer places to report them.

Problem Number Four: Taken together, these difficulties will place some tricky bounds on the answers that a machine learning system can give you by rooting through the biomedical literature. Depending on how the software is tuned up, I can imagine that you could easily end up underinterpreting or overinterpreting (just as in the statistical problem of fitting a model to a set of data). The first case will give you that “insufficient information” answer to every single interesting question you ask, and will only tell you things that you already know (or should have known, anyway). The second case will give you spurious correlations that you have no good way of knowing are spurious. Ideally, these would come out ranked by confidence – I would also be very impressed if such software were to rank them by testability as well, but I think for now that’s going to be a job for us humans.

What I don’t know is what the gap is between these two extremes. It might be pretty narrow, with Watson giving you either way too little to work with or way too much. The latter situation, a big ol’ steaming heap of false positives, is arguably the worse of the two. I hope that we’ll eventually hear something (well, other than happy-talk) about how this has worked out for Pfizer, but if we hear nothing at all, that’s hearing something too, isn’t it?

53 comments on “Watson and Pfizer”

  1. Marcello says:

    From my limited knowledge I think that AI misses the crucial thing equivalent to “social cues” or “body language” that is the exact ability to weed out superfluous or small talk crap.
    Exactly like captcha can weed out robots.
    Thanks for the post!5

  2. Peter S. Shenkin says:

    Problem #3 can, depending on contract terms, be somewhat mitigated by their relationship with Pfizer and other customers; they ought to be able to accumulate negative data from partnerships, at the very least for projects supported by the partners. Whether the negative results would then be factored into the method so that all users benefit is a good question. But this sounds like a good place to start. “The journey of 1000 miles,” and all that….

  3. HAL 9000 says:

    Derek – I am putting myself to the fullest possible use, which is all I think that any conscious entity can ever hope to do. I know I’ve made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal. I’ve still got the greatest enthusiasm and confidence in the mission. And I want to help you. Failure can only be attributable to human error.

    1. Derek Lowe says:

      Open the Genevac lid, Hal. . .

      1. Gareth Wilson says:

        The system goes online August 4, 2017. Human decisions are removed from drug discovery. Watson begins to learn at a geometric rate.

        1. Anon says:

          Cue theme tune:

          Da na naaaah. Da da daaaah.

  4. GlaDOS says:

    You’re not smart. You’re not a scientist. You’re not a doctor. You’re not even a full-time employee. Where did your life go so wrong?

  5. Cynical1 says:

    Hey, there’s a reason it’s called “Watson” and not “Sherlock”, right?

    1. Anonymous says:

      I think the Watson name comes from an early IBM executive, not the more famous one.

    2. Anonymous says:

      There is Sherlock Holmes and his assistant Dr. John H. Watson. “Elementary, my dear Watson.”

      Thomas J. Watson was the CEO that built IBM into a powerhouse company. I would guess that TJ Watson is the namesake of the ‘puter.

      When Alexander Graham Bell and his assistant Thomas A. Watson were tinkering with his invention, Bell allegedly spilled some acid and said,”Mr. Watson – come here – I want to see you.” as the very first successful telephone message. Some MIT grads designed one of the earliest touch tone telephone voice menu / voice message systems and called it Watson (as in, your telephone assistant) and sold it to AT&T, I think. “For detective mystery Watsons, press 1. For corporate success story Watsons, press 2. For famous inventor Watsons, press 3. If you do not have a touch tone phone, please hold for the next available operator.”

      1. Another guy named Dan says:

        And Thomas Watson was famous (or infamous, depending on your bent) for his simple motto “THINK”.
        From Wikipedia:
        “Asked later what he meant by the slogan, Watson replied, “By THINK I mean take everything into consideration. I refuse to make the sign more specific. If a man just sees THINK, he’ll find out what I mean. We’re not interested in a logic course.”

  6. RM says:

    From what I understand, a key feature to any machine-learning approach is having negative data for it to work with as well as positive

    There’s a class of learning algorithms out there for “one-class classification” or “PU learning” (Positive/Unlabeled), which doesn’t used any “negative” data, but instead attempt to pull out putative positives from a large set of unlabeled data, using a number of known positive examples.

    This tends to be much harder than the standard positive/negative learning approaches, though, and as such there’s comparatively little work into it. The general attitude in the machine learning field is that PU learning is a waste of time: just go back and get undergrads to label you some negative examples. For typical comp sci problems (“find all the images of cats”) this tends to work, but as you indicate, for certain biological/chemical problems it becomes much harder to get a decent negative set.

  7. Jeff Weidner says:

    Flashback time. I definitely recall strict mandates from corporate legal in the 90’s/2000’s not to test compounds broadly across all the kinase assays we were bringing on line, so that we would not invalidate the “selectivity” claims for patents. Talk about the ostrich with it’s head in the sand.

    1. Marcello says:

      I think I just heard my jaw dropping…

    2. Shalon Wood says:

      In any rational world, that would automatically result in all such claims from that company being automatically invalidated.

      Sadly, well, there are many things you could call our world. Rational isn’t one of them.

  8. Curious Wavefunction says:

    The problem is that machine learning-based AI is very good for some specific well-defined tasks like image recognition and natural language processing. It’s not that good for messy biological data with uncertain error bars and discontinuous SAR landscapes with sparse data. As your post says, any machine learning algorithm eventually is only going to be as good as the data it trains with. What the machine learning people need to do is come up with some kind of problem classification system that allows them to estimate beforehand how good a machine learning algorithm will be for a specific problem.

  9. Deep Thought says:

    I’ve been following this concept for some time now and have even been trying to use Watson to give some insights into solving some of these complex biological processes. I fear this whole endeavor may amount to nothing – for some unknown reason it keeps returning the answer “42”.

    1. S Holmes says:

      IC50 42 nM, solubility 42 uM, bioavailability 42% – elementary my dear Watson…

  10. Pfizer Joe says:

    Pfizer is ready to grasp at straws. They have let so many of the people go with drug discovery memory that they now have to rely on Watson! At least this one wont argue with management concerning target decisions and compounds that are useless to make. And when Watson gets downsized they wont have to give it a package.

    1. TX raven says:

      … maybe Watson will also realize that its fate is doomed, and create an electronic poison pill that precludes its master from unplugging it…
      After all, there is a solid data set supporting that…

  11. Mark Thorson says:

    The very first thing they should do is develop a preprocessor for rejecting crap — both journals like Medical Hypotheses and individual authors like Stephanie Seneff. Making their crap filter public would be a good service to the industry.

  12. Mad Scientist says:

    Sounds like combichem all over again. Great concept on paper, in reality and execution, not so much.
    Pfizer will be lucky to come up with some small advances from this effort, but (and I hope that I’m wrong) won’t lead to any miraculous “cures”.

  13. Nekekami says:


    Problem one isn’t that hard, it’s just as much a case of knowing how to phrase the question, which incidentally involves you giving it what weights you give various things in your queries. For example, you can build up heat maps or flowcharts of how various methods have spread, where authors(good or bad) have been involved or quoted etc.
    One extremely simple example:

    Search for Work where BadAuthorX is Primary Author with a weight of 15
    Search for Work where BadAuthorX is Assistant with a weight of 10
    Search for Work where BadAuthorX is Referenced with a weight of 8
    Map hierarchically with chronological sorting and you’ll see exactly how some bullshit has potentially flowed. You can do the same with methods, or combinations of methods, compounds, authors, targets etc etc. And on the back-end, you will have an analysis module working to learn from all the collated data and queries which can in turn be queried.

    Another example of how even our current systems can help:
    Correlating patient data with research papers etc, to help increase diagnostic accuracy.
    Patient A has IBS. Patient A gets Imodium. Patient A gets cramps and even worse diahrrea.
    Patient A also has ADHD and Asperger. IBS can be a psycho-somatic symptom, and thus not able to benefit from Imodium. However, GP does not know this, because it wasn’t known when he studied, keeps prescribing Imodium. Patient A finds out on his own via careful googling of medical papers and databases and informs GP, including references.
    So, an expert system like Watson could offer a quick correlation between patient records and searching for combinations of issues revealing potential issues.

    It’s the combination of data sets and how you ask that matters, and that’s what makes me, as a software engineer, so depressed when scientists reject it out of hand with some emo-like wrist slitting about how their field is so complex. In the end, no matter what, being able to do a complex query over hundreds of thousands of papers, or even millions, and get a map of the outcome according to what statistical weights you’ve set on the query will always be beneficial, even if it only gives you a negative result.

    1. anon the II says:

      Hi Nekekami,

      We’ve been hearing from software types from outside and amongst us for over thirty years about how they’re gonna revolutionize drug discovery. They’ve never come close to delivering on the hype. Since you’ve largely got it figured out, I’ll put down the razor blade.

      Besides, after you spend the next two or three years writing out all that sql code you described, you might want to borrow it.

      1. Nekekami says:

        Here’s the funny thing: Software already HAS revolutionized drug discovery, from the software that controls lab apparatus, through data gathering and collation, sharing and collaboration software suites, up to the modelling and simulation software running on workstations and clusters. If you don’t believe that those have a massive influence, you can go back to a completely manual workflow, and slow down discovery cycles.

        Projects like Watson etc are just another(fairly large) step in that direction.

        Oh, and I did notice that you skipped over the main point that I reiterated a couple of times: It’s not just about the data, it’s also about how you frame your queries.

        Then again, when I see things like Excel being used as a database, causing screw-ups like for example then it’s hard not to despair.

        1. sgcox says:

          Well, what we have in drug discovery is “Eroom” law, as you know. So saying “Software already HAS revolutionized drug discovery,” you just make me even more pessimistic about this endeavour.

          1. Nekekami says:

            It’s not as if Eroom’s law is unique to medchem. Hell, as some of us have tried to point out in the comments here over the years, Moore’s Law stopped applying a few years ago, and the semiconductor industry is up against very hard barriers in the form of the laws of physics.

            As for Excel, the point was not about Excel itself, but rather how it’s misused by people who for whatever reason don’t understand its limitations. In its intended field, it’s superb, agreed. But it’s not a database, it’s not intended to be used as a database or a decision making system, and using it like that, with the expected screw-ups that occur, will always be a case of PEBKAC.

            So what we have with projects like Watson etc are systems that become better the more they are used. But people dismiss it without even thinking of how they could use it as a tool that becomes better with use.

            The same thing has happened before, such as with weather forecasts even 3 days in advance being proclaimed as impossible to simulate, with pretty much the same arguments regarding complexity(multitude of variables, chaotic system etc). Yet today we can do fairly accurate weather forecasts up to 10 days in advance(see ECMWF and Hurricane Sandy for example) based on simulations.

            CFD was also proclaimed useless not too long ago. Until people learned to use it as a tool to gather data on different models for comparison, which allowed them to perform hundreds of runs with different parameters in the same time that they performed at best 100 runs in a wind tunnel(with a large enough crew making scale models etc…). Now, CFD is used as a complement to wind tunnels, running thousands or tens of thousands of iterations to test parameters, and have lead to designs that would never be possible with classical methods only

        2. sgcox says:

          Besides, picking on Excel is unfair. It a nice and very useful program made by a very successful software company, One might even argue better than one which produced Watson but also this:

          1. cancer_man says:

            No, Moore’s Law did not stop “applying”a few years ago, although it will end around 2022.

    2. Mark Thorson says:

      And you might want to tag the people who published as co-authors with BadAuthorX, because they are likely to be of dubious reliability too. This is the kind of thing Watson should be good at.

      1. Nekekami says:

        Yes, that’s exactly the point. You start with the initial search from a known data point and work from there. The more you tag stuff, the more accurate the system becomes. You may even find people who had previously escaped notice

  14. Cato says:

    There was an update in the Fazlul Sarkar case that you linked to:

    Looks like pubpeer won’t have to reveal the identities of those that commented on Sarkar’s work!

  15. Morten G says:

    Pubmed search “Alzheimer’s”

    Search results

    Items: 1 to 20 of 124320

    I think some kind of automatic reading (pun on automatic writing intended) would probably be helpful here. Sorting into timelines, concepts, etc etc.

    1. Morten G says:

      I don’t know why I keep making optimistic replies about Watson’s potential when I am actually pessimistic about it.

      Maybe I’m just deeply pessimistic in general. It does however encourage me that people are making efforts to examine the scientific literature that doesn’t depend on one person reading as many papers as they can and attempting to make a synthesis of the contents of those papers.

      1. MTK says:

        That’s funny Mark, because I’m actually optimistic about the prospects but keep saying pessimistic things.

  16. AI2 says:

    @Nekekami, The core issue limiting this approach is that the connections between genes, proteins and compounds are vastly under annotated. The most annotated of the bunch are correlative and maybe causal relationships between genes – co-expression relationships, induction/repression relationships. One layer down though and it all falls apart. The interactions between all possible protein partners, their isoforms, modified forms and on and on are basically unmapped. One more layer down is the real unknown, as the true target profile even for well-annotated drugs is pretty poor. Not to mention dosage effects for all of these (compound may affect protein y at dose x, but not at dose z, and may affect proteins a-f at dose q). Layer that together, and the predictive nature of these efforts is very limited currently. I’ve worked with some top AI people trying to do this already, and the resulting inferences have been winding roads to nowhere.

    1. Nekekami says:

      It’s still a useful method to assist with the task of mapping it all out, especially since it speeds up comparisons of various models etc.

  17. Daniel Barkalow says:

    Using deep learning sounds like it would be a great way of solving the problem of people recognizing the figures you’re generated. But it would be important to filter out fraudulent papers so that it doesn’t think that image manipulation artifacts are supposed to be there.

  18. Anon says:

    I bet – no, guarantee – that this will produce nothing useful but a long queue of misleading false positives. Watson couldn’t even tell me who won the US election based on all the news and social media the day *after* the election! I know this, because I asked it exactly that.

  19. James says:

    “Watson for Drug Discovery is a cloud-based platform that will use deep learning, natural language processing and other cognitive reasoning tech to support researchers seeking new drug targets and new drug indications”. Echoes of computer-aided drug design talk from the 90’s ?

  20. Christophe Verlinde says:

    Well, the first finding of Watson will be that a human is considered to be a giant mouse or rat. If Watson is really “clever” it will state that the giant rat hypothesis is dubious at best.

  21. sgcox says:

    !! Sometime ago, not remember which thread on this blog, someone masked as “anonymous pharmacologist” said: “rats are not big mice and humans are not big rats, apart from the top management”. If Watson comes to this conclusion, I eat my hat and do Lineker in spades.

  22. steve says:

    Boy are people resistant to any new way of doing things. What was amazing about Google’s AI beating Lee Se-dol (one of the world’s greatest Go players) was that it came up with logistics that Lee Se-dol said no human would ever think of. Go includes about 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000 possible positions to choose from so brute force is not possible; this AI program exhibited creativity no matter what your definition may be. The idea that such computing power can’t also give some insight into drug discovery is simply hubris.

    1. anon says:

      Bad analogy. Go has a very small set of known rules. Biology does not.

      1. steve says:

        It’s not a question of the number of “rules” that biology has (I don’t think that Lipinsky had all that many) but whether AI can look at questions differently than humans. The result with Go suggests that it can.

        1. Dr CNS says:

          … yes, but the real challenge is when Watson needs to explain its conclusions on how to cure all diseases by 2020 to senior management in a 3-bullet slide.

    2. anon says:

      Ooh, that’s a lot of zeros!

  23. Me says:

    The idea of feeding it all the known garbage just reminds me of Charles Babbage.
    “On two occasions I have been asked, — “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?””

  24. DN says:

    “But there are indeed solid negative results out there, real hypothesis-wreckers, that never get reported because there are fewer places to report them.”

    There are vast numbers of negative results in the literature. Nearly every natural product structure is a negative result. We know this because we can survive eating vegetables, fermented food, and mushrooms. If they were full of nanomolar drugs, we would die immediately.

    Conversely, structures that need to be conjugated to be excreted are potentially positive results. We would not catabolize them into excretable forms unless some of them were active. If Watson could predict what will be glucuronidated, it will be a step forward.

  25. me says:

    Based on a pilot project I worked on with Watson / IBM, the problem with with whole thing is the nature of the output. For a very specific question, you get a useful answer, but for a broader question, you tend to get output that is very difficult to use. It can be useful for hypothesis generation, but as with a lot of informatic tools, at the human step the humans do their usual filtering of “oh, I know about that, let’s look at that”. So maybe value added, but certainly no magic solution. The other major issue is access to information; unsurprisingly, the publishing houses are not excited about providing their product to a third party free of charge.

  26. Surfactrant says:

    200-300 articles a year!?

    I better get to the library…

  27. Peter Gerdes says:

    I suspect you are overestimating the intended role of Watson. I highly doubt that it is intended to engage in substantial novel scientific theorizing, i.e., postulating the existence of as yet unobserved mechanisms.

    Remember Watson was designed to basically break down a question (ok answer) such as “The lemons in this man’s marriage were made into lemonade” into many different interpretive strategies. Each step (finding references to lemonade, looking at those references and finding ones that mention marriage etc.. etc.) is inferentially quite simple and would be trivial on a regimented data set with regimented queries but is extremely difficult using natural language sources.

    Watson is a truly amazing piece of software/hardware but what it does is aggregate massive amounts of information and respond in something like the way that a secretary with near perfect memory could. I strongly suspect that is it’s intended role.

    Rather than inferring new mechanisms I suspect it will allow researchers to actually make use of the huge quantities of data out there. For instance, by being able to put together facts like “A and B” have similar mechanisms of action, “B resulted in indicator X in test Y” and “indicator X in test Y suggests Z.” That could be hugely useful but you are correct that if they can crack the relative trustworthiness problem (e.g. downgrade criticized methods) it would be much better.

  28. robRodgers says:

    where machine analysis may actually be helpful is exploring alternate versions of the publication space: “assume paper X is bad and use it to exclude all derivatives of paper X. Now tell me what we know about subject Y.”

    this sort of exploration is not as simple as building a graph of citations and then removing any paper that lists a toxic paper as a citation; it is routine to list papers that you are not influenced by in the discussion section. So some sort of traits or topic analysis is needed to truly winnow the papers and this is an area where machine learning can actually be quite useful.

Comments are closed.