Skip to main content

Drug Assays

No Easy Road to Getting Rid of PAINS

Here’s a warning about trying to write compounds off too quickly as PAINs (pan-assay interference compounds). The authors, from UNC-Chapel Hill, have gone through the PubChem database using software structural alerts for such motifs. Despite the original PAINs list being derived from AlphaScreen interfering compounds, they found that most of the compounds that get flagged are actually infrequent AlphaScreen hitters, and that PAINs alerts did not seem to reflect greater activity in the other assay techniques listed, either. Update: took the bold step of actually adding a link to the flippin’ paper. That’s what happens when I post and then spend the rest of the morning in meetings!

They go on to point out (as have others) that there are indeed marketed drugs that set off the same alerts, and that similar percentages of “dark chemical matter” compounds, the ones that don’t seem to hit in any assays, also set off such software warnings. That’s food for thought, I have to say. Here’s the conclusion:

(The) PAINS concept has been widely accepted by many experienced medicinal chemists both in academia and the pharmaceutical industry. Indeed, the original study from which the PAINS alerts were derived and the impetus behind it are an important step towards reproducibility and the appropriate use of resources in drug discovery. However, our findings based on the analysis of public data suggest that many compounds containing PAINS alerts do not actually show high assay promiscuity, leading to the conclusion that these alerts should not be blindly used, in the absence of orthogonal experimental assays, to deprioritize a compound.

At the same time, it is undeniable that pan-assay interference compounds exist and care must be taken to avoid these compounds. Moreover, we recognize that true “PAINS” may be present in the data analyzed herein but have not been classified as such because the current alerts do not cover these compounds. The issue of what constitutes a pan-assay interference compound thus remains unclear.

In my own experience, there are indeed such compounds, and there are structural motifs that are more likely (compared to random chance) to lead to such behavior. Quinones are number one on that list (and this paper notes that they seem to be the worst offenders as well), but there are a few others. I take their point, though, that you’re not going to be able to strip out the interfering compounds with a long list of substructure alerts, because the signal/noise of that approach decreases pretty quickly once you get past a few stinkers. Trying to do it that way is tempting but lazy.

The only way to be sure that a compound is a false positive is to run orthogonal assays to check for aggregation, redox cycling, covalent reactivity, fluorescence interference, and all the other ways that things can go wrong. Those of us beating the drums for the PAINs concept are, for the most part, reacting against the papers we see that just seem to pretend that such problems don’t exist, and blithely publish on “active” compounds that excite great suspicion if you’ve seen a lot of screening hits before. At the same time, there is certainly an error in the opposite direction, where you blast through a list of compounds crossing things off automatically with no further evidence. That’s wrong, too.

In my view, presence of the (relatively few) structural motifs at the top of the PAINs lists should put a compound under suspicion. If you want to work on it because it’s the best and most promising thing out of your screen, go ahead – but run some other assays beforehand to make sure that you’re not wasting your time. These are the sorts of assays you should really be running on the best compounds from the other structural classes, too – these behaviors can crop up at any time. But while a rhodanine quinone should not be allowed to proceed without showing its papers, lesser offenders should certainly be given a chance to prove themselves. Assay data, real assay data, is the only way to be sure.

The paper finishes up with this call:

It would be of great value if a community-wide effort to screen and analyze a large set of commercially available compounds representing the all current PAINS alerts against multiple targets in various assays was performed by several independent groups.

I agree. I hope that something like this can be realized, but until then, the lesson is (and always has been) to get real numbers on your compounds, from several directions, before you trust them.

40 comments on “No Easy Road to Getting Rid of PAINS”

  1. question says:

    Is there a link to the PAINS paper, or maybe I missed it?

    1. FJ says:

      I didn’t see the link in Derek’s post either, but I found this paper, which has the quotes Derek posted above:

      Phantom PAINS: Problems with the Utility of Alerts for Pan-Assay INterference CompoundS Stephen Joseph Capuzzi, Eugene N. Muratov, and Alexander Tropsha J. Chem. Inf. Model., Just Accepted Manuscript DOI: 10.1021/acs.jcim.6b00465

  2. John Wayne says:

    A great way convince yourself (or a colleague) of the utility of any PAINS-like list is to compare it to structures of compounds that are the most important to medicine. The way these papers can help is by suggesting a specific mode in which a given compound may be tricking you.

    1. Peter Kenny says:

      There is no suggestion that PAINS are benign. The criticism of the PAINS filters is of the data analysis and cheminformatics.

  3. Neo says:

    The real problem here is that experienced medicinal chemists neither have a clue about how to build a binary classifier to detect PAINS nor the humility to ask a chemoinformatician to do that for them. Otherwise, they would not be still using error-prone filters. Unfortunately, nothing has been learnt from rule-of-5 filters…

    1. tlp says:

      Same is true about substructure-based molecular fingerprints and ‘similarity’ metrics derived from these

      1. Peter Kenny says:

        The problem is a bit more than an inability to use cheminformatic software. Would you consider a panel of 6 AlphaScreen assays to represent a good experimental design for detecting and characterizing pan-assay interference?

        1. tlp says:

          Of course not, but what I (and Neo) am pointing out is that med-chemists would readily jump into heuristic ‘press-the-button’ approach (e.g. rule of 5, LE metrics, etc) rather than trying to figure out underlying limitations.

        2. Neo says:

          Completely agree with tlp. Sure one has to have a meaningful definition of PAINS beforehand, that’s the medicinal chemist job. However, once this is done, predicting PAINS using a set of structural filters is obviously a very poor choice for a classifier. I am frankly amazed that the pharmaceutical industry has used and abused Ro5 filters for so long…

          1. Peter Kenny says:

            Neo/tlp, Ro5 is of no value whatsoever when optimizing Ro5-compliant compounds and the voracious appetite of the MedChem community for rules, guidelines and metrics points could be seen as diagnostic of a herding instinct. That said, it is a lot easier to denounce compounds as PAINS than it is to actually demonstrate that those compounds really are PAINS (it has been this way since HTS began in the early 1990s when PAINS were called ‘false positives’, ‘crap’ or ‘swill’). It can be instructive to ask how many of the hits in the 6 AlphaScreen assay panel of the original PAINS study were actually shown to be interfering with the assay or affecting target function by an undesirable mode of action.

            I certainly agree that substructural filters have limitations although I do have concerns about the machine learning approaches that are popular with cheminformaticians and data scientists. In particular, it appears to be difficult to find out how many parameters are used in a particular model (although this doesn’t seem to inhibit comparison of different models that may have been trained and validated using different data sets). If offered the choice between access to a machine learning model or the data used to train it, I would generally opt for the latter. My first step in using such a database would be to look for close analogs of the molecular structure that I wished to assess.

          2. Neo says:

            Your comment “it appears to be difficult to find out how many parameters are used in a particular model” shows that you know very little about predictive modeling. Regardless of whether you mean model hyperparameters or features/variables by “parameters”, these are clearly stated in any half-decent published study. Further, despite not being an expert, you still want the data set to model it yourself and be the one calling the shots. And that has been, and sadly continuous to be, an important part of the problem.

  4. anoano says:

    Problem with extremes, rejecting everything taking PAINS paper as gold standard, or going with all weird compounds possible, using this paper (or blog post) criticising PAINS as gold standard (this drug has this group so it is ok, fact).
    More reasoning would be great.

  5. DCRogers says:

    As a follower of Bayes, this would appear a straight statistical issue, and not a chemical or mechanistic one.

    Enough data should allow the sorting of PAINS wheat from chaff – I’m surprised ChEMBL doesn’t have enough already?

  6. Curious Wavefunction says:

    My perception of PAINS is simply that the pendulum now seems to be swinging to the other extreme, something that happens with almost every solution to a tricky problem. The initial recognition of PAINS came rightly as a response to the proliferation of papers claiming indiscriminate, ugly compounds as legitimate hits. This was a very valid concern. But the answer cannot be to automatically flag compounds as PAINS if they look similar to those from a limited dataset culled from just a few assays.

    The problem is that human beings are bad at probabilistic thinking and good at binary classification. Even the much hailed pattern recognition capability of human minds fails when confronted with fuzzy problem-solving where the goal is to assign probabilities rather than bin things into neat categories. In addition medicinal chemists can suffer from many biases such as the availability bias where signal, even signal derived from a very limited and biased sample, can dictate decision making on a much larger sample.

    1. Anon says:

      Yes, I’ve noticed I have a knee-jerk reaction to ugly compounds now which I’m trying to temper. I was looking at antibacterial literature recently and noticed a lot of ‘PAINful’ structures, but am curious whether these compounds really do act through useful mechanisms in bacterial systems – at least in the articles I looked at there wasn’t much follow up on either SAR or MOA. Maybe someone more well-versed in this area would be willing to pipe in, or share some recommended reading?

    2. Chris says:

      One of the attractions of PAINS was that many people had seen similar compounds cause problems in screens in the past. The easiest response is the have a pass/fail filter, what is better is to regard these as flags highlighting compounds that need further validation before they are prosecuted further.

    3. Peter Kenny says:

      Establishing that compounds that we believe to be indiscriminate and ugly really are indiscriminate and ugly is not as easy as it sounds. It’s also worth remembering that assay interference (where compound hits assay without effect on target) and undesirable mode of action (compound has effect on protein) are different behaviors. Cheminformatic data analysis tends to get talked up in two ways. First, the strengths of trends is exaggerated and this was the focus of our correlation inflation article from four years ago. Second, analysis is extrapolated out of its applicability domain and I have argued that this is the case for the PAINS filters. I have linked the first of a six post series (there are ‘previous’ and/or ‘next’ navigation links at the top of each) on PAINS filters as the URL for this comment.

    4. Chemystery says:

      As suggested in this article, descriptors for the original PAINS dataset could be calculated and statistical models could be built – without having to reveal the actual structures – to distinguish PAINS from Phantom PAINS

      If the pro-PAINS crowd really wants to make a good TOOL and not a RULE, they should be advocating for the release of these data!

  7. Cialisized says:

    As alluded to above, PAINS filter details are important. Here is an anecdote from within the past three years. Our biotech company submitted compounds to another organization’s screening initiative. But first the compounds were put through their PAINS computer filter based on structure (SMILES strings). The computer rejected hydantoins (perfectly fine groups) since they fit the pattern of acylureas which are bad (R-NHCONHCO-R’). They later corrected it I assume.

    1. Dionysius Rex says:

      Hydantoins might not be PAINS but i think the Stoichet Aggregator database might have quite a few in it.

  8. Chrispy says:

    One would not have to run very many screens on a compound library before the “frequent fliers” were identified.

    This seems like it would be more of a problem in academia than industry, where the lab may have only a few assays and a pressing urgency to publish.

    The big risk in industry would be to have Management set up some kind of brainless filter (a la “Rule of 5”) that throws out the baby with the bathwater.

    1. AcademiaMedChemist says:

      “This seems like it would be more of a problem in academia than industry, where the lab may have only a few assays and a pressing urgency to publish.”

      This is poignant statement. It’s so visible in our world. Traditional models of drug discovery and the “publish or perish” paradigm are difficult to reconcile, especially to the biologist in the colab. who may only screen for a little while. One group doing robust work is the VCNDD, but their center leaders hail from pharma..

  9. jbosch says:

    So per definition, it will be difficult to convince somebody that your PAINs compound is specific. Since PAINs = pan-assay interference compounds, how many orthogonal assays do you test to convince yourself (or others) that this is actually valid?
    Where do you stop?
    Enzyme inhibition with site-directed mutagenesis data supporting your binding mode of the compound + SPR kinetics wt + mutants + co-crystal structure with the ligand ?
    Next is of course, the on-target effect, so you go through the trouble of either a photoactive-derivative of your PAIN or a radiolabeled variant, first show that it works with purified protein via mass spec and then throw it at a cell lysate and repeat the mass spec. Would that be sufficient to convince people that that particular PAINs molecule is indeed on-target and acts the way advertised?

    Just curious what people in the pharma industry think about this.

    1. John Wayne says:

      At the end of the day, the only things that are important are efficacy, safety and profit; everything else is a placeholder.

    2. chris says:

      I would go the other way: prove that it doesn’t hit a bunch of other random enzymes. The general rule of thumb is that failing to prove you’re wrong is faster than succeeding at proving you’re right. Unfortunately there’s no hard and fast rule for when you reach either of those conclusions and I think that’s the heart of this article: be wary of yes/no heuristics

  10. Christophe Verlinde says:

    The authors conclude:
    “It is of great importance that reviewers and journal editors request experimental proofs of selectivity, such as orthogonal experimental assays, for hit and lead compounds reported in scientific manuscripts. However, the results of this study strongly suggest that such requests should not be based solely on the results of PAINS filters.”
    The very last sentence bothers me because in case inhibitors are solely based on ONE assay, then there are no orthogonal assay data available; hence, as a referee I should ask for such data in case of a PAINS fragment.
    One assay is ALWAYS insufficient as illustrated by a HTS campaign we carried out; 50% of the hits were deemed unreliable after an othogonal screen. see: Pedró-Rosa at al.(2015). Identification of Potent Inhibitors of the Trypanosoma brucei Methionyl-tRNA Synthetase Inhibitors via High Throughput Orthogonal Screening. J. Mol. Screening. 20: 122-130. [PMID: 25163684]

    1. Peter Kenny says:

      The journal policy appears to relieve authors of the responsibility to perform orthogonal assays for compounds that do not match PAINS filters. You have a situation where frequent-hitter behavior for structurally-related compounds in a six assay AlphaScreen panel can be used to devalue a concentration response obtained using an assay that is orthogonal to AlphaScreen.

    2. Morten G says:

      But with only *one* orthogonal assay how do you know if 50% of the hits are junk or if the orthogonal assay is junk?
      Obviously, there’s also the possibility that the original HTS assay was junk (with this combination of target and library) but I think that’s unlikely.

      1. Peter Kenny says:

        An orthogonal assay is likely to flag up interference (compound is hit in assay but has no effect on target) but but unlikely, without a concentration response, to diagnose an unacceptable mode of action. It’s worth thinking about whether inhibition results at fixed concentration can be used to declare a concentration response to be invalid. I have linked a blog post on the Nature PAINS article as the URL for this comment.

  11. Peter Kenny says:

    Even in the early days of HTS, people were keenly aware that not all screening output smelled of roses and the problem then was (as it still is now) how to demonstrate that what we thought was crap really was crap. A common response to criticism of PAINS filters is typically a straw man argument that the critic is asserting that compounds matching the PAINS filters are benign. This misses the point that the data analysis in the original PAINS study simply does not support the rhetoric. The PAINS filters provide reassurance that perceptions of a compound’s unwholesomeness are based on fact. I have challenged the use of PAINS filters to define J Med Chem editorial policy and I link that blog post as the URL for this comment.

  12. Paul Workman says:

    This is an important discussion. The PAINS filters are clearly useful in identifying potential problem compounds but they need to be used judiciously and a key point is to actually test compounds for the potential problems they may have. So I agree with Derek that “the lesson is (and always has been) to get real numbers on your compounds, from several directions, before you trust them.”

    I’d like to respond to AcademiaMedChemist who refers to the following:

    “This seems like it would be more of a problem in academia than industry, where the lab may have only a few assays and a pressing urgency to publish.”

    And then comments:

    “This is poignant statement. It’s so visible in our world. Traditional models of drug discovery and the “publish or perish” paradigm are difficult to reconcile, especially to the biologist in the colab. who may only screen for a little while…”

    I think it’s a real problem in the use of compounds and especially claimed chemical tools that individual academic biologists will often not have easy access to a broad range of assays needed to test for PAINS properties and also to test for selectivity versus promiscuity. This calls for sharing assays and data across the research community. In that regard I think the developing Chemical Probes Portal (; COI I am on Board of Directors) can play a valuable role.

    This discussion is very relevant to efforts to help the biology research community to be more critical of the chemical tools they use (see also my blog on this at and the Arrowsmith et al paper in Nature Chemical Biology paper PMID: 26196764).

  13. Anon says:

    So now we have to deal with “PAIN-PAINS” = false false-positives!

  14. loupgarous says:

    Eventually you still hit Paracelsus’ maxim: the dose determines the poison. In fact, you can hit it in at least two ways, it seems to me (disclaimer – I’m not a med chemist, just an interested bystander):
    (1) screen something out because software analyses indicate toxicity despite the possibility that said toxicity may show up in animal models at much higher doses than the desired therapeutic effect (the drug could have a good therapeutic index) and
    (2) screen something out despite the fact that its actual mode of action is to be the prodrug for a PAINS but is only so converted in target cells where the toxicity is either much less important than its on-target effect, or is actually what you want it to do in the target cell.
    An example of a drug class that crashed with a loud sound because of Phase III-detected toxicity lately is Cathepsin K inhibitors – tested for osteoporosis, induced cardiac injury. Yet, there are papers asking if these might still be useful in glioblastoma, where a narrower therapeutic index might well be acceptable (and local delivery of the candidate to the tumor could minimize off-target toxicities). Compounds that show good selectivity for Cathepsin S are being looked at for treatment of tumors, so one definitely has to consider whether hits in screening software which identify potential toxicity or even unfavorable clinical experience with a given family of drugs for conditions for which less toxic drugs exist rules their use out where effective treatments for deadly diseases just don’t exist or are more toxic than the candidate.

  15. David says:

    Hey Derek – came across this post:

    Whatever happened to that project?

  16. Rhodium says:

    Off topic, but your readership should appreciate this.
    R.I.P. Professor Irwin Corey. An EJ Corey postdoc once convinced his parents that he was working for Irwin Corey.

  17. Kip Guy says:

    My experience with this has been that, outside of aggregators and highly reactive electrophiles, what makes a PAIN is as undefinable as what makes a good drug — that is to say you can’t look at it and predict reliably how selective it will be. I think the key thing is to use the computational methods to identify a risk in a hit and then to include orthogonal assays early in your hit validation. One that we have seen repeatedly is supposedly ok structures degrading to PAINS in assay buffers. But also plenty of examples of “reactive” molecules that are perfectly well behaved — even in vivo

  18. Dominic Ryan says:

    Why do we run an HTS? To find chemical starting points, not drugs!
    If you run any reasonably sized HTS you are probably going to spend in the neighborhood of $1MM. Deciding what to exclude without considering the rest of the issues is foolish.
    There are a lot of factors that affect how you think about the results of a screen. Some of them are the cost of acquisition and maintenance. There is a place for some reactive species but don’t treat them like the others. Are follow up assays quick and inexpensive or is your second triage an animal model? If the secondary assay is fast and cheap then I would worry a lot less about that initial problem. If instead you only have an animal model then you really want to be confident that the next results mean something.
    Are you looking for hits on a new type of target, is novelty a high premium? Maybe you do something more with that iffy hit as long as you are not committing 6 medicinal chemists and a whole team to 6 months of work based on an iffy start.
    In my view you have to weigh the cost of getting good assays online vs the cost of a team flying just blind enough to convince themselves to spend another year at it and all that in the context of how important that project is.
    Keep the rules coming because that should make people think about the issues. Paraphrasing what others have said, if they let you stop thinking then you should not be in this business.

  19. Willem says:

    If you read the original PAINS paper ( ) you’ll soon realise that the data set is limited (so that is not news), and that the authors used a three-tiered designation (A,B,C). These categories have diminishing evidence going from A to B to C – so perhaps the conclusions from the Tropsha et al. paper are not that surprising.

    Such motifs occur in drugs indeed – eg. aminothiazoles (reported here as dodgy buggers: ) can be found in kinase inhibitors like dasatanib. No news there either.

    Now what does that mean? It means just what it says on the tin – if you screen such compounds and it’s a hit, there is a risk that your hit is a false positive (FP), so you should be careful. That risk of being an FP is heightened for a lot of these substructural alerts, but substructures can make a rather sweeping statement and be wrong depending on context (effect of substitutions, physicochemical properties, reactivity etc).

    That’s the reason we’ve gone with a complementary data-based approach that is compound-based. It’s still not ideal, but at least you know that a particular compound may be trouble if you have the data to support that conclusion ( ).

    It’s the game of acting on risk very early in lead generation – what will you be likely wasting your time on when working it up. No more, no less.

  20. Saayak says:

    Is there a software or website for filtering PAINS?

Comments are closed.