Skip to main content

Chemical Biology

Chose Your Controls Wisely

I’ve been meaning to write about this paper (open access) on some problems with chemical probes, and now’s a good time. There was a well-known article a few years ago about “The Promise and Peril of Chemical Probes”, and this is a deliberate follow-up, starting with its title. Even if you don’t give much of a hoot about chemical probes, the point it makes is worth keeping in mind.

All chemical probes have some off-target effects. Some of the things sold as such in the catalogs (and referenced in the literature, to this day) have so many of these that they’re worthless. In fact, they’re worse than worthless; they cause active confusion and harm. One way to validate your experiments with the more useful ones is to use a very closely related compound that doesn’t have the desired probe activity, in the hopes that it will still maintain the off-target binding activities. It would be nice if every chemical probe had a selection of these available, but that is not the world we find ourselves in.

And what this new paper says is that even some of the ones that people have been trusting are. . .not so great. For example, A-196 is a pretty well-validated probe for the SUV420H1 and SUV420H2 enzymes, which are specific lysine demethylases. A similar compound, A-197, is sold as a negative control for experimental validation. But if you profile the two compounds against a wide panel of receptors, enzymes, transport proteins and so on, you find that A-196 has six off-target activities on that list, and A-197 not only loses activity against SUV420H1, but it loses activity against five of those other targets as well.

The same goes for another epigenetic probe, NVS-MLLT-1. It has six off-target activities in the profile screen, but the compound that’s suggested as its negative control, NVS-MLLT-C, loses activity at five of them, too. To be sure, all of these six targets are being hit at low micromolar range or a bit below (depending on your assay technique), whereas the interaction with the probe target (interaction of histones with the the YEATS/MLLT proteins) is about tenfold more potent. But the recommendation is to use this one at 5 to 10 micromolar concentrations in cell assays, so depending on the tone of the off-target systems, you could still get misled.

Many of these negative control compounds change out a charged atom (such as a nitrogen) for something uncharged, like a straight carbon or a basic-nitrogen-to-amide switch. And the problem, which is going to be very hard to completely escape, is that these sorts of atoms are likely going to be involved in binding to those other targets as well, so you can’t get as clean a slice in the activity profile as you’d want. This paper only looks at four compounds (and their suggested controls), but tries to extend the reach computationally. The authors pulled data from the PDB on identical ligands bound to different protein sites, and then looked at the predicted effect of adding a methyl group to such a compound by systematically replacing every plausible hydrogen with a methyl.

An example of what happens is with the JAK2 inhibitor TG101209, which also binds BRD4. There are 25 possible methylation sites, and five of these would be predicted to knock out the JAK2 binding. But four of those would also be predicted to knock out BRD4 binding as well. So there’s a significant risk that plausible negative control compounds around TG101209 would be less useful than you would want, whether you’re approaching this from the JAK2-binding side of the story or the BRD4-binding side. The authors extended this to 41 other such pairs, using protein binding domains that were as dissimilar as possible, and found an average 50% chance of trouble with this methyl-substitution idea. That leads one to think that there’s generally a substantial chance of a given negative control compound modification wiping out on other unrelated targets as well.

What the paper ends up recommending is developing at least two chemically distinct probes for every target, and while that is going to involve a lot of work, I think that they’re right: that’s the only way that we’re going to be (reasonably) sure that we’re looking at what we think we are. They note the example of LLY507, a SMYD2 inhibitor probe that kills off glioblastoma cells. But before you run out and publish a paper on the importance of SMYD2 in glioblastoma (it’s another lysine methyltransferase for histones), or worse yet try to use such an inhibitor as a drug for the disease, you might want to know that a structurally different SMYD2 inhibitor, BAY-598, doesn’t show that cytotoxicity at all. So a selection of probes would be rather handy, and comparison of the negative controls for both such compounds would be, too. We have enough noise in the literature already, right?

18 comments on “Chose Your Controls Wisely”

  1. Rock says:

    Another issue which is not as wide spread as it was 20 years ago is species differences. Researchers often used rat or mouse protein because it was easier to generate (as I said, not an issue these days). People then used those probes to study human homologs assuming the same high level of specificity which was often not the case. Conversely, it was also common to test human selective probes in vivo in rats, mice, rabbits etc without checking selectivity in that species. What a mess it made of the literature.

    1. Roll says:

      What is not an issue? Generating the protein or having species selectivity?

      I think you are right: generating receptors for non-human species is easier and the corresponding in vitro assays are generally developed and compounds tested.

      Yet, oftentimes compounds have different SAR in different species, including functionality (e.g., agonist in mouse, antagonist in human).

      Our reader, Prof Bryan Roth, published a commentary on a 5-HT6 receptor study decades ago, entitled “Why mice are neither miniature humans nor small rats: a cautionary tale involving 5-hydroxytryptamine-6 serotonin receptor species variants” ( available here:

      1. DREADDing IT says:

        Speaking of issues with chemical probes, it appears that the DREADD system and its actuator clozapine-n-oxide have had their own issues. Is CNO actually inert? What is the ‘true’ ligand of DREADDs?

  2. sgcox says:

    Negative controls are underrated. They are very useful and can even be approved by FDA, like aducanumab. Now we only need to find for what drug.

    1. The Blue Hornet says:


  3. PastTense says:

    Data for the latest Covid vaccine are out: Curevac only has an effectiveness of 47% in a late stage trial and blamed the poor results on 13 virus variants in Latin America and Europe where the vaccine was tested. Final analysis will be in two to three weeks.

  4. hse says:

    To further complicate things, using as an example the structurally unrelated SMYD2 inhibitors LLY507 and BAY-598 that give different phenotypic readouts, one can imagine scenarios where structurally distinct probe pairs are indeed exquisitely selective and potent for the same target X, but give different cell readouts because they modulate target X in different ways upon exquisitely selective binding.

  5. Kevin says:

    Perhaps the lesson should be, if it’s *really* important, verify your result with an orthogonal assay.

    If it looks like inhibition of Protein X is important, don’t just “confirm” with a second small-molecule X inhibitor. Do the harder work of knocking it out, or otherwise modulating its expression. Or do an experiment that affects something up- or down-stream of X instead.

    Do those options have potential pitfalls too? Definitely! Will you learn something? Almost certainly.

  6. Mukesh Prasad says:

    Maybe you can address something that bugs me about the DNA to phenotype mapping. How the heck do scientists so confidently isolate phenotypes? I mean, ok, the pigment in eye color is pretty easy to confidently isolate. But, things like intelligence, being prone to a disease, historical ancestry, there is so much there it seems that could be environmentally related, as in, the day to day living environment and habits, that it seems bizarre to me that so many, many, phenotypes have been connected to this or that genes. It’s one thing to connect a gene to the presence of an enzyme, it’s another to connect it to a poorly understood thing like intelligence.

    How can one say definitively that there are 22 genes linked to intelligence, when the measures of intelligence themselves are open to reasonable intelligent debate?

    1. Derek Lowe says:

      Oh, that’s a good question, and the answer is that you absolutely can’t say such things. That doesn’t stop papers from being published and (especially) press releases being issued and headlines being written. But we absolutely cannot draw such detailed links between genes and higher-order phenotypes such as intelligence, not at a useful level of detail.

      “Phenotypic” for drug discovery folks, often means something you can see the cells doing – their lysosomes swell up, or they halt their progression through the cell cycle at some particular stage, or they lyse and die. In animal models, we go so far as to say when we knock out Gene X or silence expression of Protein Y that there are some things that we can see and catch – blood sugar goes off, the adrenals look funny on necropsy, the animals become more sensitive to cold, they live longer/shorter, they become more/less active in their cages – all that sort of thing. There are a lot of cases where we knock out something and say “No obvious phenotypic changes”, but there are almost certainly changes that were too suble for us to see. Or wouldn’t make themselves known until specific conditions came along.

      1. MagickChicken says:

        The right conditions? Surely someone tries making all these knockout mice angry.

        Suddenly “Mus musculus” makes a whole lot more sense.

      2. I am guessing here says:

        “That doesn’t stop papers from being published”… sad but true. And amplified by preprints.
        So, even after passing the filter of “reproducibility”, a significant amount of research in the open literature is just advertising.
        Between these two groups, you probably get ~50% of published work essentially useless. And you never know which 50% it is…

  7. Another Guy says:

    An additional point: approved drugs are sometimes used as probe molecules for in-vitro studies because they may be good-enough inhibitors of a given target unrelated to the disease the drug is approved to treat (in-vitro conditions), and convenient since they are already on the lab shelf. Caution should be used to avoid jumping the gun and proclaiming the drug is a “proven” treatment for whatever the probe assay is trying to accomplish. Long road from a probe molecule to cure, and the molecules that reach clinical trials eventually may bear only a passing resemblance to the probe molecule.

  8. Good points about issues with ‘negative’ controls, and the need for several different chemical classes for target/phenotypic validation. The kinase world has taught us that drug-resistant versions of potential targets are also extremely useful for target validation, either generated after biochemical tinkering, or ‘evolved’ in a complex environment, such as a cell (or patient). This is discussed here, I don’t believe that in most cases the community has taken up the reasonable advice to follow this ‘set of criteria described that should be met before any investigation using such compounds should be accepted for publication’

  9. Dominic Ryan says:

    This is starting to feel a lot like earlier days of HTS. A lot went into the design of screening decks and how to minimize false positives and negatives through careful selection of compounds and execution of the screen.
    One of the very important concepts is to avoid singletons. An HTS deck should have around 4-5 closely related compounds for each one in the deck. The second is to screen in small pools where every compound is in two different pools.
    Taken together, these give you some real statistical advantages and built in controls. A medchem team looking over hits is usually going to ignore singletons unless there is nothing else available, there is a high priority on trying something and the synthesis is not going to be too hard.
    Chemical probes are becoming a lesson in the perils of singletons and the notion to always include near neighbors feels an awful lot like basic screening concepts.
    The phrase “no free lunch” comes to mind.

  10. T says:

    I’m always shocked by how so many researchers ignore this issue. When I was a journal editor, I got at least one article month proposing a crappy promiscuous binder (or something with untested specificity or a PAIN suspect) as a probe with the argument that specificity doesn’t matter so much because it’s only a probe not a drug. In the end I made a template text to copy-paste into the rejection letter explaining that specificity is more not less important for probes. With drugs it doesn’t really matter what it binds as long as the patient gets better without dangerous side effects. But use a promiscuous probe and your results are just nonsense.

  11. David Edwards says:


    The flip side of molecular promiscuity, is that a molecule in this category may be of pharmacological value, once we understand how it behaves. Nitrous oxide is a molecule that is well-known for being metabolically promiscuous in humans, but understanding what it interacts with and where, has led to several important medical discoveries.

    This of course, does not invalidate the many caveats associated with promiscuous probes. But what is useless as a probe might turn out to be useful elsewhere …

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.