I really enjoyed this new paper on ChemRxiv, a Munich/Michigan/Berkeley collaboration on reactive covalent groups and their profile across different proteins. There have been a number of papers addressing this subject before, but this one is the most comprehensive one I’ve ever seen, and it’s a valuable resource.
Most of the covalent probes (and nearly all of the covalent drugs) that you see are targeting cysteine residues on proteins, and that’s no accident. Cys is definitely the standout nucleophile of all of them, and (as the Cravatt group and others have shown) there’s a population of them that are hypernucleophilic and are ready to party with electrophiles. These tend to be in active sites of enzymes and other specialized spots, and are surrounded by residues that make them more like a full thiolate anion (which as everyone who’s been through sophomore organic chemistry should recall, is one of God’s Own Nucleophiles when it comes to reactivity).
Over the years, though, there have been many searches for reactive groups that will pick up other amino acid residues, and this paper features a pretty comprehensive evaluation of those (54 of them in total!) The technique used is “isoDTB-ABPP”, which stands for “isotopically labelled desthiobiotin azide activity-based protein profiling”, and here’s how it works: you take a broad proteomic sample and treat it with a reactive probe compound, and at the same time set aside a control proteome sample that just gets solvent treatment, no ligand. You then treat both of those with some sort of broadly active alkyne-containing reagent, which means that in the control sample all the residues that can be tagged with it will get labeled, but in the one that’s been treated beforehand with a reactive compound, the residues it labeled will be blocked from reacting with the alkyne. You then come in with your isotopically labeled desthiobiotin azide reagents, one for the experimental sample and a different isotopically-patterned one for the control sample (light and heavy, basically), and you do a copper-catalyzed click reaction on each. Now you mix the two together (since you’ve differentiated them with those light and heavy additions), do a proteolytic digestion to break all the proteins up into small chunks, and do an LC/MS analysis on the whole mixture.
You’ll get peaks on the LC – lots and lots of peaks – and each of them will have mass spec profile. There will be a lot of those protein fragments that had nothing to do with the whole process – they didn’t get labeled with the covalent probe at first, and they didn’t get labeled with the reactive alkyne reagent, either. Those will show up with no isotopic differences in their mass spectra at all, and can be ignored. There will be others that got labeled to the same extent with the isotopically enriched probe, and you’ll see that, but there won’t be any ratio showing up between the pairs and you ignore them as well. But there will also be protein fragments whose mass spec ratios will be way off the baseline, because they had side chains that were blocked by the original reactive probe in the experimental sample, but were wide open for the reactive alkyne in the control sample. Those will be heavily skewed towards the isotopic pattern that you used in the control, and those are then the parts of the proteins that reacted with your original probe.
Now, you can see from this that you need a good selection of those reactive alkynes that will pick up various amino acid residues, and there have been many of these reported (for Cys, Arg, His, Glu, Lys, Met, Asp and others). But these have shown up from different groups, using different protocols and different LC/MS conditions and data analysis, so it was this paper’s intent to get everything under one roof: same experimental conditions, same mass spec analysis, same software workup. That last one is key, as you would imagine. Sorting seventy-eight Godzillion protein fragments (a rough estimate on my part) looking for isotopic mass ratio differences is definitely a job for automated analysis, and the paper presents an optimized version of the FragPipe computational suite for the chemical biology community. The technique is sensitive enough to pick up events like formylation of side chains from the formic-acid containing elution solvents, S-oxidation of the thioether covalent products, and so on, which is a good sign.
Ripping through the proteome of Staphylococcus aureus as an example (plenty of that available!), the team was able to sort out the various probes under controlled conditions. For example, STP-alkyne is a widely used reagent to label lysine residues, and this work confirmed that it’s selective. But it’s not perfect, because nothing is. 9% of the residues it labels are serines, 2% of them are threonines, and 5% of them are the N-terminal amines of the proteins. Looking closer, it turns out that the threonines and serines that were labeled were strongly biased towards having a histidine two residues down, and the serines also showed a preference for a cysteine two residues upstream or an arginine one residue down. So there are local effects on the reactivity of those serine and threonine OH groups that will cause them to poke their heads up for this reagent (and in fact, for all the lysine-directed probes). As it turns out, there aren’t any probes (yet) that are directed towards Ser and The residues per se, which means that you could make a start with these reagents if you like.
This sort of thing is seen for all of the alkyne probes to one degree or another – they are indeed selective for their advertised amino acids, but with some different stuff around the edges. Some of them pick up greater or fewer numbers of their target residues compared to the others, and they all have off-target reactivity to some degree as with the lysine probes above. These are valuable things to know, to calibrate analyses going forwards and to allow everyone to work from a common baseline. It’s important to keep in mind, though, that the residues that you will pick up (even using the whole suite of labeling reagents) are still a select bunch. S. aureus has over 62,000 lysines in its proteome, and all the lysine probes together will only label about 15% of those (the most accessible and the most reactive). Similarly, all the carboxylate-directed probes, put together, label about 7.8% of the Asp and Glu residues, the Tyr probes cover about 12% of the total tyrosines, and the Trp probes about 12% of the available tryptophans.
Those of you who are into this sort of thing might be saying “Hold on, tryptophans?” The paper also contributes some new probes and validates other recently described ones, especially for Trp, His, and Arg residues as well as protein N-terminals, and shows that a new photochemical lysine probe is very selective indeed. All in all, the paper identifies a set of 17 probes (out of the 54 studied) that the authors can recommend for proteomic residue coverage. All of these together label about 54% of the S. aureus proteome (and a much higher fraction of the annotated or essential proteins). A look at a human cell line showed similar results, fortunately, so this seems like it could be a useful standard set going forward. The biggest gaps are still probes for Ser, Thr, and the carboxylates at protein C-terminals. But as reactive groups are developed for these (and new ones for the other potentially reactive side chains), we now have a common platform to evaluate them.
The hope is, of course, that we can use such information about reactivity and selectivity to come up with chemical probes for specific proteins (and protein classes), and with selective drugs towards the ones that are targets in disease. The latter may well need some new chemistries, to dial down the reactivity of the “warhead” groups from what you’d use from these sorts of broad protein-labeling experiments, but there are a lot of ways that you can think of to do that. This could be applied not only to active sites in enzymes, but to allosteric sites, protein-protein interaction surfaces, and more. These techniques can also be used to selectively label proteins for imaging studies in live cells, to conjugate other small molecules to specific proteins for therapeutic use, to covalently link entirely different proteins together for new purposes, and whatever else we might be able to dream up. And that’s a lot.