Skip to Content

Analytical Chemistry

Proteins in a Living Cell

It’s messy inside a cell. The closer we look, the more seems to be going on. And now there’s a closer look than ever at the state of proteins inside a common human cell line, and it does nothing but increase your appreciation for the whole process.
The authors have run one of these experiments that (in the days before automated mass spec techniques and huge computational power) would have been written off as a proposal from an unbalanced mind. They took cultured human U2OS cells, lysed them to release their contents, and digested those with trypsin. This gave, naturally, an extremely complex mass of smaller peptides, but these, the lot of them, were fractionated out and run through the mass spec machines, with use of ion-trapping techniques and mass-label spiking to get quantification. The whole process is reminiscent of solving a huge jigsaw puzzle by first running it through a food processor. The techniques for dealing with such massive piles of mass spec/protein sequence data, though, have improved to the point where this sort of experiment can now be carried out, although that’s not to say that it isn’t still a ferocious amount of work.
What did they find? These cells are expressing on the order of at least ten thousand different proteins (well above the numbers found in previous attempts at such quantification). Even with that, the authors have surely undercounted membrane-bound proteins, which weren’t as available to their experimental technique, but they believe that they’ve gotten a pretty good read of the soluble parts. And these proteins turn out to expressed over a huge dynamic range, from a few dozen copies (or less) per cell up to tens of millions of copies.
As you’d figure, those copy numbers represent very different sorts of proteins. It appears, broadly, that signaling and regulatory functions are carried out by a host of low-expression proteins, while the basic machinery of the cell is made of hugely well-populated classes. Transcription, translation, metabolism, and transport are where most of the effort seems to be going – in fact, the most abundant proteins are there to deal with the synthesis and processing of proteins. There’s a lot of overhead, in other words – it’s like a rocket, in which a good part of the fuel has to be there in order to lift the fuel.
So that means that most of our favored drug targets are actually of quite low abundance – kinases, proteases, hydrolases of all sorts, receptors (most likely), and so on. We like to aim for regulatory choke points and bottlenecks, and these are just not common proteins – they don’t need to be. In general (and this also makes sense) the proteins that have a large number of homologs and family members tend to show low copy numbers per variant. Ribosomal machinery, on the other hand – boy, is there a lot of ribosomal stuff. But unless it’s bacterial ribosomes, that’s not exactly a productive drug target, is it?
It’s hard to picture what it’s like inside a cell, and these numbers just make it look even stranger. What’s strangest of all, perhaps, is that we can get small-molecule drugs to work under these conditions. . .

22 comments on “Proteins in a Living Cell”

  1. luysii says:

    Not only are there a lot of different proteins, but their aggregate amounts are quite high (the cytosol of E. Coli contains 300 milliGrams of protein/Liter). The concentration of each protein is relatively tiny, 300 milligrams of a protein of mass as low as mass 10 kiloDaltons (a small protein) gives a concentration of 30 microMolar — and assumes that just one protein is present in the soup. The crowding helps (or forces) the proteins fold into to compact structures.
    Whether concentration, as typically defined by chemists, for ions and drugs has any meaning in such a concentrated soup is unclear. You can forget the Debye Huckel theory of electrolytes in the cell. As we used to to say back in the day, it applies to slightly contaminated distilled water.

  2. Sleepless in SSF says:

    @Luysii: 300 mg/L (or 30uM) doesn’t strike me as especially concentrated. From the way people talk about the bizarre conditions inside cells, I had always assumed that cytosolic proteins must be tens of mg/ml or more. Is 300 mg/L really correct?

  3. Lester Freamon says:

    It’s 300mg protein per mL, not L.
    And that doesn’t include all of the lipids, carbohydrates, nucleic acids, metabolites.
    Also, for molarity calculations, the volume of an E coli cell is 10^-15 L. In E coli, this means that a protein at 1 nanomolar is present at 1 molecule per cell. But given the size scaling, in a mammalian cell, 1 nanomolar is 1000 copies/cell.

  4. barry says:

    there are a few outlier drug targets that are actually abundant. Tubulin (targetted by colchicine and taxol) and Hsp90 (targetted by a bunch of hopeful anti-cancer agents) are among the most abundant proteins in many mammalian cells.

  5. Todd says:

    @barry: It makes sense that a lot of cancer targets are abundant. The zeitgeist aims for cell regulatory proteins, and those proteins are needed in huge quantities to keep the cell running. Depending on your perspective, this may be a good thing or bad thing. The good news is that you’re trying to throw a rock in the ocean with cancer drugs. The bad news is throwing enough rocks in the ocean creates sandbars and all sorts of nasty issues.
    Also, 300mg/mL is super concentrated. No wonder small molecule drugs have become the dominant paradigm. I’m a biology guy by trade, but I remember enough chemistry to know that fitting any sort of drug in that space without hitting anything else is like needing a royal flush every time you play poker. Yikes!
    Can I go get my MBA now? LOL

  6. pete says:

    The notion that regulatory proteins in eukaryotes generally show:
    – increased “evolvability”
    – low cellular abundance
    maybe suggests a slick way of increasing the “gain” on sensitivity to environmental change. That is, a little change imposed on a regulatory gene might thereby have a big effect at various levels: gene family, cellular & (ultimately) organismal. Interesting stuff.

  7. Luysii: It’s still early, but people have started realistically looking at the effects of macromolecular crowding on protein folding and function using simulations and molecular dynamics. C & EN had a cover story on this a while back.

  8. Bill says:

    Did anyone else notice all of the typos in the paper? If you spent all that money and time running fancy mass spec experiments, you could at least read the paper before you submit it.
    “yeast expands a significant fraction of its protein mass (~30%) on translation and protein sorting”
    “this validation method relies only onto a single measurable value”
    There are more examples, but you get the idea.

  9. bacillus says:

    Remember too, if you shake the flask a bit, you’ll get a different proteome altogether.

  10. Anonymous says:

    The trick is not to think we 1) know everything 2) understand everything and 3)can work everything out at the molecular level from the ground up. Unfortunately this has been the main approach for the last ~15 years and is why (I belive) the pharma industry has failed to find many drugs. If you start out with ‘we don’t know s**t’ then you end up going down the phenotypic(‘black box’) route which has historically been more successful. Plenty more drugs left in that locker. Come back and revisit the reductionist approach in 500 years when we know a little bit more.

  11. daen says:

    The analysis was done on an osteosarcoma cell line. So how about a comparative analysis of normal bone tissue, under the same conditions? By comparing the two, you could identify under- or over-expressed proteins in the U2OS line. That comparison could obviously help in the identification of some of those regulatory choke points and bottlenecks.

  12. luysii says:

    Sleepless in SSF — My face is red ! That’s 300 milliGrams/milliLiter (not per Liter). For a reference see J. Bacteriol. vol. 181 pp. 197 – 203 ’99. Sorry.
    The molarity concept for proteins automatically means concentrations must be low. Molar means Moles per Liter which is Molecular weight in grams (10,000 in the case I mentioned above) per Liter (1,000 grams), so a 1 M concentratio is physically impossible for even a protein of this relatively small size.
    Sorry for the mistake.

  13. Sleepless in SSF says:

    @luysii: Thanks for the correction. However, I think you may have another error on your hands. It seems to me you are assuming that you can’t have solubilities approaching 10000 g/l. That’s clearly not true for some solutes: taking a quick spin through an online solubility table produced SbCl3 at 9100 g/l. SbCl3 clearly isn’t a protein 🙂 but I wonder if your assertion about physical impossibility is based on an assumption that solubilities of 10000 g/l are impossible in general and not just in the case of proteins, where it may be true (though I don’t know).

  14. Sleepless in SSF says:

    @Bill: I did notice lots of typos, but the authors are all either Swiss or German and I believe that is likely to be the origin of many of the errors. It seems to me that it’s more reasonable to expect journals to employ copy editors rather than expecting grammatical perfection from scientists writing in a second (or third?) language.
    And as to the expense, excluding the AQUA peptides the incremental cost of this experiment was probably less than a couple of hundred dollars (cell culture medium, IPG strips and reagents, TCP/IAA/trypsin). I don’t see that they’ve clearly specified the total amount of each AQUA peptide used, but from what I do see I might guess that they used something like $2000 worth. In total, not a very expensive experiment given the amount of data produced.

  15. Sleepless in SSF says:

    @daen: The type of differential proteomics experiment you describe is standard methodology. My lab does them every day as do many, many others. This paper was a demonstration of the benefits of combining two somewhat lesser used techniques: directed MS (in contrast to dynamic data acquisition MS/MS) and AQUA absolute quantitation (as opposed to labeled or label-free relative quantitation).

  16. daen says:

    @Sleepless in SSF: Thanks! BTW, where in SSF are you? That’s where I’m working!

  17. luysii says:

    #13 Sleepless in SSF: That was my assumption. My example seemed to me like putting 10 quarts of water in a 1 quart milk carton. Probably still not possible for proteins. Here’s why:
    Figuring an average molecular mass of 100 Daltons/amino acid, a protein of mass 10,000 would have about 100 amino acids. Now put Avogadro’s number of this protein into 1 liter of water which has 55+ Avogadro’s number of water molecules, or less than one water molecule solubilizing each amino acid. Not going to happen.
    I must confess that my original approach was based simply on mass. Thanks for making me think it through.

  18. Sleepless in SSF says:

    @daen: That nick is old and dates from the days when I was at Exelixis, pre-implosion; I keep using it here for the sake of continuity. I’m actually in Florida nowadays.

  19. Sleepless in SSF says:

    @luysii: I suspect you are probably correct about 1M 10kDa protein solutions, but the situation still isn’t quite as simple as your last rationale. A 10 kDa protein will almost certainly have tertiary structure, and may well have a core that isn’t well solubilized. Thus the number of AAs that require solvent contact might well be much less than 100.
    Not claiming it would ever really happen, just saying that the effective H2O:AA ratio could be much higher than 55:100.
    It sort of raises the question you implied in your first post: What is a solution? The answer seems clear when thinking about the sort of “slightly contaminated water” solution you rightly say that we chemists are used to thinking about. But what the heck would you have have if you did the reverse — added one liter of water to 10 kg of dry protein (hypothetically assuming that the protein would fold correctly under those conditions). Would it be a solution? How much water would you need to make it a solution, and how would you know when you had enough?

  20. luysii says:

    #19 Quite true — the large class of globular proteins DO have a hydrophobic core in which the amino acid side chains essentially dissolve themselves. Huge medical problems arise when the hydrophobic amino acids of one protein ‘dissolve’ those of another protein, leading to insoluble protein aggregates. The aggregates are associated with (and probably are in some sense causative) of a variety of neurologic diseases I used to manage (treat is too strong a word): Huntingtin in Huntington’s chorea, Abeta peptide in Alzheimer’s, alpha-synuclein in Parkinsonism, superoxide dismutase type 1 (SOD1) in familial amyotrophic lateral sclerosis.
    It would be an interesting calculation (which I’ve not done, but should have) to take a 100 amino acid protein of average composition, fold it into a ball, measure the surface area, and see how many water molecules it would take to cover (e.g. solubilize) it. I doubt that 50 would be enough.

  21. Nile says:

    I like this research: admittedly, the results are messy compared with the neatly-labelled reagents in a pharma lab, but they’re getting better. And that’s the point: the protein repertoire of a living cell is a finite amount of information and we can come fairly close to catalogueing it down to the last peptide.
    At which point, or close to it, the question “What do you mean, nobody knows what this one does?” will have gone through three phases of answers:
    “Nobody knows what hardly any of ’em do, and its no surprise we never saw or heard of half of the sequences in your bucket of gloop”.
    “Oh, so there’s a distinct family of kinases that look like my pet drug target! I never would’ve heard of them… I wonder what they do?”
    “What – an unknown enzyme? Young man, that’s either a contaminant or you’re the guy who found the first new genera of mammals discovered in three decades, out there in New Guinea, together with the Yeti and a hitherto-unnoticed species of rhinoceros unique to Brooklyn”.

  22. Mohammed Nader says:

    Biotechnology is, in general, human’s use of living organisms. What man kind does with these organisms has branched in various directions. The existence of these huge amounts of proteins in human cells, and of course other living organisms, serves not only in drug delivery and disease treatment, but in numerous other applications as well. For example, bacteria have been modified to produce insulin, which is the key factor in treating diabetes. On the other hand, the same bacteria have been modified to produce biofuel as a replacement to regular chemical fuel. As for these proteins, they can be invested in medicine and drug delivery, but they might also be more valuable and more accessible for other applications like crop engineering, for example. One polypeptide of these thousands might, for instance, end up being a key factor in doubling the efficiency of soil bacteria which help plants grow. Drug delivery is definitely very important and a significant phase of biotechnology’s challenges, but it is as significant to understand that it isn’t the only one.

Comments are closed.