Skip to main content

Pharma 101

Proteomics 101

Over at the entertaining culture-blog 2Blowhards, the comments to this post (on people who feel deficient in math ability) include a mention of proteomics, which prompted Michael Blowhard to say:
“Proteomics” — even the word is scary. I wonder how people in the field are going to communicate the substance and importance of what they’re up to to civilians … A challenge, I guess.”
A challenge that I’m willing to take up! It’s not my exact field, of course, but close enough. I’m starting a new category for posts like this, when I (and the readership here, in the comments) try to explain some technical buzzword-laden area in language that intelligent non-scientists can profit from. So. . .proteomics.
The place to start, most likely, is where the word came from. It’s a direct steal from “genomics”, the study of genomes, which are the total DNA sequences of a species (or individuals of a species). Back a few years ago when the human genome was being sequenced for the first time (all the individual A T C G letters being read off), it became clear that the number of genes that humans carry around was very much on the low side of what most people expected. (The human genome, as we have it today, is a composite – the number of people in the world who have their complete genome read can be counted on one hand. That’s going to change drastically in the years to come as the process gets cheaper, faster, and more useful).
The reason why people expected more genes relates to what a gene is: a stretch of DNA that’s read off (transcribed) and turned into a specific protein. That’s DNA’s job; it’s a set of coded instructions to make proteins. But, as it happens, we have a lot more different proteins than we have genes. Clearly, something more happens downstream of the DNA part of the process.
A lot of things happen, actually. Those first-made proteins get altered in all sorts of ways. The same protein can be folded into different shapes, for starters (we’re just now recognizing how important a process this is in some diseases). Proteins can also be clipped into smaller ones by many different routes, and at any stage they’ll be decorated with molecular tinsel like sugars and lipids and phosphates. All of those can totally change a protein’s function. This gives you some idea of where all that diversity is coming from – and why sequencing the human genome, huge and necessary accomplishment though it was, was nowhere near the end of the story.
Proteins spend their time interacting with other proteins. If you think of a cell in your body as a large irregularly shaped bag, full of intricate (and somewhat squishy) 3-D jigsaw pieces which are constantly sluicing around assembling or sliding past each other, you’ll have a pretty reasonable idea of what it’s like in there. Any given cell will contain thousands upon thousands of different proteins, many of which are doing multiple jobs depending on the time and place. Proteomics is the attempt to understand which proteins are doing what, when, with whom, and why.
It hardly needs saying, but we’re just at the very beginning of that study. We have some tools to track these interactions, and they’re far better than anything people had twenty or thirty years ago, but they’re still rather crude compared to what we need. Huge signaling networks get uncovered and extended, and are found to touch upon others for reasons that are unclear. All sorts of feedback loops and backup systems are sketched in, and many pathways have been missed (or, alternatively, assigned too much importance) because they only operate under certain special conditions that our assays may overemphasize or skip entirely.
This project is much harder than the deciphering of the genome, and will take much longer. But that’s because it’s much closer to the real-time workings of a living organism, which means that comprehension, when it comes, will be still more valuable. Really substantial sums are being spent on this stuff, along with serious brainpower and computing resources. Progress will be jerky, irregular, infuriating, and of very great interest indeed.

20 comments on “Proteomics 101”

  1. Anonymous BMS Researcher says:

    > it became clear that the number of genes that humans carry around was very much on the low side of what most people expected.
    Count me among those who were utterly stunned by this discovery — in the late 1990s I fully expected the number of genes (by any reasonable definition, defining what is meant by “number of genes” is itself a major can of worms) would be at least 75 thousand and possibly over 100 thousand.
    > the number of people in the world who have their complete genome read can be counted on one hand.
    I have met one of those people and observed first hand the amazing size of his ego. When will we get the first complete genome of a human who does not have a Y chromosome?

  2. Anonymous BMS Researcher says:

    Followup to my previous comment:
    when attempting to explain genomics to lay folks, I often ask them to imagine printing out in hexadecimal format on a monstrous stack of paper the complete binary for any current computer operating system, then sending that back to 1957 by time machine for the brightest minds of that era to reverse engineer. Puzzling out the information-processing systems of life is at least as hard a problem and possibly harder.

  3. Submarine says:

    > But, as it happens, we have a lot more different proteins than we have genes.
    > Clearly, something more happens downstream of the DNA part of the process.
    Agreed, but don’t forget that we also have a lot more DNA than genes, and much of that DNA is not junk. It is playing a critical role in determining when and where proteins encoded by the relatively modest number of genes are expressed.

  4. Anonymous BMS Researcher says:

    > much of that DNA is not junk
    You are absolutely right, the non-coding part of the genome is clearly of tremendous importance.
    And we are barely beginning to comprehend what sorts of things it might be doing. There are huge stretches of non-coding DNA that have been conserved for enormous amounts of evolutionary time, so clearly they must do SOMETHING important but as yet we have little clue WHAT most of these sequences do.

  5. RKN says:

    Proteomics is the attempt to understand which proteins are doing what, when, with whom, and why.
    Tho I agree with this, at the risk of making the matter even murkier, this is really the area of Interactomics.
    One advantage of Proteomics over Genomics, at least in terms of the study of expression changes, is that proteins (“decorated” and non-) are the immediate effectors of phenotype. Changes in a message (treated vs. control) can be telling, but unimportant to phenotype unless that message is proportionally translated. I think one of the biggest challenges right now in Proteomics is the effort to measure the change in specific protein isoforms. But first we have to identify those specific isoforms, and that is anything but easy.

  6. GATC says:

    I put the various “polyomics” in the same category as “systems biology”, or as Josh Lederberg once said “what we used to call physiology”.
    So Derek, now that you are firmly in place up there in Cambridge, perhaps you could ask around and get a good definition from the Harvard crowd as to what is “chemical biology” and how that relates to what we used to call “biochemistry”.

  7. Caleb says:

    We have three graduate programs at the school I currently attend that are very similar: Biological Chemistry, Chemical Biology, and Medicinal Chemistry (plus a Chemical Biology “track” within the Chem Dept)! Funny thing is, most of the faculty are cross-listed so it doesn’t really matter what program you’re in. Generally speaking the biological chem folks don’t do any synthesis and are more likely to use model organisms such as Drosophila or knockout mice, the med chem and chem bio labs have a synthetic and biological component and primarily use organisms such as E. coli and yeast.

  8. Interested Layperson says:

    > Huge signaling networks get uncovered and extended…
    You might want to edit the description to define “signaling networks” and “pathways”. As a non-biologist who works in the industry, those leap out as jargon to me. I now have a sense of what they mean, but I remember when I had to learn it.
    Nice work!

  9. Anonymous BMS Researcher says:

    Here’s a stab at fairly brief — and therefore oversimplified — definitions of “signaling networks” and “pathways.”
    First off, let me introduce an engineering analogy. In an engineering system there will be two main sets of wires connecting various components. One set, known as the “control circuits,” mainly convey INFORMATION about the current and desired states of the system. These typically are small wires, carrying relatively low voltages and currents. A second set of wires, known as the “power circuit,” are much larger because they carry POWER at higher voltages and currents to the motors and actuators that do whatever physical work the system is designed to perform. When you press the button for your desired floor in an elevator (a “lift” in the UK), you close a connection in the control circuit which sends a small amount of electrical energy to the controller telling it where you want the elevator to go. In response, the controller will actuate a contactor that sends a much larger amount of power to the motor and up the elevator goes. Sensors detect when it has reached your desired floor and send small amounts of electrical energy to the controller, which then de-energizes the contactor, and the contactor stops sending power to the motor, sends power to smaller motors that open the appropriate doors, and so forth. Before I became a biologist I was an engineer and once I actually worked on some control devices that my company sold to Otis Elevator (even though this was over 20 years ago, elevators have long service lives so probably even now some elevators are being controlled by gadgets I helped design). But I am digressing here, the point is the control circuit does not directly move the elevator, it does so by its effects on the state of the power circuit.
    Of course, this distinction between “control” and “power” circuits is oversimplified because even the “control” circuits carry some energy and even the “power” circuits carry some information.
    In somewhat the same fashion, biologists describe the molecular equivalents of control circuits as “signaling networks” by which various molecules (hormones and a zillion others) convey information from one part of the organism to another and/or integrate information from various sources. A “pathway” is a series of biochemical reactions that make or destroy substances needed by the organism, and they tend to consume much larger amounts of energy in their operations; these are analogous to the power circuits of electrical engineering. In general, signaling networks have their effects on the organism and its physical world through their effects on the operation of pathways. For example, when I wave a toy at my cat, her eyes send messages through various signaling networks to her brain, where other signaling networks trigger energy conversion pathways in various muscles and she chases after the toy that I am waving.
    However, the distinction between a signaling network and a pathway is even less clear-cut than is the electrical engineering distinction between the control and power circuits.

  10. MTK says:

    Beyond just the pure size of the proteome and the complexity of it due to the isoforms, posttranslational modifications (sorry, jargon), and multiple protein-protein interactions, there’s also the fact that many of the most interesting, i.e. relevant to various disease states, proteins are probably low abundance proteins. Tough to study when there is no protein equivalent to PCR. Sensitivity is a huge issue.
    Hey, at least you can semi-describe and characterize a protein by it’s linear amino acid sequence. There’s also only 20 naturally occurring amino acids all with the same stereochemistry. Compare that to the glycome.

  11. roadnottaken says:

    exactly, MTK. I’ve heard it said that in a typical cell protein abundances vary over six orders of magnitude. that is an enormous challenge and i can tell you from experience that detecting your low-abundance proteins in a background of actin and tubulin etc can be very challenging.
    regarding your second point, metabolomics is a much nastier beast. as you mentioned, at least one always has the ability to sequence proteins, but compared to proteins the subunits of metabolites seem almost infinite. whereas with proteomics, detection and quantitation is the major challenge, simply determining the identity of a particular metabolite can take months.

  12. RKN says:

    Tough to study when there is no protein equivalent to PCR. Sensitivity is a huge issue.
    The big challenge in proteomics is separation (chromatography), not amplification. If you can satisfactorily separate the proteins in a sample, and digest them properly, mass spectrometry will find the peptides. Modern mass spectrometers have attamole sensitivity, and quantitation (control vs. treated/disease) of even low abundance proteins is readily achieveable.

  13. TNC says:

    There’s no protein equivalent to PCR? You may have just written my next proposal! 😉

  14. SRC says:

    I have a modest proposal: the next person who defines yet another risible “ome” (metabolome qualifies here) is put in a Waring blender to have his proteome/genome/gnomeome extracted – all of it. It’s the biological equivalent of including “gate” as the suffix of any remotely questionable political transaction.

  15. roadnottaken says:

    SRC: then what would you call the untargeted analysis of biological small-molecules? i agree that some -omics words are silly (my favorite is snake venomics) but oftentimes it’s the most concise way to express a concept. if a lexical construction is useful then use it, i say. i think the defining feature of an -omic is untargeted/global measurement which is qualitatively different from the way biology used to be done (i.e. isolate then study).

  16. Anonymous BMS Researcher says:

    Can any reader point us to the paper I saw a few years ago and cannot find now in which the authors say of a term they have just defined something like “this name was carefully chosen for its resistance to being given an omics suffix.”?
    I’ve tried Googling variations on such a sentence, but all I get is various -omics hits…

  17. MTK says:

    If you come up with a protein chemistry PCR equivalent, it’s time to buy your tickets to Stockholm, brush off the tux, and get ready to pose with Randolph and Mortimer Duke.

  18. RNAbiologist says:

    For the sake of completeness in Derek’s synopsis of ‘how one gene becomes multiple proteins’ I think one should mention the enormous diversity resulting from alternate mRNA splicing. His original post glossed over this aspect, and it’s important for the less-familiar-with-biology crowd to know that it exists.
    In summary: DNA is transcribed into RNA, which is spliced – Eg: one or more piece is removed from the sequence and the remainder is shipped out of the nucleus. This RNA is then translated into a protein. The sequences removed from the transcript sometimes vary, sometimes sections are removed, sometimes they’re not. Many genes have multiple sequences removed that are thousands of bases apart. No one understands how this works. What’s more is that due to the triplicate nature of the codons used in translation a frameshift can result in largely different proteins sequences from a single gene. (In english: RNA is translated into protein sequence based on three base combinations of RNA. Three bases of RNA result in a single amino acid being produced. If a transcript has say 8 bases removed from the middle of the RNA, everything downstream of the ‘splice site’ will be translated into a totally different amino acid sequence versus the unspliced transcript.)
    Anyway, this is another area where the ‘genomics revolution’ has fallen a bit short.

  19. SRC: It could be worse. Instead of inventing words like “genomics”, “proteomics”, “metabolomics”, and (worst, IMO) “interactomics”, they could be calling the whole lot of it “Biology 2.0”
    P.S.: how about “chain reactionomics”?

  20. Ian Musgrave says:

    I’ve just come back from the International Brain Research Organisation Meeting in Melbourne, where I was exposed to the delightful concept of “pocketomics” (An J, et al., Mol Cell Proteomics. 2005 Jun;4(6):752-61), the universe of ligand binding pockets.
    Some nit picks:

    Back a few years ago when the human genome was being sequenced for the first time …, it became clear that the number of genes that humans carry around was very much on the low side of what most people expected.

    Actually, this isn’t quite correct. There were a number of estimates, many of which were around the final figure, see Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome.
    Submarine wrote:

    Agreed, but don’t forget that we also have a lot more DNA than genes, and much of that DNA is not junk.

    It depends on what you mean by “much”. About 1.2-2% codes for proteins, a similar figure codes for structural and regulatory RNA. Regulatory sequences may account for between 5% to 20% (at the most optimistic, if you take the over-hyped ENCODE project at face value). So, even at the most optimistic, 70% of the genome is doing nothing (at least 8% is broken retroviruses, around 2-5% are broken genes). You can delete great swaths of mouse non-coding DNA without effect, and you can delete huge chunks of the most highly conserved non-coding DNA in the mouse without effect either.
    It doesn’t necessarily mean these sequences are not functional in some way, but that any claim as to the importance of conserved non-coding sequences should be taken with a big grain of salt (and again, these sequences constitute a minor part of the overall DNA of vertebrates) until some actual function is found.

Comments are closed.