Skip to main content

Analytical Chemistry

Tiny Proteins

Here’s another for the “things we just didn’t realize” file. This article is a nice look at “miniproteins” (also known as micropeptides), small but extremely important species that we’ve mostly missed out on due to both our equipment and our own biases in looking at the data. Other recent overviews are herehere, and here. I should note that the literature on this topic is rather shaggy – it’s been developing for years under a number of different names, some of which are also claimed by other fields of research, and those reviews represent people trying to get a view of the whole landscape.

We’re talking proteins with fewer than 100 amino acids (and all the way down to single digits?), and these were excluded, as genomes began to be sequenced and annotated, from the standard definition of what an open reading frame was. That brings up another distinction for these species: they’re not carved off of some larger known protein, but rather are truly coded for at these short lengths. It seems likely that these things have shown up evolutionarily through mutations that formed stop or start codons somewhere in the genome, with the resulting proteins finding a use and then being conserved. They’re all over the place, genomically, with some of them show up in regions that didn’t look as though they coded for anything at all. We’re going to have to rethink some of our ideas about what genes look like and just how many of them there are.

But there’s a lot to sort through before we get to that point. Those ORF cutoffs were put in because it was thought that there must be mostly just noise down in those small lengths, and there still is plenty of that. For example, the article mentions that the yeast genome has about 6000 ORFs for proteins of at least 100 residues, but if you open up the criteria to everything below 100, you have 260,000 more (!) It doesn’t seem likely (or even possible) that most of that list is real or functional, but the great majority of those can be illusory and still leave you with a lot of new proteins to look into. Finding and validating these things is not always straightforward: you can do RNA-seq experiments and find a lot of short mRNAs, but not all of those are being turned into proteins. Ribo-seq, where you gum up translation in the cell and look for RNA sequences that are in the act of feeding into the ribosome, are probably stronger evidence, but you can’t count on seeing a particular sequence when you do that, either. Combining such data with LC/MS validation that the proteins really exist, along with looking across species to see if similar things are found around other genomes, gives you more confidence.

The article goes into a number of examples over the last few years where such proteins have been found as regulatory species, inhibitors of other protein activity, venom components, and more. Their size can make them peculiarly suitable for such functions – just large enough to bind to some structural cleft or surface in larger proteins. Below about 50 amino acids or so you start to lose the ability to form more complex protein structures, so many of these will probably fall into the “disordered protein” category, which is already pretty large and important.

For a while, it looked like these small proteins were mostly a prokaryotic thing, but by now it seems clear that they’re all over the place and that we just haven’t been paying proper attention to them. You’d figure that there must be miniproteins that are important for binding to RNA species, with implications for cotranslational expression mechanisms, noncoding RNA function, etc. And it also wouldn’t surprise me if miniproteins turn out to be involved in intracellular condensate formation and behavior, either, both by such RNA-binding mechanisms and others. (In fact, I think we’re already seeing that, since the P-bodies discussed here as affected by the “NoBody” microprotein are now recognized to be such condensates). These seem, in fact, pretty likely to blend into the whole RNA world for these reasons (and in fact, a number of miniproteins were at first identified as long noncoding RNAs until they turned out to be not so noncoding).

The whole story illustrates how we don’t find what we’re not looking for. (The recent glyco-RNA discovery is another example of this). And that means that we have to constant check our assumptions, particularly the assumption that we know what’s important enough to look for!

15 comments on “Tiny Proteins”

  1. Barry says:

    how many “genes” you find in a genome has to depend on how you define “gene”. You’d need a very good reason to exclude anything that gets translated into peptide from the list. And I for one wouldn’t count a simplification of the task as a “very good reason”

    1. Athaic says:

      But that’s the issue. Until you find this translated peptide, you cannot pick a length of DNA or even of RNA and say “oh, that’s an active, functional gene, all right”.
      The issue is not about looking at some length of DNA with telltale signs of being capable of being expressed into a protein. It’s about confirming that it is indeed churning out a functional protein/peptide. Even in cases when the usual telltale signs are missing or somewhat skewed.

      (OK, a RNA which happens to have a function on its own – tRNA, siRNA, whatever – would count, too, as proof this DNA region has a biological role, so you can call this region a gene if your definition of a gene is not just “is the blueprint of a protein”)

      And this is assuming this peptide is specific, sequence-wise, of this length of DNA and cannot also comes from another region of the genome. If its sequence could be matched to multiple gene-like regions, then even finding this peptide will not be enough to confirm that it has been translated from a specific DNA region.

    2. Bryan says:

      I’d argue that being translated is not sufficient to call something a gene. Ultimately, a gene has to have some function, so determining the function is key to determining whether these mini ORFs should be considered genes. Does knockout or overexpression of the peptides produce specific effects on the cell (seems like the prime target for some sort of CRISPR screen)? Are these peptides evolutionarily conserved?

      The issue is reminds me of the debate over functional RNAs after the ENCODE consortium found that something like 80% of the genome is transcribed (versus only ~ 10-20% that is evolutionarily conserved). Of course, we understand the language of protein-coding genes fairly well, and our methods of detecting evolutionary conservation are good at detecting conservation between protein-coding regions of genomes. It is possible that some of these potential junk RNAs and junk proteins are evolutionarily conserved, but because we don’t understand the sequence-function relationships well (e.g. the RNAs could be conserved at a structural level but not at a sequence level), we are unable to detect signs that they are conserved. Still, determining the functions of these mini-proteins will be an important area of research going forward.

  2. Jake says:

    This got me wondering about the ENCODE project and all the heat they got for their claims.

  3. Christophe Verlinde says:

    Researchers have known for at least 40 years that proteins with under 100 amino acids exist.
    For example in 1982 conotoxin from Conus magus , with 68 aa was sequenced.

    1. Klagenfurt says:

      Um… Let’s add a few years. Insulin is composed of 51 amino acids – am I missing something?

  4. gippgig says:

    Insulin is synthesized as a prepropeptide that is over 100 amino acids long.
    I’ve been wondering if some of these might be “molecular bolts” that hold a bilayer membrane together.

  5. anon says:

    What about conotoxins? ~ 10 – 30 a.a. in length, disulfide bonding, known and unknown targets.

  6. If there are less than 10 amino acids the molecule is called a peptide; if 10 to 100 a polypeptide; and more than 100 a ‘protein. There are twenty different amino acids, and the body can synthesize eleven of them. The other nine must be obtained in the diet, and are known as the essential amino acids.a

  7. chiz says:

    Nature has a piece on de novo genes that is related to this. For conjectured micropeptides that aren’t de novo you can presumably look at synonymous vs non-synonymous mutation ratios to get some clue if its really a functional gene.

    As for the large number of short ORFs I sometimes wonder if it might be worth looking at bounded ORFs – ORF’s that start with a partial or full kozak sequence.

    1. chiz says:

      Although that Nature piece has an infographic that appears to have been written by someone who doesn’t know the difference between a codon and a translation initiation sequence.

  8. Anon says:

    We have always known big and small always co-existed (Domestic cat Vs Lion), (chihuahua Vs St. Barnard), (rat Vs Rabbit) and including small and big protein! Not a news for people working in the area of small, natural, or artificial cell penetrating peptides or proteins! It is just that affluent few who are well funded give us new meaning like mini proteins or peptides, mini bodies etc. Call it marketing!

  9. gippgig says:

    It is interesting to note that the first genome sequenced, bacteriophage MS2, encodes a small (75 amino acids) lysis protein that was initially missed.

  10. Fernando says:

    These small polypeptides or peptides have a really short half life due to quick metabolism and excretion (insulin around 4–6 minutes).
    So the function of these (with the notable exception of insulin), if any, should be of intracellular function at the site of production or paracrine signaling (surrounding cells).
    Or, if enough concentration, as a venom.

  11. steve says:

    Do we really need a class called “miniproteins”? Isn’t it enough to call them peptides (or polypeptides if needed)? And aren’t “single digits” amino acids? So the genome codes for peptides. Interesting. There are ways to isolate these things, make antibodies against the, etc. and localize them with in the cell. It’s not like they’re some new form of matter for which we have no tools. Oh the power of advertising.

Comments are closed.