Well, biology is marching on, even outside the virology that’s on all of our minds. Have a look at this paper, which is looking at the very small proteins I last wrote about here. (Here’s a commentary on this new work as well). What we’re seeing is yet more strong evidence for such species being numerous, important, and (up until recently) missed by many of our molecular biology techniques. We’ve had to rework our thinking about cell biology over the years as the variety of RNA subtypes became clear, and it looks like we’re going to have to do that again around these tiny proteins.
The rules we’ve largely been following for finding proteins in the cell to study include length (because sequences of fewer than 100 amino acids were thought less likely to fold into anything useful), presence of a canonical open reading frame (ORF) with a standard AUG start codon, and demonstration of evolutionary conservation and homology to other proteins. Hah! We appear to be in the process of tossing those out the window. This new work shows (in line with other studies over the last few years) that many so-called “noncoding” RNA sequences are being translated into proteins and that these proteins are in fact functional. This is happening in many cases through noncanonical ORFs, several varieties of them. Careful large scale loss-of-function screens turn these things up, and working backwards shows that we’ve been missing many thousands of functional proteins and many active RNA regions that are apparently important for RNA functions in general, such as via modulating the expression of proteins that we had already annotated by conventional means.
There are some patterns, such as these new proteins being localized with others produced on the same stretches of larger mRNAs, but we’re going to be figuring out these things for quite some time to come. The loss-of-function screens show these small species having noticeable effects on cellular growth and overall gene expression profiles, and you’d have to think that we’re going to start making connections to various disease states as we start to understand more.
I can tell you, I’m having to change my own thinking. Back when the ENCODE consortium came out with their (rather high at the time!) estimates for how much of the genome was actually transcribed, I was more in the skeptics’ camp. But time seems to have been proving those estimates more correct than not. One of the particularly puzzling things (as noted in this new paper) is that we’re seeing proteins that have real functions but do not seem to be evolutionarily conserved. (That was one of the objections back then, that if you can have all those mutations then the so-called protein must not be real). Well, here we are. I wonder if some of this can be explained by the weirdness of disordered proteins? Those sometimes don’t seem to care much about sequence, either, so long as their overall properties are maintained.
As the authors of this new paper say, it shows “a previously unappreciated complexity of the functional mammalian proteome“, and how. We are all going to have to get used to thinking differently about what functional proteins look like, how they’re produced, what they can do, about the relationships between protein regulation and RNA regulation, and more besides. Want an example of how much we don’t know about cell biology? You’ve got another one right here. I suppose the folks talking about modeling the inner workings of the cell in silico will have to tweak their code just a tiny bit. . .