I’ll bet that this is the only hit for a Google search for that word! I typed it out as I was thinking about how some major classes of biomolecules – protein, carbohydrate, lipid, and nucleic acid – are perceived. If you look at the number of papers published, and the number of details worked out in their field, you’d think that proteins are the single most important constituent of a living cell, followed by DNA / RNA. Are they?
I think this is partly an artifact of how easy things are to work with. There are only five purine / pyrimidine bases used in all DNA and RNA, and only two sugars (ribose or deoxyribose.) That level of simplicity is what’s allowed sequencing techniques to become so automated so quickly. I’m not saying that there isn’t plenty of complexity in the area – you get all sorts of hard-to-sequence hairpins and the like – but having only a few building blocks has helped enormously.
Proteins are the next step up. There are twenty-odd amino acids that you have to worry about, which gets pretty combinatorially complicated. (If you wanted to make, say 100 milligrams for your compound files of every 20-amino-acid protein combination there is, you’d run into a severe problem having to do with the amount of available carbon on earth.) Direct protein sequencing can be done, but it’s nowhere near as easy as it is for DNA. Proteins have the advantage of being much easier to handle than nucleic acids, though, and many of them are robust enough to stand all kinds of mistreatment. That helped biochemists get a good start on enzymes before any other aspect of molecular biology got on its feet at all.
So, how about lipids? Here’s where things start to get ugly. There are a *lot* more than 20 or 30 kinds of lipid molecules in a living system – all sorts of chain lengths, unsaturations, cis/trans isomers, mono-di-and-triglycerides and so on.(I won’t even get into phosphorylation, since that’s a big variable in the protein world, too.) And what about steroids, prostaglandins, and all the other lipid-derived stuff? All of these things are a real a pain to work with, too, since they’re often found transiently or at very small concentration and their solubility is almost always awful by definition. It takes some really good techniques to separate the various lipid constituents out of the greasy mess.
And carbohydrates? I worked a lot with smaller ones in my graduate school days, and people still look at me funny for it. Sugars are as bad as they come for complexity – there are plenty of them, and they can be connected any number of ways to make macromolecules. By contrast, proteins are basically linear front-to-back chains (curled up, twisted, fractal-dimension space-filling chains, but chains nonetheless.) Complex carbohydates branch out all over the place, and they’ll really make your life miserable. Despite years of work, there’s not a general way (yet) to automatically sequence one, although the situation is getting better. But if we had to depend on carbohydrate sequencing to read the genetic code, we’d be up the creek for sure. Their physical properties can be quite squirreley, too, making them very little fun to purify.
So there are at least two important classes of biomolecules that probably don’t get their due, because they’re a lot more hostile to work with. And that should tell you how well we can handle the mixes between them – glycosylated proteins, nucleic-acid protein complexes, lipid conjugates. Pretty poorly, is how. It’s a mess out there.