Home > Blogs & Communities > Origins > Sizing Genomes  

Good News for Lost Tiger | Main | An Origin-al: Stanley Miller

January 15, 2009

Sizing Genomes

Irene Chen is a Bauer research fellow at Harvard interested in the origins of life and in particular, in the emergence of complexity. In this entry, she explores why there are limits to the size of a genome, using an example from real life to get across the point that the longer the genome, the harder it is to pass information faithfully from one generation to the next:

telephone game image Have you ever played the game Telephone? In this game, several people sit in a circle, and the first person whispers an arbitrary message to his or her neighbor, as quietly as possible. The message is passed by successive whispers around the circle, and the last person attempts to repeat the message out loud. Typically, the message has become completely garbled by the time it reaches the last player. Why is this? The problem (or the game) is that whispering is a rather inaccurate mode of information transfer, so mistakes accumulate around the circle until the message is lost. In fact, errors can plague transfer of information in any form, including biological genomes.

The information for making a living organism is encoded in the genome as a series of chemical letters (adenine, guanine, cytosine, and thymine). Longer genomes have the potential to encode more information and thus more complex organisms. In 1971, Nobel laureate Manfred Eigen realized that the probability of making a mistake limits the length of the genome. If errors are relatively rare, most daughter genomes are faithful copies of the original. However, if errors are too common, daughter genomes usually contain mistakes, and their daughters would contain even more mistakes, and so on. In that case, the information of the genome would degrade over time.

Eigen proved that the maximum length of the genome is inversely proportional to the rate of mistakes per letter, simply because longer genomes contain more opportunities to make mistakes (Eigen, M., Naturwissenschaften 58, 465). If the genome is longer than this critical threshold, then an "error catastrophe" ensues: Error-ridden replication essentially randomizes the genome after several generations. This principle also suggests that mechanisms to reduce mistakes can set the stage for the evolution of more complex organisms. Indeed, modern organisms generally have multiple proofreading mechanisms, such that mistakes are usually very rare. (Some viruses, like HIV, are notable exceptions and appear to exist near the edge of the error catastrophe.)

Although the concept of the error threshold was originally developed to understand the limits of biological information, we can see surprising illustrations of this idea in everyday life. Take a look at the name of this lake in central Massachusetts:

sign with lake name

Photo: Irene Chen

According to locals, it means "You fish on your side, I fish on my side, and nobody fishes in the middle" (see the Wikipedia article for more information). Now, check out this sign from the Massachusetts Turnpike Authority:

road sign

Photo: Bree Bailey

Looks the same, right? But look again:

lake name spelled out

sign spelling mistakes noted

There are two errors in this sign! Notice that the short, four-letter word "Lake" was copied correctly, but that the long, 45-letter word "Chargoggagoggmanchauggagoggchaubunagungamaugg" was copied with two errors, for an error rate of approximately 4%. If the MTA were to use its sign as a template for more signs, you can imagine how the information in the lake name would quickly disappear. Fortunately for humans, whose genome is more than 3 billion letters long, our DNA replication is substantially more accurate!

—Irene Chen


Chen said: "This principle also suggests that mechanisms to reduce mistakes can set the stage for the evolution of more complex organisms."

This is indeed a self-evident principle but is entirely missing in the existing theories of macroevolution. For a new hypothesis of macroevolution that takes into account of this principle, see Huang, S. submitted "Inverse relationship between genetic diversity and epigenetic complexity"

The existing theories of macroevolution (NeoDarwinism plus the neutral theory) are incomplete because they fail to take into account the intuitively obvious axiom of an inverse relationship between between genetic diversity/mutation and epigenetic/organismal complexity. Macroevolution from simple to complex organisms involves a suppression of mutations. It is not at all merely an accumulation of mutations, as is assumed by the existing theories.

Until we have a theory that can explain all the relevant facts, we don't have a complete theory. Unlike the existing theories, the new MGD hypothesis is a self-evident axiom, explains all relevant facts of evolution, and has yet to meet a factual contradiction.

Why do you assume salamanders or amoeba are less complex than a human? Also, I think we're starting to find that the "junk" DNA is less and less junk and more and more non-protein coding regulatory regions.

I was under the impression that the relationship between genome size and complexity was extremely weak to non-existant. Salamanders have genome sizes at least an order of magnitude larger than humans, while the noble amoeba's genome is simply gigantic. Is there any empirical evidence to suggest that error catastrophe acts as a check on genome size? It seems like genome size is determined more by phylogentic inertia and chance. Replication fidelity is only necessary for coding or regulatory regions of the genome, no? Isn't all the "junk DNA" able to be ignored by error catastrophe when determining genome size?

Leave a comment

Thanks for your feedback. Please keep it polite and to the point.