Skip to main content

Sizing genomes

An illustration of people playing the game Telephone

Have you ever played the game Telephone? In this game, several people sit in a circle, and the first person whispers an arbitrary message to his or her neighbor, as quietly as possible. The message is passed by successive whispers around the circle, and the last person attempts to repeat the message out loud. Typically, the message has become completely garbled by the time it reaches the last player. Why is this? The problem (or the game) is that whispering is a rather inaccurate mode of information transfer, so mistakes accumulate around the circle until the message is lost. In fact, errors can plague transfer of information in any form, including biological genomes.

The information for making a living organism is encoded in the genome as a series of chemical letters (adenine, guanine, cytosine, and thymine). Longer genomes have the potential to encode more information and thus more complex organisms. In 1971, Nobel laureate Manfred Eigen realized that the probability of making a mistake limits the length of the genome. If errors are relatively rare, most daughter genomes are faithful copies of the original. However, if errors are too common, daughter genomes usually contain mistakes, and their daughters would contain even more mistakes, and so on. In that case, the information of the genome would degrade over time.

Eigen proved that the maximum length of the genome is inversely proportional to the rate of mistakes per letter, simply because longer genomes contain more opportunities to make mistakes (Eigen, M., Naturwissenschaften 58, 465). If the genome is longer than this critical threshold, then an “error catastrophe” ensues: Error-ridden replication essentially randomizes the genome after several generations. This principle also suggests that mechanisms to reduce mistakes can set the stage for the evolution of more complex organisms. Indeed, modern organisms generally have multiple proofreading mechanisms, such that mistakes are usually very rare. (Some viruses, like HIV, are notable exceptions and appear to exist near the edge of the error catastrophe.)

Although the concept of the error threshold was originally developed to understand the limits of biological information, we can see surprising illustrations of this idea in everyday life. Take a look at the name of this lake in central Massachusetts:

Lake Chaubunagungamaug signIrene Chen

According to locals, it means “You fish on your side, I fish on my side, and nobody fishes in the middle” (see the Wikipedia article for more information). Now, check out this sign from the Massachusetts Turnpike Authority:

Looks the same, right? But look again:

Lake Chaubunagungamaug sign

There are two errors in this sign! Notice that the short, four-letter word “Lake” was copied correctly, but that the long, 45-letter word “Chargoggagoggmanchauggagoggchaubunagungamaugg” was copied with two errors, for an error rate of approximately 4%. If the MTA were to use its sign as a template for more signs, you can imagine how the information in the lake name would quickly disappear. Fortunately for humans, whose genome is more than 3 billion letters long, our DNA replication is substantially more accurate!

About the author

Irene Chen is a Bauer research fellow at Harvard interested in the origins of life and in particular, in the emergence of complexity. In this entry, she explores why there are limits to the size of a genome, using an example from real life to get across the point that the longer the genome, the harder it is to pass information faithfully from one generation to the next: