Here’s a rather startling paper from a large team led by Craig Venter (Venter Institute, UCSD, Synthetic Genomics, and NIST). You may recall a few years ago when Venter and co-workers reported an engineered Mycoplasma (M. mycoides JCV syn1.0) that they had synthesized and exchanged into another cell (original post edited, because we’re still in prokaryotes here!), which then showed viability. (This genome is a slightly altered version of the wild-type sequence, but it was completely synthetic, which made some people uncomfortable, because vitalism dies hard). They proposed this as a platform to understand what a minimal genome for a living cell might look like.
They meant it. In the years since, they’ve been mutating and cycling this organism, whittling it down until now they report the “syn3.0” version, which is down to 531kb (473 genes). Now that is a stripped-down cell, for sure, and getting to it was not straightforward:
The minimal cell concept appears simple at first glance but becomes more complex upon close inspection. In addition to essential and nonessential genes, there are many quasi-essential genes, which are not absolutely critical for viability but are nevertheless required for robust growth. Consequently, during the process of genome minimization, there is a trade-off between genome size and growth rate. JCVI-syn3.0 is a working approximation of a minimal cellular genome, a compromise between small genome size and a workable growth rate for an experimental organism. It retains almost all the genes that are involved in the synthesis and processing of macromolecules.
That’s smaller than any autonomous replicating organism found in the wild (for comparison, it’s about one-tenth the genome of a common bacterium). You’d think that by now we’d have a pretty solid idea of the genes/proteins that are necessary for life, but apparently you (and I) would be wrong. The initial attempts at a minimal genome were actually done de novo, the team being under that same impression. But “An initial design, based on collective knowledge of molecular biology combined with limited transposon mutagenesis data, failed to produce a viable cell.“ What’s rather alarming is the sequence of this current minimal cell: of those 473 genes, 149 of them are of unknown function.
. . .biological functions could not be assigned for the ~31% of the genes that were placed in the generic and unknown classes. Nevertheless, potential homologs for a number of these were found in diverse organisms. Many of these genes probably encode universal proteins whose functions are yet to be characterized. Each of the five sectors has homologs in species ranging from mycoplasma to humans. . .
Think about that for a minute. One third of this stripped-down minimialist genome is still made of of genes that we don’t understand. There are some kinases and hydrolases, etc., whose substrates, products, and functions are unknown, but there are some that are just big question marks. Clearly we have some work to do if there are so many fundamental processes that aren’t even annotated. Even the annotated ones can be mysterious, though – to pick one example, the organism still has six different efflux pathways, and the paper notes that “It is somewhat disconcerting to imagine that all of these exclude or remove toxic substances.” And some of the genes even in this organism may still be overlapping, because they’ve noted a number of “synthetic lethal pairs” – redundant genes that look like they can be eliminated if you take one of them out, but are fatal if both are removed. The biggest design challenges for this organism, the paper says, was working through these and figuring out what had gone wrong in each case. They’re still not sure how many redundant genes are present.
Interestingly, they also tried “defragmenting” a large stretch of this genome, placing similar functions next to each other in an orderly arrangement. This “rationalized” organism was viable, and grew at basically the same rate as the other one, showing that at least at this level, a good deal of genetic rearrangement can take place without having that much effect. (This makes me wonder, a little bit, about the “huge-stretches-of-noncoding-DNA-are-vital-because-they’re-scaffolding” argument). Edit: although note that this argument is mostly being applied to eukaryotes, who have a lot more DNA – bacterial genomes are already pretty compact by comparison).
Clearly, synthetic genomics is just beginning to take off. This sort of work is going to make us rethink a lot of what we believe that we know about living cells, and will surely set off arguments in several directions at once. It’s going to be fun.