Skip to Content

The Smallest Viable Genome Is Very Weird

Here’s a rather startling paper from a large team led by Craig Venter (Venter Institute, UCSD, Synthetic Genomics, and NIST). You may recall a few years ago when Venter and co-workers reported an engineered Mycoplasma (M. mycoides JCV syn1.0) that they had synthesized and exchanged into another <s>cell’s nucleus</s> cell (edited, because we’re still in prokaryotes here!), which then showed viability. (This genome is a slightly altered version of the wild-type sequence, but it was completely synthetic, which made some people uncomfortable, because vitalism dies hard).  They proposed this as a platform to understand what a minimal genome for a living cell might look like.

They meant it. In the years since, they’ve been mutating and cycling this organism, whittling it down until now they report the “syn3.0” version, which is down to 531kb (473 genes). Now that is a stripped-down cell, for sure, and getting to it was not straightforward:

The minimal cell concept appears simple at first glance but becomes more complex upon close inspection. In addition to essential and nonessential genes, there are many quasi-essential genes, which are not absolutely critical for viability but are nevertheless required for robust growth. Consequently, during the process of genome minimization, there is a trade-off between genome size and growth rate. JCVI-syn3.0 is a working approximation of a minimal cellular genome, a compromise between small genome size and a workable growth rate for an experimental organism. It retains almost all the genes that are involved in the synthesis and processing of macromolecules.

That’s smaller than any autonomous replicating organism found in the wild (for comparison, it’s about one-tenth the genome of a common bacterium). You’d think that by now we’d have a pretty solid idea of the genes/proteins that are necessary for life, but apparently you (and I) would be wrong. The initial attempts at a minimal genome were actually done de novo, the team being under that same impression. But An initial design, based on collective knowledge of molecular biology combined with limited transposon mutagenesis data, failed to produce a viable cell. What’s rather alarming is the sequence of this current minimal cell: of those 473 genes, 149 of them are of unknown function.

. . .biological functions could not be assigned for the ~31% of the genes that were placed in the generic and unknown classes. Nevertheless, potential homologs for a number of these were found in diverse organisms. Many of these genes probably encode universal proteins whose functions are yet to be characterized. Each of the five sectors has homologs in species ranging from mycoplasma to humans. . .

Think about that for a minute. One third of this stripped-down minimialist genome is still made of of genes that we don’t understand. There are some kinases and hydrolases, etc., whose substrates, products, and functions are unknown, but there are some that are just big question marks. Clearly we have some work to do if there are so many fundamental processes that aren’t even annotated. Even the annotated ones can be mysterious, though – to pick one example, the organism still has six different efflux pathways, and the paper notes that  “It is somewhat disconcerting to imagine that all of these exclude or remove toxic substances.” And some of the genes even in this organism may still be overlapping, because they’ve noted a number of “synthetic lethal pairs” – redundant genes that look like they can be eliminated if you take one of them out, but are fatal if both are removed. The biggest design challenges for this organism, the paper says, was working through these and figuring out what had gone wrong in each case. They’re still not sure how many redundant genes are present.

Interestingly, they also tried “defragmenting” a large stretch of this genome, placing similar functions next to each other in an orderly arrangement. This “rationalized” organism was viable, and grew at basically the same rate as the other one, showing that at least at this level, a good deal of genetic rearrangement can take place without having that much effect. (This makes me wonder, a little bit, about the “huge-stretches-of-noncoding-DNA-are-vital-because-they’re-scaffolding” argument). Edit: although note that this argument is mostly being applied to eukaryotes, who have a lot more DNA – bacterial genomes are already pretty compact by comparison).

Clearly, synthetic genomics is just beginning to take off. This sort of work is going to make us rethink a lot of what we believe that we know about living cells, and will surely set off arguments in several directions at once. It’s going to be fun.

34 comments on “The Smallest Viable Genome Is Very Weird”

  1. Al

    I recall serving at at DoE think tank on “what next after genomics” and many of us prioritized the “genes of unknown function”, of which there were thousands in each genome (including human). Almost two decades later…..

    The truth is that it’s tough to convince a science peer-review panel to fund a study of a gene/protein when no one has a clue what it does. “Don’t work on that unknown protein. Work on something important!”

    And it’s also risky for the investigator because it takes time to develop the research tools and to figure it out…meanwhile you’ve lost your funding

    So on the one hand, decades after the issue came to light, there still remain thousands of microbial and human genes/proteins with no known function

    And on the other, “our poor understanding of biology” is the main issue in our not being able to predict the right targets for drug discovery

    There’s a massive disconnect is the system

    1. Kyle Serikawa in reply to Al

      I like this point. It comes back to an argument/question I’ve debated with people in the past with respect to novelty in drug development candidates. As I see it, there are a couple of (not necessarily mutually exclusive) ways to think about why we work on the genes we do and why we know what we know about them.

      The first is a biological importance argument in which one might say that the genes we’ve discovered and know so much about were initially identified because they were among the most important genes for the most important functions in the cell/organism, and therefore the fact that some genes have thousands of pubs while others are basically unknown is a simple consequence of value. So, under this way of thinking, we have been directed to that which is most important in the cell.

      The second is a more random argument which says in the early days of molecular biology a combination of techniques did identify a number of interesting and important genes but that set us on a kind of canalized path in which the only way to continue working on any gene would be to provide evidence for why it’s important, which would produce a kind of directed, and much more limited scope of inquiry. So, well-characterized genes are well-characterized because they’re the ones that due to a founder effect were able to attract more funding and interest.

      I realize there are all kinds of random screening and identification methods that should counteract that second argument (mutational screens, transcriptome profiling, GWAS, etc.) but at the same time my experience has been that when such techniques are used, researchers gravitate toward the genes about which most is known anyway.

      There’s also the point alluded to elsewhere in these comments that unknown genes may be unknown because we haven’t yet figured out which conditions under which to study them.

      This is one of the valuable things that come out of work like this: questioning our fundamental assumptions about life and how it works. Or how we think it works, anyway.

    2. Oliver H in reply to Al

      There’s a whole host of barriers, and grants are only one of them. “Blue ocean research” is all good and fine when you already have a solid standing and don’t have to run after every nickel and dime and have enough projects going on that you’re not dependent on this one yielding publishable results. But this pretty much precludes it being done by a graduate student – unless the PI is such a beacon in the field that their connections and their name on the CV will overcome the risk of a lack of publications three, four years down the line. Projects with an outcome that may not be foreseeable in as to what it will be but as to that it will happen are far more attractive.

  2. Jose

    There are close parallels in self-evolving circuits- truly ghost in the machine territory:

    http://www.damninteresting.com/on-the-origin-of-circuits/

    “The plucky chip was utilizing only thirty-seven of its one hundred logic gates, and most of them were arranged in a curious collection of feedback loops. Five individual logic cells were functionally disconnected from the rest— with no pathways that would allow them to influence the output— yet when the researcher disabled any one of them the chip lost its ability to [function].”

    1. zero in reply to Jose

      Some of those structures exploit subtle behavior of the underlying chip. Running the same configuration on a different chip or in a different location would almost certainly fail. One has to train the algorithm in a variety of environmental conditions and on multiple different chips in order to develop a program that is robust enough to be demonstrated and shared.

      In a similar way, the minimum viable codebase for a cell is dependent on the environments in which that cell will be expected to survive. It sounds like this is part of the allure of a fully synthetic cell: the code can be tweaked and run through many different environmental conditions to see what exactly the various genes are doing under varying pressures.

  3. What are issues that come up with very big genomes? Do they throw light on the issue from the other end? Or do they just involve “junk” repeats?. Human genome used to be described as about 97% junk until it became clear that not under standing and junk were not becessarily the same A very boring plant called Paris Japonica is contested holder of biggest genome award with about 150,000 genes or 6x human one. It has a lot of dull foliage and tiny dull white flowers from an anthropomorphic perspective. What is all that DNA doing?

    http://www.sciencemag.org/news/2010/10/scienceshot-biggest-genome-ever

  4. Vaudaux

    A small point, re “(This makes me wonder, a little bit, about the “huge-stretches-of-noncoding-DNA-are-vital-because-they’re-scaffolding” argument)”:

    Note that Mycoplasma mycoides is a bacterium.

    Those huge stretches of non-coding DNA are in eukaryotic cells. The genomes of bacteria mostly consist of one gene after another with very little space in between.

  5. Wile E. Coyote, Genius

    “Mycoplasma (M. mycoides JCV syn1.0) that they had synthesized and exchanged into another cell’s nucleus, which then showed viability.” I haven’t read the paper or news items, but I find it really interesting that they put the mycoplasma genome into the nucleus of another cell. Bacteria don’t have nuclei. Did they put it in a eukaryotic cell?

    1. Vaudaux in reply to Wile E. Coyote, Genius

      The recipient cell was also bacterial. From a paragraph in the 2010 paper, describing the challenges of constructing the artificial cell: “We also needed to learn how to transplant these genomes into a recipient bacterial cell to establish a cell controlled only by a synthetic genome. Because M. genitalium has an extremely slow growth rate, we turned to two faster-growing mycoplasma species, M. mycoides subspecies capri (GM12) as donor, and M. capricolum subspecies capricolum (CK) as recipient.”

  6. Anon

    RE:“huge-stretches-of-noncoding-DNA-are-vital-because-they’re-scaffolding”. That argument is usually made for eukaryotes. Prokaryotes, which you are discussing here, usually have substantially more compact genomes with a lot less “junk” (i.e. stuff we don’t understand).

  7. CBPS

    Hey Derek,

    On an unrelated note, I’m curious to hear what you think of the ongoing squabble at Google’s Verily:

    http://www.statnews.com/2016/03/28/google-life-sciences-exodus/

  8. Da Vinci

    Bigger news is that Venter rather disgustingly patented the whole things, effectively shutting down research for anyone but him. Unsurprising coming from him really.

    1. Steve in reply to Da Vinci

      Not disgusting at all. He spent millions of dollars on the research, he has every right to reap the benefits. If you want to work on it you can contact him. By the way, interesting math problem. It said ” 6 5= “. An operation symbol would be helpful!

      1. Dr. Manhattan in reply to Steve

        Yeah, I tried to post earlier on some details around Ventner’s work but also got an operator-less math problem. Gave up after four tries. Too bad, as I know a fair amount about the actual Ventner work.

      2. Da Vinci in reply to Steve

        Only an American would think science benefits from being monetised. Tell that to Salk.

  9. gippgig

    Off topic but may be of interest:
    TV show about a clinical trial: What Love Is – The Duke Pathfinders 50
    “Fifty women with incurable breast cancer endure an experimental medical protocol.”
    In Washington, D.C. March 31 9PM Channel 32 (WHUT). No idea if or when it airs anywhere else – check your local listings.

  10. MikeB

    Then again, it really depends on what kind of ‘life’ you’re looking at. There are many well known examples where if you delete a gene and look at cells under the microscope your cells will look fine and appear normal, but if you mutate or delete that same gene in a multicellular organism, it is now embryonically lethal or causes severe defects. The same logic probably doesn’t apply for more complex life.

  11. Derek Freyberg

    I wished I’d heard the whole discussion about this on “Science Friday” last week – just caught a few moments in the car.
    On the “synthetic lethal pairs”, the speaker (Venter(?)) drew the analogy of a 777. “There are these two big things, one under each wing. Let’s take one off. Hey, it still flies; maybe they’re not needed, let’s take the other off. Oops.”
    I foresee lots more on this topic in the years to come.

  12. Anonymous Researcher snaw

    A friend who used to work at Venter’s outfit said lots of people there called this the “Frankencell Project,” so I guess this paper should be called “Frankencell 2.0” Cool!

    In addition to this project demonstrating that even a minimal genome has lots of unknown genes, I’m certain some of the “known” genes in that genome are not as well understood as we think. There are lots of questionable annotations in the genomic databases.

    As for the tendency of people to focus on what they know, that sure does happen. When I’m showing colleagues a list of the top hits from some Omics study or whatever, they invariably pay attention mainly to the ones about which they know something. A long standing personal fantasy is, before running such a study demand the people who will follow up on the hits firmly promise to work on the top N hits NO MATTER WHAT so that when the list comes back they won’t only work on the known hits. Never quite dared try this…

  13. Andre

    Taken from Al’s comment above:
    “Don’t work on that unknown protein. Work on something important!”

    Now here is a good example illustrating our lack of biological knowledge: the amyloid beta A4 precursor protein (APP). Despite 30 years of research and more than 12’000 publication in PubMed, the biological functions of the APP gene remain still unknown. Mutations in APP are responsible for Early-Onset Alzheimer Disease. By contrast, mice lacking the App gene live on happily and reproduce. Any suggestions? Genetic redundancy?

    1. DanielT in reply to Andre

      Andre I thought knock outs of APP did have a phenotype in mice – reduced learning [1].

      1. http://www.ncbi.nlm.nih.gov/pubmed/10338291

      1. Andre in reply to DanielT

        As far as I understand, studies claiming behavioural differences between App-deficient mice and wild-type mice are highly controversial. There are lots of factors (e.g. genetic background, age, sex, previous conditioning, changes in animal care takers) influencing how mice respond in (artificial) behavioural tests. Have the App-deficient animals been backcrossed to be isogenic with the control mice? I had discussed the very same issue recently with a leading AD specialist in Germany. He feels that biological function of App is completely unresolved to date. The App-deficient mice in his animal facility are apparently normal and fertile. I therefore believe that if there were learning deficit in App-deficient mice, it does not appear to impair their reproductive activities. The males find the females without any problems and remember what needs to be done….. Maybe other AD specialists can comment on this point in greater detail.

        1. DanielT in reply to Andre

          Well if humans are anything to go by then reduced learning does not seem to affect fertility. Actually what would be really interesting is to see if we can find a null mutant for APP in the human population like was found PSCK9. If this has no phenotype then this will be very interesting.

          1. Andre in reply to DanielT

            Good point! I am not aware of a report identifying individuals lacking the APP gene. They may exist, however exist.

          2. Lane Simonian in reply to DanielT

            The amyloid precursor protein appears to be an intermediary between g protein-coupled receptors and the activation of Akt. This may explain its possible role in learning and memory and in cell survival.

            http://www.ncbi.nlm.nih.gov/pubmed/25165877

            Certain amyloid precursor protein mutations appear to over-activate g protein-coupled receptors which leads to oxidative stress and to the inhibition of Akt via nitration. The result then is the opposite: memory loss and neuronal cell death.

            http://www.ncbi.nlm.nih.gov/pmc/articles/PMC452057/

            http://www.ncbi.nlm.nih.gov/pubmed/16410804

  14. Insilicoconsulting

    Dosen’t Mycoplasma genitalium have around the same number of genes as this syn 3.0 genome? M. genitalium knockout essentiality experiements were one of the earliest performed by Craig’s group in TIGR. Important consideration since bacteria with cell walls have genomes of the order of 4000 + genes but smaller genomes are known. Even H. influenzae has around 1800.

  15. gippgig

    Then there’s searching for suppressor mutations that allow normally essential genes to be deleted…
    Another interesting question: What is the minimum number of chemical elements for Earth life? C, H, O, N, P, & S are obvious. Some sort of ions are undoubtedly needed in the cytoplasm but could ammonium & bicarbonate, for example, do the job? Could all necessary functions be achieved without metalloenzymes?

    1. tangent in reply to gippgig

      I’d be stunned if any free-living lifeform can operate without metalloenzymes. Doing the electron-pushing to have any kind of metabolism…

      Interesting question how much you could theoretically pare down the set of ions.

  16. Morten G

    Buchnera apparently have smaller genomes than Mycoplasma. There’s also Wigglesworthia which are about the size of Mycoplasma. And then obviously there’s Nanoarchaeum equitans.

    Source: http://www.genomesize.com/prokaryotes/table1/

  17. gippgig

    As I recall Buchnera isn’t autonomous; it can only survive inside insects. Nanoarchaeum may also be a nonautonomous parasite. (Anyone have more up-to-date info?)

    1. Morten G in reply to gippgig

      Nanoarchaeum is an extracellular parasite of bacteria IIRC (like Mycoplasma to eukaryotic cells, right?). But aren’t all of these prokaryotes very dependent on recovering complex compounds from their surroundings? The question is really whether the needed compounds are available in yeast extract – I am assuming that they aren’t growing these synthetic organisms in minimal media. I think it should be possible to run these experiments on any non-intracellular prokaryote as long as you supplement the media correctly.

      I actually started by looking up the genome sizes of Chlamydia and Wolbachia. Both obligate intracellular parasites. Genomes of respectively ~900 and ~1200 which is larger than I expected so I went hunting for more information.

  18. Kaleberg

    This reminds me of a game I used to play when I was a kid. A friend of mine had one of those electronics projects kits with a punch board for connecting components. We’d build one of the projects from the book, usually a radio or music generator, then we’d take turns removing wires until it stopped working. The last one to remove a wire won.

    We had no idea of what we were doing.

Leave a Reply

Your email address will not be published. Required fields are marked *

Solve the math problem. * Time limit is exhausted. Please reload CAPTCHA.