Looking back on the Human Genome Project, it’s becoming increasingly odd to read the articles at the time about how we were about the sequence “the human genome”. As if there were only one! Even at the time, of course, it was clear that we weren’t capturing human genetic diversity, but that was seen as a much longer-term goal. The big thing was getting the first sequence, for the first time, which is fair enough.
But the popular press coverage of the event didn’t always emphasize this, which has put some semi-contradictory ideas into the heads of the lay public. Pretty much everyone out there has taken on board the idea that everyone’s DNA is unique, thanks to the ubiquity of forensic genotyping in the news and on TV shows. At the same time, pretty much everyone knows that we’ve Sequenced The Human Genome. I’d hope that nonscientists ended up with a mental picture of mostly-the-same-but-subtly-different, but I’m not sure. But if they haven’t, I think that they’re going to get plenty of chances to hear about it – more each year – because the steadily increasing power and scope of sequencing technology is allowing us to really get to work in the zone between those two concepts. That, as it turns out, is where a lot of the promise that attached to the original human sequencing really resides.
Here’s a paper in Nature Biotechnology (from a large multicenter team) that illustrates the point (commentary in that issue here).We can learn a great deal from “human knockout” phenotypes, which can illustrate the effect of key genes several ways. You can see a sudden loss of function (usually a bad thing, as in Mendelian diseases, but sometimes a reasonably good one, like loss of PCSK9). And by looking even more carefully, you can find a few people who have such loss-of-function mutations but for some reason are still fine. Both of these strongly suggest avenues of potential treatment, and perhaps not only for the (often rare) Mendelian diseases themselves.
This new paper is the sort of thing that would have been completely impossible not so long ago (which is the sort of line that you can insert into most current stories about gene sequencing). It starts with a look at over 589,000 individuals, with varying degrees of genetic detail available, and narrows down, in a series of more rigorous stages, to try to find those participants who have known disease-bearing mutations but who still show no signs of the corresponding disease. It isn’t easy, because there aren’t many.
There’s good news and bad news. The good news is that out of that large starting set, the team managed to find 13 individuals who have apparently escaped their genetic destinies for 8 severe diseases. The bad news is that they can’t study any of these people further! Why might that be, you ask? Paperwork!
We were unable to recontact any of the 13 candidate resilient individuals identified in this study, often due to the absence of a recontact clause in the original informed consent forms used for the studies from which these individuals were identified. Although recontact was possible for some cohorts in this study (e.g., Mount Sinai School of Medicine Biobank), no candidates were identified from those cohorts. Given this, we were unable to perform additional critical preprocessing steps to further confirm the resilient status of these individuals. Such steps would include confirming that the analyzed DNA matched the correct medical records for each individual, that they had not been diagnosed with the indicated Mendelian disorder, and that they were not mosaics. We consider these preprocessing steps as critical in order to formally characterize candidates as truly resilient.
Thus does science march on, waist-deep in consent forms. Even as it stands, though, this study is a valuable look into both the riddle of genetic resistance and into just how to deal with finding such people in large data sets. It’s a series of tradeoffs: time versus thoroughness, cost versus speed. Even with today’s sequencing techniques, you’re not going to start off by completely sequencing 500,000 people. The techniques used here probably missed some interesting cases, but that was in the service of getting to the ones that they did find. If you don’t really lean on the data along the way, you’re going to go off into the swamp:
The utility of a high-impact screening panel depends directly on rigorous informatics processes and clinical review. Less than 1% of the candidates we initially identified from the screening panel survived our filtering criteria. More than 75% of the initial candidates identified were filtered out due to errors in variant calls resulting from low coverage that made it difficult to reliably call homozygous genotypes, high GC or AT content known to lead to higher sequencing-error rates, or from repetitive sequences known to lead to alignment errors that in turn lead to false small insertion or deletion calls. The remaining false positives represented candidates that failed to pass our established clinical presentation criteria, harbored mutations that were inaccurately represented in the mutation databases, or for which there was insufficient scientific evidence to support the predicted phenotypic impact of the mutation.
But what you’re left with is potentially incredibly worthwhile – if you can contact the people involved, of course. We’re left with what sounds like the pitch for a (probably not very good movie): walking among us are 13 mutant humans, able to fight off what should be crippling genetic defects. And we don’t know how. But we don’t know who they are, or if we can ever find them again. . .