Ever since gene sequencing became feasible (for several values of “feasible”!) it’s been of great interest to look at the genetic material of cancerous cells. It’s been clear from very early on that there are many changes, mutations, rearrangements, shifts, etc. in a cancer cell’s DNA, and it’s been equally clear that these do not tell a simple or clear-cut story. Genomic instability itself is a key feature of many tumor lines, and genomic instability comes in a lot of varieties.
Here’s a new paper that illustrated that in even more detail than we’ve had before. The authors are looking at a cell line that’s considered a model of HER2+ breast cancer, SK-BR-3. And they’re examining it via a (relatively) newer method, long-read sequencing. That’s been coming on for some years now, and this is just the sort of paper that illustrates how it can be valuable. As many readers will know, many current DNA sequencing methods work by breaking genomic DNA down into shorter fragments and reading those, then computationally assembling the original sequence from all the various breaks and overlaps. (This, indeed, was one of the key innovations during the Human Genome Project, and its development and eventual acceptance make for quite a tale). Current short-read sequencers work with roughly a few hundred base pairs at a time, but long-read sequencers up that to several thousand. That lets you get a better look at variants and rearrangements that get missed or averaged out when you chop things up too finely.
Now, short-read “shotgun sequencing” was a real breakthrough – the human genome would not have been unveiled at that famous press conference at anywhere near that date without it. But even as everyone was taking bows, waving to the press, or perhaps exchanging poisonous glances with their collaborators, the question was hanging in the air: what does it mean to have “sequenced the human genome”? Everyone’s genome is different. To some extent, that announcement was, I’m pretty sure, that we’d at least partially sequenced Craig Venter. But Craig Venter ≠ humanity at large (a statement which will come as a relief to no small number of people).
Looking at genomic DNA is like looking at the night sky. Those stars over your head are a mosaic in time – the light is all hitting your retina at the same time, but (in the summer sky) you’re seeing how Altair looked in 2001, how Vega looked in 1983, how Antares looked in the year 1398, and how Deneb looked in about 600 BC. Let’s not even get into the deep-sky objects – if you stay up a bit later and can see the naked-eye fuzzball of the Andromeda galaxy, that light is from around the time that the australopithecines were learning how to spend more of their time walking on two legs.
Similarly, when you look at a DNA sequence, you’re looking at a mosaic of conservation. Some parts of that string of letters are nearly unchanged all the way from humans to sea cucumbers, while others are different between every human on the planet (OK, short of identical-twin human clones, I suppose). So to be technical about it, there is no such thing as “the human genome”: everyone has their own list of variant sequences, most of which make little or no difference because they’re in genomic regions that can handle them. That’s why we talk about “consensus sequence”.
Cancer cells are at the far end of that scale, though. The more unstable ones are throwing off mutations constantly, by all sorts of mechanisms – and, of course, some of those are incompatible with life, but the ones that survive just keep on cranking. What you see when you look at a tumor sample are the lineages that have made it, that have (unfortunately) managed to randomly stumble into being the best tumor cells that they could manage to be, and the disconnect between their priorities and the priorities of the rest of the body are the disease that we call “cancer”.
Here’s what long-read sequencing revealed in this case (ERBB2 is also known as HER2):
Now we have applied long-read sequencing to explore the hidden variation in a cancer genome and have discovered nearly 20,000 structural variations present, most of which cannot be found using short read sequencing and many are intersecting known cancer genes. More than twice as many of the copy number amplifications could be explained through long-range variants identified by long-read sequencing compared to short-read sequencing. We further found the ERBB2 oncogene to be amplified through a complex series of events initiated by a large translocation into the highly rearranged hotspots of Chr 8, where the sequence was then copied dozens of times more with further translocations and inverted duplications resolved only by the long reads. Furthermore, we find 20 additional inverted duplications throughout the genome, highlighting the importance of this underreported structural variation type. Overall, using long-read sequencing we see that far more bases in the genome are affected by structural variation compared to SNPs.
There are several ways for those inverted duplications to happen (just to pick one of the problems), but in general, given the hosed-up nature of DNA handling, quality control, and cellular death checkpoints in a cancer cell, pretty much anything that you can imagine happening is probably happening at some point. It’s especially interesting to see the complexity of what’s happened to HER2 – if there’s a popular perception at all of DNA problems with cancer, it’s likely a picture of one little thing going wrong. But the reality is that stuff goes into a combination duplicator/industrial fan, and gets flung all over the genome. Note that in these cells HER2 has been translocated into chromosome 8 from its original home on chromosome 17 and then copied “dozens of times” with more translocations. We’re seeing the result of a long series of bad events piled up on each other. To paraphrase Adam Smith, there is a lot of ruin in a cell.