Let’s file this one under “We’ve seen this before, and I’ll bet we’ll see it again”. Anyone who’s worked for some years in cell culture (or with people who have) should appreciate the dangers of cell line contamination. You can get mycoplasma, you can get other cell lines entirely (particularly others that are more vigorous and hardy than the ones you’re trying to grow), and you can get viral thingies that cause you so much trouble that your entire company gets bought by someone else.
Here’s a new paper in PLoS ONE that tries to get a handle on the problem. The real kicker is that some of these cell lines became contaminated along the way, so that earlier papers and later ones in the field are actually referring to different cells. And others became contaminated (or mis-identified) so early that basically all of the literature on them is mistaken. Warnings have taken place about this stuff again and again, and the current literature is surely cleaner than the older papers. But how bad is it in the published record?
By correlating the literature with a list of known contaminated cell lines (many of them invaded by HeLa cells), the authors estimate a lower bound of over 32,000 papers that have worked on the wrong cells, compared to what they report. In turn, these papers are cited by at least 500,000 more articles, and that total excludes self-citations. And as the authors note, they were quite conservative with their name strings in the searches, so although there are also still a few false positives in those numbers, they are surely tiny compared to the false negatives – the mistaken papers that haven’t been flagged yet. A representative example:
In a 1994 report, the establishment of a group of novel thymic cell lines (F2-4E5, F2-5B6, P1-1A3 and P1-4D6)  was announced. In a report by MacLeod et al. , the cell lines were found to be misidentified, having been derived in fact from a liver carcinoma. In total, 69 articles were found that refer to these cell lines, in turn cited by 2092 articles. Of the primary articles, 43 were published after the report by MacLeod et al. and the most recent one was published only in late 2016 . Of the fifteen most recent articles referring to the 1994 report, thirteen actually refer to it because they use the cell lines, all thirteen reporting research on thymic cells, without mentioning any knowledge of the misidentification of these cell lines. The other two articles refer to the establishing article for the sake of the method used in it to establish novel cell lines.
So yeah, there’s a lot of crap out there. All you folks who are trying to machine-learn your way through the medical literature, you now have a half million more papers to flag: and remember, there are plenty more where those came from. And as the authors note ruefully, just flagging a cell line as misidentified is not enough to stop people from using it. The “Chang Liver” cell line was established in the early 1950s, but as early as 1967 it was suspected to actually be yet another culture of HeLa cells. Ten years later, more evidence was presented, but the cell line originator argued at the time that these were real liver cells that had changed in cell culture, not HeLa. The question was resolved beyond doubt in 2001, so that should have been that. Right?
Wrong. A search through the literature will show “Chang liver” cells are still being used as if they were liver cells, with no mention of the misidentification. Hundreds of papers in the last fifteen years have done so, and their appearance in the literature shows no signs of going down. If anything, it may be slightly increasing. This new paper finds the same trend. The number of papers citing cell lines that are known to be wrong is increasing – perhaps not as a percentage of all scientific papers, but it sure isn’t going down, despite numerous warnings and exhortations.
You’d think that this would largely be a problem for areas where research has not been as well established, or where appropriate safeguards haven’t yet been put in place. For example, a 2015 report – from China – suggested that 85% of the cell lines established there were wrong, and were almost entirely HeLa cells. This latest work finds that China’s share of the contaminated cell line literature is indeed rising rapidly (like, congratulations, guys), but that the majority of such papers are still from the US, Japan, Germany, and the like. No one has any room to feel superior, for all have HeLa-ed.
What to do? At the very least, the authors suggest, we could flag these papers in the databases with a note that they used a cell line that is known to be misidentified. Later readers can then deal with them as they will. But for papers going forward, I’m stumped. Nothing seems to have worked. Just as with bad chemical probes, people just plow right ahead no matter how many warning flares you send up. Any ideas?