Here are two papers have been going around on Twitter for a few days now. The first one is from a Hindawi title, “The Scientific World Journal”, from a group at the University of Malaya. And the second is from the same team (several overlapping co-authors), published a year or so later in Scientific Reports. Neither paper is, frankly, of very much interest as far as I’m concerned – you could probably publish these “Hey, this random compound does something to cells” papers every week if you wanted to. Every time I see these things, I can hear what Samuel Johnson said about “Ossian“, that is “A man might write such stuff forever, if he would abandon his mind to it.” But subject matter aside, the immediate problem is that Figure 4 from the first paper is the same batch of pictures as Figure 2 from the second one (well, slightly dimmer), and they’re supposed to be looking at completely different compounds. This isn’t possible, of course.
The University of Malaya took pretty swift action the last time this happened, and I expect that they’ll want to have a look at this situation, too. In a larger sense, though, what this makes me wonder is whether anyone has written image-comparison software to catch things like this automatically. You could start with an algorithm that calls up the papers from all the co-authors for the last few years and pulls the figures and images from each one, then starts sorting through them for similarities. I certainly have never programmed something like this, but it seems like you could pick some distinctive contrasty feature from a given frame and use that as a fingerprint to look through the others. If you wanted to get fancy (and people do get fancy like this), then you’d also want to have it search through some of the rotations as well. For all I know, there is such software already, but (not being a journal editor) I’ve had no occasion to seek it out.
If it does exist, though, it doesn’t appear as if many journal editors themselves have had occasion to seek it out, either. This sort of thing happens way too often. You have your duplicated gel bands and lanes, your duplicated cell pictures, and (a favorite) your cut-and-paste jobs that copy out individual features inside what’s supposed to be a single frame. Shady cell and molecular biologists do plenty of this, but you can find it in chemistry, too, of course: try this one and this one, both spotted by F. X. Coudert on Twitter. I mean, we already have enough problems with results that are hard to reproduce – does it help anyone to go on and fill the literature with actual bullshit? Just made-up stuff? This is what I think of every time I read about machine-learning programs that will whiz through the scientific literature and distilling out all that knowledge and all those connections – they’re going to be abstracting out this kind of stuff, too. Just today, Retraction Watch has word of nearly 60 papers being pulled from a bunch of Iranian “researchers” who were manipulating the review processes at Springer and BioMed Central to publish piles of plagiarized “results”. So before we start gathering all human scientific effort together, maybe we should make a couple of passes to remove all the crap.