Recycle, Reuse, Republish

There’s an analysis in the latest Nature that puts some numbers on a problem that scientists the world over have suspected for some time: the number of duplicate papers that show up in the literature. The authors used this online text-similarity tool to go through papers in Pubmed, and found a small (but not as small as it should be) percentage of papers that seem to be the same damn things, recycled.
As it turns out, the “most similar papers” function over on the right-hand side of the Pubmed results was a good starting point for tracking these down, and this shortcut allowed them to search the entire Pubmed database. The authors have set up a web site where they’ve deposited their data and their lists of duplicate papers. Out of about 7 million abstracts, some 70,000 were flagged as being highly similar to their corresponding “most related article” on Medline. Manual checking suggests that about 50,000 of these are going to be true duplicates – they’ve gone through about 2700 by hand so far (statistics here).
They have drawn some preliminary conclusions from their data set. For one thing, duplication seems to have been steady or trending down in the database during the 1990s, but has been increasing since 2000 (and is currently at the highest level). Their explanation – the rising number of print and online journals, making copying easier to perform and harder to detect – seems right to me. Another interesting graph is the frequency of duplicates by country of origin, versus that country’s relative contribution to the Medline database as a whole. Looked at that way, the US is under-represented in the duplicates (which is good to know), and Japan and China are quite over-represented. Several explanations for this are considered – original publication in a language less used for scientific publication, followed by a chance to expose the same work to a wider audience, for one. But the authors don’t hesitate to cite “differences in ethics training and cultural norms” as a factor, too.
A further fascinating detail is that the papers which seem to have been duplicated in different journals by the same author (or authors) very often appear too soon after the first publication to have gone through the reviewing process sequence. In other words, they were most likely submitted simultaneously to both journals, which isn’t a nice thing to do. By contrast, when the same stuff appears under someone else’s name, there’s generally an appropriate time lag.
This study notes that their manual inspections have, so far, found over seventy cases of what looks like outright plagiarism, and that they’re starting to contact journal editors and universities for more details. And they also seem to have found a number of what they term “serial offenders”, and are investigating those cases as well. They don’t go into details, but my guess is that some of those people could possibly be found here.
Their hope is that if such authors realize that such tools exist, that plagiarism and duplication will be seen as more risky. Thus all the publicity. Want to try it out yourself? The list of potential duplicates can be found here. Here’s the list of journals, and you can plug those into this search page and see what you come up with. Here are some of the manually checked papers – click on the left-hand side ID number to see a side-by-side comparison.

16 comments on “Recycle, Reuse, Republish”

  1. Greg Hlatky says:

    Perhaps there’s room for a new series of journals: Journal of Plagiarized [fill in discipline] Research.
    Some years ago, I gave a paper at an ACS meeting, for which a preprint was required. I wanted the results published in a journal where it would get more visibility. When I later submitted substantially the same paper to that journal, I explained the problem to the editor and included a copy of the preprint. The editor cleared it for review and publication.

  2. A-non-y-mous says:

    Wow, some of these serial offenders are brave . . . and lazy, they don’t even re-word the abstract. I know I shouldn’t be shocked, but I am. I can’t not compare the articles. Thanks for the links.

  3. Gerald Bothe says:

    My apologies for an off-topic question – I need a source for 4-Epidoxycycline and as a mere biologist I don’t know how to find one. Could anybody help me? Can answer directly to

  4. reevej says:

  5. azmanam says:

    Haven’t had the time to go through very rigorously.
    Do you know if their ‘similarity tool’ includes total synthesis communications later published as full papers?
    The intros might look similar, as well as some of the language for navigating through reaction schemes.

  6. Rhenium says:

    Wow… JACS and a slew of other high profile journals. I’m suprised no none has commented on this yet. Now it’s out in public forever.
    Still Etblast will be a handy tool for when I review journal articles in future.

  7. macabre says:

    Mulzer’s recent Pasteurestin A tot. syn.
    Basically the same synthesis as Vollhardt published 15 years ago.
    Not sure what is worse, doubling up your own work if papers are a bit slow that year or blatantly copying already published work
    All in all, very depressing.

  8. Anonymous BMS Researcher says:

    Somebody close to me was once a journal editor; on more than one occasion referees called her attention to likely plagiarism — needless to say these manuscripts did NOT get published! I wonder how many cases of plagiarism get detected before publication versus the number that slip through without detection and get published?

  9. Anne says:

    I hope I don’t sound like a hopeless Neanderthal here, but what *are* the ethics of republishing a paper? Let’s take, for example, a paper that goes in a conference proceedings in an abbreviated form and is then fleshed out and submitted to a normal journal. Okay or not? What about vice versa? Or a monster paper packed with technical details submitted to one journal, accepted and published, and then trimmed down to reveal the central facts and submitted for publication in a flashier journal? Or how about a thesis that generates a series of papers based on its chapters (tidied up to suit the audience)? Cribbing wholesale text used for one proposal (requesting telescope time) to go into another (both by the same author of course)? Cribbing “motivation” text from one paper to go into another paper studying the same phenomenon?
    I’m genuinely unsure what the ethical rules are for, well, all those cases above; there are plenty of others that are plainly unethical (cribbing text from someone else’s paper) or where they’re clearly borderline (rewriting a paper so that it counters arguments made by another paper in press without citing the other paper). But it seems like a certain amount of textual similarity is inevitable. If your field of research is the mysterious 511 keV emission from the galactic centre, all your papers should have an explanation of what the observed excess is and why it’s mysterious; is there any reason those introductions shouldn’t be quite similar?

  10. RKN says:

    Pretty interesting. I wonder if reviewers now blast the abstract/intro of submitted papers to check for this sort of thing before accepting them for publication.

  11. Ken Knott says:

    I’m surprised by the lack of comments on this. To me this is fascinating… And some of the serial offenders are truly ridiculous with the sheer amount of plagiarism. I’d love to be a grad student in their group and call those professors asking for explanations… I would be very interested to hear the responses of the offenders and their universities, not to mention the journals….

  12. RKN says:

  13. Bunsen Honeydew says:

    As troubling as a lot of this is, there is one part that I don’t really have a problem with and that’s duplicate publication in different languages. If someone wants to publish a paper in Chinese, Japanese, or Korean, and then publish it in English, I’m not sure I have a problem with that. Am I ever going to see that non-English paper? Aside from SciFinder, no. Could I ever read the non-English paper? No. Do I want to go through the hassle of getting that non-English paper and getting it translated? No. Am I happy that that paper appeared in English? Yes.
    Now, all that being said (typed? written?), I can read papers in French and German without too much trouble. But if it’s in a non-Latin text, forget it.
    I also don’t want to see national journals disappear. I believe that there is still a place for journals like Helvetica Chimica Acta, Australian Journal of Chemistry, and others- especially if the majority of articles are not in English. The authors are trying to reach two groups of people- the broader chemical community and their national community. Sometimes, in order to reach both groups you need to publish in two places and the groups are largely mutually exclusive.

  14. Jose says:

    “Re-issue ! Re-package ! Re-package !
    Re-evaluate the songs
    Double-pack with a photograph
    Extra Track (and a tacky badge)…”

  15. Charlotte says:

    Derek, thanks for linking this – I’m a journal editor, and I’ve spent the last day or so going through the 60 hits in my journal – most of which I’m pretty comfortable with, as they’re clinical guidelines and similar, but there’s certainly a couple I’m expressly unhappy with.
    I adore reviewers who are on the lookout for, and call attention to plagiarism and other examples of publishing dishonesty. They are treasured individuals. We’ve caught poor ethics at every stage, but a sharp-eyed reviewer is far more effective than I am when I triage a paper, or when my copyeditor’s ploughing through it. We’re looking into doing more (and all suggestions are welcome!), but it’ll be a happy day in the editorial office when ManuscriptCentral and EES have an inbuilt text similarity scanner.

  16. anon says:

    Fascinating article and it is fun to poke around in the results. The one search I did for my old advisor yielded a pair of almost identically worded abstracts but two articles with substantially different results (to my mind, at least).

