Skip to main content

The Scientific Literature

Not So Many Uncited Papers, Actually

How many scientific papers drop into the void, never to be cited by anyone, ever again? There are all sorts of estimates floating around, many of them rather worryingly high, but this look at the situation by Nature suggests that things aren’t so bad.

The idea that the literature is awash with uncited research goes back to a pair of articles in Science — in the 1990 one and in another in 1991. The 1990 report noted that 55% of articles published between 1981 and 1985 hadn’t been cited in the 5 years after their publication. But those analyses are misleading, mainly because the publications they counted included documents such as letters, corrections, meeting abstracts and other editorial material, which wouldn’t usually get cited. If these are removed, leaving only research papers and review articles, rates of uncitedness plummet. Extending the cut-off past five years reduces the rates even more.

With a ten-year cutooff there are still uncited papers, of course, but it varies interestingly with the field. Biomedical papers have a 4% uncited residue, chemistry has 8% refractory material, and physics has 11%. Note, though, that if you remove self-citation by the same authors, these rates go up, sometimes quite noticeably. The ten-year uncited rate across all disciplines, minus self-citation, is about 18%. But another thing that such studies have uncovered is that this rate has been dropping for many years. That’s presumably a function of better access and searching across journals and a related tendency towards longer reference lists in general. (In the sciences, that rise starts around 1980 and has gotten steeper in recent years).

Getting these citation numbers is not easy, though – the harder you look, the more you find:

It’s hard enough to check a handful of papers. In 2012, for instance, Petr Heneberg, a biologist at Charles University in Prague, decided to examine the Web of Science records of 13 Nobel prizewinners, to scrutinize a preposterous-sounding paper that claimed that around 10% of Nobel laureates’ research was uncited. His first glance at the Web of Science suggested a number closer to 1.6%. Then, checking on Google Scholar, Heneberg saw that many of the remaining papers actually had been referenced by other works indexed in the Web of Science, but had been missed because of data-entry errors or typos in the papers. And there were additional citations in journals and books that the Web of Science never indexed. By the time Heneberg gave up searching, after about 20 hours of work, he had reduced the proportion another fivefold, to a mere 0.3%.

So the figures above are most certainly upper bounds. Note also that newer technology has shown that papers get viewed or downloaded at a much greater rate than they’re ever cited, so the papers that no one ever reads or sees must be a much smaller fraction in turn. As the Nature article notes, things also get put into reference database without being formally cited. Figuring out how useful all these papers are is another question entirely – and one that’s impossible to answer, for a lot of reasons – but it’s at least possible to disprove the idea that a substantial number of papers are never seen, read, or cited.

But still. . .the papers in the uncited zone do tend (as you’d expect) to be in much less prominent journals (apparently almost all papers published in a journal that you’ve heard of do get cited by someone). And that brings up the “dark matter” problem – these figures are all from Web of Science-indexed journals, a large cohort, but one that (justifiably) ignores the hordes of shady journals that will publish anything at all. You’d have to imagine that citation rates are abysmal among the paper-mill-publishing “journals”, and the great majority of that is surely self-citation. If we count these things as “papers”, then the number of never-seen no-impact publications rises again.

But personally, I’m not willing to count the shady publishers’ product at all, because I just don’t know if it can be trusted. I have no problem with obscure but honest journals, and of course I have no problem with low-impact reference data. There should be piles of it; let’s keep it all now that storage isn’t really a problem. There really are cases where odd little papers from years ago suddenly become relevant – science should never throw things away. But going with some outfit that wants cashier’s checks sent to an island in the Caribbean, accepts your papers in ten minutes every time, and publishes every single word exactly as written (while at the same claiming to review things) is throwing your work away right from the beginning.


18 comments on “Not So Many Uncited Papers, Actually”

  1. A Nonny Mouse says:

    I have just had my bacon saved by a paper in the Arabian Journal of Chemistry (though the work was initially started in Florida, presumably with Katritzky as it was submitted in late 2015).

    1. Derek Lowe says:

      There you go – I think many of us have stories like that, where some obscure-but-real thing turned out to be valuable.

      1. A Nonny Mouse says:

        Unfortunately, I will never be citing this as I used it to make something for a client. As a consequence, people will continue to use the process described in the original method for doing this reaction without realising that there is a superior method out there (time, solvents, yield 10% going to 90%).

  2. Bagger Vance says:

    In your closing you seem to be pushing against Chem Archive servers as well. Or are they a solution to the predatory publisher game?

    1. Derek Lowe says:

      No, I have no problem with the archive servers, and I think that they may well be a weapon against the predatory folks. With a preprint server, you know right up front that you’re getting an unedited, un-refereed manuscript, and it’s up to you to decide what to do with it. Whereas the predatory journals charge to do the same thing and then pretend that it’s now a “real” paper in a real journal.

  3. b says:

    I have heard Carolyn Bertozzi tell the story of their discovery of the strain-induced azide-alkyne click reaction in the literature: a note about an explosion in an old Wittig paper from the 1960’s. Don’t know what the citation rate for that paper was for that paper was for 40 years, but one can guarantee it has exploded in the past 10.

  4. tt says:

    To me, the future of journals is all electronic with complete, traceable access to all raw data and calculations in something like a Jupyter notebook, such that the “paper” is a living document that can be continually improved upon (with a history tree). Much like Kanye’s “The Life of Pablo” album, I imagine future publications to be always updating and evolving (also would enable easy collaborations). A good example of this is “The Living Journal of Computational Molecular Science”. We are still very much stuck and fixated upon this idea of electronic journals being simply paper on glass. This is silly and outdated, missing out what is most powerful of being online and electronic…collaborative, live links to other citations, social, living. Additionally, having the raw data for the paper in this format makes it immediately available for data mining, ML applications, re-analysis, etc…

    1. Bagger Vance says:

      I think anyone who’s looked at Wikipedia can see the downsides of everything being constantly “update-able” by anyone

      1. tt says:

        Just needs gatekeepers on authorized authors and editors. Not a free for all, anonymous system.

  5. Barry says:

    ACS indexing has made the chemical literature searchable for decades, far ahead of the rest of the intellectual world. Chemistry is the field in which a (valuable) paper is LEAST likely to fall through the cracks into obscurity.

  6. bacillus says:

    When the biodefense field exploded after 2001, I found myself citing works as far back as 1928 as well as some classics from the 1940s-1960s. None of these authors gave a flying fig about citations, and most likley none of them lived to see their long forgotten work suddenly getting cited out of the wazoo.

  7. Peter S. Shenkin says:

    Self-citations: A citation of an author’s publication that cites his earlier work arguably should count as a citation of the earlier work as well as the current work. Very often the early articles in the thread are preliminary publications that are subsumed by later publications that fill out additional detail, applications, etc. This form of counting could result in an overcounting of citations, since some authors might cite irrelevant early work just to get the citations, but based on this, I feel the early work is currently undercounted. (If counting citations, maybe count such citations as a half a citation; but if just asking whether the work has been cited, I do think it should count.)

  8. California South University says:

    Speaking of self-citation in paper-mill garbage

    1. Derek Lowe says:

      Holy cow. That guy’s worth a blog post on his own.

  9. Eugene says:

    “Biomedical papers have a 4% uncited residue, chemistry has 8% refractory material” thought you could slip a couple past us!

  10. jbosch says:

    I self cite in order to avoid to rewrite published methods we developed or to refer to a previous strain/construct etc. for people to look up details if they wish to. As long as it is related to your current manuscript, I don’t see why those citations should not be counted as true. For assessment of the value of the work I would remove them though and only judge by other citing the work.

  11. Li says:

    “The ten-year uncited rate across all disciplines, minus self-citation, is about 18%.”
    “Not So Many Uncited Papers, Actually”
    Is it just me, or does 20% seem like a LOT to anyone else? OTOH, there is the problem that some self-citation is because the paper doesn’t completely (adequately) define its methods. IMHO, self-citations to justify a procedure makes lots of sense, but self-citation to describe it should not be allowed…electronic storage of supplementary material costs almost nothing and it is an unusual researcher who uses EXACTLY the same method over several papers…things usually change – by design or just because Entropy increases. Of course, I’m (naively) presupposing that the methods ARE adequately described/defined somewhere (and many of the papers on the replication crises would contradict that).

Comments are closed.