Skip to main content

The Dark Side

Thoughts on Reproducibility

Not too long ago, I was talking to someone outside the field about the “reproducibility crisis”. They’d heard that there were many published papers whose results weren’t solid, and wanted to know if I’d encountered that. I had to tell them that yep, I sure had, and that just about anyone who’s worked in any field of science will be able to say the same thing.

But, I cautioned them, that doesn’t mean the whole scientific enterprise is about to collapse. Less-than-reliable papers have been a feature of it since the beginning. I’m very willing to put money down on the proposition that the first Fellows of the Royal Society grumbled to each other in private about that latest correspondence from so-and-so, which had those things in it that no one else seems to be able to get to work. It comes with the territory, and the (more or less built-in) error correcting mechanisms of scientific work take care of many of them. It can be intensely annoying to be part of that error-correcting machinery when you were intending to do something else with your time, but that’s the risk you take.

I told my questioner that wonky results are found in all sorts of journals, but the pattern was not what they expected. Sure, I said, there’s junk in the junk journals. I mean, some of that stuff is real and published by people who don’t know any better, are taking the easy way out to rack up publications for some committee or government oversight, or just can’t seem to get their stuff published anywhere else. But there’s plenty of actual slop down there, and surely some actual fraud, too – it’s just that no one, for the most part, pays attention or ever tries to reproduce the stuff, because so much of it is of no interest to anyone (including, I’d also be willing to bet, the authors themselves in some cases).

Then, I told them, you have a solid layer of solid journals. For organic chemistry, I’d call this the “JOC stratum” – perfectly respectable journals publishing perfectly respectable papers. It’s not always the most exciting stuff in the world, but it’s almost always reliable, and when you get a literature reference from such work you feel pretty confident that you can get it to run for you, too. Most scientific work is actually in this category, fortunately for us. You don’t see as much real fraud in this part of the literature, for the same reason that no one goes to the trouble of counterfeiting $10 bills.

But after that, you come to the big, flashy stuff. I explained how each field has its own top-tier specialty journals, and after that come the multidisciplinary ones covering (say) all of chemistry (like JACS or Angewandte Chemie) or crossing fields entirely (like Science or Nature). In theory, papers published in the latter journals are of interest to people in totally different fields, although in practice it’s not like we can understand the fine details in each other’s work, and we’ll will do well if we even look over the list of titles. But certainly, if there’s some big to-do in physics or evolutionary biology, I enjoy hearing about it and reading up on it to the extent I can.

Those journals, though, I went on to say, are where the unreliable results start cropping up again. My interlocutor was surprised at that one; he thought that the top journals would necessarily publish the most rock-solid stuff. Not so, I told him – the top journals are such desirable places to publish that people who really need to get papers in there often rush them out before the cake is quite baked. Or before they check to make sure that they added the raspberry jam inside it. Or that they didn’t actually use a half cup of salt instead of a half cup of sugar – you know, that sort of thing. And besides, the kinds of results that can make it into these journals are the high-profile, unexpected, cutting-edge stuff, and those are the tricky things that aren’t always going to work quite right, anyway, not until more people have had a chance to find out the things that can go wrong. Which is one of the big things that publication is supposed to do (other than help your grant renewals or get you tenure).

So there’s reproducibility and there’s reproducibility. At the low end, you have the stuff that Pauli used to describe as “Not even wrong”. At the high end, you occasionally get the wax melted off your wings as you fly too near the blazing sun of a big discovery. And while there is fraud, and while fraud needs to be discovered and expunged (and its practitioners removed from the field), at the low end it doesn’t have a chance to do much damage, and at the high end it tends to get discovered pretty quickly. And actual fraud up at that level is more rare than it is in the low end, for just that reason, which makes the cases where it does happen (as in various retracted stem cell work, or the physics papers of Jan Hendrik Schoen so notable. Those things waste time and effort when they do happen, because so many people are interested in the results, but it’s also true that getting them out of the literature as quickly as possible is not really a waste of time. You’d rather never see such things take place at all, but we staff the labs with humans, and any large sample of people is going to include some interesting and fraught personalities in it. . .

83 comments on “Thoughts on Reproducibility”

  1. luysii says:

    A friend, an emeritus professor of chemical engineering, referees a lot of papers. He estimates that 80% of the papers in his field, quantum chemistry, coming from China are absolute trash. According to him China gives bonuses to people getting published in high impact journals. What he finds particularly appalling is that he writes up a detailed list of corrections and improvements for the paper, and then finds it published totally unchanged in another journal. This is from 2 years ago.

    1. Xiao Bu says:

      It is entirely normal for a rejected paper to be resubmitted and eventually accepted into an appropriately ranked journal. That is one key purpose of people like your friend acting as reviewers. Furthermore, your friend’s claim that 80% of articles submitted from China are trash is not that far from the median. Unless he reviews for a garbage journal… For example, Angewandte Chemie rejected 79% of submitted papers in 2013. Yes it is true that Chinese scientists have cash incentives to publish in good journals, I believe that depends on institutions and is not government funded. There are pros and cons to this arrangment, but despite what incentives Chinese scientists have to publish, the entire purpose of reviewers is to safeguard against fraudulent or low quality publications. China’s efforts to promote growth in the sciences is working, and China is quickly becoming one of the largest contributors to high impact journals.

      1. Li Zhi says:

        I disagree with the assertion that reviewers are to “safeguard against fraudulent … publications”. They have, in general, not the time, skill set, inclination, nor resources to detect fraud. We are (arguably) fortunate that in many cases the fraud is so poorly executed that it’s obvious (e.g. reusing the same photographs) – probably an indication of the perpetrators mental state. Fraud is like murder; a good murder is the one no one knows about. (I’ll also call out Derek’s use of the phrase “perfectly respectable papers”: if it’s Science, then by definition we should be treating it critically. We should be expecting and looking for errors of omission and commission and not treating is as being perfectly anything. Yeah, pedantic, I know). If we actually pay attention to amount of lying the average human being does, then it should be pretty clear that fraud is far more common than we like to admit, that we’re all less honest than we could be. We therefore should be amazed at the amount of solid replicable work that is done, given our lyin’ cheatin’ ways.

      2. Morten G says:

        Hopefully Angewandte sets the bar for publication higher than “Well, it isn’t pure trash at least”.

  2. Hap says:

    The complicating factor for both fraud/ick at the low and high ends is the complicity of others in it – at the low end, someone is counting publications and so people decide to give them publications, and at the high end, someone or something gets ancillary glory (and grant money) from papers in big name journals and so doesn’t pay attention to the signs of problems the way they ought to. Systems in some cases encourage the bad stuff, and they need to be discouraged from doing so, if possible.

    People will do what they can get away with, but systems can mitigate or amplify their behavior.

    1. fluorogrol says:

      Yep. Change the incentives, change the behaviour. Just *how* that is achieved is the big question.

      1. Hap says:

        Sorry. I’m just the idea rat.

        1. Some idiot says:

          Squeak! 😉

          1. Postdocof20years says:

            Dude, y’all need to keep your ( scientific ) problems to yourselves, jeese.

          2. Derek Lowe says:

            Nah, having worked with several “idea rats” (and having tendencies toward that myself at times), I enjoyed the reference.

    2. Vader says:

      “People will do what they can get away with,”

      I’m not so cynical. Some people will do what they can get away with. That’s where the “removing them from the field” part comes in.

      Most scientists I’ve known have an ethic that they won’t knowingly publish junk. At most, they’ll publish stuff they’re a little uncertain of, because all human affairs are uncertain, and they may not exactly play up the uncertainty in the way they tell their story.

      1. Isidore says:

        “they’ll publish stuff they’re a little uncertain of, because all human affairs are uncertain, and they may not exactly play up the uncertainty in the way they tell their story.”

        It’s the “I want to believe” syndrome. Scientists have their biases, and where others might see ambiguity in the data they will discern and extract (by force, if necessary) certainty. This is what peer and editorial review is supposed to guard against and perhaps at some point it did this better than today.

  3. Old Timer says:

    I’ve always suspected that in my field, organic chemistry/catalysis, fraud or unreproducible results had to be much lower than in almost any other field except physics or math. Does anyone have hard numbers on the retractions/fraud by field? The reasoning behind this suspicion is obvious; proofs in math, measurements in physics, and spectroscopic data in organic chemistry are very easy to check. Only the most dubious fraudsters (as laid out in Derek’s previous post) go through the trouble to fabricate this data… and are usually caught. So much is unknown about what happens in biology (also a favorite topic around here), things are often misinterpreted.

    1. anon says:

      Many proofs in math (and measurements in physics, think LHC for example) are extremely difficult to check because of their complexity. I don’t think there’s much fraud there, though; it seems to me that deliberately writing a faulty proof that would be convincing enough to fool others would be very difficult.

      1. luysii says:

        Incorrect proofs tend to follow mathematicians their entire career, no matter what they do. They scrupulously try to avoid them. Smale always mentions that Guckenheimer’s PhD thesis contained an error, even though he later became a prof at Cornell.

      2. Susan says:

        Actually, LHC results are very easy to check, because there are two to four (depending on the nature of the result) different experiments on LHC, with fairly similar physics reach. It is not a coincidence that ATLAS and CMS announced the discovery of the Higgs at the same time. The experiments are independent of each other, and to a fair extent independent of CERN, which hosts them – it doesn’t manage them.

        Fraud would also be very difficult in these experiments, because draft papers (which are typically produced by a smallish group) are internally reviewed by the collaboration before submission – and the internal reviewers are much more likely to know where the bodies are buried than the journal referees are! Most high-profile analyses will also have two or three groups working in parallel, and the final result won’t see the light of print unless the parallel analyses come up with comparable answers. Plus, we demand a 5-sigma significance for a discovery – we get p = 0.05 significance results every day, and ignore them (if your physics working groups produce several hundred plots between them – and they do, trust me – you’re going to get quite a few p < 0.05 results by chance, and a non-negligible number of p < 0.01).

        1. barry says:

          And yet there was the report of faster-than-light neutrinos reported from Gran Sasso…

          1. eyesoars says:

            Yes, there was that result, but even the authors (correctly!) expressed considerable skepticism of their own results. And, I believe, considerably relieved when an alternative explanation (disconnected cable, IIRC) was found to explain them.

    2. Hap says:

      Unfortunately, I wouldn’t be as sure of total syntheses – I like the stories, but most total syntheses are performed once, and on small scale, so there is room for error. Because they aren’t generally reproduced (unless short, or of an interesting molecule that can be prepared on scale by that route or a very similar one), there’s also room for fraud.

      1. Some idiot says:

        This would be difficult/ impossible to implement broadly, but I love the idea behind Org Synth (and I love the preps, too). The principle that someone else has to repeat it before it is publishable. I understand 100% that this is essentially 100% impossible in biology, and impossible in practice in most parts of organic chemistry. I mean, what proportion of PIs are going to devote significant proportions of their time/resources to repeating others’ work?

        But I think the principle is excellent. And would reduce fraud significantly.

        1. Ian Malone says:

          One tricky aspect (well two), is who to get to do it and who pays. Often research is using the most cutting edge (or custom) tools and you need people who are capable of doing work to at least the same standard as the original workers (probably better, because most people will have some weakness, and their weakest relevant skills may need to be as good as the original authors’). Now these people, who have to be quite talented, you are asking to spend time re-doing other people’s work, rather following their own agenda, which doesn’t look spectacular on the CV. It might suit some people’s inclinations, but otherwise you might have to reward them quite well for that.
          In practice, as Derek says, we end up doing that as part of the error correcting part of science, when we try to build on other people’s results. So the cost and responsibility gets spread out (or, maybe more accurately, allocated randomly and unpredictably).

          1. Isidore says:

            Who to get to do it and who pays: What if a graduate student or post-doc from another lab gets to spend a short “sabbatical” in the lab that published the interesting synthesis, repeats it under the guidance and with the assistance of the person who did the work originally, gets credit and makes some useful connections in the process and perhaps be included among the authors, as this work would be an integral part of verifying that what is reported is correct. Funding a student for a few weeks would not be prohibitive and could be split between labs. Again it would not be applicable in 100% of the cases but it might work for some.

        2. Li Zhi says:

          Sorry to disagree. When I run an assay, I do it in TRIPLICATE because that way if I get one outlier, I will toss it. (as long as the other two are “sufficiently close”) (This isn’t in published work, btw, its industrial process development). If a paper needs replication, it needs TWO (not one) (independent) replications. A single confirmation isn’t sufficient – you can’t decide between the results. (Seems to me that Z-ray experiments were replicated quite a few times, and that it took an actual visit to the originator’s lab followed by a paper by that (forensic AND subject matter) expert (from Princeton, iirc) to quash them. (Although the original PI went to his grave believing in them, so this isn’t a case of fraud or (probably (using the then current standards)) of fraud.)

          1. Li Zhi says:

            I meant to write “a single contradiction” (or failure to replicate). D’Oh!

        3. Mary Kuhner says:

          In my field there are papers being published in Science and Nature that should have been caught in review. Tomasetti and Vogelstein (2015) in Science will stand as an example. The statistics in this paper are deeply flawed in a way that post-publication review caught instantly. But the results were “newsworthy” and Vogelstein is a big name (and sits on the editorial board of Science), so it got published.

          It should have been a big red flag that this paper divides cancers into those for which environmental and hereditary factors are important and those for which they are not (and thus prevention measures are unlikely to work)–and puts melanoma into the “not environmental or hereditary” class. Melanoma! Do a quick reality check on how often people get melanoma on always-clothed versus sun-exposed parts of their bodies, or how often dark-skinned people get it compared to light-skinned people, and the problem should be obvious.

          I have gotten to the point that if I see a result in Genetics or Cancer Prevention_ I’ll think it’s probably okay, but if it’s in Science or Nature, well, wait and see.

          How did we get into a situation where the most prestigious places to publish are ones with such a poor record for accuracy? Where your high-profile publication is going to rub shoulders with memory water and arsenic life and melanoma that isn’t influenced by sun exposure?

      2. CR says:

        Maybe the compound itself would be a little harder to produce fraudulently; however, yields would be very easy to misrepresent. I would say there is a very high level of variability in reproducing literature procedures in organic chemistry – but most of those can (or are) attributed to just changing “hands”.

        1. DrOcto says:

          Of greater concern is the almost complete lack of purity data in academic literature. NMR looks OK therefore it must be 100% pure is not sufficient, and is a likely source of variations in observed yield between ‘hands’

      3. barry says:

        The Woodward/vonEggersDoering quinine synthesis has been questioned repeatedly through the years. The reduction on the Aluminum metal surface is fiendishly sensitive to the preparation of the amalgam. Routes to less popular targets have received far less scrutiny.

        1. Jeff Seeman had a very comprehensive analysis of the W-D quinine synthesis and concluded that they were indeed successful; his conclusion was substantiated by a replication study from Colorado State. Link in handle.

          1. barry says:

            Yes, I should have finished the thought explicitly: Although the Woodward vonEggersDoering Quinine synthesis was questioned, more careful re-examination revealed that it works. It just depends on a very touchy reduction step that other found hard–but not impossible–to reproduce.

          2. Humulonimbus says:

            IIRC Rabe was vindicated by aluminum powder aged long enough to build up some oxide (link in handle to CSU reproduction).

    3. Ian Malone says:

      Schön was a physicist; if a paper involves reporting of experimental results then there’s scope for fraud. Fields like theoretical physics or pure mathematics where you’re presenting a theorem are proof against it though. (As I suppose are most completely conjectural exercises like philosophy or literary criticism, though other misdeeds like plagiarism are still possible.)

  4. the seeker says:

    I’ve been fighting some nasty issues with reproducing some literature for some time, and it is appearing that something that at first did not seem like it would matter may be the linchpin.

    I’ve read elsewhere (maybe a previous posting here) that a great debate in reproducibility is that (a) it isn’t reproduced unless it is done exactly as described in the reference or (b) some slight changes are OK (the argument here is that if the observations are robust, then some deviations should be tolerable)

    My recent experience is telling me that every last detail does matter at times.

    Even then, how can certain things be described in an experimental section? Some things are extremely difficult and only the most elite of the field have the necessary skills to reproduce it (there was an article I can’t find now drawing an analogy to professional golfers being able to do what an average weekend warrior cannot). Then there are more ordinary lab skills that are required but may take considerable time to develop the right touch, and some people have an easier time with that than others (a simple example: I don’t like titrations with colorimetric indicators – it was just a fight to try to get the right color).

    1. Hap says:

      In chemistry, though, if you publish it (particularly as a method) people of average ability are expecting to be able to do it (and you’re supposed to describe exactly how to do so). There’s not an expectation like in golf or other sports that only the best should be able to use a method (and there’s not an objective way to sort people on bench ability in chemistry). Something may take lots of practice but that’s different from something that takes supernatural ability to perform, which will be useful to only a few (and if the few are in one group or place, is highly susceptible to fraud or delusion). If no one else can use the method, it is not useful, and if a datum can’t be reproduced, it can’t be taken as a datum by others.

      1. the seeker says:

        You’re right and I hope I didn’t come off as some kind of apologist for irreproducibility, because it is a frustrating waste of time and money. Indeed, I was venting because sometimes these things do seem “supernatural,” and I do not like feeling like my results are no longer bound by causality!

        As I mentioned, I’ve been dealing with irreproducibility for several months, and it seems maybe it’s due to some condition that I did not think was important. The references I have do not agree on the condition in question, nor did any point out its importance. So maybe they did not know either and made a “lucky” choice on this particular condition. I went about things wrong – my thought was, if the phenomenon is robust across these different choices of condition, then I need not concern myself with that particular condition. Where I went wrong for some time was not trying to reproduce any of their choices for that condition. So the phenomenon might be more robust in their choice “space” for that condition and far less so in my choice of “space” for that condition.

        So what I guess I’m getting at is: how much do unknown factors or interactions factor into reproducibility? What about cases like mine where the phenomenon is robust in some cases, but not all, and the lack of agreement between various authors on some condition gives the appearance that the condition does not matter, but it does?

        1. Hap says:

          I wasn’t assuming you were tolerant of bad stuff. I just figured that there should be enough explicit in the published article data to reproduce the results, and if there isn’t, there’s something wrong. Sometimes the experimental parameters haven’t been defined well enough to reproduce the phenomenon (Pd-free couplings, Fe-mediated coupling), or the authors haven’t fully explored how tolerant the method is to changes. I would imagine it depends how important or original the method is on whether it’s published without this information, but it’s important for any method (unless the method isn’t useful at all, but then…)

          I don’t know how much tolerance I should have for lack of robustness; if the method is fully known, then there’s no excuse for not having enough detail to reproduce it, though.

          1. Some idiot says:

            The other factor is what I call the “just got lucky” syndrome, where there is a factor that the experimenter was unaware of, which was critical. I would guess that most of us have knowledge of such cases (the Woodward case may be one of those, although I am not familiar enough with the case to judge). In this sort of case, repeating the experiment in another lab is almost the only way of catching it.

            The best example I have witnessed was a PhD student I knew who was doing total synthesis. About a year or so after he nailed one of the early steps (really good yield; repeatable) he had to go back to bring a whole lot more material through. When he got to this step it stopped. He could not get it to work. After many, many months of extremely meticulous experiments (not to mention sanity-searching…) he found out that in his previous experiments he had dried the solvent with old (dusty) mol sieves, but in the newer experiments he had some shiny new mol sieves. When he added some finely-ground mol sieves to the reaction, it worked perfectly…

          2. NJBiologist says:

            “The other factor is what I call the “just got lucky” syndrome…”


            Kohler and Milstein, the people who worked out hybridoma formation for monoclonal antibody production, published on the basis of six consecutive successful independent preparations. They then left the project alone. A year later, they went back–and their next dozen preps all failed.

            It’s hard to avoid the thought that we could have lost an incredibly valuable technique, at least for a while, if some things had played out even a little differently.

    2. Harrison says:

      I think what you are talking about here is the distinction between replication (100% duplication of original experiment) and reproducibility (ability to see general phenomenon with necessary local changes). This is particularly relevant in biology where you might be able to observe something only in replication under strict conditions, but not in a wider setting (and particularly where some reagents/conditions/etc just cannot be duplicated). The failure to reproduce as result is said to be lacking biological relevance.

      1. Li Zhi says:

        It seems to me that we’re not all on the same page. Without being able to create an exact replicate of the Universe as it was at the time the data was collected, (we might agree to call this “going back in time”…) NO experiment can EVER be exactly reproduced. Personally, I see no difference between reproduced and replicated…the issue (raised above) seems to be a question of whether the data or the principle(s) are the items to be studied. I disagree that it is or should be anything other than the data which should be in question, conclusions and principles come and go. Of course, we can not expect exact data point by data point replication, either… So, if we agree it is the authors’ responsibility to publish (supplementary) sufficient information to provide one “skilled in the art” to reproduce an equivalent data set, then we should, I think, agree that all reported data requires inclusion of explicit error bars. And we find ourselves in the dank dark jungle of statistics…

        1. Some idiot says:

          Hmmm… Not sure I agree with you here, although I feel that it would be probably straightforward to find common ground.

          I think you take the “100% reproducible” a bit too literally (even this is what he says). To me, reproduction is getting pretty well much the same result, whereas replication is more like robustness. This is a point I see pretty often as a process chemist. Easy enough to repeat the experiment in the lab next door, but if problems crop up on another site, it probably means that there are a whole lot of factors that you don’t know enough about. Personally, to me robustness is the key, and robustness needs knowledge and understanding. That is the fun (and rewarding) bit… 🙂

  5. Chrispy says:

    It seems to me that there is a certain amount of widely accepted fraud. Organic chemists are familiar with the “Corey 1-2 inversion” where a 29% yield becomes a 92% yield. Yields are a disaster in organic chemistry, and everyone knows it. “Representative images” in microscopy are the best damn image you got, and everyone knows that, too. And the animal survival data in a totally gamed xenograft system? Yup: garbage.

    1. I caught a fish THIIIIIS BIIIIIG says:

      It’s not always outright fraud: sometimes it is wishful thinking coupled with a shady result. One of my coworkers in graduate school was working on a methods project and could not get above 45% for one of the substrates. Then, one day, they miraculously got 85% on the reaction. The raw data supported the yield (though we all know the ‘analytical’ balances can have a lot of drift, especially on the sub-mmol scale) and was published as-is. The reaction always worked, though I have my doubts about the validity of the reported number…

      1. eugene says:

        And that’s why if you’re a reviewer for a methods paper, getting picky about the yield is one of the most useless things you can do. And if the paper tries to sell you on the yield only, you can ignore that point. The type of reaction is more important due to these known yield-gaming issues. As long as it works even at 30%, it’ll be useful to somebody, but if it doesn’t go at 99% like you claimed, you’ll be pissing off a lot of process chemists potentially, who will be out for blood/editor contacting.

  6. Druid says:

    Some of my pet hates in publications:
    1 Teaser abstracts. “A method/results are presented …” NO – TELL ME THE ANSWER IN SUMMARY FORM! Don’t force me to buy the whole paper for a mean and sd.
    2 Having bought the whole paper, I find one small but vitally important piece of methodology has been conveniently left out, so I can’t repeat it. Or, almost as bad, it is in a reference, and when I have bought that, I find it is in a further reference, where it is still incomplete. AAaaaargh – I am stuck in a hall of mirrors!
    3 The marketing department need a peer reviewed paper to quote, and while you are at it, just sneak in some negative opinion about the competitor’s product – they can quote that too!
    There was a very good article on scientific publishing in the Guardian 27 June, explaining how we got into this expensive mess.

  7. old timer says:

    you really hit a sore point: the uselessness of abstracts and the hall of mirrors effect. It is no wonder that scihub is so popular.

    1. Publication Business says:

      … and this is the core of the problem:
      A massive publication industry that profits the most. This could be changed with some easy steps: Funding agencies should only consider publications in journals that adhere to these standards of conduct:
      1. Name the reviewers
      2. Publish the full content of the review
      3. Allow people to comment on the publication in a manner like we do to on this blog

  8. “I’m very willing to put money down on the proposition that the first Fellows of the Royal Society grumbled to each other in private about that latest correspondence from so-and-so, which had those things in it that no one else seems to be able to get to work.”

    Oh, and I’m sure they published plenty of purely anecdotal results themselves. Statistics as a science was non-existent back then.

  9. Dragon says:

    you can’t expect the risk of failure (or at least not understanding something completely) to be the same between the SpaceXs of the world and the local rocketry club. Same is true for comparing “reproducibility rates” (whatever that means) between studies in your CNS stratum and the slog journals (and even the solid middle JOCs). Things blow up on the launch pad sometimes when you are aiming for the stars…doesn’t mean they are frauds or sloppy.

  10. polymath says:

    I’m surprised that commenters so far haven’t discussed the abuse of statistical methods to create support for hypotheses from complete noise. This remains quite common in biology and social science; Andrew Gelman’s blog has many examples but he doesn’t bother much with examples any more, they’re so widespread.

    Perhaps Chemistry/MedChem has less of this, and Physics certainly has much less. But it’s prominent in areas close to MedChem.

    The problem is not fraud. Fraud happens but rarely. The problem is investigators not understanding how statistical tests work, and the assumptions that must hold for a test result to be relied upon. “If p is less than [0.05, 0.005, 0.0005] then the hypothesis is supported, otherwise the datasets are indistinguishable,” seems to be the most common sentiment, and it’s wrong wrong wrong. Bonus: if a replication attempt doesn’t work, you just find a new “moderator” condition that restores your p-value threshold and call it a “successful conceptual replication.”

    Authors don’t understand this [in some fields]. Reviewers don’t understand this. Editors don’t understand this. Without anything to push against this effect, it runs rampant, and so do useless results.

    If your field/journal has not succumbed to this problem, count your lucky stars.

    1. eugene says:

      We don’t have this problem in organic/ogranometallics. We just make stuff and give the spectra to prove that we made it. One of the biggest reasons I went into the field is that I drank too much so by third year I couldn’t really focus on all that statistics stuff. And I always wanted to do a trade, like be a mechanic who can fix up an engine, so chemistry looked the most similar.

      Of course, you do have stuff like kinetics and trying to figure out mechanisms from time to time, but then you just focus for a bit, and can relax with a nice synthesis afterwards.

    2. eyesoars says:

      Is it fraud if the statistics are knowingly skewed?

      For instance, in a psychological study in which I was a subject, quite a large number of results were presented between different groups of subjects, along with p-values. However, there were some stuffing of the groups: one of the most obviously egregious was a subject group of size 2 (!)… and they were brothers, which of course was not mentioned in any publication. (There goes your ‘iid’ [independent, identically distributed] presumption.)

      This sort of thing was rampant…

      1. Pennpenn says:

        It’s fraud if it’s deliberately done with intent (typically to deceive). Otherwise, it’s an error. It’s like the difference between a lie and a mistake. Since your question includes the term “knowingly skewed” then that indicates a deliberate act, therefore yes, it’s fraud.

  11. dave w says:

    Hmm… wonder if “attempt to reproduce various published procedures” would be a good sort of assignment for students (perhaps at the advanced undergrad or early-stage grad school level)… it would expose them to the issue early (quite likely a good thing!) as well as increasing the chance that published results might actually see reproduction attempts (instead of just sitting in the literature).

    1. John says:

      Now I know how to design our advanced organic teaching lab…good idea.

    2. Anoni says:

      Out of 30 undergraduates doing a Fischer esterification of benzoic acid, only 25 will succeed. I don’t think sticking undergraduates on the problem is the best idea.

  12. jb says:

    How much of biology though is and has to be done with plastic?

    It really is the elephant in the room no one wants to address because we literally HAVE to do science with plastic that can interfere with results. How many studies are done with the SAME exact supplier with the SAME exact lots of pipette tips, culture flasks, and centrifuge tubes? Biology is much more complex and harder than synthesis. The number of variables for a simple experiment is orders of magnitude larger.

    1. barry says:

      Biologists are raised in a culture of controls (positive and negative) that’s foreign to most synthetic chemists. How many synthetic chemists set up the reaction of a known substrate next to their attempted reaction of e.g. LAH with a novel substrate? How many also set up a third flask with everything but the LAH? And a fourth without the stir-bar?
      Sensitivity to the lot# of a reagent (as when Kishi et al. found the nickel contaminant in their chromium reagent was critical to oxidative addition to vinyl halides) only occasionally enters a synthetic chemist’s mind.

    2. Mark Thorson says:

      It’s not just plastic. You wouldn’t believe the havoc caused in the disk drive industry by silicone mold release agents. Disk drives are very sensitive (in a bad way) to silicones. Syringes used to dispense adhesives used in disk drives have caused early failures due to silicone mold release agents used to make the syringes. Even putative silicone-free syringes can be contaminated by silicones from adjacent production lines. This is like the parasite that lives on the back of the flea that lives on the back of the dog, but it is very very real.

  13. dragon says:

    sounds like that approach will really push the field @eugene. sorry that biology is more complex, but that’s why it’s not disappearing like the other trades.

  14. Lanning says:

    Every human endeavour except science — peer colaboration and comradery

    Science– peer hatred, gossip, and harrasment

    Ergo, science failed and will fail for as long as culture continues.

    1. Isidore says:

      “Every human endeavour except science — peer colaboration and comradery.”

      Indeed! Like the US Congress, for instance.

  15. Komm says:

    Your comment about people not making fraudulent $10 bills reminds me of something somewhat amusing. The United States longest running forger. He was a rather nice, elderly fellow, but a bit squeezed by living in New York. So he would crank out $1 bills in his tiny kitchen. It took the secret service a decade to find this guy, because he never used his bills in the same place twice, and no one questions a single.

  16. Neo says:

    Spot on Derek. Being the first is all that matters. Publish it fast, even if it contains errors, you and your powerful friends will find a excuse later (at that time, no one knew that that factor was important, etc). This is the truth of academic research. In 2012, Amgen researchers made headlines when they declared that they had been unable to reproduce the findings in 47 of 53 ‘landmark’ cancer papers ( Does anyone really think that this happens by chance? I’m willing to bet that the careers of these authors were never affected by publishing fancy garbage.

  17. luysii says:

    The following has received a few hits since this post went up —

    I’d actually forgotten I’d written it.

    Reproducibility isn’t all academic, and the failure to reproduce a widely cited study on the treatment of acute stroke probably subjected patients to needless risk until another study was published 9 years later. Interestingly, the authors of the second paper did not recognize that they had failed to reproduce the results of the first study, but plugged the therapy even more.

    It’s why the best patient protection is a thinking doc.

  18. putmebackinfrontofthehood says:

    I remember watching a “researcher” repeatedly throw out all the product from a reaction that had oiled out, they thought the prep wasn’t “reproducible”.

    I saw another “senior scientist” try to TLC a reaction mixture with all the product crashed out in the bottom of the test tube he pipetted sample into. The thought of putting some methanol in there to get everything into solution before he took the TLC never dawned on him.

    All people the companies were paying near six figure salaries for, but hey, they were well liked by everyone, especially HR.

    Is chemistry really dead?? or were a lot of people doing the work, brain dead.

  19. Dave says:

    There were certainly problems of reproducibility of Newton’s experiments with prisms.

  20. Some chemist says:

    Its my impression that organic synthesis is a field with generally good reproducibility compared to the biological sciences for instance (probably due to differences in complexity of the experiments). That being said I think a lot of the academic literature is hard to reproduce, especially yields.
    But this is in part the fault of the publishers. ACS standards for experimentals and purity of compounds doesnt exactly support reproducibility. Sentences like “…extracted three times with DCM…” and “…purified by flash column chromatography.” can basically not be reproduced since the information is too sparce. How many litres (or better yet kg) of DCM was used for the extractions? How many kg of silica was used? How large were the fractions?
    These things are very easy to report in the labbook, so why they shouldnt be noted in experimentals is a mystery to me. Additionally I think HPLC purity should complement NMR spectra.

    There are numerous other things that could be noted, but I think it would be appropriate for the publishers to go first in raising the bar.

    1. putmebackinfrontofthehood says:

      As long as you’re paying the idiot in the corner office way more than me AND 100 times more stock options AND let him lay me off as soon as I discover something, I’m not going to tell you EXACTLY how I made ANYTHING!!!!

      If you need me to remake something for you, then call me back from the unemployment line and I will give you a nice bottle of the pure white powder you need.

    2. Another chemist says:

      Did you report your rotovap conditions for your papers? Removal of solvent is also a purification step. Including that level of details is a waste of time in most cases. People can easily figure out workup and column conditions from a single tlc. If a tlc or an hplc injection is too much work, then I don’t know what to tell you.

      1. Some chemist says:

        I agree that an organic chemist should be able to fish out all the compound from the mixture regardless of the amount of details provided on the chromatography. But rotovap conditions are somewhat important as well. At least if the compound is heat sensitive.
        A perhaps more relevant example is temperature control (or lack therof). When you read experimentals like “…n-BuLi was added at -78 degrees.” its probably not true in most cases. Just dumping the reactor into a dry ice/acetone bath does not mean the reaction mixture temperature is -78 degrees. It often takes quite long for it to reach this temperature under those conditions. Furthermore its difficult to maintain this temperature when doing very exothermic reactions (like deprotonation with BuLi etc). I was surprised the first time I actually had a thermometer in the reaction mixture when using BuLi.

  21. putmebackinfrontofthehood says:

    When you have a need to EXACTLY now how to reproduce something, there is a guy you can give a job to called the process development chemist…..

    1. Some idiot says:

      I hear you, brother!!! 🙂

  22. Mark Thorson says:

    Maybe the solution is for someone to offer a prize, say a million dollars or ten $100,000 prizes, for the best falsification of a published article — where “best” factors in both the weight of the journal and the significance of the refutation. Showing the yield of reaction X was only 80% and not 90% would rank fairly low, but showing a reported result was due to fraud or a contaminated reagent would rank high. Just knowing that a dubious experiment would put a target on your back might change people’s behavior.

    1. SHunter says:

      What would be the prize for raising concerns about a whole field of biomedical research due to poor understanding of antibody cross reactivities? … oh … funding difficulties anyone ….

  23. 10 Fingers says:

    I’m wondering if others have similar stories of “reagent quality” issues with certain reactions along these lines:
    I used to have to run a lot of Cristol-Firth modified Hundiecker reactions (R-carboxylate to R-bromide transformation). They were quite sensitive to the quality (aka: age) of the bromine in a not particularly obvious way, but the yields would drop sharply with consumption of starting material (depending, all the way to ~zero). Enough of us had the same experience (despite several “protective” attempts around our bromine) that we just routinely shunted our bottles off to less sensitive reactions after a point (for which the bromine was just fine). I suspect that the only allusion to this ever made in a written prep was to use the word “fresh” in connection with the bromine.

    To anyone who experienced this phenomenon at its worst (especially, as I did, when running the reaction for the first time) it would seem like bogus chemistry. Sorted out, the same reaction on the same starting material gave a 70% yield.

    1. Mister B. says:

      Same story. With potassium tert-butoxide (t-BuOK).
      I followed an early 90’s paper, involving this reagent… With a brand new bottle from a well-known supplier I had poor yields…
      Recristillization of this reagent gave me excellent yields, +/- 5% of those reported.

      If even commercially available reagents quality have been down over years, what can we do ?
      Re-purifiy everything ?

  24. Isidore says:

    Many years ago I was told by a well known American peptide chemist of a Russian article in a Soviet era journal that reported some novel protecting group; apparently the Russians had been doing a lot of interesting work in peptide synthesis, but little of it was being published in Western journals, but some articles were translated, albeit with many months delay. This individual and a number of his US colleagues could not reproduce the Russians’ work, specifically they could not get off the protecting group under the conditions specified in the article and contacting the authors in the Soviet Union (pre-email days) proved difficult. It turned out that the deprotection step specified anhydrous Tfa, and the Tfa available in the US was truly anhydrous but the material used in Russian labs had a few percent water, without which the deprotection step did not work.

  25. Young Padawan says:

    The natural selection of bad science

  26. Ed says:

    Having been part of a team that developed assays and transferred them halfway across the world for use in high throughput assays I can say that it would be wise to not immediately jump to the conclusion that lack of reproducibility is fraudulent. Seemingly trivial things like the source/purity of reagents (incl water) and setting pH values properly can be quite important.
    From the various responses here I get the impression there is a tendency to solve challenges in isolation, in other words how many of you actually picked up the phone and contacted the lab, and either tried the reaction in their lab or had someone from that lab set up the reaction in their own lab.

  27. anonymous says:

    Take a look at the spectra in the SI of this one and tell me if the ‘exclusive selectivity’ is reproducible, or even some of the products at all. And you can probably forget about the yields. It’s like the reviewers are starting to take the day off whenever a journal’s impact factor moves above 10. Sheesh.

Comments are closed.