Skip to Content

Attack of the Research Parasites

The New England Journal of Medicine is taking a lot of shots for this editorial on the sharing of data. After starting out by talking about how the idea of sharing clinical data is an appealing one, you have this:

However, many of us who have actually conducted clinical research, managed clinical studies and data collection and analysis, and curated data sets have concerns about the details. The first concern is that someone not involved in the generation and collection of the data may not understand the choices made in defining the parameters. Special problems arise if data are to be combined from independent studies and considered comparable. How heterogeneous were the study populations? Were the eligibility criteria the same? Can it be assumed that the differences in study populations, data collection and analysis, and treatments, both protocol-specified and unspecified, can be ignored?

A second concern held by some is that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”

It’s that second paragraph that really sets people off, and it is unfortunately worded. Science actually advances on this sort of thing – calling people who use or build on previous data sets “research parasites” is actually fairly silly. And how dare anyone try to disprove what some other group has claimed! No, this is just weird. And that’s too bad, because that weirdness has totally overshadowed the paragraph before it, which is actually pretty sensible.

I don’t worry about parasitic scientists rooting through the piles of clinical data – I worry about people who are trying to prove that eating GMO food synergizes with your astrological sign in order to misalign your chakras. I worry about folks selling snake-oil supplements who will spend five minutes sifting through some database in order to say “Clinical studies prove. . .!” And I worry about people who will decide that every single drug that’s ever been through the clinic is obviously a toxic plot concocted by poisoners because, I mean, just look at these numbers. The NEJM is worried about misappropriation of data by rival research groups; I’m worried about the far, far greater number of headline-grabbing idiots. You’re going to wish for some good ol’ research parasites after these people get through flinging dirt everywhere.

Clinical data really are hard to parse and to interpret. Just look at the delays between the end of a clinical trial and the reporting of the data. From the outside, you’d think that a late night after the last patient reports would be enough to get everything lined up, but you’d be wrong. It takes weeks, maybe months, to be sure that you’ve handled the numbers appropriately and that you haven’t missed anything important (good or bad). I just mentioned the other day how hard it is to design decent experiments if you haven’t had much practice at it – designing a good clinical trial can be one of the hardest tests there is, and working up the data properly is an even less widely distributed skill.

So I think it’s perfectly reasonable to be concerned about how people will handle that job, but worrying about “research parasites” is not something that going to cause me much anxiety. We stand on each other’s shoulders in this business; that’s how science works. It’s not the scientists that worry me here.

Addendum: here are some people who aren’t afraid of research parasites, either!

40 comments on “Attack of the Research Parasites”

  1. cirby

    Your worries about the snake-oil salesmen are a bit off – those guys already just make up “studies” from whole cloth, or run flawed experiments from the get-go. Why bother pirating data from real research when you can custom-tailor it and get a grant at the same time? If you’re using publicly available data, someone can fact-check you without a lot of effort.

    On the other hand, there are a fair number of “scientists” from respectable institutions who output poor-quality research, or fudge data to get grant money. They’d REALLY like to make it so everyone else can’t look over their shoulder.

    Hiding the raw data just makes it easier for both of these kinds of people.

    1. Charlie Kilian in reply to cirby

      I can’t speak to the second problem, but as to the first, I think you have it backwards. Why would you run a fake trial when data is already available that could “prove” your point to the rubes? The higher prestige journal the resulting paper was published in, the better for your snake oil claim!

      There may be good reasons yet for publicly sharing data, but one of the cons would almost certainly be muddier waters for the lay public.

      1. cirby in reply to Charlie Kilian

        You run a fake trial to get funding.

        You won’t get funding from other people’s data.

        …and you can always, ALWAYS make up fake reports, whether you have “real” data to base it on.

        You think the people selling homeopathic medicine and copper bracelets bother with real trials, or even the appearance of real trials? Hell, that recent “GMOs cause tumors” kerfuffle turned out to have been faked from the get-go – with Italian government funding.

      2. GaryM in reply to Charlie Kilian

        “Your worries about the snake-oil salesmen are a bit off – those guys already just make up “studies” from whole cloth, or run flawed experiments from the get-go. Why bother pirating data from real research when you can custom-tailor it and get a grant at the same time? ”

        This scenario occurs often in spirit science and is being “sold” to the public in books and conferences. Some recommend forgoing chemotherapy treatments for vegan diets and meditation as treatment plan alternatives based on placebo data in studies.

        There needs to be a better means to approve data usage differentiating from reproducing experimental results, validating methods, and for meta analysis from where data is applied as a vehicle to possibly defraud and injure the public.

  2. Immunoldoc

    I consider these sorts of comments particularly ironic given that much of the data is paid for by the public. I agree about quality concerns-that’s what reviews are for (in theory), but frankly if you do work on the public dime my expectation is that whatever data is generated should be come public domain within a reasonable period of time after completion of the study (one year, two years?). That gives the investigators plenty of time to publish what they consider to be the important findings and the institutions to file IP – and then open it up for others to analyze. If you don’t like it then find private money…

    1. anon in reply to Immunoldoc

      Immunoldoc, you said, “much of the data is paid for by the public”. If you are referring to clinical trial data that Derek’s post is about, you are way off the mark. The majority, by far, of clinical trials are paid for by pharmaceutical companies, not by public funds.

      1. Natural in reply to anon

        And who pays the pharmaceutical companies?

        1. Joshua Cranmer in reply to Natural

          Insurance companies?

        2. anon in reply to Natural

          “And who pays the pharmaceutical companies?”

          It is private individuals who find their products beneficial enough to buy them (directly, or indirectly through health insurance as noted above).

      2. immunoldoc in reply to anon

        Hey Anon,

        Was not referring to the trial referenced in Derk’s post. Certainly most trials are conducted by industry. I was merely commenting on the disturbing concept (to me) of research parasites as it relates to publicly funded work. Sorry if I wasn’t clear enough. Public funding=public data (can accept a reasonable exclusivity period).

  3. More data

    A question regarding accessibility of clinical trial data:

    Should one be able to find structure of the drug in a trial, and obtain preclinical data associated with it? Or would this compromise the company’s position on IP grounds? You would think they already have patents by the time the compound or formulation is in the clinic.

    Of course for the majority of compounds cursory google image search will reveal the structure. But I’m having a trouble with biologics. To be specific, I’ve been trying to find what’s in NN1953, an oral diabetes mix from NovoNordisk.

    1. me in reply to More data

      That data would be included in the Investigator Brochure, which would be provided to the clinical staff – but not to the public.

      Some of it may be in the patent. The stuff that is mandatory would all be held on file at the Health Authority though

      1. More data in reply to me

        Thank you. What is the reason to hide it from the public? Obviously that would include competitors, but what would they do with this data? Rush ahead with a trial of their own? Not like you can accelerate clinical end points. I’d argue that if anything that would help avoid the unnecessary “replicate” trials.

        1. me in reply to More data

          I’m not sure ‘hide’ is the correct term. It implies they are with holding data deliberately. What would they gain by making everything publically available? And what exactly would they make available? The lab notebooks? The raw data from the spectrometer? Or would you expect publication-level output?

          And as for unnecessary trials, I doubt there would be anything in a preclinical package that would nullify the need for a trial – the whole purpose of a preclinical package is to determine the parameters of a potential clinical trial.

          As for NN1953: what exactly do you want to know? It’s insulin formulated to be orally available – there are ways and methods of making proteins orally available. none of them easy. Best to look at the drug delivery patents filed by Merrion Pharmaceuticals.

  4. Regarding the first paragraph, if we’re worried about people missing the minutiae of the dataset, shouldn’t we be demanding that trials be reported in more detail? If you already know that this information could be important, why not report it?

    The second paragraph’s tone reminds me of Diels’ and Alder’s 1928 Annalen paper:

    “We explicitly reserve for ourselves the application of the reaction discovered by us to the solution of such [synthetic] problems.”

    Can you imagine how stunted organic synthesis would be if anyone had respected this jejune posturing? I am simply baffled that scientifically-inclined people (some of influence, no less) seem to cling to this idea of results “belonging” to those who discovered them. Credit? Yes. Ownership? No.

  5. Another chemist

    Derek, I totally agree with you.

  6. Dieter

    James Watson? Using others work worked well for him. Even managed to be a misogynist and crucify the originator of much of the data used to form his conclusions.

    1. dearieme in reply to Dieter

      Did he really crucify Miss Frankland’s research student?

  7. Why are the clinical researchers not declaring all their assumptions and suppositions within the dataset? Shouldn’t this be mandatory for the FDA? If they did that, wouldn’t this problem be solved? All assumptions are either objectively declared or the basis behind a non-objective assumption is stated.

    As for the data itself, it may have been paid for by pharma or by taxes, but why can’t it all be made public, after a particular period of time? Especially if the FDA made decisions that could affect the public based on it. Heck, even the CIA opens up their files after a period of time.

    The data, actually belongs to some poor patient who may very well have died. She or he just provided you the permission to use it for your research, either because of their health or financial condition. Given a fair choice, I’m not sure they would have done that without sharing publicly, in the first case.

  8. Aaron

    From what I understand Walther Nernst originally formulated his heat theorem (aka the third law of thermodynamics) through analysis of previously published data. Was he a “research parasite”? Was he stealing from the research productivity planned by those original researchers? Certainly Theodore Richards may have felt that way, but history certainly didn’t.

  9. thy

    This article is upsetting. Especially the part where the doctors demand that anyone using their dataset should beg them for permission and then include them as co-authors (after assuring the doctors that the new paper won’t contradict anything they previously stated). They don’t seem to understand, that when you publish something, you have to publish all your raw data. That is the essence of reproducible research. If you don’t want others to use your work to create their own, then don’t publish it to start with. It doesn’t matter what the source of funding is. The source of funding isn’t relevant. These guys are really belligerent, I don’t know how full of yourself you need to be to actually write and publish a perspective like this. I don’t even work in the biomedical field, but its alarming that these guys are in a position of any authority.

  10. watcher

    Letting any Joe Blow with a big name and reputation, or those who are wanna-be’s reanalyze clinical data, particularly that taken across studies in a meta-analysis, can be disastrous, lead to incorrect conclusions, and even force companies to take drugs that have been approved off the market. Oh yeah, it has already happened, for example with Avandia. This huge error through meta-analysis by a big name academic and initial judgement by the FDA coming from a huge amount of press, public attention, and political pressure forced the drug to be removed from the market, giving patients one less treatment option, losing the company sponsor billions in sales, forcing hundreds of staff to be “made redundant”. Later when new data was presented along with reanalysis by FDA and other experts, the decision made because of poor meta-analysis and pressure external to FDA was rescinded, but by then it was too late for the drug to be resurrected.

    1. Question Everything in reply to watcher

      Maybe we can use this as a lesson in how we can improve the expected rigor of meta-analyses, not restrict the use of clinical data. It’s unfortunate that a probably efficacious drug was torpedoed by this incident but, ultimately, the failure was in the uncritical acceptance of the sloppy reanalysis, not the reanalysis itself.

      1. watcher in reply to Question Everything

        The reanalysis was not supported by many, including statisticians within FDA. However, FDA mainly succumbed to external pressure due to the big name and reputation of the person involved in the meta-analysis (he did not do the analysis himself, but became the spokesperson, and needs to be pointed out he had been a consultant and was paid on behalf of a similar drug). Yet, public opinion / concern and pressure from Congress became overwhelming, greatly preventing FDA from doing its job in as careful and dutiful as expected. Even so, there is no lesson from trying to make decisions by meta-analysis where data is taken across many different studies using various study designs and re-outs except that such analysis is not scientifically and statistically valid.

    2. Popper Korn in reply to watcher

      ” letting any Joe Blow with a big name and reputation, or those who are wanna-be’s reanalyze clinical data, particularly that taken across studies in a meta-analysis, can be disastrous, lead to incorrect conclusions,”

      This belief is exactly the problem.
      The idea of “Protecting” data from those people that are “too stupid” or “too evil” is the antithesis of Science. I also suspect it is also not what the consented patients would want.

  11. Two MD’s in an opinion piece – where’s the true researcher with a PhD?

    Regarding research parasites, this happens all the time in fields like Health Policy Management etc. somebody collects a dataset, publishes it and somebody else re-analyses the data and comes to a completely different opinion.

    The second link to the Montreal Institute’s announcement
    “Starting this year, any work done there will conform to the principles of the “open-
science” movement—all results and data will be made freely available at the time of publication, for example, and the institute will not pursue patents on any of its discoveries.”

    I think they are basically saying we will publish in open access journals or pay the fee to make the article accessible for free. That’s been around for sometime and various institutions force you do comply with it since years e.g. Max Planck Society.

    The only true difference is that they don’t pursue patents, and I’m afraid he hit the nail on the head with this statement. Universities love to sell innovation, creativity, entrepreneurship [add your ten favourite buzword] and that is easily recorded in numbers such as how many disclosures/patents/licenses per year or per department etc. But how many of these disclosures truly lead to a patent and how often is money generated with these?

    A nice statement about why some of us are in science to begin with:
    “It comes down to what is the reason for our existence? It’s to accelerate science, not to make money.”
    Unfortunately, with the current funding climate I’m afraid that will become more and more a dream of the next generations of scientists.

  12. Kaleberg

    Kepler was one of the great data parasites. He developed his laws of planetary motion based on Brahe’s Mars orbit data set. Amusingly when I searched to verify the spelling of Brahe’s name I came across a number of websites that assured me that Kepler not only stole Brahe’s data but murdered the man as well.
    Still, I can see a reason for giving the researcher who produces the data dibs on the low hanging research fruit. Maybe we need a decent interval rule sort of like the rule on spoilers too early in a film’s or television series’ run.

  13. Sok Puppette

    Back to the total charlatans in this late comment: The great thing about those guys is that they’re adaptable. If you give them data, they’ll abuse them. If you don’t give them data, they’ll say you’re covering something up. They always have a strategy available to win with their audience. Your best bet is to ignore them.

    1. In addition, the loons will come up with stuff on their own. For example, after the Fukushima accident, a large number of anti-nuclear nutjobs got Geiger counters and were using the resulting data to claim that “Fukushima has poisoned the entire Earth!”

  14. Anon

    If we couldn’t use each others’ data then Crick and Watson would never have gotten the structure of DNA. Mind you, it would have been polite just to ask Rosalind Franklin for her data directly, rather than misappropriating it via her supervisor.

  15. J Severs

    @Ayatollahoftheoutcomes: I am not sure how clinical researchers could “declar[e] all their assumptions and suppositions within the dataset”. Datasets capture what is in case report forms and also lab values. Assumptions would be more likely stated in the protocol and especially in the statistical analysis plan.

  16. steve

    So, by these criteria, Picasso was an art parasite because much of his art was influenced by Edvard Munch and Henri de Toulouse-Lautrec and later by African sculpture, Mozart was a music parasite because he was heavily influenced by Haydn, Einstein was a physics parasite because he stood on the shoulders of Newton and others, etc, etc. The Cochrane Collaboration is a network of parasites because they do meta-analyses of other people’s data. What a crock; NEJM should be ashamed of themselves.

  17. Cellbio

    Couple of perspectives to add:

    A trial by whom, and timing matters imo. If an academic doing CE or phase 4 studies on grant money, all data released at end of trial. If a publicly traded company, then it gets sticky. As much as I do not trust companies to do the right thing or to be timely, I trust less the market manipulators who could run pump and dump schemes or short selling rackets. On the other hand, companies are obligated to share material information so maybe the system works, albeit slowly and much less so for failed trials. Start-ups are another story.

    I work in investor funded start-up space, where value is tied to IP, both patents and know-how or insights about novel therapies, such as a target, NCE or population segment that are key to clinical success. Timing of release of information is key to retaining a competitive edge which is the engine for value creation. Without it, or the ability to protect what you have long enough to get to the finish line, no funding.

    However, once the story is to be told in a journal like NEJM, I think fuller data release is warranted. Group averages, or other summarized expressions of data are in most cases all that is required, but why not make available all the raw data (money for one thing). This assures that the goal of publishing is met, namely, to clearly articulate the value of a therapeutic agent and support those claims with rigorous data.

    As for the FDA, ALL data go into the final clinical study report. The data are analyzed by a pre-specified statistical plan for a specified endpoint. Further, all other data for the molecule in the IND generated are shared with the FDA in an annual IND report. These combined tell the whole story of a start-up, or for a big company, the whole program status. Sharing this publicly would give competitors a huge advantage by waiting to jump in after someone else solves first order challenges which would destroy risk taking and funding of new ventures. Imagine the ‘patent rip off’ strategy of drug development but on steroids, aided by outcomes, PK data etc.

    But again, once a sponsor is saying publicly, here is the proof, lay it out there.

  18. Pennpenn

    The most disturbing phrase in the quoted part of the article was “use the data to try to disprove what the original investigators had posited”. Guys, it’s science. That’s the point, people are supposed to try to disprove what you posit. If they can’t, good for you. If they can, that is important, and you should accept that your position has been disproved.

  19. I woke up this morning at 4 AM with the disturbing image in my mind of parasites attacking parasites in an endless loop cycle with no apparent victor and an infinite wash of losers. I came upon these thoughts from watching the 2016 election cycle and the various individuals involved who stand to gain from one victory or another. A quick Google search landed me here on this page. Apparently all gains are brief; just fleeting enough to let one shine in the limelight before getting brutally cut down by the arrows of wild misfortune. Science, it appears, is no different than any other sport.

Leave a Reply

Your email address will not be published. Required fields are marked *

Solve the math problem. *
Time limit is exhausted. Please reload CAPTCHA.