Here’s a report on the preclinical development of Sutent (sunitinib). The authors did a retrospective analysis of all the published studies, and found a number of problems. It should be noted that these are definitely not limited to just this drug – in fact, the authors are explicitly extending their conclusion to a lot of preclinical oncology research:
This systematic review and meta-analysis revealed that many common practices, like randomization, were rarely implemented. None of the published studies used ‘blinding’, whereby information about which animals are receiving the drug and which animals are receiving the control is kept from the experimenter, until after the test; this technique can help prevent any expectations or personal preferences from biasing the results. Furthermore, most tumors were tested in only one model system, namely, mice that had been injected with specific human cancer cells. This makes it difficult to rule out that any anti-cancer activity was in fact unique to that single model.
. . .evidence (suggests) that the anti-cancer effects of sunitinib might have been overestimated by as much as 45% because those studies that found no or little anti-cancer effect were simply not published. Though it is known that the anti-cancer activity of the drug increases with the dose given in both human beings and animals, an evaluation of the effects of all the published studies combined did not detect such a dose-dependent response.
The poor design and reporting issues identified provide further grounds for concern about the value of many preclinical experiments in cancer. . .
To be sure, animal models in oncology are already known to be of limited use. You’d think that being able to shrink tumors in a mouse model would be pretty predictive of the ability to do the same in a human, especially if these are human-derived tumor lines, but that isn’t quite the case. To use the framework laid out in this paper, that’s a problem with construct validity, a general disconnect between the clinic and the model. (More on this below). The two other main problems are internal validity (bias or improperly accounted-for variation in the experiments themselves), and external validity (when other factors might be driving the readout in one particular model).
The authors find problems in all three areas. For external validity, going into different species and different sorts of tumors would help, and there was a general trend towards small effect sizes as the models became more diverse. And with internal validity, there were too many studies that didn’t address dose-response, and (as mentioned in that quote above) randomization and blinding were a lot less common than they should have been. It should be noted, though, that when these practices did show up, they didn’t seem to have all that much influence on effect sizes, so they may (at least here) be less important.
Publication bias varied quite a bit. For renal cell carcinoma, things actually looked pretty good, with thorough coverage relative to the effect size. (And that, clinically, is one of the drug’s major indications). Looking at the published studies, though, you’d think that it was going to be of much more use for (say) small-cell lung cancer, where clinically it turned out to have little impact. Overall, as the authors say, “That all malignancy types tested showed statistically significant anti-cancer activity strains credulity“. And while that’s true, I wouldn’t rule out the general crappiness of the prelinical models pitching in on this as well: construct validity is a much larger problem in this field than outside observers tend to believe it is.
The authors aren’t having this, though:
One explanation for our findings is that human xenograft models, which dominated our meta-analytic sample, have little predictive value, at least in the context of receptor tyrosine kinase inhibitors. This is a possibility that contradicts other reports (Kerbel, 2003; Voskoglou-Nomikos et al., 2003). We disfavor this explanation in light of the suggestion of publication bias; also, xenografts should show a dose–response regardless of whether they are useful clinical models.
Should they? You’d hope so, but why does that have to be the case? And when you look at those two references they cite, you find this in the Kerbel paper:
It is not uncommon for new anti-cancer drugs or therapies to show highly effective, and sometimes even spectacular anti-cancer treatment results using transplantable tumors in mice. . .Unfortunately, such preclinical results are often followed by failure of the drug/therapy in clinical trials, or, if the drug is successful, it usually has only modest efficacy results, by comparison.
The Voskoglou-Nomikos one found that xenograft models were basically worthless for breast and colon cancer, but were predictive in non-small-cell lung and ovarian cancer as long as sufficiently broad panels of different cell lines were used. If you picked fewer cell lines and jumped the wrong way (and there’s no good way to know a priori which way that is), you were out of luck there, too. So I don’t see how these two reports so blatantly contradict the idea that xenografts – particularly xenografts during the period of sunitinib’s development – have (had) little predictive value. You could just as easily cite those two against this new paper’s claim here. And although they don’t cite it in this context, this paper’s own reference 28, (this paper, from this same era) has this to say about xenografts:
For 39 agents with both xenograft data and Phase II clinical trials results available, in vivo activity in a particular histology in a tumour model did not closely correlate with activity in the same human cancer histology, casting doubt on the correspondence of the pre-clinical models to clinical results.
So actually, I think the weight of the evidence is that the preclinical data for sunitinib look bad because the state of preclinical research in cancer, especially back at the time, was bad. Construct validity was the single biggest problem that this meta-analysis seems to have identified, and I really think you can chalk that up to “damn xenografts, again”. Internal validity should have been better, but (as noted above) didn’t seem to influence the results all that much, and external validity is wrapped up in the xenograft problem as well. Finally, publication bias certainly seems to have been present, but (in the indication where sunitinib was eventually approved, renal cell carcinoma), things looked far more representative, as compared to indications that were abandoned.
If this paper wants to come to the conclusion that preclinical oncology needs improvement, I absolutely agree. But I can’t endorse (or not yet) the idea that sunitinib’s particular case was hurt by anything more than those same limits.
Update: more on this from Retraction Watch.