There was a mention yesterday in the comments section about a clinical trial that was stopped early due to efficacy. I’ve never been involved with a project that this has happened to myself – pretty much the opposite, for the most part! – but it does happen, and is generally cause for celebration.
Although not always. I wanted to link to this blog post by Hilda Bastian that has some examples of times when halting early wasn’t a good idea. Why wouldn’t it be? Well, you can get fooled. Even with a reasonably designed trial, you can get fooled, because statistics are what they are. The post has an excellent quote from an article reviewing an AML trial that was almost stopped early, but looked so odd (oddly good) that it was continued a few months longer. Whereupon the benefit evaporated:
Quite extreme chance effects can and do happen more often than many clinicians appreciate. At any one time, there are hundreds, if not thousands, of trials ongoing, often with analyses at several time points and with a number of subgroup analyses. Thus it is inevitable, with all these multiple comparisons being undertaken, that highly significant results (p<0.002) will sometimes occur by chance and that conventionally significant (p<0.05), but spurious, differences will occur frequently. Taken in isolation, these may well appear so striking to investigators that it will be difficult to believe that these are chance findings. No trial is immune from such random effects, no matter how well designed and conducted.
Absolutely right. Weird readings of this sort are not something that can be designed out; they’re an inevitable feature of running a lot of trials. The equivalent on a smaller scale is rooting through the subgroups of a single large trial and finding that by gosh, this group right over here responded just great: let’s run another trial! We found the people the drug works on! Well, maybe. But if you analyze enough subgroups, the chances of finding a spurious correlation are quite good (particularly since these subgroups all have smaller sample sizes). If you’re not careful, you’ll find yourself realizing that the drug seems to perform best on Libras who own Toyotas, which is not too useful.
Here’s a good example to think about:
To start: a tale of 2 trials of a drug for secondary progressive multiple sclerosis (SPMS) (interferon beta-1b). One started in Europe in 1994; the other got underway in the US and Canada in 1995. The European trial stopped 2 years early after interim results “gave clear evidence of efficacy. Treatment with interferon beta-1b delays sustained neurological deterioration” – the first treatment found to do that for SPMS.
So what then for the North American trial, still in its early stages? The knowledge base providing the ethical justification for their trial had shifted – and they had hundreds of people on placebos. The trial had a data monitoring committee (DMC). The DMC has the role of protecting participants against harm and making judgments about the data during a trial. (A DMC is also called a data and safety monitoring board (DSMB) or data monitoring and ethics committee (DMEC).)
The DMC looked at their data, and decided to keep going. They stopped early, too, in November 1999 – not because of benefit. Unfortunately, there was no benefit on delaying disability. They stopped early for futility – the belief that the outcome wasn’t going to change if they continued. (If you want to brush up on the basics of stopping trials early, I’ve written a primer over at Statistically Funny.)
Where did that leave people with SPMS? Despite 2 trials, the picture was murky. It took another big trial that didn’t stop early to be sure. According to a systematic review in 2011, the evidence that interferon beta-1b doesn’t work for SPMS is “conclusive” (PDF). (The drug is not approved for the indication of SPMS by the FDA.)
In this case, it’s thought that the first European trial that looked so promising might have admitted too many patients who had not actually progressed to secondary progressive MS. And that is, of course, another way that you can be led down the wrong path. Patient selection and enrollment is a major, major issue. You can produce answers all over the map if you get it wrong. But sometimes you find that getting it right means that you’re either not going to be able to enroll enough patients (or you’re enrolling them too slowly) to meet any kind of reasonable development timeline. An opposite mistake can be made, too: you can greenhouse your patient selection so carefully as to give you a fine trial readout that has little to do with what will happen in the broader patient population your drug will see after approval.
Of course, you can stop early for futility, too, and the same considerations apply – although I will say that since the overall clinical failure rate is 90%, that the odds are better that rosy results will deteriorate on closer inspection rather than a bad result will improve. But it’s not impossible. Run enough trials, and handle enough data, and nothing is impossible. Bastian’s post has other examples, and I highly recommend it. Clinical trial design and interpretation is the most crucial part of this whole business, and it deserves plenty of thought and plenty of respect.