Yesterday’s post on yet another possible Alzheimer’s blood test illustrates, yet again, that understanding statistics is not a strength of most headline writers (or most headline readers). I’m no statistician myself, but I have a healthy mistrust of numbers, since I deal with the little rotters all day long in one form or another. Working in science will do that to you: every result, ideally, is greeted with the hearty welcoming phrase of “Hmm. I wonder if that’s real?”
A constant source for the medical headline folks is the constant flow of observational studies. Eating broccoli is associated with this. Chocolate is associated with that. Standing on your head is associated with something else. When you see these sorts of stories in the news, you can bet, quite safely, that you’re not looking at the result of a controlled trial – one cohort eating broccoli while hanging upside down from their ankles, another group eating it while being whipped around on a carousel, while the control group gets broccoli-shaped rice puffs or eats the real stuff while being duct-taped to the wall. No, it’s hard to get funding for that sort of thing, and it’s not so easy to round up subjects who will stay the course, either. Those news stories are generated from people who’ve combed through large piles of data, from other studies, looking for correlations.
And those correlations are, as far as anyone can tell, usually spurious. Have a look at the 2011 paper by Young and Karr to that effect (here’s a PDF). If you go back and look at the instances where observational effects in nutritional studies have been tested by randomized, controlled trials, the track record is not good. In fact, it’s so horrendous that the authors state baldly that “There is now enough evidence to say what many have long thought: that any claim coming from an observational study is most likely to be wrong.”
They draw the analogy between scientific publications and manufacturing lines, in terms of quality control. If you just inspect the final product rolling off the line for defects, you’re doing it the expensive way. You’re far better off breaking the whole flow into processes and considering each of those in turn, isolating problems early and fixing them, so you don’t make so many defective products in the first place. In the same way, Young and Karr have this to say about the observational study papers:
Consider the production of an observational study: Workers – that is, researchers – do data collection, data cleaning, statistical analysis, interpretation, writing a report/paper. It is a craft with essentially no managerial control at each step of the process. In contrast, management dictates control at multiple steps in the manufacture of computer chips, to name only one process control example. But journal editors and referees inspect only the final product of the observational study production process and they release a lot of bad product. The consumer is left to sort it all out. No amount of educating the consumer will fix the process. No amount of teaching – or of blaming – the worker will materially change the group behaviour.
They propose a process control for any proposed observational study that looks like this:
Step 0: Data are made publicly available. Anyone can go in and check it if they like.
Step 1: The people doing the data collection should be totally separate from the ones doing the analysis.
Step 2: All the data should be split, right at the start, into a modeling group and a group used for testing the hypothesis that the modeling suggests.
Step 3: A plan is drawn up for the statistical treatment of the data, but using only the modeling data set, and without the response that’s being predicted.
Step 4: This plan is written down, agreed on, and not modified as the data start to come in. That way lies madness.
Step 5: The analysis is done according to the protocol, and a paper is written up if there’s one to be written. Note that we still haven’t seen the other data set.
Step 6: The journal reviews the paper as is, based on the modeling data set, and they agree to do this without knowing what will happen when the second data set get looked at.
Step 7: The second data set gets analyzed according to the same protocol, and the results of this are attached to the paper in its published form.
Now that’s a hard-core way of doing it, to be sure, but wouldn’t we all be better off if something like this were the norm? How many people would have the nerve, do you think, to put their hypothesis up on the chopping block in public like this? But shouldn’t we all?