I recently wrote a column for Chemistry World on the concept of effect size – the readership there is from all sorts of chemistry, so it’s perhaps not as familiar a concept, and I thought it worth highlighting. (Briefly, effect size is the difference between the means of your treatment group and control group, divided by the standard deviation – it’s a “corrected” difference between the two. A small clinical trial is likely to only reach statistical significance for things that have a rather large effect size, while a large trial, on the other hand, can at times still reach significance for things that are small enough to make no real-world difference).
Here’s an excellent blog post on the idea, illustrated by an example that you may have heard about. There was a study a few years ago that seemed to show that judges handed down stiffer sentences right before lunch. The authors ascribed this to hunger, irritability, a desire to wrap things up, etc. But as that post shows the effect size that the paper found is impossibly huge, for a psychological effect:
If hunger had an effect on our mental resources of this magnitude, our society would fall into minor chaos every day at 11:45. Or at the very least, our society would have organized itself around this incredibly strong effect of mental depletion. Just like manufacturers take size differences between men and women into account when producing items such as golf clubs or watches, we would stop teaching in the time before lunch, doctors would not schedule surgery, and driving before lunch would be illegal. If a psychological effect is this big, we don’t need to discover it and publish it in a scientific journal – you would already know it exists.
There are other good examples given, along with links to papers that have tried to refute the “hungry judges” story in general. To not enough avail – it still gets trotted out as an example of the interesting and surprising findings that social science and psychology can provide. The point here, though, is that we should be wary of things that look too interesting and surprising, and also look for other causes when we find them. (In this case, one possibility is courtroom scheduling, where complicated cases are scheduled early, while plea bargains, mandatory sentences, and other more open-and-shut items get fitted in before lunch as time allows).
The more startling the result (positive or negative), the more it needs to be interrogated. We have better chances, in biomedical research trials, of producing profound effects, but we still need to be open to all sorts of possible explanations for them. . .