This post is motivated by a twitter link to a recent blog post critical of the old but influential study An obesity-associated gut microbiome with increased capacity for energy harvest with impressive citation metrics. In the post, Matthew Dalby smartly used the available data to reconstruct the final weights of the two groups. He showed these final weights were nearly the same, which is not good evidence for a treatment effect, given that the treatment was randomized among groups.

This is true, but doesn’t quite capture the essence of the major problem with the analysis: a simple t-test of differences in percent weight change fails to condition on initial weight. And, in pre-post designs, groups with smaller initial measures are expected to have more change due to regression to the mean. This is exactly what was observed. In the fecal transplant study, the initial mean weight of the rats infected with ob/ob feces was smaller (by 1.2SD) than that of the rats infected with +/+ feces and, consequently, the expected difference in the change in weight is not zero but positive (this is the expected difference conditional on an inital difference). More generally, a difference in percent change as an estimate of the parametric difference in percent change is not biased, but it is also not an estimate of the treatment effect, except under very limited conditions explained below. If these conditions are not met, a difference in percent change as an estimate of the treatment effect is biased, unless estimated conditional on (or “adusted for”) the initial weight.

Regression to the mean also has consequences on the hypothesis testing approach taken by the authors. Somewhat perplexingly, a simple t-test of the post-treatment weights or of pre-post difference in weight, or of percent change in weight does not have elevated Type I error. This is demonstrated using simulation below. The explaination is, in short, the Type I error rate is also a function of the initial difference in weight. If the initial difference is near zero, the Type I error is much less than the nominal alpha (say, 0.05). But as the intial difference moves away from zero, the Type I error is much greater than the nominal alpha. Over the space of the initial difference, these rates average to the nominal alpha.

continue reading the whole post