Before I get into the discussion, I want to remind everyone why we randomize in the first place. It’s to reduce selection bias and to get rid of systematic variation among groups, which allows us to come to more precise and efficient causal inferences. Many critics claim that we can’t make valid causal inferences if there’s an imbalance in covariates between the groups.
Here’s an example. Say, hypothetically, we had two groups, and we wanted to see the effect of a statin on all-cause mortality and compare it to placebo. We randomized our participants to both groups. Now, imagine our placebo group had more smokers in it than the statin group. We may think that this imbalance in smoking distribution puts us in trouble because there were already substantial between-group differences before the trial even started.
So, we haven’t even started the study, and we got a significant result from our test of homogeneity. Now we’re thinking, “oh no, I need to randomize again or abandon ship, or I’ll attempt to fit this covariate in a generalized linear model like an ANCOVA.”
Critics of RCTs will argue that because there’s also always the possibility of there being an imbalance of known or unknown covariates between groups, RCTs cannot make proper causal inferences, especially small RCTs that are “unable to distribute confounders effectively.”
Unfortunately, there are several problems with these beliefs and approaches.
Tests of homogeneity cannot tell you whether you’ve “completely randomized,” and they are also inappropriate, see the CONSORT statement).
It’s not always possible to rerandomize to achieve better balance in trials (even though the pioneer of randomization is believed to have one said to rerandomize when an undesirable configuration was obtained.)
Larger trials do not necessarily produce a better balance than smaller trials
The balance of covariates has never been the goal of randomization
Fitting covariates to a GLM after a test of significance is inappropriate
RCTs were never designed to perfectly balance covariates or even have very similar distributions of them. The purpose of randomization is to distribute hidden covariates, not perfectly, but efficiently and randomly.
Ronald Fisher’s recommendation, when he originally wrote Statistical Methods for Research Workers in the 1900s, was to block known covariates while randomizing all the hidden ones while looking at the effects of known covariates on both between-group variance and within-group variance (where everyone is getting the same treatment). The effects on the within-group variance would give you some idea as to the impact of the covariate.
Now, will imbalances of hidden covariates affect the point estimates? Of course, but they will also yield large standard errors and wide compatibility intervals, which should make you less sure about the results of your study. But it does not mean your study is unable to make causal inferences.
It’s also worth remembering that analyses of clinical trials are robust in that they account for these possible imbalances in both known and unknown covariates. When you have an imbalance in covariates, what do you get? Wide compatibility intervals, which is okay, that’s what statistics is meant to do, allow you to gauge ‘uncertainty.’ Furthermore, if covariates were balanced perfectly between groups, the standard analyses used for RCTs would become inappropriate, which is why the analyses for matched RCTs and normal RCTs are different.
But again, this doesn’t mean that we shouldn’t think about covariates. There are several valid methods to account for known covariates such as minimization (although I am NOT a fan of it), fitting covariates into a GLM (decisions like this are made prior to the analysis, not based on tests of significance) or stratifying by that covariate, in fact, this technique of stratifying after is known as post-stratification and often used by trial statisticians.
So, in conclusion, you don’t achieve causality from a perfect balance of RCTs and clinical trial statistical methods are a lot more complicated than most people think.
This piece is primarily inspired by Stephen Senn’s paper on randomization.