Wednesday 9 September 2009

Testing for baseline balance in clinical trials

There are a number of interesting points in this paper. In particular, section 4: Some misconceptions about balance.

To paraphrase, consider a two arm trial in which 200 patients (100 male and female) are allocated at random to one of two treatments. The resulting proportions of male and females in each arm after randomization are different (this is far more likely to be the case than being exactly equal (50 males and females in each arm) regardless of how well the randomization was performed. Exact balance in the covariates is probably more indicative of a non-randomzied trial.

Does this matter, yes if the covariate in question (gender in this case) has an effect on the outcome,

Suppose a model for the outcome y is;

y(1= treatment) = mu + beta1*x + theta + error
y(0 = control) = mu + beta1*x + error

where theta is the true treatment effect and beta1 is coefficient for the confounding covariate and x is the covariate (gender). A naive estimate of theta is the difference of means between the two groups.

theta.hat = ybar(1) - ybar(0)

which will be biased by the value of

beta1(xbar(1) - xbar(0))

The magnitude of which is dependent on the distribution of x across the treatment groups and the value of beta1.

The problems with testing for imbalance:

It is common for people to judge balance via some kind of significance test of the group means. There are problems with this as is described in Senn's paper. Firstly, all that matters is the observed data not the wider population distribution over repeated identical trials. So the question of "is that a real difference" is meaningless and arises from the confusion between identifying a sample and population. Secondly, is one of why should a difference of two standard errors define imbalance. As described earlier any difference in the distribution of x is potentially relevant when beta1 is not zero and does not become nullified if p = 0.1, and as the p-value is dependent on the sample size it is possible for a difference in means to be balanced in terms of the p value in a small trial whereas in a larger trial the exact same difference would be identified as imbalanced.

The biggest problem with the significance test is not conditoning on an important covariate just because the p-value is > 0.05. If the effect of the covariate on the outcome is substantial the bias will be important. However, if balance is obtained in the important covariates it does not follow that you can ignore it and perform an unconditional analysis. Although, the estimate of the means will be unbiased, the standard errors will not be. The effect of the covariate on the variance is in fact maximised with the unconditional analysis.

Conclusion

This is from Senn's paper directly;

Thus, to sum up, a conditional analysis of an unbalanced experiment produces a valid inference; an unconditional analysis of a balanced experiment does not. Question; what is the value of balance as regards validity of an inference? Answer: none.



First meeting - 8 September

We held our first meeting yesterday to discuss an old, but relevant paper by Stephen Senn. Here's a link to the abstract:

Testing for baseline balance - Stephen Senn

Our discussion revolved around a number of issues highlighted in the paper. We wondered whether or not it's necessary to add a 'Table 1' to a publication, and although some of the members thought it was useful to add a table describing patient characteristics, other felt it's generally unnecessary to conduct significance tests to highlight differences between two groups in clinical trials or case control studies. Although baseline tests may demonstrate that the randomization hasn't worked, it's very easy to manipulate randomization and maintain baseline balance in a clinical trial.

We also talked about how to decide what covariates to add to an analysis. The paper specifically advises readers to choose covariates based on previous studies, and fit those covariates in multivariable regression techniques whatever the degree of imbalance in the baseline tests. This approach was generally well received, and we talked about some of the issues around covariate selection. Some of the members were experienced in using causal diagrams and directed acyclic graphs, so we thought that this might be an interesting area to discuss in a future seminar.

Since this was the first meeting, I thought it would be a good idea to think forward and discuss ideas for future meetings. Areas for future discussions could include Mendelian randomization and approaches to missing data. I'd like to also invite speakers to discuss their own work, so if you have any ideas or would like to discuss your own work in progress, email me at
nada.khan@dphpc.ox.ac.uk.

Next meeting is on October 21!