Optimal alpha – is the traditional statistical threshold α = 0.05 in phase III cancer clinical trials a problem?
University of New Brunswick
Clinical trials are conducted to test treatment safety and to provide high quality data for healthcare decision making. They are usually divided into phases I through IV with phase III being among the key steps for making decisions on new drug development. Most phase III trials continue to use the traditional Null Hypothesis Significance Testing (NHST) and P-values. Many of the drawbacks of NHST revolve around the sensitivity to sample size,and the arbitrary 1%, 5% and 10% statistical thresholds that only account for how much Type I error the investigators are willing to accept, without fully considering either the probability of making Type II error or the effect size. There have been many suggested alternatives to NHST but, few, if any, attempts have been made to address its key limitations. Optimal alpha is a method that is explicitly designed to address the key limitations of NHST and provide a study-specific threshold that minimizes the overall probability of making an error and present conclusions with unequivocal considerations of effect sizes. If using optimal α influences our decisions, e.g. concluding a drug should have been approved when it was previously decided otherwise, then this could have large implications for clinical research. The purpose of this study was to re-analyze 2,197 statistical tests from published phase III cancer clinical trials using optimal α, and compare conclusions that were originally reached using traditional NHST thresholds to those reached using optimal α. My results show that in 23.6% of the tests conclusions were inconsistent. This suggests that if one of our goals in healthcare research is to minimize the costs of making mistakes, then using the arbitrary NHST thresholds is often leading to wrong conclusions in clinical trials. Optimal α could improve our ability to make decisions and clinicians using NHST should apply optimal α rather than the traditional thresholds of 1%, 5% and 10%.