Misplaced Confidence in Observed Power

Zad Rafi

Misplaced Confidence in Observed Power

statistics

Another misinterpretation of what statistical power is and how trial results should be interpreted.

Author

Zad Rafi

Published

September 30, 2018

Keywords

statistical power, observed power, post hoc power, statistical misinterpretation, clinical trials, hypothesis testing

Two months ago, a study came out in JAMA which compared the effectiveness of the antidepressant escitalopram to placebo for long-term major adverse cardiac events (MACE).

The authors explained in the methods section of their paper how they calculated their sample size and what differences they were looking for between groups.

First, they used some previously published data to get an idea for incidence rates,

“Because previous studies in this field have shown conflicting results, there was no appropriate reference for power calculation within the designated sample size. The KAMIR study reported a 10.9% incidence of major adverse cardiac events (MACE) over 1 year… Therefore, approximately 50% MACE incidence was expected during a 5-year follow-up.”

Then, they calculated their sample size based on some differences they were interested in finding,

“Assuming 2-sided tests, α = .05, and a follow-up sample size of 300, the expected power was 70% and 96% for detecting 10% and 15% group differences, respectively.”

So far so good.

Then, we get to the results,

“A significant difference was found: composite MACE incidence was 40.9% (61/149) in the escitalopram group and 53.6% (81/151) in the placebo group (hazard ratio [HR], 0.69; 95% CI, 0.49-0.96; P = .03). The model assumption was met (Schoenfeld P = .48). The estimated statistical power to detect the observed difference in MACE incidence rates between the 2 groups was 89.7%.”

Ouch. This issue ended up bothering me so much that I wrote a letter to the editor (LTE) to point out the issue. Unfortunately, the LTE got rejected, but Andrew Althouse suggested that I discuss this over at DataMethods, so I did, and I also discussed it on Twitter but also wanted to publish the LTE on my blog. Here it is.

This letter has now been preprinted on arXiv.

In a similar tale, a group of surgeons published a methodological article advocating this practice of calculating observed power, which I further discuss here.

Citation

For attribution, please cite this work as:

1. Rafi Z, Rafi Z. (2018). ‘Misplaced Confidence in Observed Power’. Less Likely. https://lesslikely.com/statistics/misplaced-power.

Backlinks (3)

--- title: "Misplaced Confidence in Observed Power" author: - name: "Zad Rafi" orcid: "0000-0003-1545-8199" url: https://twitter.com/dailyzad date: "2018-09-30" categories: [statistics] description: "Another misinterpretation of what statistical power is and how trial results should be interpreted." image: "https://res.cloudinary.com/less-likely/image/upload/f_auto,q_auto/v1554700127/Site/JAMAObservedPower.png" slug: misplaced-power zotero: true citation: type: article container-title: "Less Likely" url: https://lesslikely.com/statistics/misplaced-power abstract: "Another misinterpretation of what statistical power is and how trial results should be interpreted." keywords: - statistical power - observed power - post hoc power - statistical misinterpretation - clinical trials - hypothesis testing format: html: toc: true code-fold: false code-tools: true theme: default execute: warning: false message: false echo: true cache: false knitr: opts_chunk: fig.width: 7 fig.height: 5 fig.align: "center" dev: "svg" dpi: 300 --- ```{r setup, include=FALSE} # R options from .Rprofile options( digits = 3, width = 130, max.print = 100, stringsAsFactors = FALSE, useFancyQuotes = TRUE, mc.cores = parallel::detectCores() ) # Suppress messages for common packages suppressMessages({ library(knitr) library(ggplot2) }) # Custom functions from .Rprofile theme_less <- function() { ggplot2::theme( plot.title = ggplot2::element_text(hjust = 0.5, color = "#666666"), plot.subtitle = ggplot2::element_text(hjust = 0.5, color = "#666666"), plot.caption = ggplot2::element_text(color = "#666666"), axis.title = ggplot2::element_text(hjust = 0.5, color = "#666666"), panel.grid = ggplot2::element_rect(fill = "transparent", colour = "transparent"), rect = ggplot2::element_rect(fill = "transparent", colour = "transparent"), panel.background = ggplot2::element_rect(fill = "transparent", colour = "transparent"), plot.background = ggplot2::element_rect(fill = "transparent", colour = "transparent"), panel.grid.major = ggplot2::element_rect(fill = "transparent", colour = "transparent"), panel.grid.minor = ggplot2::element_rect(fill = "transparent", colour = "transparent"), strip.background = ggplot2::element_rect(fill = "transparent", colour = "transparent"), strip.placement = ggplot2::element_rect(fill = "transparent", colour = "transparent"), legend.background = ggplot2::element_rect(fill = "transparent", colour = "transparent"), legend.box.background = ggplot2::element_rect(fill = "transparent", colour = "transparent"), legend.key = ggplot2::element_rect(fill = "transparent", color = "transparent"), axis.line.x = element_line(colour = "transparent", linetype = NULL), axis.line.y = element_line(colour = "transparent", linetype = NULL) ) + ggplot2::theme_minimal() } # Custom colors zred <- "#d46c5b" ``` * * * Two months ago, [a study came out in _JAMA_](https://jamanetwork.com/journals/jama/article-abstract/2688569) which compared the effectiveness of the antidepressant escitalopram to placebo for long-term major adverse cardiac events (MACE). The authors explained in the methods section of their paper how they calculated their sample size and what differences they were looking for between groups. First, they used some [previously published data](https://www.ncbi.nlm.nih.gov/pubmed/21982310) to get an idea for incidence rates, >"Because previous studies in this field have shown conflicting results, there was no appropriate reference for power calculation within the designated sample size. The KAMIR study reported a 10.9% incidence of major adverse cardiac events (MACE) over 1 year… Therefore, approximately 50% MACE incidence was expected during a 5-year follow-up." Then, they calculated their sample size based on some differences they were interested in finding, >“Assuming 2-sided tests, α = .05, and a follow-up sample size of 300, the expected power was 70% and 96% for detecting 10% and 15% group differences, respectively.” So far so good. Then, we get to the results, * * * <img src="https://res.cloudinary.com/less-likely/image/upload/f_auto,q_auto/v1554700127/Site/JAMAObservedPower.png" alt="Survival curve showing data from associations between antidepressants and major adverse cardiac events"> * * * >“A significant difference was found: composite MACE incidence was 40.9% (61/149) in the escitalopram group and 53.6% (81/151) in the placebo group (hazard ratio [HR], 0.69; 95% CI, 0.49-0.96; P = .03). The model assumption was met (Schoenfeld P = .48). The estimated statistical power to detect the observed difference in MACE incidence rates between the 2 groups was 89.7%.” Ouch. This issue ended up bothering me so much that I wrote a letter to the editor (LTE) to point out the issue. Unfortunately, the LTE got rejected, but [Andrew Althouse](https://twitter.com/ADAlthousePhD) suggested that I discuss this over at DataMethods, [so I did](https://discourse.datamethods.org/t/observed-power-and-other-power-issues/731), and I also [discussed it on Twitter](https://twitter.com/dailyzad/status/1045467112293183498) but also wanted to publish the LTE on my blog. [Here it is.](/uploads/rafi2020_obspower.pdf) This letter has now been [preprinted on _arXiv_](https://arxiv.org/abs/1907.08242). In a similar tale, a group of surgeons published a methodological article advocating this practice of calculating observed power, which I further [discuss here](/statistics/observed-power-magic).

Other Links

Misplaced Confidence in Observed Power

Citation

Backlinks (3)

Comments