What Makes a Sensitivity Analysis?

Frequent & Mindless Rituals


Sensitivity analyses (SA) are common in trials and observational studies, but often little thought is given to what they should entail. None of this is surprising given that they are not usually taught in traditional settings, although historically, statistical concepts taught in traditional settings don’t have a great track record for proper application and interpretation. Regardless, they (SA) are an important component of many statistical analyses, and are therefore worth carefully understanding.

The first word of the phrase is clearly an obvious tell, it suggests an analysis that examines how sensitive or robust a result is. But a natural question is, sensitive specifically to what? Can I vary anything in the analysis to test the sensitivity of the result? For example, if I obtained an odds ratio of 1.7 from the primary analysis in my experiment and I conducted an extra analysis to see whether the odds ratio would change by resampling, is that a sensitivity analysis?

What if I did a completely different type of analysis, for example, if the primary analysis was a classical statistical test with an adjustment for multiple comparisons and I decided to run a hierarchical Bayesian regression with a concentrated prior on the null, would that be a sensitivity analysis? The possibilities of what can be varied are endless when a loose definition of sensitivity analysis is assumed. This is unique to every discipline and their culture.


A Common Scenario


It is very common in clinical trials, in particular, to see primary analyses often being intent-to-treat (ITT) analyses and sensitivity analyses being per-protocol analyses (PP). The following is a similar example from a high-profile trial published in the NEJM last year,1


The primary efficacy analysis was performed in the intention-to-treat population. Multiple imputation by chained equations was applied to account for missing data. The assumption that unobserved values were missing at random was deemed to be appropriate because we could not find any pattern among the missing values. A complete-case analysis and a per-protocol analysis were conducted as sensitivity analyses.


To many, this may seem perfectly fine, even great. The authors used multiple imputation to handle their missing data and they explained their assumption about the missing data mechanism. They then explained how they excluded all individuals with missing observations for the sensitivity analysis. We will come back to this example later.


As mentioned above, there are many possible ways to conduct a sensitivity analysis, especially if the phrase is used in a loose/vague way. Luckily, many of us do not have to get into nonsensical philosophical arguments on Twitter about this because the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) has given this topic much thought and it’s reflected by the fact that they recently created an entire new addition to their E9 guidance document (Statistical Principles for Clinical Trials) which has served as a reference for clinical trial statisticians for decades. In the addendum, titled Addendum on Estimands and Sensitivity Analysis In Clinical Trials, which has now been legally adopted by regulatory agencies around the world including the FDA and EMA, they elaborate on the concept of estimands and the role sensitivity analyses play.

To be clear, estimands are not new and have been discussed in the statistical literature since Tukey2 but the ICH working group’s addendum formalized its adoption in the clinical trial world and the importance of sensitivity analyses.35


Estimands & Sensitivity Analyses


A.5.2.1. Role of Sensitivity Analysis

Inferences based on a particular estimand should be robust to limitations in the data and deviations from the assumptions used in the statistical model for the main estimator. This robustness is evaluated through a sensitivity analysis. Sensitivity analysis should be planned for the main estimators of all estimands that will be important for regulatory decision making and labelling in the product information. This can be a topic for discussion and agreement between sponsor and regulator.

The statistical assumptions that underpin the main estimator should be documented. One or more analyses, focused on the same estimand, should then be pre-specified to investigate these assumptions with the objective of verifying whether or not the estimate derived from the main estimator is robust to departures from its assumptions. This might be characterised as the extent of departures from assumptions that change the interpretation of the results in terms of their statistical or clinical significance (e.g. tipping point analysis).

Distinct from sensitivity analysis, where investigations are conducted with the intent of exploring robustness of departures from assumptions, other analyses that are conducted in order to more fully investigate and understand the trial data can be termed “supplementary analysis” (see Glossary; A.5.3.). Where the primary estimand(s) of interest is agreed between sponsor and regulator, the main estimator is pre-specified unambiguously, and the sensitivity analysis verifies that the estimate derived is reliable for interpretation, supplementary analyses should generally be given lower priority in assessment.

A.5.2.2. Choice of Sensitivity Analysis

When planning and conducting a sensitivity analysis, altering multiple aspects of the main analysis simultaneously can make it challenging to identify which assumptions, if any, are responsible for any potential differences seen. It is therefore desirable to adopt a structured approach, specifying the changes in assumptions that underlie the alternative analyses, rather than simply comparing the results of different analyses based on different sets of assumptions. The need for analyses varying multiple assumptions simultaneously should then be considered on a case by case basis. A distinction between testable and untestable assumptions may be useful when assessing the interpretation and relevance of different analyses.

ICH E9 (R1) Guideline

The need for sensitivity analysis in respect of missing data is established and retains its importance in this framework. Missing data should be defined and considered in respect of a particular estimand (see A.4.). The distinction between data that are missing in respect of a specific estimand and data that are not directly relevant to a specific estimand gives rise to separate sets of assumptions to be examined in sensitivity analysis.


they explicitly define a sensitivity analysis as being an analysis which realistically varies the assumptions from the primary analysis, still targets the same estimand, examines the robustness of the results to assumption violations, and can possibly change the results/conclusions drawn. They contrast this with more extensive analyses that investigate these violations/departures of assumptions, and characterize them as being supplementary rather than sensitivity.



Now, back to the NEJM paper. The reason why an ITT analysis and a PP analysis cannot serve as primary and sensitivity analyses, respectively, is because they are targeting entirely different estimands.6 Thus, because they are answering completely different questions, they’re just two different analyses. And the additional PP analysis, although important to certain stakeholders, can be classified as a supplementary analysis or a non-sensitivity analysis.

Indeed, the following flowchart from Morris et al. (2014)6 is particularly useful as a guide to differentiate sensitivity analyses from non-sensitivity analyses.


Flowchart from Morris et al. (2014)

So what does a sensitivity analysis actually look like? Below, I walk through the analysis of a trial to give some examples.


An Example Using Trial Data


I’ll use a sample clinical trial dataset, which is described here:


"This dataset is typical of diastolic blood pressure data measured in small clinical trials in hypertension from the mid-to-late 1960s and for approximately a decade thereafter. During this time, hypertension was more severe, the number of effective treatments was relatively small, and the definition (DBP > 95 mmHg) of essential hypertension was not as stringent as it is now (DBP > 80 mmHg) - as seen in the 1967 report from the Veterans Administration Cooperative Study Group on Antihypertensive Agents (VA Study Group (1967)).

In Table 3.1, diastolic blood pressure (DBP) was measured (mmHg) in the supine position at baseline (i.e., DBP1) before randomization and monthly thereafter up to 4 months… Patients’ age and sex were recorded at baseline and represent potential covariates. The primary objective in this chapter in the analysis of this dataset is to test whether treatment A (new drug) may be effective in lowering DBP as compared to B (placebo) and to describe changes in DBP across the times at which it was measured. "


RNGkind(kind = "L'Ecuyer-CMRG")
set.seed(1031) # My birthday

# Suppose you wanted to simulate a clinical trial
library(readr)
BP <- read_csv("https://lesslikely.com/uploads/BP.csv", 
               col_types = cols(Group = col_factor(levels = c("A", "B")),
                                Sex = col_factor(levels = c("M", "F"))))

I have made some minor adjustments to this dataset and also generated a new variable (z) that is strongly correlated with the outcome of interest (PostDBP).


n     <- nrow(BP)              # length of vector
rho   <- 0.7                   # desired correlation = cos(angle)
theta <- acos(rho)             # corresponding angle
x1    <- BP$PostDBP            # fixed given data
x2    <- rnorm(n, 60, 10)      # new random data
X     <- cbind(x1, x2)         # matrix
Xctr  <- scale(X, center = TRUE, 
               scale = FALSE)  # centered columns (mean 0)
Id   <- diag(n)                               # identity matrix
Q    <- qr.Q(qr(Xctr[ , 1, drop = FALSE]))    # QR-decomposition, just matrix Q
P    <- tcrossprod(Q)          # = Q Q'       # projection onto space by x1
x2o  <- (Id-P) %*% Xctr[ , 2]                 # x2ctr orthogonal to x1ctr
Xc2  <- cbind(Xctr[ , 1], x2o)                # bind to matrix
Y    <- Xc2 %*% diag(1/sqrt(colSums(Xc2^2)))  # scale columns to length 1

z <- Y[ , 2] + (1 / tan(theta)) * Y[ , 1]     # final new vector
hist(z, col = "#d46c5b")

cor(x1, z)  
#> [1] 0.7
BP$z <- z
study <- BP
head(study, 10)
#> # A tibble: 10 x 7
#>    Subject Group BaselineDBP PostDBP   Age Sex         z
#>      <dbl> <fct>       <dbl>   <dbl> <dbl> <fct>   <dbl>
#>  1       1 A             114     105    43 F     -0.139 
#>  2       2 A             116     101    51 M     -0.164 
#>  3       3 A             119      98    48 F     -0.327 
#>  4       4 A             115     101    42 F     -0.130 
#>  5       5 A             116     105    49 M     -0.197 
#>  6       6 A             117     102    47 M     -0.150 
#>  7       7 A             118      99    50 F     -0.246 
#>  8       8 A             120     102    61 M     -0.185 
#>  9       9 A             114     103    43 M     -0.0853
#> 10      10 A             115      97    51 M     -0.263

Now, we can quickly explore the dataset.


# Inspecting Data  -------------------------------------

study_formula <- as.formula(PostDBP ~ .)

plot(spearman2(study_formula, study))
abline(v = 0)


x_1 <- model.matrix(~ ., data = study)[, -1]

# Initial Data Analysis  ------------------------------

cormatrix <- cor(x_1)

col <- wes_palette("Zissou1", 10, type = "continuous")
corrplot(cormatrix, method = "color", col = col,  
         type = "upper", order = "hclust", 
         addCoef.col = "black", 
         tl.col = "black", tl.srt = 45, 
         diag = FALSE)


clus1 <- varclus(x_1, similarity = "h")

plot(clus1, ylab = "Hoeffing's D statistic", 
     lwd = 2.5, lty = 1, cex = 0.75)
title("Hierarchical Cluster Analysis")


and we notice that there are missing data (which I actually generated below) and perhaps (if we didn’t actually know how the data were generated) we might suspect or start with the assumption that the data are missing at random (MAR).


bp_mnar <- ampute(data = BP[, -c(1, 2, 6)],
                  prop = 0.85, mech = "MNAR")

temp <- cbind(bp_mnar$amp, BP[, c(1, 2, 6)])
predMatrix <- make.predictorMatrix(temp)
# Inspect Data

str(temp) 
#> 'data.frame':    39 obs. of  7 variables:
#>  $ BaselineDBP: num  114 116 119 115 116 117 118 120 114 115 ...
#>  $ PostDBP    : num  105 101 98 101 NA 102 99 NA 103 97 ...
#>  $ Age        : num  43 NA 48 42 49 NA 50 61 NA 51 ...
#>  $ z          : num  -0.139 -0.164 -0.327 NA -0.197 ...
#>  $ Subject    : num  1 2 3 4 5 6 7 8 9 10 ...
#>  $ Group      : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ Sex        : Factor w/ 2 levels "M","F": 2 1 2 2 1 1 2 1 1 1 ...
sum(is.na(temp)) 
#> [1] 32

aggr(temp, col = c("#3f8f9b", "#d46c5b"))


Preliminary Analyses


summary(glm(PostDBP ~ Group + BaselineDBP + Age + Sex + z, data = temp))
#> 
#> Call:
#> glm(formula = PostDBP ~ Group + BaselineDBP + Age + Sex + z, 
#>     data = temp)
#> 
#> Deviance Residuals: 
#>       1        3        7       14       23       31       39  
#> -0.0153   0.3158  -0.6908   0.3903   0.0943  -0.3903   0.2961  
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)   77.099     64.960    1.19     0.45
#> GroupB         5.523      1.840    3.00     0.20
#> BaselineDBP    0.410      0.718    0.57     0.67
#> Age           -0.394      0.389   -1.01     0.50
#> SexF           3.578      0.969    3.69     0.17
#> z             39.392      8.821    4.47     0.14
#> 
#> (Dispersion parameter for gaussian family taken to be 0.9785)
#> 
#>     Null deviance: 262.00000  on 6  degrees of freedom
#> Residual deviance:   0.97847  on 1  degrees of freedom
#>   (32 observations deleted due to missingness)
#> AIC: 20.09
#> 
#> Number of Fisher Scoring iterations: 2
model_performance(glm(PostDBP ~ Group + BaselineDBP + 
                      Age + Sex + z, data = temp))
#> # Indices of model performance
#> 
#> AIC    |    BIC | Nagelkerke's R2 |  RMSE | Sigma
#> -------------------------------------------------
#> 20.091 | 19.713 |           1.000 | 0.374 | 0.989
model_performance(glm(PostDBP ~ Group + rcs(BaselineDBP, 3) + 
                      rcs(Age, 3) + Sex + rcs(z, 3), 
                      data = temp))
#> # Indices of model performance
#> 
#> AIC      |      BIC | Nagelkerke's R2 |      RMSE | Sigma
#> ---------------------------------------------------------
#> -370.350 | -370.782 |           1.000 | 2.505e-13 |   Inf

We start off doing some preliminary analyses to check what our model looks like fit to these incomplete data. Because the glm() function will automatically delete missing observations and do a complete-case analysis, we can gauge what our estimates would look like if we assume that the data are missing completely at random (MCAR), which is almost always unrealistic. It is almost always more realistic to assume that the data are missing at random (MAR), \(\operatorname{Pr}\left(R=0 \mid Y_{\mathrm{obs}}, Y_{\mathrm{mis}}, \psi\right)=\operatorname{Pr}\left(R=0 \mid Y_{\mathrm{obs}}, \psi\right)\). This can often be a reasonable assumption to start off with for the primary analysis.7 We include all possible information in our imputation model, meaning all the variables in our dataset, and impute the missing data using the predictive mean matching method, and then analyze each dataset and pool the results.


The Primary Analysis


study$PostDBP <- temp$PostDBP

quickpred(temp)
#>             BaselineDBP PostDBP Age z Subject Group Sex
#> BaselineDBP           0       1   1 1       1     0   1
#> PostDBP               1       0   1 1       1     1   1
#> Age                   1       1   0 1       0     1   1
#> z                     0       1   1 0       1     1   1
#> Subject               0       0   0 0       0     0   0
#> Group                 0       0   0 0       0     0   0
#> Sex                   0       0   0 0       0     0   0

z1 <- parlmice(data = temp, method = "midastouch",
               m = 20, maxit = 10, 
               predictorMatrix = predMatrix,
               cluster.seed = 1031, n.core = 4, 
               n.imp.core = 5, cl.type = "FORK",
               ridge = 1e-04, remove.collinear = TRUE, 
               remove.constant = FALSE, allow.na = TRUE) 

densityplot(z1)

analysis1 <- with(z1, glm(PostDBP ~ Group + 
                          BaselineDBP + Age + Sex + z))
result1 <- pool(analysis1)
summary(result1)
#>          term  estimate std.error statistic     df   p.value
#> 1 (Intercept) 100.84364   38.1105   2.64608 15.091 1.826e-02
#> 2      GroupB   9.82931    1.4750   6.66374 13.789 1.157e-05
#> 3 BaselineDBP   0.01504    0.3850   0.03906 13.830 9.694e-01
#> 4         Age  -0.02399    0.1766  -0.13586  8.700 8.950e-01
#> 5        SexF   0.52096    1.0136   0.51398 14.787 6.149e-01
#> 6           z   5.08634    4.5593   1.11560  8.417 2.954e-01

We’ve done our primary analysis, in which we imputed the missing data under the assumption of MAR. Unfortunately, we cannot verify whether or not this assumption is true, but we can vary this missing data assumption and assume that the data are missing not at random (MNAR) \(\operatorname{Pr}\left(R=0 \mid Y_{\mathrm{obs}}, Y_{\mathrm{mis}}, \psi\right)\), and that in the individuals with missing data, we can assume different numbers than those with complete data.



A Sensitivity Analysis


Now suppose we wish to handle these missing under the assumption of MNAR, we would now be conducting a sensitivity analysis because we are still targeting the same estimand, but only varying the assumptions. The question / target remain the same. To handle missing data that we assume are MNAR, there are a number of different and complex approaches.

We start off with the δ-adjustment, controlled multiple imputation method,8 in which we assume that the group with missing observations differs systematically from the group with complete observations by a certain quantity. We produce a range of these quantities and add or subtract them from the imputed values and then conduct our analysis on this adjusted, imputed dataset.


ini <- mice(temp, method = "midastouch", maxit = 0, 
            predictorMatrix = predMatrix)

ini$method["BaselineDBP"] <- "midastouch"
ini$method["Age"] <- "midastouch"
ini$method["Sex"] <- ""
ini$method["Group"] <- ""
ini$method["Subject"] <- ""
ini$method["z"] <- "midastouch"
meth <- ini$method

delta <- c(0, 5, 10, 15, 20, 25, 30)

imp.all <- vector("list", length(delta))
post <- ini$post
for (i in 1:length(delta)) {
  d <- delta[i]
  cmd <- paste("imp[[j]][, i] <- imp[[j]][, i] +", d)
  post["PostDBP"] <- cmd
  imp <- parlmice(temp, method = meth, 
                  predictorMatrix = predMatrix,
                  m = 12, maxit = 10, 
                  cluster.seed = 1031, n.core = 4, 
                  n.imp.core = 3, cl.type = "FORK",
                  ridge = 1e-04, remove.collinear = TRUE, 
                  remove.constant = FALSE, allow.na = TRUE,
                  post = post, seed = i * 1031,  
                  printFlag = FALSE)
  imp.all[[i]] <- imp
}

output <- sapply(imp.all, 
                 function(x) pool(with(x, glm(PostDBP ~ Group + 
                                              BaselineDBP + Age + Sex + z))), 
                 simplify = "array", USE.NAMES = TRUE)
a <- (as.data.frame(t(output)))$pooled
r <- array(rep(NA, length(delta) * 6), 
           dim = c(length(delta), 6), dimnames = list(c(delta),
                                               c("Intercept", "Group", 
                                                 "BaselineDBP", "Age", 
                                                 "Sex", "z")))
for (i in 1:length(delta)) {
   r[i, ] <- cbind(a[[i]][["estimate"]])
}
r <- as.data.frame(r)

kbl(r, format = "html", escape = F, padding = 25, valign = "t",
  position = "t", align = "c", centering = T, longtable = T, color = "#777") %>%
  row_spec(row = 0, color = "#666",  bold = T) %>%
  column_spec(column = 1, color = "#666",  bold = T) %>%
  kable_classic(full_width = T, html_font = "Cambria", 
                lightable_options = c("basic", "hover", "responsive") , 
                font_size = 16, fixed_thead = list(enabled = F))  %>%
  kableExtra::scroll_box(width = "100%")
Intercept Group BaselineDBP Age Sex z
0 91.88 8.931 0.1507 -0.1611 0.8559 8.7732
5 105.16 9.892 -0.0498 0.0621 0.7971 3.3268
10 114.08 11.742 -0.2232 0.3121 0.1908 1.0478
15 150.28 12.273 -0.5886 0.4564 0.8588 0.1938
20 94.93 11.165 -0.0216 0.2738 1.3412 6.2673
25 70.23 11.351 0.2007 0.2666 1.4047 4.4462
30 62.74 13.634 0.2252 0.3727 0.2044 -1.3675

The table above gives the various δ’s (the first column with values ranging from 0 - 30 in increments of 5) that were added to the multiply imputed datasets, and the corresponding estimates from the models fit to those datasets. So now we’ve conducted our first sensitivity analysis using the δ-adjustment, controlled multiple imputation method. This form of sensitivity analysis is also typically what is used and preferred by regulators.8



Another Sensitivity Analysis


To further explore the MNAR assumption, we could also use a Heckman selection model \(P(Y,R)=P(Y)P(R|Y)\), as described by van Buuren here:


The selection model multiplies the marginal distribution \(P(Y)\) in the population with the response weights \(P(R|Y)\). Both \(P(Y)\) and \(P(R|Y)\) are unknown, and must be specified by the user. The model where \(P(Y)\) is normal and where \(P(R|Y)\) is a probit model is known as the Heckman model. This model is widely used in economics to correct for selection bias.


JointModelEq <- generate_JointModelEq(data = temp, varMNAR = "PostDBP")
JointModelEq[,"PostDBP_var_sel"] <- c(1,0,1,1,1,1,0)
JointModelEq[,"PostDBP_var_out"] <- c(1,0,1,1,0,1,0)

arg <- MNARargument(data = temp, varMNAR = "PostDBP", 
                    JointModelEq = JointModelEq)

arg$method["BaselineDBP"] <- "midastouch"
arg$method["Age"] <- "midastouch"
arg$method["Sex"] <- ""
arg$method["Group"] <- ""
arg$method["Subject"] <- ""
arg$method["z"] <- "midastouch"

imp1 <- parlmice(data = arg$data_mod,
                  method = arg$method,
                  predictorMatrix = arg$predictorMatrix,
                  JointModelEq = arg$JointModelEq,
                  control = arg$control,
                  maxit = 10, m = 20,
                  cluster.seed = 1031, n.core = 4, 
                  n.imp.core = 5, cl.type = "FORK",
                  ridge = 1e-04, remove.collinear = TRUE, 
                  remove.constant = FALSE, allow.na = TRUE)
densityplot(imp1)

analysis2 <- with(imp1, glm(PostDBP ~ Group + 
                            BaselineDBP + Age + Sex + z))
result2 <- pool(analysis2)
summary(result2)
#>          term  estimate std.error statistic     df   p.value
#> 1 (Intercept) 100.41890   38.6536   2.59792 15.819 1.955e-02
#> 2      GroupB   9.44191    1.4146   6.67445 17.688 3.193e-06
#> 3 BaselineDBP   0.03023    0.3856   0.07842 14.631 9.386e-01
#> 4         Age  -0.05397    0.1745  -0.30921  8.944 7.642e-01
#> 5        SexF   0.44827    0.8684   0.51621 26.469 6.100e-01
#> 6           z   6.42482    3.9082   1.64392 13.461 1.233e-01

so now we have conducted our second sensitivity analysis.



The results from both the primary analysis, the first sensitivity analysis, and the second seem somewhat consistent. However, with the first sensitivity analysis where we applied delta adjustments, as the adjustments became larger, the coefficients changed drastically. However, such large delta adjustments are not realistic and those that were smaller led to coefficients that were closer to the other two analyses.


Supplementary Analyses


The following analyses are what I mostly consider to be supplementary analyses that may further investigate some violations of assumptions but often go far beyond what would be necessary for a principled sensitivity analysis that would typically accompany a primary analysis.


hs_prior <- hs(df = 3, global_df = 1, global_scale = 0.01, 
               slab_df = 4, slab_scale = 2.5)
summary(stan_glm(PostDBP ~ Group + BaselineDBP + 
                   Age + Sex + z, data = temp, refresh = 0, prior = hs_prior))
#> 
#> Model Info:
#>  function:     stan_glm
#>  family:       gaussian [identity]
#>  formula:      PostDBP ~ Group + BaselineDBP + Age + Sex + z
#>  algorithm:    sampling
#>  sample:       4000 (posterior sample size)
#>  priors:       see help('prior_summary')
#>  observations: 7
#>  predictors:   6
#> 
#> Estimates:
#>               mean   sd    10%   50%   90%
#> (Intercept) 109.2   34.1  92.1 105.8 128.8
#> GroupB        0.1    0.4  -0.1   0.0   0.2
#> BaselineDBP   0.0    0.3  -0.2   0.0   0.1
#> Age           0.0    0.2  -0.2   0.0   0.1
#> SexF          0.0    0.4  -0.1   0.0   0.1
#> z             0.0    0.4  -0.1   0.0   0.1
#> sigma         7.3    2.3   4.9   6.9  10.2
#> 
#> Fit Diagnostics:
#>            mean   sd    10%   50%   90%
#> mean_PPD 105.0    4.0 100.0 105.0 109.6
#> 
#> The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
#> 
#> MCMC diagnostics
#>               mcse Rhat n_eff
#> (Intercept)   0.6  1.0  3832 
#> GroupB        0.0  1.0  2220 
#> BaselineDBP   0.0  1.0  4021 
#> Age           0.0  1.0  3789 
#> SexF          0.0  1.0  3251 
#> z             0.0  1.0  3095 
#> sigma         0.0  1.0  3439 
#> mean_PPD      0.1  1.0  3883 
#> log-posterior 0.1  1.0  1325 
#> 
#> For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
summary(stan_glm(PostDBP ~ Group + BaselineDBP + 
                   Age + Sex + z, data = temp, refresh = 0))
#> 
#> Model Info:
#>  function:     stan_glm
#>  family:       gaussian [identity]
#>  formula:      PostDBP ~ Group + BaselineDBP + Age + Sex + z
#>  algorithm:    sampling
#>  sample:       4000 (posterior sample size)
#>  priors:       see help('prior_summary')
#>  observations: 7
#>  predictors:   6
#> 
#> Estimates:
#>               mean   sd    10%   50%   90%
#> (Intercept)  87.5  173.7 -96.7  83.6 286.8
#> GroupB        5.8    5.2   0.5   5.7  11.2
#> BaselineDBP   0.3    1.9  -1.9   0.3   2.3
#> Age          -0.3    1.0  -1.4  -0.4   0.8
#> SexF          3.5    3.8   0.4   3.5   6.5
#> z            37.9   24.8  11.7  38.4  63.4
#> sigma         2.8    2.5   0.8   2.0   5.6
#> 
#> Fit Diagnostics:
#>            mean   sd    10%   50%   90%
#> mean_PPD 105.0    2.1 103.2 105.0 106.8
#> 
#> The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
#> 
#> MCMC diagnostics
#>               mcse Rhat n_eff
#> (Intercept)   5.1  1.0  1140 
#> GroupB        0.2  1.0  1194 
#> BaselineDBP   0.1  1.0  1093 
#> Age           0.0  1.0  1133 
#> SexF          0.1  1.0  2333 
#> z             0.7  1.0  1107 
#> sigma         0.2  1.0   245 
#> mean_PPD      0.0  1.0  3029 
#> log-posterior 0.4  1.0   171 
#> 
#> For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Indeed, it is no surprise that the Bayesian model with no prior specified would give similar results to the base glm() function, while specifying a regularizing prior like the horseshoe prior with three degrees of freedom, would practically shrink all the coefficients towards zero. We now look at robust methods and resampling and the results they give.


foo <- function(data, indices) {
  mod <- glm(PostDBP ~ Group + BaselineDBP + Age + Sex + z, 
             family = gaussian, data = data[indices, ])
  return((mod)$coefficients)
}

boot(temp, foo, R = 1000, stype = "i", 
     sim = "ordinary", parallel = "multicore", ncpus = 4)
#> 
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#> 
#> 
#> Call:
#> boot(data = temp, statistic = foo, R = 1000, sim = "ordinary", 
#>     stype = "i", parallel = "multicore", ncpus = 4)
#> 
#> 
#> Bootstrap Statistics :
#>     original  bias    std. error
#> t1*  77.0985      NA          NA
#> t2*   5.5234      NA          NA
#> t3*   0.4103      NA          NA
#> t4*  -0.3943      NA          NA
#> t5*   3.5777      NA          NA
#> t6*  39.3923      NA          NA

I hope it is clear from the examples above that it is quite easy to do additional analyses to explore whether or not the results you’ve obtained are trustworthy, but it is quite difficult and requires much thought to do a careful and principled sensitivity analysis that will take into account real departures from assumptions and whether or not those will lead to substantial changes in results and decisions.

References


1. Mitjà O, Corbacho-Monné M, Ubals M, Alemany A, Suñer C, Tebé C, et al. (2020). ‘A Cluster-Randomized Trial of Hydroxychloroquine for Prevention of Covid-19.’ New England Journal of Medicine. 384:417–427. doi: 10.1056/NEJMoa2021801.
2. Mosteller F, Tukey JW. (1987). ‘Data analysis including statistics.’ In: The Collected Works of John W. Tukey: Philosophy and Principles of Data Analysis 1965-1986. CRC Press.
3. Akacha M, Bretz F, Ohlssen D, Rosenkranz G, Schmidli H. (2017). ‘Estimands and Their Role in Clinical Trials.’ Statistics in Biopharmaceutical Research. 9:268–271. doi: 10.1080/19466315.2017.1302358.
4. Permutt T. (2020). ‘Do Covariates Change the Estimand?’ Statistics in Biopharmaceutical Research. 12:45–53. doi: 10.1080/19466315.2019.1647874.
5. Mitroiu M, Oude Rengerink K, Teerenstra S, Pétavy F, Roes KCB. (2020). ‘A narrative review of estimands in drug development and regulatory evaluation: Old wine in new barrels?’ Trials. 21:671. doi: 10.1186/s13063-020-04546-1.
6. Morris TP, Kahan BC, White IR. (2014). ‘Choosing sensitivity analyses for randomised trials: principles.’ BMC Medical Research Methodology. 14:11. doi: 10.1186/1471-2288-14-11.
7. Buuren S van. (2018). ‘Flexible Imputation of Missing Data, Second Edition.’ CRC Press.
8. Permutt T. (2016). ‘Sensitivity analysis for missing data in regulatory submissions.’ Statistics in Medicine. 35:2876–2879. doi: 10.1002/sim.6753.

  • Cite this blog post


  • See also: