Sensitivity analyses (**SA**) are common in trials and observational
studies, but often little thought is given to what they *should* entail.
None of this is surprising given that they are not usually taught in
traditional settings, although they have been discussed extensively
within the statistical and epidemiological
literature.^{1–7} Indeed, the **Handbook of
Missing Data Methodology**,^{8} which is the most authoritative book to date
on statistical methods to handle missing data, has nearly six chapters
devoted to the topic of sensitivity analyses and different approaches to
use. Clearly, they (**SA**) are an important component of many
statistical analyses, and are therefore worth carefully understanding.

This article will focus on what a sensitivity analyses is often assumed to be based on practices in the literature, some differing points of view regarding sensitivity analyses, and a practical example of conducting principled sensitivity analyses in the context of missing data in a randomized clinical trial.

The first word of the phrase is clearly an obvious tell, it suggests an
analysis that examines how sensitive or robust a result is. But a
natural question is, sensitive specifically to what? Can I vary anything
in the analysis to test the sensitivity of the result? For example, if I
obtained an odds ratio of **1.7** from the primary analysis in my
experiment and I conducted an extra analysis to see whether the odds
ratio would change by resampling, is that a sensitivity analysis?

What if I did a completely different type of analysis, for example, if the primary analysis was a classical statistical test with an adjustment for multiple comparisons and I decided to run a hierarchical Bayesian regression with a concentrated prior on the null, would that be a sensitivity analysis? The possibilities of what can be varied are endless when a loose definition of sensitivity analysis is assumed. This is unique to every discipline and their culture.

# Frequent Misconceptions

Take for example in in clinical trials and observational studies, in
particular, it is common to see primary analyses often being
intent-to-treat (**ITT**) analyses and sensitivity analyses being
per-protocol analyses (**PP**). The following is a similar example from
a high-profile trial published in the *The Lancet* in
2002,^{9}

The Multicentre Aneurysm Screening Study group randomised 67,800 men to receive an invitation to an abdominal ultrasound scan or not [6]. Of those invited to receive an abdominal scan, 20% did not accept. The primary analysis was by intention to treat, thus estimating the effect of being randomised to abdominal ultrasound. Another analysis investigated the complier average causal effect, which considers what the (average) effect of treatment was in patients who would have adhered to protocol however they were randomised [7].

To many, this may seem perfectly fine, even great. The authors used the
question they were primarily interested in as the main analysis, and
then conducted an additional analysis to see if these results are
consistent. Unfortunately, this is highly problematic as Morris et al.
(2014)^{10}
describes

These questions are different, and observing different results should not shake our confidence in either. The CACE analysis was a secondary analysis, not a sensitivity analysis.

It is common for authors to compare the results of intention-to-treat with per-protocol analysis; see for example [8, 9]. While it is hard to pin down the precise question of per-protocol analysis [10], this is clearly different to the question intention-to-treat addresses. Per-protocol analysis should not therefore be considered as a sensitivity analysis for intention-to-treat but as a secondary analysis, if at all.

As mentioned above, there are many possible ways to conduct a
sensitivity analysis, especially if the phrase is used in a loose/vague
way. Whether or not these are valid and principled approaches to
conducting sensitivity analyses is another question. Luckily, many of us
do not have to ponder day and night about the semantics about this
because both The Handbook of Missing Data Methodology and the
**International Council for Harmonisation of Technical Requirements for
Pharmaceuticals for Human Use** (**ICH**) have given this topic much
thought, and for the latter it is reflected by the fact that they
recently created an entire new addition to their E9 guidance document
(*Statistical Principles for Clinical
Trials*)
which has served as a reference for clinical trial statisticians for
decades. In the addendum, titled *Addendum on Estimands and Sensitivity
Analysis In Clinical
Trials*,
which has now been legally adopted by regulatory agencies around the
world including the FDA and EMA, they elaborate on the concept of
estimands and the role sensitivity analyses play.

To be clear, estimands are not new and have been discussed in the
statistical literature since Tukey^{11}
but the ICH working group’s addendum formalized its adoption in the
clinical trial world and the importance of sensitivity
analyses.^{12–14}

# Estimands & Sensitivity

A.5.2.1. Role of Sensitivity AnalysisInferences based on a particular estimand should be robust to limitations in the data and deviations from the assumptions used in the statistical model for the main estimator. This robustness is evaluated through a sensitivity analysis. Sensitivity analysis should be planned for the main estimators of all estimands that will be important for regulatory decision making and labelling in the product information. This can be a topic for discussion and agreement between sponsor and regulator.

The statistical assumptions that underpin the main estimator should be documented. One or more analyses, focused on the same estimand, should then be pre-specified to investigate these assumptions with the objective of verifying whether or not the estimate derived from the main estimator is robust to departures from its assumptions. This might be characterised as the extent of departures from assumptions that change the interpretation of the results in terms of their statistical or clinical significance (e.g. tipping point analysis).

Distinct from sensitivity analysis, where investigations are conducted with the intent of exploring robustness of departures from assumptions, other analyses that are conducted in order to more fully investigate and understand the trial data can be termed “supplementary analysis” (see Glossary; A.5.3.). Where the primary estimand(s) of interest is agreed between sponsor and regulator, the main estimator is pre-specified unambiguously, and the sensitivity analysis verifies that the estimate derived is reliable for interpretation, supplementary analyses should generally be given lower priority in assessment.

A.5.2.2. Choice of Sensitivity AnalysisWhen planning and conducting a sensitivity analysis, altering multiple aspects of the main analysis simultaneously can make it challenging to identify which assumptions, if any, are responsible for any potential differences seen. It is therefore desirable to adopt a structured approach, specifying the changes in assumptions that underlie the alternative analyses, rather than simply comparing the results of different analyses based on different sets of assumptions. The need for analyses varying multiple assumptions simultaneously should then be considered on a case by case basis. A distinction between testable and untestable assumptions may be useful when assessing the interpretation and relevance of different analyses.

ICH E9 (R1) GuidelineThe need for sensitivity analysis in respect of missing data is established and retains its importance in this framework. Missing data should be defined and considered in respect of a particular estimand (see A.4.). The distinction between data that are missing in respect of a specific estimand and data that are not directly relevant to a specific estimand gives rise to separate sets of assumptions to be examined in sensitivity analysis.

they explicitly define a sensitivity analysis as being an analysis which realistically varies the assumptions from the primary analysis, still targets the same estimand, examines the robustness of the results to assumption violations, and can possibly change the results/conclusions drawn. They contrast this with more extensive analyses that investigate these violations/departures of assumptions, and characterize them as being supplementary rather than sensitivity.

Similar views are echoed by the National Research Council’s advice on
clinical trials (National Research Council
2010)^{15}

Recommendation 15: Sensitivity analysis should be part of the primary reporting of findings from clinical trials.Examining sensitivity to the assumptions about the missing data mechanism should be a mandatory component of reporting.

Now, back to the *NEJM* paper. The reason why an **ITT** analysis and a
**PP** analysis cannot serve as primary and sensitivity analyses,
respectively, is because they are targeting entirely *different
estimands*.^{10} Thus, because they
are answering completely different questions, they’re just two different
analyses. And the additional **PP** analysis, although important to
certain stakeholders, can be classified as a supplementary analysis or a
non-sensitivity analysis.

Indeed, the following flowchart from Morris et al.
(2014)^{10}
is particularly useful as a guide to differentiate sensitivity analyses
from non-sensitivity analyses.

So what does a sensitivity analysis actually look like? Below, I walk through the analysis of a trial to give some examples.

# An Example From a Trial

I’ll use a sample clinical trial dataset, which is described here:

"This dataset is typical of diastolic blood pressure data measured in small clinical trials in hypertension from the mid-to-late 1960s and for approximately a decade thereafter.

During this time, hypertension was more severe, the number of effective treatments was relatively small, and the definition (DBP > 95 mmHg) of essential hypertension was not as stringent as it is now (DBP > 80 mmHg)- as seen in the 1967 report from the Veterans Administration Cooperative Study Group on Antihypertensive Agents VA Study Group (1967).In Table 3.1, diastolic blood pressure (DBP) was measured (mmHg) in the supine position at baseline (i.e., DBP) before randomization… Patients’ age and sex were recorded at baseline and represent potential covariates.

The primary objective in this chapter in the analysis of this dataset is to test whether treatment A (new drug) may be effective in lowering DBP as compared to B (placebo) and to describe changes in DBP across the times at which it was measured."

Although we will not be using some of the scripts and project setup that I discussed in my statistical workflow post for the sake of efficiency and ease, I would urge and recommend others do so that it remains easy to catch errors. In particular, I would especially recommend using the following packages:

```
# Simulate Clinical Trial Data -------------------------------------
set.seed(1031, kind = "Mersenne-Twister",
normal.kind = "Inversion",
sample.kind = "Rejection")
BaselineDBP <- rnorm(500, 117, 2)
Group <- rbinom(500, 1, 0.5)
Age <- rnorm(500, 50, 7)
R1 <- rnorm(500, 0, 1)
R2 <- rnorm(500, 1, 0.5)
errors <- rmvnorm(500, mean = c(0, 0),
sigma = matrix(c(1, 0.5, 0.5, 1),
nrow = 2, byrow = TRUE))
PostDBP <- BaselineDBP + (Age * 0.33) +
(Group * 9.7) + errors[, 1]
```

I have made some minor adjustments to this dataset and also generated a
new variable (`Z`

) based on the existing ones in the dataset that is
strongly correlated with the outcome of interest (`PostDBP`

).

```
# Constructing Covariates -------------------------------------
n <- length(BaselineDBP) # length of vector
rho <- 0.7 # desired correlation
theta <- acos(rho) # corresponding angle
x1 <- BaselineDBP # fixed given data
x2 <- rnorm(n, 60, 10) # new random data
X <- cbind(x1, x2) # matrix
Xctr <- scale(X, center = TRUE,
scale = FALSE)
Id <- diag(n)
Q <- qr.Q(qr(Xctr[ , 1, drop = FALSE])) # QR-decomposition
P <- tcrossprod(Q)
x2o <- (Id-P) %*% Xctr[ , 2]
Xc2 <- cbind(Xctr[ , 1], x2o) # bind to matrix
Y1 <- Xc2 %*% diag(1/sqrt(colSums(Xc2^2)))
Z <- Y1[ , 2] + (1 / tan(theta)) * Y1[ , 1] # final new vector
BP_full <- data.frame(PostDBP, BaselineDBP,
Group, Age, Z, R1, R2)
```

```
# Explore Data -------------------------------------
str(BP_full)
#> 'data.frame': 500 obs. of 7 variables:
#> $ PostDBP : num 148 134 130 139 133 ...
#> $ BaselineDBP: num 119 117 117 116 118 ...
#> $ Group : int 1 0 0 1 0 0 0 1 1 0 ...
#> $ Age : num 59.3 54.3 44.3 46.1 51.7 ...
#> $ Z : num 0.00547 0.06653 0.07261 -0.02679 0.05205 ...
#> $ R1 : num -0.921 -0.359 -0.774 -0.195 0.864 ...
#> $ R2 : num 0.203 1.385 0.999 0.637 1.165 ...
summary(BP_full)
#> PostDBP BaselineDBP Group Age Z R1 R2
#> Min. :124.8 Min. :110.7 Min. :0.000 Min. :30.79 Min. :-0.188210 Min. :-3.13501 Min. :-0.6206
#> 1st Qu.:133.6 1st Qu.:115.7 1st Qu.:0.000 1st Qu.:46.17 1st Qu.:-0.040797 1st Qu.:-0.68427 1st Qu.: 0.6348
#> Median :138.2 Median :117.0 Median :1.000 Median :50.11 Median :-0.001858 Median : 0.02296 Median : 0.9880
#> Mean :138.5 Mean :117.0 Mean :0.506 Mean :50.47 Mean : 0.000000 Mean :-0.02118 Mean : 0.9916
#> 3rd Qu.:143.3 3rd Qu.:118.3 3rd Qu.:1.000 3rd Qu.:55.56 3rd Qu.: 0.040447 3rd Qu.: 0.72599 3rd Qu.: 1.3382
#> Max. :154.7 Max. :122.1 Max. :1.000 Max. :75.82 Max. : 0.258358 Max. : 2.67130 Max. : 2.4550
ztable(head(BP_full, 10))
```

PostDBP | BaselineDBP | Group | Age | Z | R1 | R2 |
---|---|---|---|---|---|---|

148.18 | 118.72 | 1 | 59.28 | 0.01 | -0.92 | 0.20 |

134.34 | 117.38 | 0 | 54.32 | 0.07 | -0.36 | 1.38 |

129.92 | 116.74 | 0 | 44.30 | 0.07 | -0.77 | 1.00 |

138.80 | 116.02 | 1 | 46.13 | -0.03 | -0.19 | 0.64 |

133.35 | 117.85 | 0 | 51.72 | 0.05 | 0.86 | 1.16 |

139.83 | 119.93 | 0 | 55.73 | 0.09 | -0.73 | 1.46 |

133.89 | 117.30 | 0 | 44.52 | -0.03 | 1.57 | 0.46 |

138.24 | 113.28 | 1 | 45.70 | -0.09 | -0.31 | 1.18 |

139.87 | 115.34 | 1 | 48.75 | -0.03 | -1.43 | 1.24 |

135.31 | 118.42 | 0 | 53.10 | 0.03 | 0.25 | -0.24 |

Now, we can quickly explore the other characteristics of the dataset, and first we look at the rank correlations between the predictors and the response variable.

```
# Inspect Data Attributes -------------------------------------
study_formula <- as.formula(PostDBP ~ .)
plot(spearman2(study_formula, BP_full), pch = 19,
cex = 1, col = alpha("#d46c5b", 1.95))
abline(v = 0, col = alpha("#d46c5b", 0.75), lty = 3)
```

Along with a correlation matrix between the predictors of interest.

We will not be looking at the estimates from the fitted model yet. We are mainly looking at a few characteristics of this dataset before we move on to the next step.

Now that we have familiarized ourselves with this dataset, we suddenly
notice that there are missing data (which I actually generated below
using a custom function) and suppose we didn’t actually know how the
missing data were generated, we might suspect or start with the
assumption that the data are missing at random (**MAR**).

I do not plan on touching what the different missing data mechanisms are (

MCAR,MAR,MNAR) as described by Rubin (1978). Although it is necessary to be familiar with them to better understand the approaches used throughout this post. For a simple introduction to the topic, please see

Of course, here, we do have an idea of what the missing data mechanism is, but we shall assume the role of a data analyst who has just been given a dataset with missing values in the outcome from their clinical colleague.

```
# Generate Missing Data -------------------------------------
set.seed(1031)
Ry <- ifelse(lesslikely() > 0, 1, 0)
PostDBP[Ry == 0] <- NA
BP_miss <- data.frame(PostDBP, BaselineDBP,
Group, Age, Z, R1, R2)
str(BP_miss)
#> 'data.frame': 500 obs. of 7 variables:
#> $ PostDBP : num NA 134 130 139 133 ...
#> $ BaselineDBP: num 119 117 117 116 118 ...
#> $ Group : int 1 0 0 1 0 0 0 1 1 0 ...
#> $ Age : num 59.3 54.3 44.3 46.1 51.7 ...
#> $ Z : num 0.00547 0.06653 0.07261 -0.02679 0.05205 ...
#> $ R1 : num -0.921 -0.359 -0.774 -0.195 0.864 ...
#> $ R2 : num 0.203 1.385 0.999 0.637 1.165 ...
sum(is.na(BP_miss))
#> [1] 258
```

```
missing_col <- c("#3f8f9b", "#d46c5b")
aggr(BP_miss, col = alpha(missing_col, 0.60),
plot = TRUE, prop = F, numbers = T)
```

```
par(mfrow = c(2, 2))
histMiss(BP_miss[,c("Age", "PostDBP")],
only.miss = TRUE,
col = alpha(c("#d46c5b", "gray"), 0.65))
histMiss(BP_miss[,c("Z", "PostDBP")],
only.miss = TRUE,
col = alpha(c("#d46c5b", "gray"), 0.65))
histMiss(BP_miss[,c("R1", "PostDBP")],
only.miss = TRUE,
col = alpha(c("#d46c5b", "gray"), 0.65))
histMiss(BP_miss[,c("R2", "PostDBP")],
only.miss = TRUE,
col = alpha(c("#d46c5b", "gray"), 0.65))
```

So now, we must examine this missing dataset and its characteristics and look for any systematic differences. We also examine the proportion of missingness and the influx and outflux patterns.

```
# Examine Flux Patterns -------------------------------------
round(fico(BP_miss), 1)[2:6]
#> BaselineDBP Group Age Z R1
#> 0.5 0.5 0.5 0.5 0.5
ztable(flux_temp)
```

POBS | Influx | Outflux | AINB | AOUT | FICO | |
---|---|---|---|---|---|---|

PostDBP | 0.48 | 0.48 | 0 | 1 | 0.00 | 0.00 |

BaselineDBP | 1.00 | 0.00 | 1 | 0 | 0.09 | 0.52 |

Group | 1.00 | 0.00 | 1 | 0 | 0.09 | 0.52 |

Age | 1.00 | 0.00 | 1 | 0 | 0.09 | 0.52 |

Z | 1.00 | 0.00 | 1 | 0 | 0.09 | 0.52 |

R1 | 1.00 | 0.00 | 1 | 0 | 0.09 | 0.52 |

R2 | 1.00 | 0.00 | 1 | 0 | 0.09 | 0.52 |

```
fluxplot(BP_miss, labels = FALSE, pch = 19,
col = "#d46c5b", cex = .95,
main = "Influx-Outflux Pattern")
```

# Exploratory Analyses

We start off doing some preliminary analyses to eyeball differences between the observed data and the missing amputed data.

`#> [1] 258`

pobs | influx | outflux | ainb | aout | fico | |
---|---|---|---|---|---|---|

PostDBP | 0.48 | 0.48 | 0 | 1 | 0.00 | 0.00 |

BaselineDBP | 1.00 | 0.00 | 1 | 0 | 0.09 | 0.52 |

Group | 1.00 | 0.00 | 1 | 0 | 0.09 | 0.52 |

Age | 1.00 | 0.00 | 1 | 0 | 0.09 | 0.52 |

Z | 1.00 | 0.00 | 1 | 0 | 0.09 | 0.52 |

R1 | 1.00 | 0.00 | 1 | 0 | 0.09 | 0.52 |

R2 | 1.00 | 0.00 | 1 | 0 | 0.09 | 0.52 |

We fit a model quickly to these missing data using the
`rlm()`

function from the
`MASS`

package and glance at what our
estimates look like.

```
# Exploratory Model Estimates -------------------------------------
eda_1 <- rlm(PostDBP ~ Group + BaselineDBP +
Age + Z + R1 + R2, data = BP_miss)
ztable(summary(eda_1)[["coefficients"]])
```

Value | Std. Error | t value | |
---|---|---|---|

(Intercept) | -7.48 | 5.47 | -1.37 |

Group | 9.72 | 0.13 | 74.39 |

BaselineDBP | 1.05 | 0.05 | 22.56 |

Age | 0.35 | 0.01 | 38.61 |

Z | -1.76 | 1.49 | -1.18 |

R1 | -0.18 | 0.07 | -2.63 |

R2 | -0.20 | 0.13 | -1.59 |

`ztable(performance(eda_1))`

AIC | BIC | RMSE | Sigma |
---|---|---|---|

669.45 | 697.36 | 0.93 | 0.95 |

Because the `MASS::rlm()`

function will automatically delete missing observations and do a
complete-case analysis, we can gauge what our estimates would look like
if we assume that the data are missing completely at random (**MCAR**),
which is almost always unrealistic.

It is almost always more realistic to assume that the data are missing
at random (**MAR**),
\(\operatorname{Pr}\left(R=0 \mid Y_{\mathrm{obs}}, Y_{\mathrm{mis}}, \psi\right)=\operatorname{Pr}\left(R=0 \mid Y_{\mathrm{obs}}, \psi\right)\).
This can often be a reasonable assumption to start off with for the
primary analysis.^{16}

# The Analysis Workflow

For our main analysis, we will build an imputation model to impute our missing data. We include all possible information in our imputation model, meaning all the variables in our dataset, and impute the missing data using the predictive mean matching method, and then analyze each dataset and pool the results. Our workflow will look like this

We do this with a fully-adjusted model, with all relevant covariates and
a reduced model with just the treatment group. This is because we want
both the adjusted and unadjusted estimates. We will then compare these
models using the likelihood-ratio
test^{17}
to see whether there are any substantial differences between the
fully-adjusted and reduced model.

# MI Power Analysis

However, before we are able to move onto this step, we must determine how many imputations we will construct. To many, this may seem like a trivial and even nonsensical task when any individual can impute a dataset with the click of a button and enter in any number of datasets they wish to impute.

The issue with this is that without giving careful thought to how many imputations may be needed for a particular study, one may run the risk of imputing too few datasets and losing information, or they may waste resources imputing a nonsensical amount that may not be necessary. Indeed, the latter seems especially trivial, however, it may often yield no advantages whatsoever, and only incur costs as many authors have argued.

The argument is that “the additional resources that would be required to create and store more than a few imputations would not be well spent” (Schafer 1997, 107), and “in most situations there is simply little advantage to producing and analyzing more than a few imputed datasets” (Schafer and Olsen 1998, 549).

Therefore, we will examine certain characteristics of our dataset with missing values and use them in an analysis to determine how many datasets we should impute. We will base this off a number of characteristics such as the fraction of missing information, the proportion of missing observations, and the losses that we are willing to incur.

We first start by running a “dry” or naive imputation of our data to quickly examine some of these characteristics. This imputed dataset will not be used for our primary, analysis, it is simply serving to guide us in our main imputation model. This is also why we have set the number of iterations to 0.

```
# Dry Imputation -------------------------------------
form <- list(PostDBP ~ Group + BaselineDBP + Age + Z + R1 + R2)
pred <- make.predictorMatrix(BP_miss)
mice(BP_miss, method = "pmm", m = 20,
maxit = 0, cores = 4, donors = 5,
seed = 1031, formulas = form, ridge = 0,
predictorMatrix = pred, remove.collinear = FALSE,
remove.constant = FALSE, allow.na = FALSE,
printFlag = FALSE, visitSequence = "monotone") -> init_imp
head(init_imp$loggedEvents, 10)
#> NULL
```

Now that we have imputed our dataset, we may examine it for any issues before moving onto the next step of analyzing it.

```
str(complete(init_imp, 1))
#> 'data.frame': 500 obs. of 7 variables:
#> $ PostDBP : num 147 134 130 139 133 ...
#> $ BaselineDBP: num 119 117 117 116 118 ...
#> $ Group : int 1 0 0 1 0 0 0 1 1 0 ...
#> $ Age : num 59.3 54.3 44.3 46.1 51.7 ...
#> $ Z : num 0.00547 0.06653 0.07261 -0.02679 0.05205 ...
#> $ R1 : num -0.921 -0.359 -0.774 -0.195 0.864 ...
#> $ R2 : num 0.203 1.385 0.999 0.637 1.165 ...
densityplot(init_imp, plot.points = TRUE,
theme = aesthetics, col = col)
```

`bwplot(init_imp, PostDBP, col = col)`

Although it is only a dry run, there seems to be few issues with it.

```
init_res <- pool(with(init_imp,
rlm(PostDBP ~ Group +
BaselineDBP + Age +
Z + R1 + R2)))
init_res_sum <- summary(init_res)
colnames(init_res_sum) <- c("Term", "Estimate", "SE",
"Statistic", "df", "P-val")
ztable(init_res_sum)
```

Term | Estimate | SE | Statistic | df | P-val |
---|---|---|---|---|---|

(Intercept) | 35.07 | 23.38 | 1.50 | 61.57 | 0.14 |

Group | 6.46 | 0.64 | 10.18 | 43.87 | 0.00 |

BaselineDBP | 0.74 | 0.20 | 3.72 | 61.17 | 0.00 |

Age | 0.26 | 0.04 | 7.19 | 122.38 | 0.00 |

Z | -4.12 | 6.40 | -0.64 | 60.56 | 0.52 |

R1 | -0.01 | 0.26 | -0.05 | 94.39 | 0.96 |

R2 | 0.08 | 0.54 | 0.15 | 72.50 | 0.88 |

Now that we have our vector, we can quickly examine certain characteristics such as the fraction of missing information (FIMO), the between-imputation variance and the within-imputation variance. The reason that we focus on these particular numbers is because they are crucial for several tasks in any statistical analysis plan, such as having enough power/precision, proper long-coverage, etc.

We use the FIMO from the dry imputation and a number of other criteria to determine how many imputations we need:

- Some have suggested taking the FIMO and multiplying by 100 to obtain the number of imputed datasets needed

\[m \approx 100 \lambda\]

- we wish to ensure that the monte-carlo standard error from the imputations are less than 10% of the between-imputation standard error

\[mcse < B^{\hat{-}}_{SE} * 0.10\]

- we wish to minimize the monte-carlo error that is derived from dividing the chosen number of datasets to impute from the fraction of missing information so that the monte-carlo standard error is less than 0.01

\[FMI / m ≈ 0.01\]

- we also wish to minimize the total variance so that the square-root of it is no greater than the ideal variance and the corresponding confidence interval width.

\[T_{m}=\left(1+\frac{\gamma_{0}}{m}\right) T_{\infty}\]

- Furthermore, it is commonly advocated by missing data experts to impute the same or similar number of datasets as fractions of missing observations \(\gamma_{0} * 100\)

We start by extracting the FMI’s from our dry imputation and calculating all these numbers.

```
grid(init_res$pooled$fmi)
fmi <- max(summary(init_res, type = c("all"), # Minimum Imputations
conf.int = TRUE, conf.level = 0.95)["fmi"])
m <- fmi / 0.01 # FMI Imputations
mcse <- sqrt(max(init_res$pooled$b) / m) # MCSE
se <- max((init_res$pooled$ubar) / sqrt(500)) # SE
```

What we find is that the fraction of missing information is nearly 50% which is very high and that in order to get a monte-carlo error of less than 0.01, we would need a minimum of at least 50 imputed datasets. We also calculate both the monte-carlo standard error and the standard error from the dry run and find that they are practically equivalent.

However, we must also explore other possibilities in terms of number of imputed datasets to reduce the monte-carlo errors so we run a quick function to do, checking the effects of a number of imputed datasets on the width of the confidence interval, which is directly tied to the monte-carlo error.

Indeed, Von Hippel (2018) proposed a relationship between the fraction of missing information and the number of imputations needed to achieve a targeted CI length or monte-carlo error using a quadratic formula,

\[m=1+\frac{1}{2}\left(\frac{\gamma_{0}}{\operatorname{SD}\left(\sqrt{U_{\ell}}\right) \mathrm{E}\left(\sqrt{U_{\ell}}\right)}\right)^{2}\]

where \(\mathrm{E}\left(\sqrt{U_{\ell}}\right)\) and \(\mathrm{SD}\left(\sqrt{U_{\ell}}\right)\) are the coefficient of variation (CV), summarizing the imputation variation in the SE estimates. We can construct a function to depict this relationship and calculate how many imputations we will need based on some desired error and further display the distribution of this relationship by varying these parameters.

In the function below, adopted from von Hippel, and van
Buuren,^{16} I vary both the
coefficient of variation and the \(\alpha\)-level.

With this function, we can now graphically examine the effects of different numbers of imputations on monte-carlo errors

We can see above that as we aim for smaller monte-carlo errors, we many more imputations to achieve our desired target.

Now that I have constructed and graphed three differing relationships between monte carlo errors and number of imputations needed to achieve those errors, and varied the \(\alpha\)-levels, we can now move onto choosing a specific number of imputations to construct.

If I choose an \(\alpha\) level of 5%, which the maximum tolerable type-I error rate, and a coefficient of variation of 2.5%

```
# Power || Information Analysis for Number of Imputations
powerimp(model = init_res, cv = 0.05, alpha = 0.01)
#> [1] 121
```

I need to construct approximately 50 imputations to achieve this coefficient of variation, which is what I will use for all the analyses from here on.

# The Primary Analysis

For the primary analysis, I will use the predictive-mean matching method
and specify 5 donors (based on a number of simulation studies that have
examined the effect of various donors
specified)^{18}, and fit a robust
regression that includes all the variables within the dataset. Our
imputation model will be

\[Y^{\operatorname{Post}} = \beta_{0} + \beta_{1}^{\operatorname{Group}} + \beta_{2}^{\operatorname{Base}} + \beta_{3}^{\operatorname{Age}} + \beta_{4}^{\operatorname{Z}} + \beta_{5}^{\operatorname{R1}} + \beta_{6}^{\operatorname{R2}} + \epsilon\]

```
# Impute Missing Data -------------------------------------
form <- list(PostDBP ~ Group + BaselineDBP + Age + Z + R1 + R2)
pred <- make.predictorMatrix(BP_miss)
mice(BP_miss, method = "pmm",
predictorMatrix = pred,
m = 20, maxit = 10, seed = 1031,
cores = 8, formulas = form, ridge = 0,
donors = 5, remove.collinear = FALSE,
remove.constant = FALSE, allow.na = FALSE,
printFlag = FALSE, visitSequence = "monotone") -> imp1
head(imp1$loggedEvents, 10)
#> NULL
```

Now that we have imputed our dataset, we may examine it for any issues before moving onto the next step of analyzing it.

Our imputed datasets seem to look fine so far. We now fit our models to each of these datasets and combine them using Rubin’s rules.

```
# Primary Analysis -------------------------------------
analysis1 <- with(imp1, rlm(PostDBP ~ Group +
BaselineDBP + Age + Z + R1 + R2))
analysis2 <- with(imp1, rlm(PostDBP ~ Group))
anova(analysis1, analysis2, method = "D3", use = "LR")
#> test statistic df1 df2 dfcom p.value riv
#> 1 ~~ 2 0 5 95.65188 493 1 273.8017
result1 <- pool(analysis1)
results1_sum <- summary(result1)
colnames(results1_sum) <- c("Term", "Estimate", "SE",
"Statistic", "df", "P-val")
ztable(results1_sum)
```

Term | Estimate | SE | Statistic | df | P-val |
---|---|---|---|---|---|

(Intercept) | -7.46 | 5.44 | -1.37 | 52.70 | 0.18 |

Group | 9.68 | 0.12 | 78.33 | 68.20 | 0.00 |

BaselineDBP | 1.05 | 0.05 | 22.46 | 50.57 | 0.00 |

Age | 0.35 | 0.01 | 39.44 | 67.78 | 0.00 |

Z | -1.94 | 1.72 | -1.13 | 35.92 | 0.27 |

R1 | -0.17 | 0.07 | -2.58 | 50.12 | 0.01 |

R2 | -0.16 | 0.15 | -1.09 | 38.13 | 0.28 |

While we have our primary analysis estimates, we continue to examine our imputations.

```
ps <- rep(rowMeans(sapply(analysis1$analyses, fitted.values)),
imp1$m + 1)
xyplot(imp1, PostDBP ~ ps | as.factor(.imp),
xlab = "Probability that record is incomplete",
ylab = "PostDBP", pch = c(1, 19), col = col)
```

We’ve done our primary analysis, in which we imputed the missing data
under the assumption of **MAR**. Unfortunately, we cannot verify whether
or not this assumption is true, but we can vary this missing data
assumption and assume that the data are missing not at random (**MNAR**)
\(\operatorname{Pr}\left(R=0 \mid Y_{\mathrm{obs}}, Y_{\mathrm{mis}}, \psi\right)\),
and that in the individuals with missing data, we can assume different
numbers than those with complete data. Our likelihood-ratio test also
suggests that there’s very little difference between the full model and
the reduced model, so we generally will go with the full model.

# SA I: Controlled MI

Now suppose we wish to handle these missing under the assumption of
**MNAR**, we would now be conducting a sensitivity analysis because we
are still targeting the same estimand, but only varying the assumptions.
The question / target remain the same. To handle missing data that we
assume are **MNAR**, there are a number of different and complex
approaches.

We start off with the \(\delta\)-adjustment, controlled multiple
imputation method,^{19} in which we
assume that the group with missing observations differs systematically
from the group with complete observations by a certain quantity. We
produce a range of these quantities and add or subtract them from the
imputed values and then conduct our analysis on this adjusted, imputed
dataset. Here I will make the assumption that the average `PostDBP`

estimate from the missing value group differs from at least 0 mmHg to at
most 30 mmHg from the group with no missing observations.

```
# Controlled Multiple Imputation -------------------------------------
ini <- mice(BP_miss, method = "pmm", maxit = 0,
predictorMatrix = pred)
ini$method["BaselineDBP"] <- ""
ini$method["PostDBP"] <- "pmm"
ini$method["Age"] <- ""
ini$method["R1"] <- ""
ini$method["Group"] <- ""
ini$method["R2"] <- ""
ini$method["Z"] <- ""
meth <- ini$method
delta <- c(-5:5)
imp.all <- vector("list", length(delta))
post <- ini$post
for (i in 1:length(delta)) {
d <- delta[i]
cmd <- paste("imp[[j]][, i] <- imp[[j]][, i] +", d)
post["PostDBP"] <- cmd
imp <- mice(BP_miss, method = meth,
predictorMatrix = pred,
m = 5, maxit = 5, donors = 5,
cores = 4, ridge = 0, formulas = form,
remove.collinear = FALSE,
remove.constant = FALSE,
allow.na = FALSE, post = post,
seed = i * 1031, printFlag = FALSE)
imp.all[[i]] <- imp
}
```

Now that we have our imputed datasets that are modified by \(\delta\)-adjustments, we can fit our models to each imputed dataset and combine the estimates using Rubin’s rules.

```
# Sensitivity Analysis 1 -------------------------------------
output <- sapply(imp.all,
function(x) pool(with(x, rlm(PostDBP ~ Group +
BaselineDBP + Age + Z + R1 + R2))),
simplify = "array", USE.NAMES = TRUE)
a <- (as.data.frame(t(output)))$pooled
r <- array(rep(NA, length(delta) * 7),
dim = c(length(delta), 7), dimnames = list(c(delta),
c("Intercept", "Group",
"BaselineDBP", "Age",
"Z", "R1", "R2")))
for (i in 1:length(delta)) {
r[i, ] <- cbind(a[[i]][["estimate"]])
}
r <- as.data.frame(r)
r <- data.frame(Delta = row.names(r), r, row.names = NULL)
ztable(r)
```

Delta | Intercept | Group | BaselineDBP | Age | Z | R1 | R2 |
---|---|---|---|---|---|---|---|

-5 | -0.43 | 9.68 | 0.99 | 0.32 | -0.89 | -0.12 | -0.11 |

-4 | -3.38 | 9.63 | 1.01 | 0.32 | -1.69 | -0.14 | -0.10 |

-3 | -0.84 | 9.71 | 0.99 | 0.33 | -1.14 | -0.15 | -0.21 |

-2 | -3.53 | 9.67 | 1.02 | 0.34 | -1.08 | -0.15 | -0.17 |

-1 | -4.46 | 9.64 | 1.03 | 0.35 | -1.40 | -0.20 | -0.22 |

0 | -11.04 | 9.68 | 1.08 | 0.35 | -3.46 | -0.17 | -0.13 |

1 | -6.74 | 9.68 | 1.05 | 0.36 | -1.69 | -0.20 | -0.17 |

2 | -11.15 | 9.64 | 1.09 | 0.36 | -2.61 | -0.16 | -0.18 |

3 | -13.48 | 9.59 | 1.11 | 0.37 | -3.04 | -0.18 | -0.19 |

4 | -14.59 | 9.74 | 1.12 | 0.37 | -3.63 | -0.16 | -0.17 |

5 | -14.85 | 9.64 | 1.13 | 0.38 | -3.63 | -0.23 | -0.19 |

The table above gives the various \(\delta\)’s (the first column with
values ranging from -30 - 30 in increments of 5) that were added to the
multiply imputed datasets, and the corresponding estimates from the
models fit to those datasets. So now we’ve conducted our first
sensitivity analysis using the \(\delta\)-adjustment, controlled multiple
imputation method. This form of sensitivity analysis is also typically
what is used and preferred by
regulators.^{19}.

Regulators prefer simple methods that impute the missing outcomes under MAR, and then add an adjustment δ to the imputes, while varying δ over a plausible range and independently for each treatment group (Permutt 2016). The most interesting scenarios will be those where the difference between the δ’s correspond to the size of the treatment effect in the completers. Contours of the p-values may be plotted on a graph as a function of the δ’s to assist in a tipping-point analysis (Liublinska and Rubin 2014).

And as expected, lower \(\delta\) values gave us results that were consistent with the primary analysis, while the upper end of the extreme \(\delta\) values gave us estimates that were substantially off the mark.

Next, we quickly examine the state of a random sample of these imputations to make sure they’re reliable.

```
# Examine Imputations -------------------------------------
l <- sample(1:length(delta))[1]
densityplot(imp.all[[l]], plot.points = TRUE,
theme = aesthetics, col = col)
```

`bwplot(imp.all[[l]], PostDBP, col = col)`

Now we move onto our next and final sensitivity analysis.

# SA II: Selection Models

To further explore the **MNAR** assumption, we could also use a Heckman
selection model \(P(Y,R)=P(Y)P(R|Y)\), as described by van Buuren
here:^{16}

The selection model multiplies the marginal distribution \(P(Y)\) in the population with the response weights \(P(R|Y)\). Both \(P(Y)\) and \(P(R|Y)\) are unknown, and must be specified by the user. The model where \(P(Y)\) is normal and where \(P(R|Y)\) is a probit model is known as the Heckman model. This model is widely used in economics to correct for selection bias.

We can implement this in `R`

using the
`miceMNAR`

package, which
is an extension to the `mice`

`R`

package.

```
# Selection Models -------------------------------------
JointModelEq <- generate_JointModelEq(data = BP_miss,
varMNAR = "PostDBP")
JointModelEq[,"PostDBP_var_sel"] <- c(0, 1, 1, 1, 1, 1, 1)
JointModelEq[,"PostDBP_var_out"] <- c(0, 1, 1, 1, 0, 1, 1)
arg <- MNARargument(data = BP_miss, varMNAR = "PostDBP",
JointModelEq = JointModelEq)
imp2 <- mice(data = arg$data_mod,
method = arg$method, seed = 1031,
predictorMatrix = arg$predictorMatrix,
JointModelEq = arg$JointModelEq,
control = arg$control, formulas = form,
maxit = 5, m = 5, cores = 4,
ridge = 0, remove.collinear = FALSE,
remove.constant = FALSE, allow.na = FALSE,
printFlag = FALSE, visitSequence = "monotone")
head(imp2$loggedEvents, 10)
#> NULL
imp2$data$Group <- as.factor(imp2$data$Group)
densityplot(imp2, col = col,
plot.points = TRUE,
theme = aesthetics)
```

`plot(imp2, layout = c(1, 2), col = col, theme = aesthetics)`

`bwplot(imp2, PostDBP, col = col)`

Now we fit our models to these new datasets and pool them using Rubin’s rules.

```
# Sensitivity Analysis 2 -------------------------------------
analysis3 <- with(imp2, rlm(PostDBP ~ Group +
BaselineDBP + Age + Z + R1 + R2))
result3 <- pool(analysis3)
results3_sum <- summary(result3)
colnames(results3_sum) <- c("Term", "Estimate", "SE",
"Statistic", "df", "P-val")
ztable(results3_sum)
```

Term | Estimate | SE | Statistic | df | P-val |
---|---|---|---|---|---|

(Intercept) | -2.24 | 5.68 | -0.39 | 24.46 | 0.70 |

Group1 | 9.74 | 0.19 | 51.73 | 8.50 | 0.00 |

BaselineDBP | 1.01 | 0.05 | 20.62 | 22.76 | 0.00 |

Age | 0.34 | 0.01 | 42.60 | 383.20 | 0.00 |

Z | -0.68 | 1.38 | -0.50 | 65.96 | 0.62 |

R1 | -0.14 | 0.08 | -1.79 | 12.54 | 0.10 |

R2 | -0.21 | 0.14 | -1.53 | 25.59 | 0.14 |

Now we examine our imputed values.

```
ps2 <- rep(rowMeans(sapply(analysis3$analyses, fitted.values)),
imp2$m + 1)
xyplot(imp2, PostDBP ~ ps2 | as.factor(.imp),
xlab = "Probability that record is incomplete",
ylab = "PostDBP", pch = c(1, 19), col = col)
```

so now we have conducted our second sensitivity analysis.

The results from both the primary analysis, the first sensitivity analysis, and the second seem somewhat consistent. However, with the first sensitivity analysis where we applied \(\delta\)-adjustments, as the adjustments became larger, the coefficients changed drastically. However, such large \(\delta\)-adjustments are not realistic and those that were smaller led to coefficients that were closer to the other two analyses.

# Supplementary Analyses

The following analyses are what I mostly consider to be supplementary analyses that may further investigate some violations of assumptions but often go far beyond what would be necessary for a principled sensitivity analysis that would typically accompany a primary analysis.

First, I like to check that the results from my analyses are consistent
across different statistical packages. So I may check to see that the
results from my primary analysis from `R`

are also similar to the
results from another statistical package like `Stata`

.

So in `Stata`

, I would likely run something similar to the following in
order to mimic the primary analysis in `R`

.

```
import delimited "https://lesslikely.com/datasets/temp.csv", numericcols(2 3 5 6 7 8)
#> . import delimited "https://lesslikely.com/datasets/temp.csv", numericcols(2> 5 6 7 8)
#> (encoding automatically selected: ISO-8859-2)
#> (8 vars, 500 obs)
#>
#> .
#> . mi set mlong
#>
#> .
#> . mi register imputed baselinedbp postdbp age z group r1 r2
#> (291 m=0 obs now marked as incomplete)
#>
#> .
#> . mi impute chained (pmm, knn(5)) postdbp = baselinedbp age z r1 r2 group, burn
#> > in(10) add(40) rseed (1031) savetrace(trace1, replace) bootstrap
#> note: missing-value pattern is monotone; no iteration performed.
#>
#> Conditional models (monotone):
#> postdbp: pmm postdbp baselinedbp age z r1 r2 group , bootstrap
#> knn(5)
#>
#> Performing chained iterations ...
#>
#> Multivariate imputation Imputations = 40
#> Chained equations added = 40
#> Imputed: m=1 through m=40 updated = 0
#>
#> Initialization: monotone Iterations = 0
#> burn-in = 0
#>
#> postdbp: predictive mean matching
#>
#> ------------------------------------------------------------------
#> | Observations per m
#> |----------------------------------------------
#> Variable | Complete Incomplete Imputed | Total
#> -------------------+-----------------------------------+----------
#> postdbp | 209 291 291 | 500
#> ------------------------------------------------------------------
#> (Complete + Incomplete = Total; Imputed is the minimum across m
#> of the number of filled-in observations.)
#>
#> Note: Posterior parameters obtained using sampling with replacement.
#>
#> .
#> . mi estimate, mcerror: rreg postdbp i.group baselinedbp age z r1 r2
#>
#> Multiple-imputation estimates Imputations = 40
#> Robust regression Number of obs = 500
#> Average RVI = 1.4641
#> Largest FMI = 0.7196
#> Complete DF = 493
#> DF adjustment: Small sample DF: min = 50.16
#> avg = 93.58
#> max = 126.75
#> Model F test: Equal FMI F( 6, 285.9) = 1235.31
#> Prob > F = 0.0000
#>
#> ------------------------------------------------------------------------------
#> postdbp | Coefficient Std. err. t P>|t| [95% conf. interval]
#> -------------+----------------------------------------------------------------
#> 1.group | 10.41553 .1343827 77.51 0.000 10.14839 10.68266
#> | .0153068 .0088366 5.25 0.000 .0251455 .0224951
#> |
#> baselinedbp | .9541257 .0431295 22.12 0.000 .8687595 1.039492
#> | .0043726 .0017532 0.89 0.000 .005078 .0062013
#> |
#> age | .3238384 .0099478 32.55 0.000 .3040228 .343654
#> | .0011781 .0008449 2.90 0.000 .0021126 .0021126
#> |
#> z | -2.393966 1.341859 -1.78 0.077 -5.049314 .2613814
#> | .1349552 .0550873 0.12 0.019 .1878212 .16354
#> |
#> r1 | -.3490421 .0852886 -4.09 0.000 -.5203355 -.1777487
#> | .0112217 .0081904 0.49 0.000 .0159233 .0242459
#> |
#> r2 | -.1611967 .1469517 -1.10 0.276 -.4541912 .1317979
#> | .0176771 .008492 0.14 0.060 .0242385 .0257259
#> |
#> _cons | 6.453088 5.10019 1.27 0.208 -3.643662 16.54984
#> | .5205047 .2107395 0.12 0.043 .7220854 .6295723
#> ------------------------------------------------------------------------------
#> Note: Values displayed beneath estimates are Monte Carlo error estimates.
#>
#> .
#> . quietly mi passive: generate byte imputed = _mi_miss
#>
#> .
#> . replace imputed = 1 if imputed==.
#> (0 real changes made)
#>
#> .
#> . generate float postdbp_Delta_A1=postdbp
#> (291 missing values generated)
#>
#> .
#> . replace postdbp_Delta_A1=postdbp + 5 if imputed==1
#> (11,640 real changes made)
#>
#> .
#> . mi estimate, mcerror: rreg postdbp_Delta_A1 i.group baselinedbp age z r1 r2
#>
#> Multiple-imputation estimates Imputations = 40
#> Robust regression Number of obs = 500
#> Average RVI = 0.4338
#> Largest FMI = 0.4403
#> Complete DF = 493
#> DF adjustment: Small sample DF: min = 118.94
#> avg = 235.19
#> max = 315.98
#> Model F test: Equal FMI F( 6, 418.9) = 918.73
#> Prob > F = 0.0000
#>
#> ------------------------------------------------------------------------------
#> postdbp_De~1 | Coefficient Std. err. t P>|t| [95% conf. interval]
#> -------------+----------------------------------------------------------------
#> 1.group | 13.62051 .1967079 69.24 0.000 13.23315 14.00788
#> | .0145068 .0055858 2.00 0.000 .0195646 .017133
#> |
#> baselinedbp | 1.001326 .0687089 14.57 0.000 .8660981 1.136554
#> | .004666 .0012464 0.24 0.000 .0041898 .006217
#> |
#> age | .3137277 .0148054 21.19 0.000 .2845127 .3429427
#> | .0013031 .0006316 0.91 0.000 .001624 .0020197
#> |
#> z | -2.339549 2.121875 -1.10 0.271 -6.514338 1.83524
#> | .1355884 .0353007 0.06 0.026 .1713074 .1324027
#> |
#> r1 | -1.441007 .1151125 -12.52 0.000 -1.668943 -1.213072
#> | .0118336 .0065291 0.77 0.000 .0145274 .0207451
#> |
#> r2 | -.6562613 .2091866 -3.14 0.002 -1.068731 -.2437918
#> | .0174729 .0054034 0.12 0.001 .0202383 .021055
#> |
#> _cons | 3.054198 8.13298 0.38 0.708 -12.95496 19.06336
#> | .56593 .1605893 0.07 0.054 .7663276 .5137369
#> ------------------------------------------------------------------------------
#> Note: Values displayed beneath estimates are Monte Carlo error estimates.
```

As you can see from above, I imported the missing data I generated in
`R`

into `Stata`

, and then used it to multiply impute the datasets using
the predictive mean matching method with 5 donors. I chose the exact
same number of variables to include in the imputation model as I did
with the primary and sensitivity analyses. Just like the primary
analysis, I also used robust regression,
`rreg`

in `Stata`

, to analyze
the datasets. The numbers seem to be fairly consistent with those from
the primary analysis and the some of the results from the sensitivity
analyses.

I will now conduct another supplementary analysis, in which I fit a
Bayesian regression model using a *t*-distribution so that it is
analogous to the other robust regression models I have utilized so far,
and I will specify a weakly informative prior.

```
# Bayesian Regression -------------------------------------
brm_form <- bf(PostDBP ~ Group + BaselineDBP +
Age + Z + R1 + R2)
preds <- c("Intercept", "Group", "BaselineDBP",
"Age", "Z", "R1", "R2")
prior1 <- c(prior(normal(0, 100), class = Intercept),
prior(normal(0, 100), class = b),
prior(normal(0, 100), class = sigma))
brm_multiple(brm_form, data = imp1, prior = prior1,
silent = 2, family = student(), iter = 40, refresh = 0,
warmup = 10, chains = 1, inits = "0",
thin = 2, open_progress = FALSE,
combine = TRUE, sample_prior = "no",
seed = 1031, algorithm = "sampling", backend = "rstan",
control = list(max_treedepth = 10, adapt_delta = 0.8),
file = NULL, file_refit = "never") -> b_1
b_1[["fit"]]@sim[["fnames_oi"]] <- preds
b_1
#> Family: student
#> Links: mu = identity; sigma = identity; nu = identity
#> Formula: PostDBP ~ Group + BaselineDBP + Age + Z + R1 + R2
#> Data: imp1 (Number of observations: 500)
#> Samples: 20 chains, each with iter = 40; warmup = 10; thin = 2;
#> total post-warmup samples = 300
#>
#> Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
ztable(posterior_summary(b_1))
```

Estimate | Est.Error | Q2.5 | Q97.5 | |
---|---|---|---|---|

Intercept | -100.40 | 172.83 | -500.80 | 196.50 |

Group | 3.28 | 3.34 | -4.45 | 8.19 |

BaselineDBP | 1.31 | 1.19 | -1.23 | 4.23 |

Age | 0.71 | 0.98 | -0.12 | 4.00 |

Z | -5.07 | 3.04 | -12.05 | -0.08 |

R1 | 0.44 | 2.43 | -6.78 | 4.28 |

R2 | 3.95 | 3.46 | -3.38 | 8.68 |

We examine our model diagnostics to look for any anomalies.

```
#> Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not found in Windows font database
#> Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not found in Windows font database
#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font family not found in Windows font database
#> Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not found in Windows font database
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : font family not found in Windows font database
#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font family not found in Windows font database
#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font family not found in Windows font database
#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font family not found in Windows font database
```

Nothing seems out of the ordinary.

Indeed, it is no surprise that the Bayesian model with a weakly
dispersed prior would give similar results to the base
`rlm()`

function, while
specifying a regularizing prior like the horseshoe prior with three
degrees of freedom, would practically shrink all the coefficients
towards zero, including our covariate `Z`

. We now look at robust methods
and resampling and the results they give.

Although this is currently under the supplementary analysis section, I
would very much consider these next two analyses to be sensitivity analyses. The first method
utilized here is known as the *random indicator method*, and it is an
experimental method that essentially combines a selection model and a
pattern-mixture model and is designed to handle data that are missing
not at random. As van Buuren
writes^{16}

The random indicator method (Jolani 2012)

^{20}is an experimental iterative method that redraws the missing data indicator under a selection model, and imputes the missing data under a pattern-mixture model, with the objective of estimating δ from the data under relaxed assumptions. Initial simulation results look promising. The algorithm is available as the`ri`

method in`mice`

.

The script below utilizes this method (random indicator), but bootstraps
the missing dataset before the imputation procedure, and then after
multiply imputing the datasets, obtains estimates using maximum
likelihood multiple (**MLMI**) imputation, an alternative and more
efficient method to the more popular posterior draw multiple imputation
(**PDMI**). More about **MLMI** can be found
here.^{21}

```
# Bootstrap Inference -------------------------------------
bootMice(BP_miss, method = "ri", maxit = 5,
predictorMatrix = pred, nBoot = 10, nImp = 2,
seed = 1031, formulas = form,
ridge = 0, remove.collinear = FALSE,
remove.constant = FALSE, allow.na = TRUE,
printFlag = FALSE, visitSequence = "monotone") -> imps
analyseImp <- function(inputData) {
return(coef(rlm(PostDBP ~ Group + BaselineDBP + Age + Z + R1 + R2,
data = inputData)))
}
ests <- bootImputeAnalyse(imps, analyseImp, quiet = TRUE)
ests <- as.data.frame(ests)
colnames(ests) <- c("Estimate", "Var",
"CI.ll", "CI.ul", "df")
rownames(ests) <- c("Intercept", "Group", "BaselineDBP",
"Age", "Z", "R1", "R2")
ztable(ests)
```

Estimate | Var | CI.ll | CI.ul | df | |
---|---|---|---|---|---|

Intercept | -6.97 | 18.40 | -18.24 | 4.31 | 4.66 |

Group | 9.71 | 0.01 | 9.45 | 9.97 | 3.00 |

BaselineDBP | 1.05 | 0.00 | 0.95 | 1.15 | 4.55 |

Age | 0.35 | 0.00 | 0.32 | 0.39 | 6.31 |

Z | -1.88 | 3.07 | -6.12 | 2.35 | 6.33 |

R1 | -0.15 | 0.01 | -0.35 | 0.06 | 7.90 |

R2 | -0.08 | 0.02 | -0.45 | 0.29 | 5.79 |

For our second supplementary approach, We will utilize the GAMLSS method, which is an extension of the generalized linear model and the generalized additive model. For the vast majority of the imputations we have conducted, we have drawn values from an assumed normal distribution. Thus, GAMLSS allows us to be far more flexible with our assumptions. However, it is also worth noting that general normal imputations are robust against violations of the assumption of normality

In general, normal imputations appear to be robust against violations of normality. Demirtas, Freels, and Yucel (2008) found that flatness of the density, heavy tails, non-zero peakedness, skewness and multimodality do not appear to hamper the good performance of multiple imputation for the mean structure in samples n>400, even for high percentages (75%) of missing data in one variable. The variance parameter is more critical though, and could be off-target in smaller samples.

Regardless, we will be using the GAMLSS method and also imputing from a t-distribution, which is much more robust to outliers.

```
data(db)
data <- subset(db, age > 1 & age < 2, c("age", "head"))
names(data) <- c("age", "hc")
synthetic <- rep(c(FALSE, TRUE), each = nrow(data))
data2 <- rbind(data, data)
row.names(data2) <- 1:nrow(data2)
data2[synthetic, "hc"] <- NA
imp <- mice(data2, m = 10, meth = "gamlssTF", seed = 88009,
print = FALSE)
syn <- subset(mice::complete(imp), synthetic)
```

Indeed, I shall try one more supplementary analysis which piqued my curiosity. I recently came across a few papers that concluded that random forests outperformed other methods in imputing real world data. They primarily based some of these conclusions on the root mean squared error. However, this is problematic for a number of reasons; imputation is not the same as prediction, and so the same loss functions are not as equally useful; most of the settings used in the program were simple defaults.

Indeed, the following simulation shows why relying so heavily on RMSE is misleading.

```
create.data <- function(beta = 1, sigma2 = 1, n = 50,
run = 1) {
set.seed(seed = run)
x <- rnorm(n)
y <- beta * x + rnorm(n, sd = sqrt(sigma2))
cbind(x = x, y = y)
}
make.missing <- function(data, p = 0.5){
rx <- rbinom(nrow(data), 1, p)
data[rx == 0, "x"] <- NA
data
}
test.impute <- function(data, m = 5, method = method, ...) {
imp <- mice(data, method = method, m = m, print = FALSE)
fit <- with(imp, lm(y ~ x))
tab <- summary(pool(fit), "all", conf.int = TRUE)
tab[2, c(3, 8, 9)]
}
simulate <- function(runs = 10) {
res <- array(NA, dim = c(2, runs, 3))
dimnames(res) <- list(c("norm.predict", "norm.nob"),
as.character(1:runs),
c("x","estimate", "2.5 %","97.5 %"))
for(run in 1:runs) {
data <- create.data(run = run)
data <- make.missing(data)
res[1, run, ] <- test.impute(data, method = "norm.predict",
m = 2)
res[2, run, ] <- test.impute(data, method = "norm.nob")
}
res
}
rmse <- function(truedata, imp, v = "x") {
mx <- is.na(mice::complete(imp, 0))[, v]
mse <- rep(NA, imp$m)
for (k in seq_len(imp$m)) {
filled <- mice::complete(imp, k)[mx, v]
true <- truedata[mx, v]
mse[k] <- mean((filled - true)^2)
}
sqrt(mean(mse))
}
simulate2 <- function(runs = 10) {
res <- array(NA, dim = c(2, runs, 4))
dimnames(res) <- list(c("rf", "midastouch"),
as.character(1:runs),
c("RMSE", "estimate", "2.5 %","97.5 %"))
for(run in 1:runs) {
truedata <- create.data(run = run)
data <- make.missing(truedata)
imp <- mice(data, method = "rf", m = 2,
print = FALSE)
res[1, run, 1] <- rmse(truedata, imp)
imp <- mice(data, method = "midastouch", print = FALSE)
res[2, run, 1] <- rmse(truedata, imp)
res[1, run, 2:4] <- as.numeric(test.impute(data, method = "rf", m = 5))
res[2, run, 2:4] <- as.numeric(test.impute(data, method = "midastouch", m = 5))
}
res
}
```

Indeed, both the random forest method and predictive mean matching method yield similar RMSEs. A more informative approach to evaluating imputation methods is to look at other measures including

- Raw Bias: where \(RB\) is bias and \(Q\) is the difference between the estimate and the truth \(RB=E(Q)-Q\), which may also be expressed as a percent \(100*\frac{RB=E(Q)-Q}{Q}\)

- Coverage Rate: which is simply the proportion of times the intervals actually contain the true parameter value

- Average Width: the width of the interval estimates, with longer widths indicating less information

```
res1 <- simulate(100)
#> Error in dimnames(res) <- list(c("norm.predict", "norm.nob"), as.character(1:runs), : length of 'dimnames' [3] not equal to array extent
ztable(apply(res1, c(1, 3), mean, na.rm = TRUE))
#> Error in apply(res1, c(1, 3), mean, na.rm = TRUE): object 'res1' not found
res2 <- simulate2(100)
ztable(apply(res2, c(1, 3), FUN = mean, na.rm = TRUE))
```

RMSE | estimate | 2.5 % | 97.5 % | |
---|---|---|---|---|

rf | 1.00 | 1.01 | 0.63 | 1.38 |

midastouch | 1.01 | 1.03 | 0.62 | 1.44 |

The authors specifically used the `missForest`

package. I tried using the package, but encountered too many difficulties so I simply specified the `rf`

method in `mice`

, and worked from there.

```
form <- list(PostDBP ~ Group + BaselineDBP + Age + Z + R1 + R2)
rf <- mice(BP_miss, method = "rf", ntree = 20,
predictorMatrix = pred,
m = 20, maxit = 5, seed = 1031,
cores = 4, formulas = form, ridge = 0,
remove.collinear = FALSE, remove.constant = FALSE,
allow.na = FALSE, printFlag = FALSE,
visitSequence = "monotone")
rf1 <- with(rf, rlm(PostDBP ~ Group +
BaselineDBP + Age +
Z + R1 + R2))
ztable(summary(pool(rf1)))
```

term | estimate | std.error | statistic | df | p.value |
---|---|---|---|---|---|

(Intercept) | 14.54 | 8.80 | 1.65 | 55.42 | 0.10 |

Group | 9.60 | 0.19 | 50.19 | 89.28 | 0.00 |

BaselineDBP | 0.88 | 0.08 | 11.60 | 53.46 | 0.00 |

Age | 0.32 | 0.01 | 21.77 | 67.46 | 0.00 |

Z | 0.14 | 2.06 | 0.07 | 109.41 | 0.95 |

R1 | -0.08 | 0.10 | -0.80 | 78.37 | 0.43 |

R2 | -0.09 | 0.18 | -0.53 | 135.18 | 0.60 |

# Full Analysis Set

After we run all those analyses, we can finally check our numbers from
the full dataset with no missing observations to see how our approaches
to handle the missing data have performed. We first look at the fully
adjusted model fit with
`rlm()`

.

```
# Full Analysis Set -------------------------------------
final_full_mod <- rlm(PostDBP ~ Group + BaselineDBP +
Age + Z + R1 + R2, data = BP_full)
ztable(summary(final_full_mod)[["coefficients"]])
```

Value | Std. Error | t value | |
---|---|---|---|

(Intercept) | -1.96 | 3.86 | -0.51 |

Group | 9.73 | 0.09 | 103.37 |

BaselineDBP | 1.01 | 0.03 | 30.93 |

Age | 0.34 | 0.01 | 49.15 |

Z | -1.85 | 1.05 | -1.76 |

R1 | -0.10 | 0.05 | -2.16 |

R2 | -0.09 | 0.09 | -1.02 |

`ztable(check_collinearity(final_full_mod))`

Term | VIF | SE_factor |
---|---|---|

Group | 1.01 | 1.01 |

BaselineDBP | 1.97 | 1.40 |

Age | 1.00 | 1.00 |

Z | 1.98 | 1.41 |

R1 | 1.00 | 1.00 |

R2 | 1.01 | 1.00 |

We can also graphically inspect our assumptions.

Then we look at the unadjusted model, where the difference is the group that the participants were randomized to.

```
# Unadjusted Model -------------------------------------
final_reduced_mod <- rlm(PostDBP ~ Group, data = BP_full)
ztable(summary(final_reduced_mod)[["coefficients"]])
```

Value | Std. Error | t value | |
---|---|---|---|

(Intercept) | 133.56 | 0.20 | 653.25 |

Group | 9.76 | 0.29 | 33.95 |

We can also compare the two to see if there are any substantial differences.

```
# Compare Models -------------------------------------
ztable(compare_performance(final_full_mod, final_reduced_mod))
```

Name | Model | AIC | BIC | RMSE | Sigma |
---|---|---|---|---|---|

final_full_mod | rlm | 1424.59 | 1458.31 | 0.99 | 1.00 |

final_reduced_mod | rlm | 2575.88 | 2588.53 | 3.16 | 3.17 |

While the results have mostly been consistent, I also wanted to run a
few additional analyses of my own to see whether these effect estimates
were consistent I first started by constructing the confidence
distribution using the `concurve`

package to see the range of parameter values consistent with the
data given the background model
assumptions.^{22, 23}

```
# Compatibility of Parameter Values -------------------------------------
fit1 <- lm(PostDBP ~ Group + BaselineDBP +
Age + Z + R1 + R2, data = BP_full)
curve1 <- curve_gen(fit1, var = "Group", method = "rlm",
log = FALSE, penalty = NULL,
m = NULL, steps = 1000,
table = TRUE)
#> Error in mclapply(X, FUN, ..., mc.cores = mc.cores, mc.preschedule = mc.preschedule, : 'mc.cores' > 1 is not supported on Windows
ggcurve(curve1[[1]], type = "c", levels = c(0.5, 0.75, 0.95, 0.99)) +
labs(title = "Confidence Curve for 'Group' Parameter",
subtitle = "Displays interval estimates at every level",
x = "Parameter Value (Treatment Effect)") +
theme_less()
#> Error in ncol(data): object 'curve1' not found
```

`#> Error in mclapply(X, FUN, ..., mc.cores = mc.cores, mc.preschedule = mc.preschedule, : 'mc.cores' > 1 is not supported on Windows`

We can see that that the overall treatment effect is approximately 9.7-9.9, and for most of the principled methods we used to handle the missing data, we did not obtain similar estimates. For example, both the primary analysis and the controlled multiple imputation methods did not yield the correct treatment effect estimate in the adjusted model.

In fact, nearly all of the methods used except for the Heckman selection
model, yielded overestimates of the treatment effect. Even though such
analyses can be substantially improved with more specification and more
information, the reality is that data that are assumed to be **MNAR** are
difficult to handle and that no method can guarantee the correct answer. Indeed,
we knew the missing data model / mechanism, via our custom function

```
# Missing Data Mechanism -------------------------------------
lesslikely <- function() {
set.seed(1031)
logistic <- function(x) exp(x) / (1 + exp(x))
1 - rbinom(n, 1, logistic(errors[, 2]))
}
set.seed(1031)
Ry <- ifelse(lesslikely() > 0, 1, 0)
PostDBP[Ry == 0] <- NA
```

\[\begin{equation} \Pr(R_1=0)=\psi_0+\frac{e^{Y_1}}{1+e^{Y_1}}\psi_1 \end{equation}\]

for **MNAR** we set \(\psi_\mathrm{MNAR}=(0,1)\)

Thus, we obtain the following model:

\[\begin{align} \mathrm{MNAR}&:\mathrm{logit}(\Pr(R_1=0)) = Y_1 \\ \end{align}\]

Even the random indicator method, which has shown much promise in
simulation studies^{20} and offers
the advantage of not having to specify a
pattern-mixture or selection model could not recover the correct
treatment effect estimate either, showing the difficulty of handling
missing data and the limitations of automated approaches.

I hope it is clear from the examples above that it is quite easy to do any analyses with missing data and use a variety of ad-hoc approaches or even principled approaches to explore whether or not the results you’ve obtained are trustworthy or at least even give that impression, but doing a careful and principled one is quite difficult and requires much thought that will take into account real departures from assumptions and whether or not those will lead to substantial changes in results and conclusions/decisions.

# Statistical Environment

```
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19043)
#>
#> Locale:
#> LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
#> LC_NUMERIC=C LC_TIME=English_United States.1252
#>
#> Package version:
#> bayesplot_1.8.1 boot_1.3-28 bootImpute_1.2.0 brms_2.15.0 car_3.0-11 concurve_2.7.7
#> cowplot_1.1.1 doParallel_1.0.16 foreach_1.5.1 future_1.21.0 gamlss_5.3-4 ggcorrplot_0.1.3
#> ggplot2_3.3.5 here_1.0.1 Hmisc_4.5-0 ImputeRobust_1.3-1 kableExtra_1.3.4 knitr_1.33
#> lattice_0.20-44 magrittr_2.0.1 MASS_7.3-54 mice_3.13.0 miceMNAR_1.0.2 missForest_1.4
#> mvtnorm_1.1-2 parallel_4.1.0 parallelly_1.26.1 performance_0.7.2 qqplotr_0.0.5 rms_6.2-0
#> rstan_2.21.2 showtext_0.9-2 Statamarkdown_0.6.1 summarytools_0.9.9 tidyr_1.1.3 VIM_6.1.0
#> wesanderson_0.3.6
```

# References

*Journal of the Royal Statistical Society Series A (Statistics in Society)*.

**168**:267–306. doi: 10.1111/j.1467-985X.2004.00349.x.

*International Journal of Epidemiology*.

**43**:1969–1985. doi: 10.1093/ije/dyu149.

*American Journal of Epidemiology*. doi: 10.1093/aje/kwab072.

*American Journal of Epidemiology*. doi: 10.1093/aje/kwab068.

*American Journal of Epidemiology*. doi: 10.1093/aje/kwab069.

*The Lancet*.

**360**:1531–1539. doi: 10.1016/S0140-6736(02)11522-4.

*BMC Medical Research Methodology*.

**14**:11. doi: 10.1186/1471-2288-14-11.

*Statistics in Biopharmaceutical Research*.

**9**:268–271. doi: 10.1080/19466315.2017.1302358.

*Statistics in Biopharmaceutical Research*.

**12**:45–53. doi: 10.1080/19466315.2019.1647874.

*Trials*.

**21**:671. doi: 10.1186/s13063-020-04546-1.

*New England Journal of Medicine*.

**367**:1355–1360. doi: 10.1056/NEJMsr1203730.

*Biometrika*.

**79**:103–111. doi: 10.2307/2337151.

*Computational Statistics & Data Analysis*.

**22**:425–446. doi: 10.1016/0167-9473(95)00057-7.

*Statistics in Medicine*.

**35**:2876–2879. doi: 10.1002/sim.6753.

*arXiv:12100870 [stat]*. http://arxiv.org/abs/1210.0870.

*BMC Medical Research Methodology*.

**20**:244. doi: 10.1186/s12874-020-01105-9.

*arXiv:190908583 [statME]*. http://arxiv.org/abs/1909.08583.