- Valid \(P\)-values Are Uniform Under the Null Model
- Posterior Predictive \(P\)-values
- Table 1: Some \(P\)-values, \(S\)-values, Maximum Likelihood Ratios, and Likelihood Ratio Statistics
- Frequentist Interval Estimate Percentiles
- Refer to Long-Run Coverage
- Brown et al. (2017) Reanalysis
- Table 2: \(P\)-values, \(S\)-values, and Likelihoods for Targeted Hypotheses About Hazard Ratios for Brown et al.
- Plot the point estimate and 95% compatibility interval
- \(P\)-value and \(S\)-value Functions
- Figure 3: \(P\)-value (Compatibility) Function
- Cumulative Confidence (Compatibility Distribution)
- Figure 4: \(S\)-value (Suprisal) Function
- Likelihood (Support) Functions
- Figure S1: Relative Likelihood Function
- Figure S2: Deviance Function \(2ln(MLR)\)

- Statistical Package Citations
- Environment
- References

The following post provides *some* of the code that was used to
construct the figures and tables from Rafi & Greenland,
2020.^{1} An enhanced PDF version of the paper
can be found here. For further discussion of
the computations, see the appendix of the main
paper,
along with our technical
supplement.^{2}

Disclaimer:I am responsible for all the code and mistakes below, and none of them can be attributed to my coauthors or my fellow package developers.

In order to recreate the functions, I would recommend installing the
latest version of `concurve`

from CRAN, as it has
patched some issues with graphing when the outcome is binary. Use the
script below to get the latest version and load the `R`

package. A
number of other `R`

packages are also used in this post, which are
listed below.

```
install.packages("concurve")
library("concurve")
```

## Valid \(P\)-values Are Uniform Under the Null Model

Here we show that valid \(P\)-values have specific properties, when the
null model is true. We first generate two variables (\(Y\), \(X\)) that come
from the same normal distribution with a \(\mu\) of 0 and \(\sigma\) of 1,
each with a total of `1000`

observations. We assume that there is no
relationship between these two variables. We regress our outcome
variable \(Y\) on \(X\) and iterate this `100000`

times and compute `100000`

\(P\)-values to see the overall distribution of the \(P\)-values, which we
then plot using a histogram./

```
RNGkind(kind = "L'Ecuyer-CMRG")
set.seed <- 1031
n.sim <- 100000
t.sim <- numeric(n.sim)
n.samp <- 1000
pb <- txtProgressBar(min = 0, max = n.sim,
initial = 0, style = 3)
for (i in 1:n.sim) {
X <- rnorm(n.samp, mean = 0, sd = 1)
Y <- rnorm(n.samp, mean = 0, sd = 1)
df <- data.frame(X, Y)
t <- glm(Y ~ X, data = df)
t.sim[i] <- coef(summary(t))[2, 4]
setTxtProgressBar(pb, i)
}
ggplot(NULL, aes(x = t.sim)) +
geom_histogram(bins = 30, col = "black",
fill = "#99c7c7", alpha = 0.25) +
labs(title = "Distribution of P-values Under the Null",
x = "P-value") +
scale_x_continuous(breaks = seq(0, 1, 0.10)) +
theme_bw()
```

This can also be shown using the
`TeachingDemos`

`R`

package, which has a function dedicated to showing this phenomenon.

```
library("TeachingDemos")
RNGkind(kind = "L'Ecuyer-CMRG")
set.seed <- 1031
obs_p <- Pvalue.norm.sim(n = 1000, mu = 0,
mu0 = 0, sigma = 1,
sigma0 = 1, test= "z",
alternative = "two.sided",
alpha = 0.05, B = 100000)
```

```
ggplot(NULL, aes(x = obs_p)) +
geom_histogram(bins = 30, col = "black",
fill = "#99c7c7", alpha = 0.75) +
labs(title = "Distribution of P-values Under the Null",
x = "P-value") +
scale_x_continuous(breaks = seq(0, 1, 0.10)) +
theme_bw()
```

As you can see, when the null model is true, the distribution of \(P\)-values is uniform. Valid \(P\)-values are uniform under the null hypothesis and their corresponding \(S\)-values are exponentially distributed. We run the same simulation as before, but then convert the obtained \(P\)-values into \(S\)-values, to see how they are distributed.

```
RNGkind(kind = "L'Ecuyer-CMRG")
set.seed <- 1031
n.sim <- 100000
t.sim <- numeric(n.sim)
n.samp <- 1000
pb <- txtProgressBar(min = 0, max = n.sim,
initial = 0, style = 3)
for (i in 1:n.sim) {
X <- rnorm(n.samp, mean = 0, sd = 1)
Y <- rnorm(n.samp, mean = 0, sd = 1)
df <- data.frame(X, Y)
t <- glm(Y ~ X, data = df)
t.sim[i] <- coef(summary(t))[2, 4]
setTxtProgressBar(pb, i)
}
ggplot(NULL, aes(x = -log2(t.sim))) +
geom_histogram(bins = 30, col = "black",
fill = "#d46c5b", alpha = 0.5) +
labs(title = "Distribution of S-values Under the Null",
x = "S-value (Bits of Information)") +
theme_bw()
```

## Posterior Predictive \(P\)-values

Despite the name, posterior predictive \(P\)-values are not considered
valid frequentist \(P\)-values because they do not meet the uniformity
criterion, instead they are pulled towards the parameter value `0.5`

.
For further discussion, see our technical
supplement along with Greenland
(2019) for comprehensive
theoretical discussions.^{2, 3}

A quick excerpt from our main paper and the technical supplement explains why this is not the case,

As discussed in the Supplement, in Bayesian settings one may see certain \(P\)-values that are not valid frequentist \(P\)-values, the primary example being the posterior predictive \(P\)-value;

^{4}^{5}; unfortunately, the negative logs of such invalid \(P\)-values do not measure surprisal at the statistic given the model, and so are not valid \(S\)-values.The decision rule “reject

Hif \(p\leq\) \(\alpha\)” will rejectH100\(\alpha\)% of the time under sampling from a modelMobeyingH(i.e., the Type-1 error rate of the test will be \(\alpha\)) provided the random variable \(P\) corresponding to \(p\) is valid (uniform under the modelMused to compute it), but not necessarily otherwise^{6}. This is one reason why frequentist writers reject invalid \(P\)-values (such as posterior predictive \(P\)-values, which highly concentrate around 0.50) and devote considerable technical coverage to uniform \(P\)-values;^{6};^{5}.^{4}A valid \(P\)-value (“U-value”) translates into an exponentially distributed \(S\)-value with a mean of 1 nat or \(\log_{2}(e)=1.443\) bits where \(e\) is the base of the natural logs.Uniformity is also central to the “refutational information” interpretation of the \(S\)-value used here, for it is necessary to ensure that the \(P\)-value \(p\) from which \(s\) is derived is in fact the percentile of the observed value of the test statistic in the distribution of the statistic under

M, thus making small \(p\) surprising underMand making \(s\) the corresponding degree of surprise. Because posterior predictive \(P\)-values do not translate into sampling percentiles of the statistic under the hypothesis (in fact, they are pulled toward 0.5 from the correct percentiles);^{5},^{4}the resulting negative log does not measure surprisal at the statistic givenM, and so is not a valid \(S\)-value in our terms.

And indeed, we can show this phenomenon below with simulations. Here we
fit a simple Bayesian regression model with a weakly informative prior
`normal(0, 10)`

using `rstanarm`

, where
both the predictor and response variable come from the same distribution
and have the same location and scale parameters (\(\mu = 0\),
\(\sigma = 1\). We then calculate the observed test statistics, along with
their distributions, and convert them into posterior predictive
\(P\)-values and plot them using
`Bayesplot`

functions. Then, we iterate
this `1000`

times to examine the distribution of posterior predictive
\(P\)-values and compare them to standard \(P\)-values that are known to be
uniform.

But first, we’ll generate the distribution of the test statistic from one model.

```
library("bayesplot")
library("rstan")
library("rstanarm")
rstan_options(auto_write = TRUE)
options(mc.cores = 4)
teal <- c("#99c7c7")
color_scheme_set("teal")
RNGkind(kind = "L'Ecuyer-CMRG")
n.samp <- 1000
X <- rnorm(n.samp, mean = 0, sd = 1)
Y <- rnorm(n.samp, mean = 0, sd = 1)
df <- data.frame(X, Y)
mod1 <- stan_glm(Y ~ X, data = df, chains = 1, cores = 4,
refresh = 0, prior = normal(0, 10))
yrep <- posterior_predict(mod1)
h <- ppc_stat(Y, yrep, stat = "median", freq = TRUE, binwidth = 0.01) +
labs(title = "Distribution of Posterior Test Statistic") +
theme_bw() +
yaxis_text(on = TRUE)
values_all <- h[["plot_env"]][["T_yrep"]]
prob_to_find <- h[["plot_env"]][["T_y"]]
quantInv <- function(distr, value) ecdf(distr)(value)
posterior_p_value <- quantInv(values_all, prob_to_find)
l <- ppc_stat_2d(Y, yrep) +
theme_bw() +
labs(title = "Distribution of Posterior Test Statistic")
plot(h)
```

`plot(l)`

```
print(prob_to_find)
[1] -0.03265453
```

The above is the posterior predictive test statistic for one model, along with it’s distribution. Now let’s iterate this process, convert them into posterior predictive \(P\)-values, and generate their distribution.

```
rstan_options(auto_write = TRUE)
options(mc.cores = 4)
RNGkind(kind = "L'Ecuyer-CMRG")
set.seed <- 1031
n.sim <- 1000
ppp <- numeric(n.sim)
n.samp <- 1000
pb <- txtProgressBar(min = 0, max = n.sim,
initial = 0, style = 3)
quantInv <- function(distr, value) ecdf(distr)(value)
for (i in 1:length(ppp) {
X <- rnorm(n.samp, mean = 0, sd = 1)
Y <- rnorm(n.samp, mean = 0, sd = 1)
df <- data.frame(X, Y)
mod1 <- stan_glm(Y ~ X, data = df, chains = 1, cores = 4,
iter = 1000, refresh = 0, prior = normal(0, 10))
yrep <- posterior_predict(mod1)
h <- ppc_stat(Y, yrep, stat = "median")
values_all <- h[["plot_env"]][["T_yrep"]]
prob_to_find <- h[["plot_env"]][["T_y"]]
posterior_p_value <- quantInv(values_all, prob_to_find)
ppp[i] <- posterior_p_value
setTxtProgressBar(pb, i)
}
ggplot(NULL, aes(x = ppp)) +
geom_histogram(bins = 30, col = "black", fill = "#99c7c7", alpha = 0.5) +
labs(title = "Distribution of Posterior Predictive P-values",
subtitle = "From Simulation of 1000 Simple Linear Regressions",
x = "Posterior Predictive P-value") +
scale_x_continuous(breaks = seq(0, 1, 0.10)) +
theme_bw()
```

We can also show this in `Stata 16.1`

, using their new
`bayesstats ppvalues`

command which generates posterior predictive
\(P\)-values.

```
clear
set obs 10000
/**
Generate variables from a normal distribution like before
*/
generate x = rnormal(0, 1)
generate y = rnormal(0, 1)
/**
We set up our model here
*/
bayesmh y x, likelihood(normal({var})) prior({var}, normal(0, 10)) ///
prior({y:}, normal(0, 10)) rseed(1031) saving(coutput_pred, replace) mcmcsize(1000)
/**
We use the bayespredict command to make predictions from the model
*/
bayespredict (mean:@mean({_resid})) (var:@variance({_resid})), ///
rseed(1031) saving(coutput_pred, replace)
/**
Then we calculate the posterior predictive P-values
*/
bayesstats ppvalues {mean} using coutput_pred
```

As we can see from the `R`

plot above, the test statistics are are
generally concentrated around `0.5`

and the distribution (as shown above
via the `R`

code) hardly resembles the uniform distribution of \(P\)-value
under the null model. Further, the base-2 log transformation of
posterior predictive \(P\)-values is not exponentially distributed, making
them invalid \(S\)-values in our terms.

```
ggplot(NULL, aes(x = (-log2(ppp)))) +
geom_histogram(bins = 30, col = "black",
fill = "#d46c5b", alpha = 0.5) +
labs(title = "Distribution of -log2(Posterior Predictive P-values)",
subtitle = "S-values Produced From Posterior Predictive P-values",
x = "-log2(Posterior Predictive P-value)") +
theme_bw()
```

## Table 1: Some \(P\)-values, \(S\)-values, Maximum Likelihood Ratios, and Likelihood Ratio Statistics

Here, we look at some common \(P\)-values, and how they correspond to other information statistics and likelihood measures.

```
pvalue <- c(0.99, 0.90, 0.50, 0.25, 0.10,
0.05, 0.025, 0.01, 0.005, 0.0001,
paste("5 sigma (~ 2.9 in 10 million)"),
paste("1 in 100 million (GWAS)"),
paste("6 sigma (~ 1 in a billion)"))
svalue <- round(c(-log2(c(0.99, 0.90, 0.50, 0.25, 0.10,
0.05, 0.025, 0.01, 0.005, 0.0001)), 21.7, 26.6, 29.9), 2)
pvalue[1:10] <- formatC(pvalue[1:10], format = "e", digits = 2)
mlr <- round(c(1.00, 1.01, 1.26, 1.94, 3.87, 6.83, 12.3, 27.6, 51.4, 1935,
(5.2 * (10^5)), (1.4 * (10^7)), (1.3 * (10^8))), 2)
# likelihood ratio statistic
lr <- round(c(0.00016, 0.016, 0.45, 1.32, 2.71, 3.84, 5.02,
6.63, 7.88, 15.1, 26.3, 32.8, 37.4), 2)
table1 <- data.frame(pvalue, svalue, mlr, lr)
colnames(table1) <- c("P-value <br/> (compatibility)",
"S-value (bits)",
"Maximum Likelihood Ratio",
"Deviance Statistic 2ln(MLR)")
kable(table1, format = "html", col.names = colnames(table1),
escape = FALSE, table.attr = "class=\"striped\"",
caption = "Table 1 $P$-values and binary $S$-values,
with corresponding maximum-likelihood ratios (MLR)
and deviance (likelihood-ratio) statistics for a
simple test hypothesis H under background assumptions A") %>%
kable_styling(bootstrap_options = c("hover", "condensed", "responsive"))
```

P-value (compatibility) |
S-value (bits) | Maximum Likelihood Ratio | Deviance Statistic 2ln(MLR) |
---|---|---|---|

0.99 | 0.01 | 1.000e+00 | 0.00 |

0.9 | 0.15 | 1.010e+00 | 0.02 |

0.5 | 1.00 | 1.260e+00 | 0.45 |

0.25 | 2.00 | 1.940e+00 | 1.32 |

0.1 | 3.32 | 3.870e+00 | 2.71 |

0.05 | 4.32 | 6.830e+00 | 3.84 |

0.025 | 5.32 | 1.230e+01 | 5.02 |

0.01 | 6.64 | 2.760e+01 | 6.63 |

0.005 | 7.64 | 5.140e+01 | 7.88 |

1e-04 | 13.29 | 1.935e+03 | 15.10 |

5 sigma (~ 2.9 in 10 million) | 21.70 | 5.200e+05 | 26.30 |

1 in 100 million (GWAS) | 26.60 | 1.400e+07 | 32.80 |

6 sigma (~ 1 in a billion) | 29.90 | 1.300e+08 | 37.40 |

For further discussion of 5 and 6 sigma cutoffs, see.^{7}

## Frequentist Interval Estimate Percentiles

## Refer to Long-Run Coverage

Here we simulate a study where one group with 100 participants has an
\(\mu\) of `100`

with a \(\sigma\) of `20`

and the second group has the same
number of participants but an average of `80`

and a standard deviation
of `20.`

We compare them using a Welch’s \(t\)-test and generate 95%
compatibility intervals several times, specifically `100`

times, and
then plot them. Since we know the mean difference is `20`

, we wish to
see how often the interval estimates cover this true parameter value.

```
#' Confidence Interval Simulation Function
#'
#' @return
#' @export
#'
#' @examples
sim <- function() {
fake <- data.frame((x <- rnorm(100, 100, 20)),
(y <- rnorm(100, 80, 20)))
intervals <- t.test(x = x, y = y, data = fake,
conf.level = .95)$conf.int[]
}
RNGkind(kind = "L'Ecuyer-CMRG")
set.seed(1031)
z <- replicate(100, sim(), simplify = FALSE)
df <- data.frame(do.call(rbind, z))
df$studynumber <- (1:length(z))
intrvl.limit <- c("lower.limit", "upper.limit", "studynumber")
colnames(df) <- intrvl.limit
df$point <- ((df$lower.limit + df$upper.limit) / 2)
df$covered <- (df$lower.limit <= 20 & 20 <= df$upper.limit)
df$coverageprob <- ((as.numeric(table(df$covered)[2]) / nrow(df) * 100))
ggplot(data = df, aes(x = studynumber, y = point,
ymin = lower.limit, ymax = upper.limit)) +
geom_pointrange(mapping = aes(color = covered),
fill = "#99c7c7", size = .60, alpha = 0.35) +
geom_hline(yintercept = 20, lty = 1, color = "red", alpha = 0.30) +
coord_flip() +
labs(title = "Simulated 95% Frequentist Interval Estimates",
subtitle = "True Mean Difference is 20",
x = "Study Number",
y = "Estimate (Point + Interval)",
caption = "Red Intervals Do Not Cover Population Parameter") +
theme_bw() + # use a white background
theme(legend.position = "none") +
annotate(geom = "text", x = 102, y = 30,
label = "Overall Coverage (%) =", size = 3.5,
color = "#99c7c7", alpha = 0.950) +
annotate(geom = "text", x = 102, y = 35,
label = df$coverageprob, size = 3.5,
color = "#99c7c7", alpha = 0.950) +
theme(text = element_text(size = 13.5)) +
theme(plot.title = element_text(size = 16),
plot.subtitle = element_text(size = 12),
plot.caption = element_text(size = 10))
```

This shows that the intervals tend to vary simply as a result of
randomness, and that the proper interpretation of the attached
percentile is about long run coverage of the true parameter. However,
this does not preclude interpretation of a single interval estimate from
a study.^{8} As we write in the *Replace
unrealistic “confidence” claims with compatibility measures* section of
our paper.^{1}

The fact that “confidence” refers to the procedure behavior, not the reported interval, seems to be lost on most researchers. Remarking on this subtlety, when Jerzy Neyman discussed his confidence concept in 1934 at a meeting of the Royal Statistical Society, Arthur Bowley replied, “I am not at all sure that the ‘confidence’ is not a confidence trick.”.

^{9}And indeed, 40 years later, Cox and Hinkley warned, “interval estimates cannot be taken as probability statements about parameters, and foremost is the interpretation ‘such and such parameter values are consistent with the data.’”.^{8}Unfortunately, the word “consistency” is used for several other concepts in statistics, while in logic it refers to an absolute condition (of noncontradiction); thus, its use in place of “confidence” would risk further confusion.To address the problems above, we exploit the fact that a 95% CI summarizes the results of varying the test hypothesis H over a range of parameter values, displaying all values for which \(p\) > 0.05

^{10}and hence \(s\) < 4.32 bits;^{3}.^{11}Thus the CI contains a range of parameter values that are more compatible with the data than are values outside the interval, under the background assumptions;^{3}.^{12}Unconditionally (and thus even if the background assumptions are uncertain), the interval shows the values of the parameter which, when combined with the background assumptions, produce a test model that is “highly compatible” with the data in the sense of having less than 4.32 bits of information against it. We thus refer to CI as compatibility intervals rather than confidence intervals;^{13};^{3};^{11}their abbreviation remains “CI.” We reject calling these intervals “uncertainty intervals,” because they do not capture uncertainty about background assumptions.^{13}

Indeed, this also has to do with our paper on deemphasizing the model
assumptions that are often behind common statistical outputs, and why it
is necessary to treat them as uncertain, rather than
given.^{14} For example, a 95% interval
estimate is assumed to be free of bias in its construction, however,
it’s coverage claims are no longer so, once there are systematic errors
in play. Many of these common, classical statistical procedures are
designed to deal with random variation, and less so with bias (a
relatively new field, with shiny new methods).

# Brown et al. (2017) Reanalysis

Taken from the Brown et al.
data.^{15, 16}

Here we take the reported statistics from the Brown et al. studies in order to run statistical tests of different alternative test hypotheses and use those results to construct various functions.

We calculate the standard errors from the point estimate and the confidence (compatibility) limits.

```
se <- log(2.59 / 0.997) / 3.92
logUL <- log(2.59)
logLL <- log(0.997)
logpoint <- log(1.61)
logpoint + (1.96 * se)
[1] 0.9535654
logpoint - (1.96 * se)
[1] -0.001097013
```

## Table 2: \(P\)-values, \(S\)-values, and Likelihoods for Targeted Hypotheses About Hazard Ratios for Brown et al.

```
testhypothesis <- c("Halving of hazard", "No effect (null)",
"Point estimate", "Doubling of hazard",
"Tripling of hazard", "Quintupling of hazard")
hazardratios <- c(0.5, 1, 1.61, 2, 3, 5)
pvals <- c(1.6e-06, 0.05, 1.00, 0.37, 0.01, 3.2e-06)
svals <- round(-log2(pvals), 2)
mlr2 <- c((1.0 * 10^5), 6.77, 1.00, 1.49, 26.2, (5.0 * 10^4))
lr <- round(c(
exp((((log(0.5 / 1.61)) / se)^2) / (2)),
exp((((log(1 / 1.61)) / se)^2) / (2)),
exp((((log(1.61 / 1.61)) / se)^2) / (2)),
exp((((log(2 / 1.61)) / se)^2) / (2)),
exp((((log(3 / 1.61)) / se)^2) / (2)),
exp((((log(5 / 1.61)) / se)^2) / (2))), 3)
LR <- formatC(lr, format = "e", digits = 2)
table2 <- data.frame(testhypothesis, hazardratios,
pvals, svals, mlr2, LR)
colnames(table2) <- c("Test Hypothesis", "HR",
"P-values", "S-values",
"MLR", "LR")
knitr::kable(table2, format = "html", col.names = colnames(table2),
caption = "HR = Hazard Ratio, MLR = Maximum Likelihood Ratio,
LR = Likelihood-ratio Statistic") %>%
kable_styling(bootstrap_options = c("hover", "condensed", "responsive"))
```

Test Hypothesis | HR | P-values | S-values | MLR | LR |
---|---|---|---|---|---|

Halving of hazard | 0.50 | 1.6e-06 | 19.25 | 1.00e+05 | 1.02e+05 |

No effect (null) | 1.00 | 5.0e-02 | 4.32 | 6.77e+00 | 6.77e+00 |

Point estimate | 1.61 | 1.0e+00 | 0.00 | 1.00e+00 | 1.00e+00 |

Doubling of hazard | 2.00 | 3.7e-01 | 1.43 | 1.49e+00 | 1.49e+00 |

Tripling of hazard | 3.00 | 1.0e-02 | 6.64 | 2.62e+01 | 2.62e+01 |

Quintupling of hazard | 5.00 | 3.2e-06 | 18.25 | 5.00e+04 | 5.03e+04 |

## Plot the point estimate and 95% compatibility interval

```
label <- c("Brown et al. (2017)\n JAMA",
"Brown et al. (2017)\n J Clin Psychiatry")
point <- c(1.61, 1.7)
lower <- c(0.997, 1.1)
upper <- c(2.59, 2.6)
df <- data.frame(label, point, lower, upper)
```

Here we plot the 95% compatibility interval estimate reported from the high-dimensional propensity score analysis.

```
ggplot(data = df, mapping = aes(x = label, y = point,
ymin = lower, ymax = upper, group = label)) +
geom_pointrange(color = "#007C7C", size = 1.5, alpha = 0.4) +
geom_hline(yintercept = 1, lty = 3, color = "#d46c5b", alpha = 0.5) +
coord_flip() +
scale_y_log10(breaks = scales::pretty_breaks(n = 10), limits = c(0.80, 4)) +
labs(title = "Association Between Serotonergic Antidepressant Exposure
\nDuring Pregnancy and Child Autism Spectrum Disorder",
subtitle = "Reported 95% Compatibility Intervals From Primary Results",
x = "Study",
y = "Hazard Ratio (JAMA) +
Adjusted Pooled Odds Ratio (J Clin Psychiatry)") +
annotate(geom = "text", x = 1.5, y = 1, size = 5,
label = "Null parameter value of 1",
color = "#d46c5b", alpha = 0.75) +
annotate(geom = "text", x = 2.2, y = 1.8, size = 5,
label = "Point Estimate = 1.61,
Lower Limit = 0.997, Upper Limit = 2.59",
color = "#000000", alpha = 0.75) +
annotate(geom = "text", x = 1.2, y = 1.8, size = 5,
label = "Point Estimate = 1.7,
Lower Limit = 1.1, Upper Limit = 2.6",
color = "#000000", alpha = 0.75) +
theme_light() +
theme(axis.text.y = element_text(angle=90, hjust=1, size = 13)) +
theme(axis.text.x = element_text(size = 13)) +
theme(title = element_text(size = 13))
```

Once again, the authors reported
that^{15}

In the 2837 pregnancies (7.9%) exposed to antidepressants, 2.0% (95% CI, 1.6%-2.6%) of children were diagnosed with autism spectrum disorder. Risk of autism spectrum disorder was significantly higher with serotonergic antidepressant exposure (4.51 exposed vs 2.03 unexposed per 1000 person-years; between-group difference, 2.48 95% CI, 2.33-2.62 per 1000 person-years) in crude (HR, 2.16 95% CI, 1.64-2.86) and multivariable-adjusted analyses (HR, 1.59 95% CI, 1.17-2.17) (Table 2). After inverse probability of treatment weighting based on the HDPS, the association was not significant (HR, 1.61 95% CI, 0.997-2.59) (Table 2).

and concluded with

In children born to mothers receiving public drug coverage in Ontario, Canada, in utero serotonergic antidepressant exposure compared with no exposure was not associated with autism spectrum disorder in the child. Although a causal relationship cannot be ruled out, the previously observed association may be explained by other factors.

We will show using the graphical and tabular summaries below, why a more nuanced summary such as,

“After HDPS adjustment for confounding, a 61% hazard elevation remained; however, under the same model, every hypothesis from no elevation up to a 160% hazard increase had \(p\) > 0.05; Thus, while quite imprecise, these results are consistent with previous observations of a positive association between serotonergic antidepressant prescriptions and subsequent ASD. Because the association may be partially or wholly due to uncontrolled biases, further evidence will be needed for evaluating what, if any, proportion of it can be attributed to causal effects of prenatal serotonergic antidepressant use on ASD incidence.”

would have been far more appropriate, as we discuss in our
paper.^{1}

## \(P\)-value and \(S\)-value Functions

In order to use this information to construct a \(P\)-value function, we
can use the `concurve`

functions.

`library("concurve")`

We enter the reported point estimates and compatibility limits and
produce all possible intervals + \(P\)-values + \(S\)-values. This is
calculated assuming normal approximations with the
`curve_rev()`

function.

```
curve1 <- curve_rev(point = 1.61, LL = 0.997, UL = 2.59,
measure = "ratio")
```

It is stored in the object `curve1`

.

We can generate a table of the relevant statistics from this reanalysis,
similar to the table from above, but this time using the
`concurve`

function,
`curve_table()`

,
but this time it will give us interval estimates of various percentiles
(\(25\)%, \(50\)%, \(75\)%, \(95\)%, etc.). These are in turn used to construct
the entire confidence (compatibility) distribution or \(P\)-value
function.

```
kable(curve_table(curve1[[1]]), format = "html", row.names = F,
col.names = colnames(curve_table(curve1[[1]])),
table.attr = "class=\"striped\"",
caption = "Table of Statistics for Various
Interval Estimate Percentiles") %>%
kable_styling(bootstrap_options = c("hover", "condensed", "responsive"))
```

Lower Limit | Upper Limit | Interval Width | Interval Level (%) | CDF | P-value | S-value (bits) |
---|---|---|---|---|---|---|

1.490 | 1.740 | 0.250 | 25.0 | 0.625 | 0.750 | 0.415 |

1.366 | 1.897 | 0.531 | 50.0 | 0.750 | 0.500 | 1.000 |

1.217 | 2.131 | 0.914 | 75.0 | 0.875 | 0.250 | 2.000 |

1.178 | 2.200 | 1.021 | 80.0 | 0.900 | 0.200 | 2.322 |

1.134 | 2.286 | 1.152 | 85.0 | 0.925 | 0.150 | 2.737 |

1.079 | 2.403 | 1.325 | 90.0 | 0.950 | 0.100 | 3.322 |

0.999 | 2.595 | 1.596 | 95.0 | 0.975 | 0.050 | 4.322 |

0.933 | 2.779 | 1.846 | 97.5 | 0.988 | 0.025 | 5.322 |

0.860 | 3.015 | 2.155 | 99.0 | 0.995 | 0.010 | 6.644 |

## Figure 3: \(P\)-value (Compatibility) Function

Plot the \(P\)-value (Compatibility) function of the Brown et al. data

Now we plot the \(P\)-value function from the data stored in `curve1`

using the
`ggcurve()`

function from `concurve`

.

```
ggcurve(curve1[[1]], type = "c", measure = "ratio",
nullvalue = c(1), levels = c(0.5, 0.75, 0.95)) +
labs(title = expression(paste(italic(P), "-value (Compatibility) Function")),
subtitle = expression(paste(italic(P),
"-values for a range of hazard ratios (HR)")),
x = "Hazard Ratio (HR)",
y = expression(paste(italic(P),"-value"))) +
geom_vline(xintercept = 1.61, lty = 1, color = "gray", alpha = 0.2) +
geom_vline(xintercept = 2.59, lty = 1, color = "gray", alpha = 0.2) +
theme_cowplot() +
theme(plot.title = element_text(size = 12),
plot.subtitle = element_text(size = 11),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
text = element_text(size = 11))
```

It is practically the same as the published version from Rafi &
Greenland, 2020.^{1}

```
curve1 <- curve_rev(point = 1.61, LL = 0.997, UL = 2.59,
measure = "ratio")
curve2 <- curve_rev(point = 1.7, LL = 1.1, UL = 2.6,
measure = "ratio")
```

## Cumulative Confidence (Compatibility Distribution)

Although we do not cover this figure in our paper, it is easy to
construct using
`concurve's`

`ggcurve()`

function by specifying `type`

as “`cdf`

” to see the median estimate
within the confidence distribution. The horizontal line that meets at
the curve, is approximately the same as the maximum at the \(P\)-value
function/confidence (compatibility) curve.

```
ggcurve(curve1[[2]], type = "cdf", measure = "ratio", nullvalue = c(1)) +
labs(title = "P-values for a range of hazard ratios (HR)",
subtitle = "Association Between Serotonergic Antidepressant Exposure
\nDuring Pregnancy and Child Autism Spectrum Disorder",
x = "Hazard Ratio (HR)",
y = "Cumulative Compatibility") +
theme_cowplot() +
theme(plot.title = element_text(size = 12),
plot.subtitle = element_text(size = 11),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
text = element_text(size = 11))
```

We will also calculate the likelihoods so that we can generate the corresponding likelihood functions.

```
lik2 <- curve_rev(point = 1.7, LL = 1.1, UL = 2.6,
type = "l", measure = "ratio")
lik1 <- curve_rev(point = 1.61, LL = 0.997, UL = 2.59,
type = "l", measure = "ratio")
```

We can also see how consistent these results are with previous studies
conducted by the same research group, given the overlap of the
functions, which can be compared using the
`plot_compare()`

function. Let’s compare the relative likelihood functions from both
studies from this research group to see how consistent the results are.

```
plot_compare(data1 = lik1[[1]], data2 = lik2[[1]],
type = "l1", measure = "ratio", nullvalue = TRUE,
title = "Brown et al. 2017. J Clin Psychiatry. vs.
\nBrown et al. 2017. JAMA.",
subtitle = "J Clin Psychiatry: OR = 1.7, 1/6.83 LI: LL = 1.1, UL = 2.6
\nJAMA: HR = 1.61, 1/6.83 LI: LL = 0.997, UL = 2.59",
xaxis = "Hazard Ratio / Odds Ratio") +
theme_cowplot() +
theme(plot.title = element_text(size = 12),
plot.subtitle = element_text(size = 11),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
text = element_text(size = 11))
```

and the \(P\)-value functions.

```
plot_compare(data1 = curve1[[1]], data2 = curve2[[1]],
type = "c", measure = "ratio", nullvalue = TRUE,
title = "Brown et al. 2017. J Clin Psychiatry. vs.
\nBrown et al. 2017. JAMA.",
subtitle = "J Clin Psychiatry: OR = 1.7, 1/6.83 LI: LL = 1.1, UL = 2.6
\nJAMA: HR = 1.61, 1/6.83 LI: LL = 0.997, UL = 2.59",
xaxis = "Hazard Ratio / Odds Ratio") +
theme_cowplot() +
theme(plot.title = element_text(size = 12),
plot.subtitle = element_text(size = 11),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
text = element_text(size = 11))
```

## Figure 4: \(S\)-value (Suprisal) Function

Plot the \(S\)-value (Surprisal) function of the Brown et
al. data with
`ggcurve()`

```
ggcurve(data = curve1[[1]], type = "s", measure = "ratio", nullvalue = c(1),
title = "S-value (Surprisal) Function",
subtitle = "S-Values (surprisals) for a range of hazard ratios (HR)",
xaxis = "Hazard Ratio", yaxis1 = "S-value (bits of information)") +
theme_cowplot() +
theme(plot.title = element_text(size = 12),
plot.subtitle = element_text(size = 11),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
text = element_text(size = 11))
```

which is quite close to the figure from our paper.

## Likelihood (Support) Functions

Here we provide the code to show how we constructed the likelihood functions from our paper, which are the supplementary figures found here (S1) and here (S2).

Calculate and Plot Likelihood (Support) Intervals.

Here we use the reported estimates to construct the \(Z\)-scores and standard errors, which are in turn used to compute the likelihoods.

```
hrvalues <- seq(from = 0.65, to = 3.98, by = 0.01)
se <- log(2.59 / 0.997) / 3.92
zscore <- sapply(hrvalues, function(i) (log(i / 1.61) / se))
```

## Figure S1: Relative Likelihood Function

We then standardize all the likelihoods by their maximum at the
likelihood function,^{17} to produce
relative likelihoods, likelihood intervals, and their corresponding
relative likelihood function.^{18}

```
support <- exp((-zscore^2) / 2)
likfunction <- data.frame(hrvalues, zscore, support)
ggplot(data = likfunction, mapping = aes(x = hrvalues, y = support)) +
geom_line() +
geom_vline(xintercept = 1, lty = 3, color = "#d46c5b", alpha = 0.75) +
geom_hline(yintercept = (1/6.83), lty = 1, color = "#333333", alpha = 0.25) +
geom_ribbon(aes(x = hrvalues, ymin = min(support), ymax = support),
fill = "#d46c5b", alpha = 0.10) +
labs(title = "Relative Likelihood Function",
subtitle = "Relative likelihoods for a range of hazard ratios",
x = "Hazard Ratio (HR)",
y = "Relative Likelihood \n1/MLR") +
annotate(geom = "text", x = 1.65, y = 0.2,
label = "1/6.83 LI = 0.997, 2.59", size = 4, color = "#000000") +
theme_cowplot() +
theme(plot.title = element_text(size = 12),
plot.subtitle = element_text(size = 11),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
text = element_text(size = 11)) +
scale_x_log10(breaks = scales::pretty_breaks(n = 10)) +
scale_y_continuous(expand = expansion(mult = c(0.01, 0.0125)),
breaks = seq(0, 1, 0.10))
```

The \(\frac{1}{6.83}\) likelihood interval corresponds to the 95% compatibility interval, as shown by the figure from our paper.

Below we use the calculated \(Z\)-scores to construct the log-likelihood function which is the upward-concave parabola \(\frac{Z^{2}}{2}\) = \(-ln(MLR)\)

```
support <- (zscore^2) / 2
likfunction <- data.frame(hrvalues, zscore, support)
ggplot(data = likfunction, mapping = aes(x = hrvalues, y = support)) +
geom_line() +
geom_vline(xintercept = 1, lty = 3, color = "#d46c5b", alpha = 0.75) +
geom_ribbon(aes(x = hrvalues, ymin = support, ymax = max(support)),
fill = "#d46c5b", alpha = 0.10) +
labs(title = "Log-Likelihood Function",
subtitle = "Log-likelihoods for a range of hazard ratios",
x = "Hazard Ratio (HR)",
y = "ln(MLR)") +
theme_cowplot() +
theme(axis.title.x = element_text(size = 13),
axis.title.y = element_text(size = 13)) +
scale_x_log10(breaks = scales::pretty_breaks(n = 10)) +
scale_y_continuous(expand = expansion(mult = c(0.01, 0.0125)),
breaks = seq(0, 7, 1)) +
theme(text = element_text(size = 15)) +
theme(plot.title = element_text(size = 12),
plot.subtitle = element_text(size = 12))
```

## Figure S2: Deviance Function \(2ln(MLR)\)

Known as the deviance function, it is twice the log-likelihood, which maps to the \(S\)-value function.

```
support <- (zscore^2)
likfunction <- data.frame(hrvalues, zscore, support)
ggplot(data = likfunction, mapping = aes(x = hrvalues, y = support)) +
geom_line() +
geom_vline(xintercept = 1, lty = 3, color = "#d46c5b", alpha = 0.75) +
geom_hline(yintercept = 3.84, lty = 1, color = "#333333", alpha = 0.25) +
geom_ribbon(aes(x = hrvalues, ymin = support, ymax = max(support)),
fill = "#d46c5b", alpha = 0.10) +
theme_cowplot() +
annotate(geom = "text", x = 1.65, y = 4.4,
label = "1/6.83 LI = 0.997, 2.59", size = 4, color = "#000000") +
theme(plot.title = element_text(size = 12),
plot.subtitle = element_text(size = 11),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
text = element_text(size = 11)) +
scale_x_log10(breaks = scales::pretty_breaks(n = 10)) +
scale_y_continuous(expand = expansion(mult = c(0.01, 0.0125)),
breaks = seq(0, 14, 2)) +
labs(title = "Deviance Function",
subtitle = "Deviance statistics for a range of hazard ratios",
x = "Hazard Ratio (HR)",
y = " Deviance Statistic \n2ln(MLR)")
```

And the version from our paper can be found here.

It is important to note that although we have calculated these
likelihood functions manually here, they can also be generated easily
using the
`curve_lik()`

function which takes inputs from the
`ProfileLikelihood`

`R`

package. To see a further discussion, please see the following
article,
which gives several examples for a wide variety of models.

Further, we urge some caution. Although we endorse the construction and
presentation of likelihood functions along with \(P\)-value and \(S\)-value
functions, along with the tabulations, the use of pure likelihood
methods has been highly controversial among some statisticians, with
some going as far as to say that *likelihood is blind* although not all
statisticians believe this, and have responded to such criticisms in
kind, see discussants.^{19} Thus, we
support providing both \(P\)-value/\(S\)-value and likelihood-based
functions for a complete picture.

Indeed, we can see how they all easily map to one another when plotted side by side, with the following script.

```
library("cowplot")
plot_grid(confcurve, surprisalcurve,
relsupportfunction, deviancefunction,
ncol = 2, nrow = 2)
```

A clearer plot comparing the four functions is seen below (click on the image to view in full).

Additional adjustments that were made to the figures from Rafi &
Greenland, 2020^{1} were done using Adobe Illustrator and
Photoshop.

All errors are ours, and we welcome critical feedback and reporting of errors. To report possible errors in our analyses, please post it as a comment below or as a bug here. We will compile them onto a public errata, if any errors or flaws are reported to us.

# Statistical Package Citations

Please remember to cite the `R`

packages if you use any of the `R`

scripts from above. The citation for our^{1} can be found below in the
References section or it can be downloaded
here
for a reference manager.

```
citation("concurve")
citation("TeachingDemos")
citation("ProfileLikelihood")
citation("rstan")
citation("rstanarm")
citation("bayesplot")
citation("knitr")
citation("kableExtra")
citation("ggplot2")
citation("cowplot")
citation("Statamarkdown")
```

# Environment

The analyses were run on:

```
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
Random number generation:
RNG: L'Ecuyer-CMRG
Normal: Inversion
Sample: Rejection
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] rstanarm_2.21.1 Rcpp_1.0.6 rstan_2.21.2 StanHeaders_2.21.0-7 bayesplot_1.8.0 TeachingDemos_2.12
[7] Statamarkdown_0.5.2 kableExtra_1.3.1 cowplot_1.1.1 ggplot2_3.3.3 concurve_2.7.7 rmarkdown_2.6
loaded via a namespace (and not attached):
[1] readxl_1.3.1 uuid_0.1-4 backports_1.2.1 systemfonts_0.3.2 plyr_1.8.6
[6] igraph_1.2.6 splines_4.0.3 crosstalk_1.1.1 rstantools_2.1.1 inline_0.3.17
[11] digest_0.6.27 htmltools_0.5.1 rsconnect_0.8.16 fansi_0.4.2 magrittr_2.0.1
[16] openxlsx_4.2.3 credentials_1.3.0 RcppParallel_5.0.2 matrixStats_0.57.0 officer_0.3.16
[21] xts_0.12.1 askpass_1.1 prettyunits_1.1.1 colorspace_2.0-0 rvest_0.3.6
[26] haven_2.3.1 xfun_0.20 dplyr_1.0.3 callr_3.5.1 crayon_1.3.4.9000
[31] bcaboot_0.2-1 jsonlite_1.7.2 lme4_1.1-26 survival_3.2-7 zoo_1.8-8
[36] glue_1.4.2 survminer_0.4.8 gtable_0.3.0 webshot_0.5.2 V8_3.4.0
[41] car_3.0-10 pkgbuild_1.2.0 abind_1.4-5 scales_1.1.1 DBI_1.1.1
[46] rstatix_0.6.0 miniUI_0.1.1.1 viridisLite_0.3.0 xtable_1.8-4 foreign_0.8-81
[51] km.ci_0.5-2 stats4_4.0.3 DT_0.17 htmlwidgets_1.5.3 httr_1.4.2
[56] threejs_0.3.3 ellipsis_0.3.1 pkgconfig_2.0.3 loo_2.4.1 farver_2.0.3
[61] reshape2_1.4.4 tidyselect_1.1.0 labeling_0.4.2 rlang_0.4.10 later_1.1.0.1
[66] munsell_0.5.0 pbmcapply_1.5.0 cellranger_1.1.0 tools_4.0.3 cli_2.2.0
[71] generics_0.1.0 broom_0.7.3 ggridges_0.5.3 evaluate_0.14 stringr_1.4.0
[76] fastmap_1.0.1 yaml_2.2.1 sys_3.4 processx_3.4.5 knitr_1.30
[81] zip_2.1.1 survMisc_0.5.5 purrr_0.3.4 nlme_3.1-151 mime_0.9
[86] xml2_1.3.2 shinythemes_1.1.2 compiler_4.0.3 rstudioapi_0.13 curl_4.3
[91] ggsignif_0.6.0 statmod_1.4.35 tibble_3.0.5 stringi_1.5.3 highr_0.8
[96] ps_1.5.0 blogdown_1.1 forcats_0.5.0 gdtools_0.2.3 lattice_0.20-41
[101] Matrix_1.3-2 nloptr_1.2.2.2 markdown_1.1 shinyjs_2.0.0 KMsurv_0.1-5
[106] vctrs_0.3.6 pillar_1.4.7 lifecycle_0.2.0 data.table_1.13.6 ProfileLikelihood_1.1
[111] flextable_0.6.1 httpuv_1.5.5 R6_2.5.0 bookdown_0.21 promises_1.1.1
[116] gridExtra_2.3 rio_0.5.16 codetools_0.2-18 MASS_7.3-53 gtools_3.8.2
[121] boot_1.3-25 colourpicker_1.1.0 assertthat_0.2.1 openssl_1.4.3 withr_2.4.0
[126] metafor_2.4-0 shinystan_2.5.0 hms_1.0.0 grid_4.0.3 minqa_1.2.4
[131] tidyr_1.1.2 carData_3.0-4 ggpubr_0.4.0 shiny_1.5.0 base64enc_0.1-3
[136] dygraphs_1.1.1.6
```

# References

*BMC Medical Research Methodology*.

**20**:244. doi: 10.1186/s12874-020-01105-9.

*arXiv:200812991 [statME]*. http://arxiv.org/abs/2008.12991.

*The American Statistician*.

**73**:106–114. doi: 10.1080/00031305.2018.1529625.

*Journal of the American Statistical Association*.

**95**:1127–1142. doi: dpvq8c.

*Journal of the American Statistical Association*.

**95**:1143–1156. doi: gg7krv.

*The American Statistician*.

**73**:1–3. doi: 10.1080/00031305.2016.1277161.

*Synthese*.

**194**:395–432. doi: 10.1007/s11229-014-0525-z.

*Journal of the Royal Statistical Society*.

**4**:558–625. doi: 10.2307/2342192.

*The American Statistician*.

**73**:262–270. doi: 10.1080/00031305.2018.1543137.

*European Journal of Epidemiology*.

**31**:337–350. doi: 10.1007/s10654-016-0149-3.

*BMJ*.

**366**. doi: 10.1136/bmj.l5381.

*arXiv:190908583 [statME]*. http://arxiv.org/abs/1909.08583.

*Journal of the American Medical Association*.

**317**:1544–1552. doi: 10.1001/jama.2017.3415.

*The Journal of Clinical Psychiatry*.

**78**:e48–e58. doi: 10.4088/JCP.15r10194.

*Journal of the Korean Statistical Society*.

**37**:191–211. doi: bn8cgk.