Earlier this year, my colleagues and I were discussing the relationship between saturated fat and cardiovascular disease. One of us was writing an article on the topic and we were discussing an unusual trial often included in meta-analyses.
That trial is the Finnish Mental Hospital Study, a crossover study that compared patients on a control diet with a certain amount of saturated fat to patients on an intervention diet that replaced the saturated fat with polyunsaturated fats.
Here is a summary of the trial,
“A controlled intervention trial, with the purpose of testing the hypothesis that the incidence of coronary heart disease (CHD) could be decreased by the use of serum-cholesterol-lowering (SCL) diet, was carried out in 2 mental hospitals near Helsinki in 1959–71.
The subjects were hospitalized middle-aged men. One of the hospitals received the SCL diet, i.e. a diet low in saturated fats and cholesterol and relatively high in polyunsaturated fats, while the other served as the control with a normal hospital diet. Six years later the diets were reversed, and the trial was continued another 6 years.”
The study didn’t just include men, it also included women and is discussed in a separate paper by the same research group.
In total, the “two studies” (really just one study) had a sample size of 818 participants (for hard CVD events), so they often weigh quite a bit in meta-analyses.
I’d like to bring attention to one particular meta-analysis published eight years ago by Mozaffarian, Micha, & Wallace, 2010. It’s one of the most cited meta-analyses on this topic, with Google Scholar indicating that it’s been cited by over 900+ academic sources. Web of Science indicates that it’s been cited by 466 papers at the time of writing this post.
Source: Web of Science
Clearly, it’s a well known study.
The meta-analysis of interest describes its inclusion and exclusion criteria as,
“We searched for all RCTs that randomized adults to increased total or n-6 PUFA consumption for at least 1 year without other major concomitant interventions (e.g., blood pressure or smoking control, other multiple dietary interventions, etc.), had an appropriate control group without this dietary intervention, and reported (or had obtainable from the authors) sufficient data to calculate risk estimates with standard errors for effects on occurrence of “hard” CHD events (myocardial infarction, CHD death, and/or sudden death). Studies were excluded if they were observational or otherwise nonrandomized.”
So the authors state that the included studies must be randomized trials that are at least a year long and that they are excluding studies that are non-randomized or observational.
Here’s a list of the studies they included. Note the design of the Finnish studies (Turpeinen, 1979 & Miettinen, 1983), which I’ll touch upon below.
What Were the Results?
Mozaffarian D, Micha R, Wallace S (2010)
“Combining all trials, the pooled risk reduction for CHD events was 19% (RR = 0.81, 95% CI 0.70–0.95, p = 0.008)”
The 2010 meta-analysis found that replacing saturated fats in the diet with polyunsaturated fats had a notable, statistically significant reduction on CHD events. A 19% reduction is certainly, nothing to ignore and the confidence interval (CI) leans towards an effect. It seems promising as a dietary intervention. That could be a reason why the study is cited so widely. The quality of the trials included in the meta-analysis was low to moderate,
“Many of the trials had design limitations, such as single-blinding, inclusion of electrocardiographically defined clinical endpoints, or open enrollment. All trials utilized blinded endpoint assessment. Quality scores were in the modest range and relatively homogeneous: all trials had quality scores of either 2 or 3.”
And there was some suggestion of publication bias (could also be small-study effects),
Mozaffarian D, Micha R, Wallace S (2010)
“Visual inspection of the resulting funnel plot indicated some potential for publication bias (Figure S1), with a borderline Begg’s test (continuity corrected p = 0.07), although such determinations are limited when the number of studies is relatively small.”
Regardless, the effects are quite interesting and worth exploring further.
What Went Wrong?
A major problem in this meta-analysis is that the two Finnish studies included in the quantitative analysis were not randomized. The authors made it clear with their inclusion criteria that they only wanted to include trials that were randomized.
The two Finnish Mental Hospital studies were labeled as “cluster randomized”, which you can see in the table of characteristics from above. When this meta-analysis was published, several individuals were critical that a “cluster-randomized trial” was being labeled as a randomized trial, especially when there were only two clusters (two hospitals). This is a valid criticism because a cluster-randomized trial with only one cluster per condition is invalid for any between-group statistical comparisons. Brown et al., 2015 explain in this comprehensive article,
A particularly pernicious and invalid design that requires recognition is the inclusion of only one cluster per condition… Such designs are unable to support any valid analysis for an intervention effect, absent strong and untestable assumptions (11, 12). In such designs, the variation that is due to the cluster is not identifiable apart from the variation due to the condition.
A one-cluster-per-condition design is analogous to assigning one person to the treatment and one person to the control in an ordinary (nonclustered) RCT, measuring each person’s outcome multiple times, treating the multiple observations per person like independent observations, and interpreting the results like a valid RCT. In such a situation, the observations on person A can be tested as to whether they are significantly different from those on person B but cannot support an inference about the effect of treatment per se.
So it is clear that a one-cluster-per-condition design is not valid to ascertain much about the intervention. However, many individuals (if not all) failed to notice that the Finnish Mental Hospital studies were not even cluster randomized! There is no indication in any of the five published papers from these two studies that there is any randomization. You can check all five papers here:
|International Journal of Epidemiology||1983||Dietary Prevention of Coronary Heart Disease in Women: The Finnish Mental Hospital Study|
|Circulation||1979||Effect of Cholesterol-Lowering Diet on Mortality from Coronary Heart Disease and Other Causes|
|American Journal of Clinical Nutrition||1968||Dietary Prevention of Coronary Heart Disease: Long-Term Experiment: I. Observations on Male Subjects|
|International Journal of Epidemiology||1979||Dietary Prevention of Coronary Heart Disease: The Finnish Mental Hospital Study|
|The Lancet||1972||Effect of Cholesterol-Lowering Diet on Mortality from Coronary Heart-Disease and Other Causes a Twelve-Year Clinical Trial in Men and Women|
Furthermore, cluster-randomized trials were not common when these studies were being conducted, which is why we should be skeptical of these being cluster-randomized trials.
Yet, these two studies were mistakenly labeled as being “cluster randomized” and therefore were included in the meta-analysis. Both of these studies contributed a total weight of 16% to the analysis.
And again, the authors found a pretty notable reduction in CVD events (RR: 0.81, 95% CI 0.70 – 0.95, p = 0.008)
Correcting the Error
So what happens to the results when you correct this mistake by removing the two studies?
R and find out. If you’d like to reproduce the analysis on your own, you can find all the code at the bottom of this blog post.
As you can see above, rerunning the analysis after removing the Finnish studies results in the effect size shrinking from a 19% reduction to a 13% reduction (RR: 0.87, 95% CI 0.76 - 1.00). That’s a large difference!
If we’re concerned about statistical significance, the results are no longer significant. It’s worth noting that the upper bound of the confidence interval barely contains the null value (1) and the lower bound includes a value as low as 0.76. It’s clear that the CI still seems to lean towards an effect.
Regardless of your statistical philosophy, this was a noteworthy, objective mistake. It was a mistake in labeling two studies as meeting the inclusion criteria, and correcting for this mistake leads to a substantial change in the results. Yet, this error has not been corrected for in the journal. In fact, this study has been around for eight years with no corrections or retractions.
I reached out to both the authors and the editors of PLoS, but to date, there are no updates or corrections on the article itself. Therefore, I suspect several people who read the article or cite it, are not aware that the summary effects are incorrect and that some of the studies in the analysis should not be there!
It is very important to note that correcting for the errors in this study does not lead to completely different conclusions. Although the effect is no longer statistically significant, it is still there based on the effect size and coverage of the confidence intervals. However, the effect is reduced.
Systematic reviews by other groups including Cochrane did not include the Finnish studies in their meta-analyses because the authors didn’t believe that a “cluster randomized trial” with so few clusters (2) met the inclusion criteria for a randomized trial (also worth remembering, that there is no indication in any of the papers that this was even cluster randomized!). Some of these systematic reviews that exclude the Finnish studies still find a benefit to replacing saturated fats in the diet with polyunsaturated fats.
However, other meta-analyses have also found no statistically significant benefit to replacing saturated fats with polyunsaturated fats.
Clearly, there is quite a bit of disagreement on this topic. Regardless, the meta-analysis in question still made a large error and it is a problem for the following reasons (even if the overall conclusions of it were not to change after the correction):
- Two prominent studies were misclassified
- The studies did not meet the inclusion criteria but were included
- Inclusion in the analysis leads to a substantially different effect size than without inclusion
- The meta-analysis is constantly cited and misleading other readers
I’m certainly not suggesting that errors do not happen, especially when undertaking such large, comprehensive projects. In fact, I would probably be suspicious if there were never any errors when such large projects were conducted!
However, I believe that when such errors are pointed out, they should be corrected as quickly and transparently as possible. Hopefully, the authors and the editors address this issue soon to prevent any further confusion.
Here is the dataset to load into
R as a dataframe.
pufaMETA <- read.csv("https://data.lesslikely.com/uploads/PUFA.csv", header = TRUE) pufaMETA
## Study_ID PUFA_Events PUFA_Total Control_Events Control_Total ## 1 DARTS 132 1018 144 1015 ## 2 LA Veterans 53 424 71 422 ## 3 Minnesota CS 131 4541 121 4516 ## 4 MRC Soy 45 199 51 194 ## 5 Oslo Diet Heart 61 206 81 206 ## 6 STARS 2 27 5 28
suppressMessages(library("metafor")) ## Meta-Analysis dat <- escalc(measure = "RR", ai = PUFA_Events, n1i = PUFA_Total, ci = Control_Events, n2i = Control_Total, data = pufaMETA) dat[, c(1, 6:7)]
## Study_ID yi vi ## 1 DARTS -0.0900 0.0126 ## 2 LA Veterans -0.2971 0.0282 ## 3 Minnesota CS 0.0739 0.0155 ## 4 MRC Soy -0.1506 0.0317 ## 5 Oslo Diet Heart -0.2836 0.0190 ## 6 STARS -0.8799 0.6272
res <- rma(yi, vi, data = dat) confint(res)
## ## estimate ci.lb ci.ub ## tau^2 0.0066 0.0000 0.3010 ## tau 0.0812 0.0000 0.5486 ## I^2(%) 21.4317 0.0000 92.5694 ## H^2 1.2728 1.0000 13.4578
par(mar = c(4, 4, 1, 2)) res <- rma(ai = PUFA_Events, n1i = PUFA_Total, ci = Control_Events, n2i = Control_Total, data = dat, measure = "RR", slab = paste(Study_ID, sep = ", "), method = "DL") res_REML <- rma(ai = PUFA_Events, n1i = PUFA_Total, ci = Control_Events, n2i = Control_Total, data = dat, measure = "RR", slab = paste(Study_ID, sep = ", "), method = "REML")
Although I have used both the Dersimonian-Laird estimator, as used in the original analysis by the authors, I have chosen instead to also conduct the analysis using the restricted maximum likelihood estimator (REML) and plot that instead, as it often gives wider interval estimates, with better coverage properties.
## Data Visualization - Forest Plot Structure forest(res_REML, xlim = c(-16, 6), at = log(c(0.05, 0.25, 1, 2)), atransf = exp, ilab = cbind(dat$PUFA_Events, dat$PUFA_Total, dat$Control_Events, dat$Control_Total), ilab.xpos = c(-9.5, -8, -6, -4.5), cex = 0.85, ylim = c(-1, 8.6), xlab = "Risk Ratio", mlab = "", psize = 1.7, pch = 16) ## Heterogeneity text(-16, -1, pos = 4, cex = 0.85, bquote(paste("Random Effects (Q = ", .(formatC(res_REML$QE, digits = 2, format = "f")), ", df = ", .(res_REML$k - res$p), ", p = ", .(formatC(res_REML$QEp, digits = 2, format = "f")), "; ", I^2, " = ", .(formatC(res_REML$I2, digits = 1, format = "f")), "%)"))) op <- par(cex = 0.85, font = 4) ## Bold Font par(font = 2) ## Columns text(c(-9.5, -8, -6, -4.5), 7.5, c("Events ", " Total", "Events ", " Total")) text(c(-8.75, -5.25), 8.5, c("PUFA Diet", "Control Diet")) text(-16, 7.5, "Study Name", pos = 4) text(6, 7.5, "Risk Ratio [95% CI]", pos = 2)
## ## Random-Effects Model (k = 6; tau^2 estimator: REML) ## ## tau^2 (estimated amount of total heterogeneity): 0.0066 (SE = 0.0183) ## tau (square root of estimated tau^2 value): 0.0812 ## I^2 (total heterogeneity / total variability): 21.43% ## H^2 (total variability / sampling variability): 1.27 ## ## Test for Heterogeneity: ## Q(df = 5) = 5.9549, p-val = 0.3106 ## ## Model Results: ## ## estimate se zval pval ci.lb ci.ub ## -0.1361 0.0720 -1.8911 0.0586 -0.2771 0.0050 . ## ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
suppressMessages(library("concurve")) library("ggplot2") PUFA_curve <- curve_meta(res_REML, measure = "ratio", cores = 4L) ggcurve(PUFA_curve[], nullvalue = c(1), levels = c(0.5, 0.75, 0.95), title = "P-value Function of PUFA Meta-Analysis", subtitle = "Reanalysis of Mozaffarian et al. 2010", xaxis = "Risk Ratio") + labs(caption = "Restricted Maximum Likelihood Estimator used")
Although I believe the results still suggest an effect, if another analyst was not convinced by these effects and set interval nulls that spanned from risk ratios from 0.9 to 1.1, as being equivalent to no effect, then these results would be in some trouble! We can see this easily with
concurve, which allows us to set the interval null regions.
ggcurve(PUFA_curve[], nullvalue = c(0.9, 1.1), levels = c(0.5, 0.75, 0.95), title = "P-value Function of PUFA Meta-Analysis", subtitle = "Reanalysis of Mozaffarian et al. 2010", xaxis = "Risk Ratio") + annotate(geom = "text", x = 1.0, y = 0.50, label = "Interval Null Region", size = 4, color = "#000000", alpha = 0.5) + labs(caption = "Restricted Maximum Likelihood Estimator used")
Edit: Dr. Mozaffarian has replied to some of these criticisms in the comments section of his paper on PLOS Medicine, and unfortunately, I have found the responses to be very poor. The response pretty much sums up to, "to the best of our knowledge when we were critically appraising the literature, this study seemed like a randomized trial, so we labeled it as such, and it may be appropriate to label it quasi-randomized, just because we believe that.
Unfortunately, I don’t think that’s how good science progresses at all, and in fact, this error has the potential to cause serious confusion in the future if it goes uncorrected.
The analyses were run on:
## R version 4.0.3 (2020-10-10) ## Platform: x86_64-apple-darwin17.0 (64-bit) ## Running under: macOS Catalina 10.15.6 ## ## Matrix products: default ## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib ## ## Random number generation: ## RNG: Mersenne-Twister ## Normal: Inversion ## Sample: Rejection ## ## locale: ##  en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 ## ## attached base packages: ##  stats graphics grDevices utils datasets methods base ## ## other attached packages: ##  ggplot2_3.3.2 concurve_2.7.7 metafor_2.4-0 Matrix_1.2-18 ## ## loaded via a namespace (and not attached): ##  tidyr_1.1.2 splines_4.0.3 carData_3.0-4 ProfileLikelihood_1.1 askpass_1.1 ##  cellranger_1.1.0 yaml_2.2.1 bcaboot_0.2-1 gdtools_0.2.2 pillar_1.4.6 ##  backports_1.1.10 lattice_0.20-41 glue_1.4.2 uuid_0.1-4 digest_0.6.25 ##  ggsignif_0.6.0 colorspace_1.4-1 htmltools_0.5.0 pkgconfig_2.0.3 broom_0.7.1 ##  haven_2.3.1 bookdown_0.21 xtable_1.8-4 purrr_0.3.4 scales_1.1.1 ##  km.ci_0.5-2 openxlsx_4.2.2 officer_0.3.14 rio_0.5.16 KMsurv_0.1-5 ##  tibble_3.0.4 openssl_1.4.3 farver_2.0.3 generics_0.0.2 car_3.0-10 ##  ellipsis_0.3.1 ggpubr_0.4.0 withr_2.3.0 credentials_1.3.0 survival_3.2-7 ##  magrittr_1.5 crayon_184.108.40.20600 readxl_1.3.1 evaluate_0.14 nlme_3.1-149 ##  rstatix_0.6.0 forcats_0.5.0 xml2_1.3.2 foreign_0.8-80 blogdown_0.21 ##  tools_4.0.3 data.table_1.13.0 hms_0.5.3 lifecycle_0.2.0 stringr_1.4.0 ##  flextable_0.5.11 munsell_0.5.0 zip_2.1.1 compiler_4.0.3 survminer_0.4.8 ##  pbmcapply_1.5.0 systemfonts_0.3.2 rlang_0.4.8 grid_4.0.3 sys_3.4 ##  base64enc_0.1-3 rmarkdown_2.4 boot_1.3-25 gtable_0.3.0 abind_1.4-5 ##  curl_4.3 R6_2.4.1 zoo_1.8-8 gridExtra_2.3 knitr_1.30 ##  dplyr_1.0.2 survMisc_0.5.5 stringi_1.5.3 parallel_4.0.3 Rcpp_1.0.5 ##  vctrs_0.3.4 tidyselect_1.1.0 xfun_0.18