Influential Errors | the Diet Heart Tale

Earlier this year, my colleagues and I were discussing the relationship between saturated fat and cardiovascular disease. One of us was writing an article on the topic and we were discussing an unusual trial often included in meta-analyses.

That trial is the Finnish Mental Hospital Study, a crossover study that compared patients on a control diet with a certain amount of saturated fat to patients on an intervention diet that replaced the saturated fat with polyunsaturated fats.

Here is a summary of the trial,

“A controlled intervention trial, with the purpose of testing the hypothesis that the incidence of coronary heart disease (CHD) could be decreased by the use of serum-cholesterol-lowering (SCL) diet, was carried out in 2 mental hospitals near Helsinki in 1959–71.

The subjects were hospitalized middle-aged men. One of the hospitals received the SCL diet, i.e. a diet low in saturated fats and cholesterol and relatively high in polyunsaturated fats, while the other served as the control with a normal hospital diet. Six years later the diets were reversed, and the trial was continued another 6 years.”

The study didn’t just include men, it also included women and is discussed in a separate paper by the same research group.

In total, the “two studies” (really just one study) had a sample size of 818 participants (for hard CVD events), so they often weigh quite a bit in meta-analyses.

I’d like to bring attention to one particular meta-analysis published eight years ago by Mozaffarian, Micha, & Wallace, 2010. It’s one of the most cited meta-analyses on this topic, with Google Scholar indicating that it’s been cited by over 900+ academic sources. Web of Science indicates that it’s been cited by 466 papers at the time of writing this post.

Web of Science metrics showing the number of times the Mozaffarian meta-analysis has been cited

Source: Web of Science

Clearly, it’s a well known study.

The meta-analysis of interest describes its inclusion and exclusion criteria as,

“We searched for all RCTs that randomized adults to increased total or n-6 PUFA consumption for at least 1 year without other major concomitant interventions (e.g., blood pressure or smoking control, other multiple dietary interventions, etc.), had an appropriate control group without this dietary intervention, and reported (or had obtainable from the authors) sufficient data to calculate risk estimates with standard errors for effects on occurrence of “hard” CHD events (myocardial infarction, CHD death, and/or sudden death). Studies were excluded if they were observational or otherwise nonrandomized.”

So the authors state that the included studies must be randomized trials that are at least a year long and that they are excluding studies that are non-randomized or observational.

Here’s a list of the studies they included. Note the design of the Finnish studies (Turpeinen, 1979 & Miettinen, 1983), which I’ll touch upon below.

Characteristics of the included studies in the meta-analysis

What Were the Results?

Forest plot from the Mozaffarian meta-analysis showing a reduction in CVD events by replacing saturated fat intake with polyunsaturated fat intake

Mozaffarian D, Micha R, Wallace S (2010)

“Combining all trials, the pooled risk reduction for CHD events was 19% (RR = 0.81, 95% CI 0.70–0.95, p = 0.008)”

The 2010 meta-analysis found that replacing saturated fats in the diet with polyunsaturated fats had a notable, statistically significant reduction on CHD events. A 19% reduction is certainly, nothing to ignore and the confidence interval (CI) leans towards an effect. It seems promising as a dietary intervention. That could be a reason why the study is cited so widely. The quality of the trials included in the meta-analysis was low to moderate,

“Many of the trials had design limitations, such as single-blinding, inclusion of electrocardiographically defined clinical endpoints, or open enrollment. All trials utilized blinded endpoint assessment. Quality scores were in the modest range and relatively homogeneous: all trials had quality scores of either 2 or 3.”

And there was some suggestion of publication bias (could also be small-study effects),

Characteristics of the included studies in the meta-analysis

Mozaffarian D, Micha R, Wallace S (2010)

“Visual inspection of the resulting funnel plot indicated some potential for publication bias (Figure S1), with a borderline Begg’s test (continuity corrected p = 0.07), although such determinations are limited when the number of studies is relatively small.”

Regardless, the effects are quite interesting and worth exploring further.

What Went Wrong?

A major problem in this meta-analysis is that the two Finnish studies included in the quantitative analysis were not randomized. The authors made it clear with their inclusion criteria that they only wanted to include trials that were randomized.

The two Finnish Mental Hospital studies were labeled as “cluster randomized”, which you can see in the table of characteristics from above. When this meta-analysis was published, several individuals were critical that a “cluster-randomized trial” was being labeled as a randomized trial, especially when there were only two clusters (two hospitals). This is a valid criticism because a cluster-randomized trial with only one cluster per condition is invalid for any between-group statistical comparisons. Brown et al., 2015 explain in this comprehensive article,

A particularly pernicious and invalid design that requires recognition is the inclusion of only one cluster per condition… Such designs are unable to support any valid analysis for an intervention effect, absent strong and untestable assumptions (11, 12). In such designs, the variation that is due to the cluster is not identifiable apart from the variation due to the condition.

A one-cluster-per-condition design is analogous to assigning one person to the treatment and one person to the control in an ordinary (nonclustered) RCT, measuring each person’s outcome multiple times, treating the multiple observations per person like independent observations, and interpreting the results like a valid RCT. In such a situation, the observations on person A can be tested as to whether they are significantly different from those on person B but cannot support an inference about the effect of treatment per se.

So it is clear that a one-cluster-per-condition design is not valid to ascertain much about the intervention. However, many individuals (if not all) failed to notice that the Finnish Mental Hospital studies were not even cluster randomized! There is no indication in any of the five published papers from these two studies that there is any randomization. You can check all five papers here:

Journal Year Title
International Journal of Epidemiology 1983 Dietary Prevention of Coronary Heart Disease in Women: The Finnish Mental Hospital Study
Circulation 1979 Effect of Cholesterol-Lowering Diet on Mortality from Coronary Heart Disease and Other Causes
American Journal of Clinical Nutrition 1968 Dietary Prevention of Coronary Heart Disease: Long-Term Experiment: I. Observations on Male Subjects
International Journal of Epidemiology 1979 Dietary Prevention of Coronary Heart Disease: The Finnish Mental Hospital Study
The Lancet 1972 Effect of Cholesterol-Lowering Diet on Mortality from Coronary Heart-Disease and Other Causes a Twelve-Year Clinical Trial in Men and Women

Furthermore, cluster-randomized trials were not common when these studies were being conducted, which is why we should be skeptical of these being cluster-randomized trials.

Yet, these two studies were mistakenly labeled as being “cluster randomized” and therefore were included in the meta-analysis. Both of these studies contributed a total weight of 16% to the analysis.

And again, the authors found a pretty notable reduction in CVD events (RR: 0.81, 95% CI 0.70 – 0.95, p = 0.008)

Correcting the Error

So what happens to the results when you correct this mistake by removing the two studies?

Let’s open R and find out. If you’d like to reproduce the analysis on your own, you can find all the code at the bottom of this blog post.

My reanalysis of Mozaffarian's study with the exclusion of the Finnish study graphed onto a forest plot, showing a substantial reduction in the effect size

My reanalysis

As you can see above, rerunning the analysis after removing the Finnish studies results in the effect size shrinking from a 19% reduction to a 13% reduction (RR: 0.87, 95% CI 0.76 - 1.00). That’s a large difference!

If we’re concerned about statistical significance, the results are no longer significant. It’s worth noting that the upper bound of the confidence interval barely contains the null value (1) and the lower bound includes a value as low as 0.76. It’s clear that the CI still seems to lean towards an effect.

Regardless of your statistical philosophy, this was a noteworthy, objective mistake. It was a mistake in labeling two studies as meeting the inclusion criteria, and correcting for this mistake leads to a substantial change in the results. Yet, this error has not been corrected for in the journal. In fact, this study has been around for eight years with no corrections or retractions.

I reached out to both the authors and the editors of PLoS, but to date, there are no updates or corrections on the article itself. Therefore, I suspect several people who read the article or cite it, are not aware that the summary effects are incorrect and that some of the studies in the analysis should not be there!

It is very important to note that correcting for the errors in this study does not lead to completely different conclusions. Although the effect is no longer statistically significant, it is still there based on the effect size and coverage of the confidence intervals. However, the effect is reduced.

Systematic reviews by other groups including Cochrane did not include the Finnish studies in their meta-analyses because the authors didn’t believe that a “cluster randomized trial” with so few clusters (2) met the inclusion criteria for a randomized trial (also worth remembering, that there is no indication in any of the papers that this was even cluster randomized!). Some of these systematic reviews that exclude the Finnish studies still find a benefit to replacing saturated fats in the diet with polyunsaturated fats.

However, other meta-analyses have also found no statistically significant benefit to replacing saturated fats with polyunsaturated fats.

Clearly, there is quite a bit of disagreement on this topic. Regardless, the meta-analysis in question still made a large error and it is a problem for the following reasons (even if the overall conclusions of it were not to change after the correction):

  • Two prominent studies were misclassified
  • The studies did not meet the inclusion criteria but were included
  • Inclusion in the analysis leads to a substantially different effect size than without inclusion
  • The meta-analysis is constantly cited and misleading other readers

I’m certainly not suggesting that errors do not happen, especially when undertaking such large, comprehensive projects. In fact, I would probably be suspicious if there were never any errors when such large projects were conducted!

However, I believe that when such errors are pointed out, they should be corrected as quickly and transparently as possible. Hopefully, the authors and the editors address this issue soon to prevent any further confusion.

Reanalysis Script

Here is the dataset to load into R as a dataframe.

pufaMETA <- read.csv("", 
                     header = TRUE)
##          Study_ID PUFA_Events PUFA_Total Control_Events Control_Total
## 1           DARTS         132       1018            144          1015
## 2     LA Veterans          53        424             71           422
## 3    Minnesota CS         131       4541            121          4516
## 4         MRC Soy          45        199             51           194
## 5 Oslo Diet Heart          61        206             81           206
## 6           STARS           2         27              5            28

## Meta-Analysis

dat <- escalc(measure = "RR", ai = PUFA_Events, n1i = PUFA_Total,
ci = Control_Events, n2i = Control_Total, data = pufaMETA)
dat[, c(1, 6:7)]
##          Study_ID      yi     vi 
## 1           DARTS -0.0900 0.0126 
## 2     LA Veterans -0.2971 0.0282 
## 3    Minnesota CS  0.0739 0.0155 
## 4         MRC Soy -0.1506 0.0317 
## 5 Oslo Diet Heart -0.2836 0.0190 
## 6           STARS -0.8799 0.6272
res <- rma(yi, vi, data = dat)
##        estimate   ci.ub 
## tau^2    0.0066 0.0000  0.3010 
## tau      0.0812 0.0000  0.5486 
## I^2(%)  21.4317 0.0000 92.5694 
## H^2      1.2728 1.0000 13.4578
par(mar = c(4, 4, 1, 2))

res <- rma(ai = PUFA_Events, n1i = PUFA_Total, ci = Control_Events, 
        n2i = Control_Total, data = dat, measure = "RR",
           slab = paste(Study_ID, sep = ", "), method = "DL")

res_REML <- rma(ai = PUFA_Events, n1i = PUFA_Total, ci = Control_Events, 
        n2i = Control_Total, data = dat, measure = "RR",
           slab = paste(Study_ID, sep = ", "), method = "REML")

Although I have used both the Dersimonian-Laird estimator, as used in the original analysis by the authors, I have chosen instead to also conduct the analysis using the restricted maximum likelihood estimator (REML) and plot that instead, as it often gives wider interval estimates, with better coverage properties.

## Data Visualization - Forest Plot Structure

forest(res_REML, xlim = c(-16, 6), at = log(c(0.05, 0.25, 1, 2)), atransf = exp,
       ilab = cbind(dat$PUFA_Events, dat$PUFA_Total, dat$Control_Events, dat$Control_Total),
       ilab.xpos = c(-9.5, -8, -6, -4.5), cex = 0.85, ylim = c(-1, 8.6),
       xlab = "Risk Ratio", mlab = "", psize = 1.7, pch = 16)

## Heterogeneity 

text(-16, -1, pos = 4, cex = 0.85, bquote(paste("Random Effects (Q = ",
   .(formatC(res_REML$QE, digits = 2, format = "f")), ", df = ", .(res_REML$k - res$p),
    ", p = ", .(formatC(res_REML$QEp, digits = 2, format = "f")), "; ", I^2, " = ",
     .(formatC(res_REML$I2, digits = 1, format = "f")), "%)")))

op <- par(cex = 0.85, font = 4)

## Bold Font

par(font = 2)

## Columns

text(c(-9.5, -8, -6, -4.5), 7.5, c("Events ", " Total", "Events ", " Total"))
text(c(-8.75, -5.25),     8.5, c("PUFA Diet", "Control Diet"))
text(-16,                7.5, "Study Name",  pos = 4)
text(6,                  7.5, "Risk Ratio [95% CI]", pos = 2)

Now suppose we also wished to see the P-value function of these results, we could simply do so using the concurve R package’s curve_meta() function.

## Random-Effects Model (k = 6; tau^2 estimator: REML)
## tau^2 (estimated amount of total heterogeneity): 0.0066 (SE = 0.0183)
## tau (square root of estimated tau^2 value):      0.0812
## I^2 (total heterogeneity / total variability):   21.43%
## H^2 (total variability / sampling variability):  1.27
## Test for Heterogeneity:
## Q(df = 5) = 5.9549, p-val = 0.3106
## Model Results:
## estimate      se     zval    pval   ci.ub 
##  -0.1361  0.0720  -1.8911  0.0586  -0.2771  0.0050  . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
PUFA_curve <- curve_meta(res_REML, measure = "ratio", cores = 4L)
ggcurve(PUFA_curve[[1]], nullvalue = c(1), levels = c(0.5, 0.75, 0.95),
        title = "P-value Function of PUFA Meta-Analysis",
        subtitle = "Reanalysis of Mozaffarian et al. 2010",
        xaxis = "Risk Ratio") +
  labs(caption = "Restricted Maximum Likelihood Estimator used")

Although I believe the results still suggest an effect, if another analyst was not convinced by these effects and set interval nulls that spanned from risk ratios from 0.9 to 1.1, as being equivalent to no effect, then these results would be in some trouble! We can see this easily with concurve, which allows us to set the interval null regions.

ggcurve(PUFA_curve[[1]], nullvalue = c(0.9, 1.1), 
        levels = c(0.5, 0.75, 0.95),
        title = "P-value Function of PUFA Meta-Analysis",
        subtitle = "Reanalysis of Mozaffarian et al. 2010",
        xaxis = "Risk Ratio") +
  annotate(geom = "text", x = 1.0, y = 0.50, 
           label = "Interval Null Region", 
           size = 4, color = "#000000", alpha = 0.5) +
  labs(caption = "Restricted Maximum Likelihood Estimator used")

Edit: Dr. Mozaffarian has replied to some of these criticisms in the comments section of his paper on PLOS Medicine, and unfortunately, I have found the responses to be very poor. The response pretty much sums up to, "to the best of our knowledge when we were critically appraising the literature, this study seemed like a randomized trial, so we labeled it as such, and it may be appropriate to label it quasi-randomized, just because we believe that.

Unfortunately, I don’t think that’s how good science progresses at all, and in fact, this error has the potential to cause serious confusion in the future if it goes uncorrected.


The analyses were run on:

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.6
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## Random number generation:
##  RNG:     Mersenne-Twister 
##  Normal:  Inversion 
##  Sample:  Rejection 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## other attached packages:
## [1] ggplot2_3.3.2  concurve_2.7.7 metafor_2.4-0  Matrix_1.2-18 
## loaded via a namespace (and not attached):
##  [1] tidyr_1.1.2           splines_4.0.3         carData_3.0-4         ProfileLikelihood_1.1 askpass_1.1          
##  [6] cellranger_1.1.0      yaml_2.2.1            bcaboot_0.2-1         gdtools_0.2.2         pillar_1.4.6         
## [11] backports_1.1.10      lattice_0.20-41       glue_1.4.2            uuid_0.1-4            digest_0.6.25        
## [16] ggsignif_0.6.0        colorspace_1.4-1      htmltools_0.5.0       pkgconfig_2.0.3       broom_0.7.1          
## [21] haven_2.3.1           bookdown_0.21         xtable_1.8-4          purrr_0.3.4           scales_1.1.1         
## [26] km.ci_0.5-2           openxlsx_4.2.2        officer_0.3.14        rio_0.5.16            KMsurv_0.1-5         
## [31] tibble_3.0.4          openssl_1.4.3         farver_2.0.3          generics_0.0.2        car_3.0-10           
## [36] ellipsis_0.3.1        ggpubr_0.4.0          withr_2.3.0           credentials_1.3.0     survival_3.2-7       
## [41] magrittr_1.5          crayon_1.3.4.9000     readxl_1.3.1          evaluate_0.14         nlme_3.1-149         
## [46] rstatix_0.6.0         forcats_0.5.0         xml2_1.3.2            foreign_0.8-80        blogdown_0.21        
## [51] tools_4.0.3           data.table_1.13.0     hms_0.5.3             lifecycle_0.2.0       stringr_1.4.0        
## [56] flextable_0.5.11      munsell_0.5.0         zip_2.1.1             compiler_4.0.3        survminer_0.4.8      
## [61] pbmcapply_1.5.0       systemfonts_0.3.2     rlang_0.4.8           grid_4.0.3            sys_3.4              
## [66] base64enc_0.1-3       rmarkdown_2.4         boot_1.3-25           gtable_0.3.0          abind_1.4-5          
## [71] curl_4.3              R6_2.4.1              zoo_1.8-8             gridExtra_2.3         knitr_1.30           
## [76] dplyr_1.0.2           survMisc_0.5.5        stringi_1.5.3         parallel_4.0.3        Rcpp_1.0.5           
## [81] vctrs_0.3.4           tidyselect_1.1.0      xfun_0.18

  • Cite this blog post

  • See also:

    comments powered by Disqus