statistics on Less Likely
https://lesslikely.com/archives/statistics/
Recent content in statistics on Less LikelyHugo -- gohugo.ioen-us2022 | Less LikelyMon, 01 Feb 2021 00:00:00 +0000Simulation of a Two-Group Parallel-Arm RCT with Interim Analyses
https://lesslikely.com/statistics/interim-analysis/
Mon, 01 Feb 2021 00:00:00 +0000https://lesslikely.com/statistics/interim-analysis/Setting up the Function A Simulated Example Examining the Results Statistical Package Citations Environment Article Citation Recently Andrew Althouse informed me that he was going to simulate a two-group parallel-arm randomized trial with interim analyses using the rpact R package, so I offered to also help in constructing the R code to do so. He already has a number of R scripts on his GitHub repo for doing similar simulations, which can be viewed here and a number of tweets explaining these simulations. For this example, his goal was to simulate a trial where the outcome was binary and the probability of death for each group could be tuned in addition to:
the total number of participants the number of interim analyses the schedule of the interim analyses the group-sequential design used along with the usual trial analysis parameters such as:
the \(\alpha\)-level the type of test (1-sided vs. 2-sided). The goal was to be able to produce a table of various statistics such as:
the odds ratio the confidence limits the \(P\)-value the number of successes for each of the interim analyses specified.
The function below is a reflection of our efforts to do so, and also returns several plots from the rpact package for the design that is chosen along with a plot comparing the design to other designs. In order to get similar results, you will need to load the R function first, and then simply enter the proper inputs. While there may be more efficient ways to write the code, for example using lapply() instead of for loops, we have chosen not to do so, and we have also tried to minimize the number of R packages necessary for the function to work but the following will be required:
rpact stringr You can quickly install and load both using:
req_packs <- c("rpact", "stringr", "ggplot2") install.packages(req_packs) lapply(req_packs, library, character.only = TRUE) Setting up the Function #' @title Simulation of a Two-Group Parallel-Arm Trial With Interim Analyses #' @docType Custom function for simulation from the rpact package #' @author Andrew Althouse with edits by Zad Rafi #' NOTE: If you want to confirm "type 1 error" under different stopping rules, #' make death = in the two treatment arms (e.g. no treatment effect) #' NOTE: I have set this one up to test the power for a treatment that would reduce mortality #' from 40% in control group (1) to 30% in treatment group (2) #' NOTE: Trial Design Parameters - Part 1 #' Here we will specify the basics: total N patients to enroll, and death rate for each treatment arm #' NOTE: Trial Design Parameters - Part 2 #' Here we will define the interim analysis strategy and stopping rules #' For this trial we will include provisions for efficacy stopping only (no pre-specified futility stopping) #' We will use the rpact package to compute the stopping/success thresholds at the interim and final analysis #' NOTE: Required packages: rpact and stringr #' @param nSims # The number of simulations, the default is 1000 #' @param nPatients # here is where you specify the planned max number of patients you want included in each RCT #' @param death1 # here is where you specify the event rate for patients receiving 'treatment 1' in these trial #' @param death2 # here is where you specify the event rate for patients receiving 'treatment 2' in these trials #' @param nLooks # here is where you put the number of looks that will take place (INCLUDING the final analysis) #' @param analyses_scheduled # schedule of interim analyses #' @param sided # Whether the test is 1-sided or 2-sided #' @param alpha # Specified alpha level, the default is 0.Tables, Graphs, and Computations from Rafi & Greenland (2020)
https://lesslikely.com/statistics/RG2020BMC/
Sat, 12 Dec 2020 00:00:00 +0000https://lesslikely.com/statistics/RG2020BMC/The following post provides some of the code that was used to construct the figures and tables from Rafi & Greenland, 20201. An enhanced PDF version of the paper can be found here. For further discussion of the computations, see the appendix of the main paper, along with our technical supplement.2
Disclaimer: I am responsible for all the code and mistakes below, and none of them can be attributed to my coauthors or my fellow package developers.
In order to recreate the functions, I would recommend installing the latest version of concurve from CRAN, as it has patched some issues with graphing when the outcome is binary. Use the script below to get the latest version and load the R package. A number of other R packages are also used in this post, which are listed below.
install.packages("concurve") library("concurve") Valid \(P\)-values Are Uniform Under the Null Model Here we show that valid \(P\)-values have specific properties, when the null model is true. We first generate two variables (\(Y\), \(X\)) that come from the same normal distribution with a \(\mu\) of 0 and \(\sigma\) of 1, each with a total of 1000 observations. We assume that there is no relationship between these two variables. We run a simple t-test between \(Y\) and \(X\) and iterate this 100000 times and compute 100000 \(P\)-values to see the overall distribution of the \(P\)-values, which we then plot using a histogram./
RNGkind(kind = "L'Ecuyer-CMRG") set.seed <- 1031 n.sim <- 100 t.sim <- numeric(n.sim) n.samp <- 1000 for (i in 1:n.sim) { X <- rnorm(n.samp, mean = 0, sd = 1) Y <- rnorm(n.samp, mean = 0, sd = 1) df <- data.frame(X, Y) t <- t.test(X, Y, data = df) t.sim[i] <- t[[3]] } ggplot(NULL, aes(x = t.sim)) + geom_histogram(bins = 30, col = "black", fill = "#99c7c7", alpha = 0.25) + labs(title = "Distribution of P-values Under the Null", x = "P-value") + scale_x_continuous(breaks = seq(0, 1, 0.10)) + theme_bw() This can also be shown using the TeachingDemos R package, which has a function dedicated to showing this phenomenon.
library("TeachingDemos") #> #> Attaching package: 'TeachingDemos' #> The following object is masked from 'package:plotly': #> #> subplot #> The following objects are masked from 'package:Hmisc': #> #> cnvrt.coords, subplot RNGkind(kind = "L'Ecuyer-CMRG") set.seed <- 1031 obs_p <- Pvalue.norm.sim(n = 1000, mu = 0, mu0 = 0, sigma = 1, sigma0 = 1, test= "t", alternative = "two.sided", alpha = 0.05, B = 100000) ggplot(NULL, aes(x = obs_p)) + geom_histogram(bins = 30, col = "black", fill = "#99c7c7", alpha = 0.25) + labs(title = "Distribution of P-values Under the Null", x = "P-value") + scale_x_continuous(breaks = seq(0, 1, 0.10)) + theme_bw() As you can see, when the null model is true, the distribution of \(P\)-values is uniform. Valid \(P\)-values are uniform under the null hypothesis and their corresponding \(S\)-values are exponentially distributed. We run the same simulation as before, but then convert the obtained \(P\)-values into \(S\)-values, to see how they are distributed.
RNGkind(kind = "L'Ecuyer-CMRG") set.What Makes a Sensitivity Analysis?
https://lesslikely.com/statistics/sensitivity/
Sat, 12 Dec 2020 00:00:00 +0000https://lesslikely.com/statistics/sensitivity/Cargo-Cult Uncertainty Vagueness of ‘Sensitivity’ Frequent Misconceptions Randomness & Uncertainty Portable Sensitivity Analyses Regulators & Sensitivity Analyses Estimands & Sensitivity Utility of Subject-Matter Experts Sensitivity in Clinical Trials Exploratory Analyses The Analysis Workflow Monte Carlo Error Analysis The Primary Analysis SA I: Controlled MI SA II: Selection Models Supplementary Analyses Full Analysis Set Examining the Analyses Acknowledgements Statistical Environments R Environment Stata Environment References This post will be split into two sections, in the first part, I will discuss the nature and difficulty of conducting sensitivity analyses and the quality of many published sensitivity analyses in various scientific fields, including medicine. I will then discuss a general framework that has existed for a long time but seen a major resurgence by regulatory agencies around the world that may help researchers conduct these analyses and other analyses in general.
In the second part of this post, I will use data from a historical parallel-arm RCT and analyze it from start to finish, however, the caveat is that there are extensive missing data, therefore I will have to utilize a number of non ad-hoc sensitivity analysis and missing data techniques to better understand these data were collected and how informative they are. Much of the code in this second part will be in R and Stata, but references will also be given to specific techniques and theory.
Cargo-Cult Uncertainty Sensitivity analyses are an important part of statistical science and many other disciplines when conducted in a principle and systematic manner. However, in the published sensitivity analyses literature, there are many inconsistencies, misconceptions, and highly misleading findings from these analyses. A giant obstacle that prevents these issues from recurring is that they (sensitivity analysis techniques) are quite difficult to learn and often advanced statistical methods that even many statisticians have difficulty with.
Yet, the appearance of difficulty does not dissuade certain researchers away from adopting them for their own work, (so that they may give their colleagues and their stakeholders that they report to, the impression of rigor and methodological expertise), and when this is done mindlessly, researchers will often skip over learning the details and theory, and like many statistical procedures, they will rely on default settings built in the statistical software.
While there are many suites/commands/functions/libraries available to conduct such analyses, much of these procedures masquerade as meticulous sensitivity analyses and to the users, are often a formality to appease stakeholders and give researchers a false sense of confidence about what they are doing. And yet, their users have little to no idea what they’re actually doing. Thus, like many statistical procedures new and old, they too will inevitably be abused as they become more popular and as they are required in research reports.
As Stark & Saltelli1, along with many others such as Gigerenzer2 and Greenland3 have written in the past. Below is an excerpt where Stark describes that much of statistics is simply people masquerading as data analysts and being rewarded as experts despite not having a single clue what they’re doing.Quality Control in Statistical Analyses
https://lesslikely.com/statistics/statistics-quality-control/
Sat, 13 Jun 2020 00:00:00 +0000https://lesslikely.com/statistics/statistics-quality-control/Data Entry & Management General Principle: Be Consistent With Everything Setting Up a Principled Statistical Workflow Setting up the Project Folder Setting up the Functions Script Setting up the Error Catching Script Example of Data Loading & Inspection Example of the Main Source Script Setting the Seed Example of a Multiple Imputation Script Saving and Reading Data Carefully Example Modeling Script Example Table Script Possible Drawbacks of this Approach Helpful Resources Environment References Experienced statisticians and data analysts are familiar with stories where a coding error has led to an entire conclusion changing, or even a retraction.1 It’s the sort of stuff that keeps people up at night. Unfortunately, not many of us think about these sorts of scenarios until we realize it’s very possible that it could happen to any of us.
To me, it seems that many of these issues could be avoided by having a principled data management and statistical workflow, and making it as transparent, open, and reproducible as possible. I’d like to quickly go over a few things that I’ve found helpful over the years, and I’ll first start with data management and data entry and then move onto analysis workflows. I largely consider this to be a living document, and I’m sure many people who will read this will have far better suggestions, so please leave them down below in the comments!
Before I go on, I want to emphasize that backing up your data, scripts, and using version control is extremely important. There is no debate about this. It’s necessary so that other collaborators/colleagues can inspect your work and catch potential mistakes or see overall progress, but more importantly, it will prevent you from losing your data in a disaster, and it’ll help you catch your own mistakes, since you’ll be the most familiar with the data and scripts.
Data Entry & Management A nice paper that I’d like to review is the one by Broman & Woo, 2018 on how to manage your data when working with spreadsheets.2 The sad reality is that even though spreadsheets like Microsoft Excel or Google Sheets are available everywhere, and easy to use, there are many risks when working with spreadsheets, just ask any statistician who works in genetics or any bioinformatician.3
One of the most fatal errors occurred recently when a group of researchers lost thousands of documented COVID cases because they entered data for each case as a column instead of a row, and Excel has a limit on how many columns and rows it can handle (1,048,576 rows and 16,384 columns, according to Microsoft), as a result, most of these cases were lost, resulting in an enormous waste of resources due to a careless and ignorant mistake, highlighting the dangers of recklessly inputting data and conducting statistical analyses. There is no doubt that reviewing principles of good data management and workflow are essential to any data analyst. I’d like to touch on some of the most important points of Broman & Woo, 2018 paper before moving onto some other “principles” I’d like to share:Book Review: Regression and Other Stories by Gelman, Hill, and Vehtari
https://lesslikely.com/statistics/regression-stories/
Tue, 11 Jun 2019 00:00:00 +0000https://lesslikely.com/statistics/regression-stories/Over a decade ago, Andrew Gelman and Jennifer Hill gave applied researchers a comprehensive book (Data Analysis Using Regression and Multilevel/Hierarchical Models) on fitting simple and complex statistical models in R both from a classical framework and a Bayesian one. Now they’re back with an updated version and a new author (Aki Vehtari).
Much has changed in applied statistics since 2006 (when the book was first released). The primary software used at the time and in the book to fit Bayesian models was BUGS (Bayesian inference Using Gibbs Sampling).
However, both BUGS and some of the R code in the first edition are now outdated. The new edition updates the R code and contains intuitive instructions on how to fit simple and complex models using the probabilistic programming language, Stan (also developed by Gelman and colleagues), which is now used in several fields (even for studying wine!).
Indeed, running a Bayesian regression model in R is now as simple as
# I use the sample PlantGrowth dataset in R library("rstanarm") library("bayesplot") pg <- PlantGrowth model1 <- stan_glm(weight ~ group, data = pg, refresh = 0) summary(model1) #> #> Model Info: #> function: stan_glm #> family: gaussian [identity] #> formula: weight ~ group #> algorithm: sampling #> sample: 4000 (posterior sample size) #> priors: see help('prior_summary') #> observations: 30 #> predictors: 3 #> #> Estimates: #> mean sd 10% 50% 90% #> (Intercept) 5.0 0.2 4.8 5.0 5.3 #> grouptrt1 -0.4 0.3 -0.7 -0.4 0.0 #> grouptrt2 0.5 0.3 0.2 0.5 0.9 #> sigma 0.6 0.1 0.5 0.6 0.8 #> #> Fit Diagnostics: #> mean sd 10% 50% 90% #> mean_PPD 5.1 0.2 4.9 5.1 5.3 #> #> The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')). #> #> MCMC diagnostics #> mcse Rhat n_eff #> (Intercept) 0.0 1.0 2866 #> grouptrt1 0.0 1.0 3054 #> grouptrt2 0.0 1.0 3192 #> sigma 0.0 1.0 2944 #> mean_PPD 0.0 1.0 3666 #> log-posterior 0.0 1.0 1688 #> #> For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1). plot(model1) Another key difference between the first edition and the new edition is that the 2006 book attempted to cover several topics at once. It contained instructions on how to fit simple models in a classical framework all the way up to multilevel models in a Bayesian framework. The new edition attempts to reduce this information overload by splitting itself into two volumes.
The first volume (Regression and Other Stories) covers fitting simple and complex models using R and Stan, and is oriented towards the applied researcher or statistician, who wants a smooth introduction to fitting Bayesian models using Stan without diving into much theory or math.
A draft copy of the table of contents in the new edition can be found here, though it’s very likely that the published edition will have some changes.Book Review: Fisher, Neyman, and the Creation of Classical Statistics
https://lesslikely.com/statistics/classical-lehmann/
Sun, 30 Dec 2018 00:00:00 +0000https://lesslikely.com/statistics/classical-lehmann/A Very Brief History of Classical Statistics Where Are The Bayesians? Fisher’s Contributions Neyman’s Contributions The Fallout Between The Creators Of Classical Statistics There Is No One Neyman Nor One Fisher References Erich Lehmann’s last book, (1) which was published after his death, is on the history of classical statistics and its creators. Specifically, how his mentor Jerzy Neyman and his adversary Ronald Fisher helped lay the foundations for the methods that are used today in several fields.
A Very Brief History of Classical Statistics This post is intended to be a general review/summary of the book, which I recommend to everyone and anyone who is interested in statistics and science. The book clears up several misconceptions people have about how frequentist statistics came to be the dominant school of statistics. Thus, I want to go over four topics from Lehmann’s book that I believe people should know more about:
How the founders of classical statistics viewed Bayesian inference
What they each developed
How they came to become so conflicted
And how their views changed over time
Where Are The Bayesians? As Stephen Senn points out in his Fisher Memorial Lecture at the Royal Statistical Society, there is a common myth that everyone who practiced applied statistics before the early 20th century was using Bayesian inference and doing everything correctly, but then Fisher came in and created significance testing, thus giving researchers a powerful tool to easily hack their data and produce publishable results, and now we have several replication crises because of this.
Of course, this is far from the truth and any thorough investigation into the history of statistics will clear up this up amongst many other misconceptions.
As several individuals may know, it was Thomas Bayes who came up with Bayes theorem and it was Richard Price who disseminated most of his writings after Bayes’s death. However, as many self-identified Bayesians will attest, using Bayes’ theorem does not make one a Bayesian. It is actually quite hard to know how Bayes would react to modern Bayesian inference. The Bayesian inference that we are familiar with today can be attributed to Pierre-Simon Laplace, who popularized what is now known as “objective Bayes.”
Back then, it was not called “Bayesian inference” but was referred to as “inverse probability” and it was a method used by many before the dominance of classical statistics. So this is one part that common myths get right. Inverse probability did indeed have a moment in history before the dominance of frequentist statistics.
Laplace, and several others popularized such methods, but around the end of the 19th century, the tides began to shift. Several mathematicians and statisticians began to discourage the use of inverse probability because they saw it as a nonrigorous method of data analysis.
This can be seen in the following passages about Fisher.
“His first publication on this new approach to inference was a 1930 paper “Inverse probability.” The paper begins with a critique of the inverse (Bayesian) method. This section ends with Fisher’s asking:P-values Are Tough And S-values Can Help
https://lesslikely.com/statistics/s-values/
Sun, 11 Nov 2018 00:00:00 +0000https://lesslikely.com/statistics/s-values/What is a P-value Anyway? Some Definitions & Descriptions Misleading Definitions Auxilliary Assumptions Probability of What? Properties (Uniformity) The Different Interpretations The Decision-Theoretic Approach Statistical Significance The Inductive Approach Null-Hypothesis Significance Testing Measure of Compatibility Common, Misleading Criticisms Estimation and Intervals Overstating the Evidence Some Valid Issues Mismatch With Direction Difficulties Due to Scale Resolution with Surprisals Some Examples S-value Calculator References The \(P\)-value doesn’t have many fans. There are those who don’t understand it, often treating it as a measure it’s not, whether that’s a posterior probability, the probability of getting results due to chance alone, or some other bizarre/incorrect interpretation.1–3 Then there are those who dislike it because they think the concept is too difficult to understand or because they see it as a noisy statistic we’re not interested in.
However, the groups of people mentioned above aren’t mutually exclusive. Many who dislike and criticize the \(P\)-value also do not understand its properties and behavior. This is unfortunate, given how important and widely used they are. In this article, which could also have been titled, \(P\)-values: More Than You Ever Wanted to Know, I take on the task of explaining:
what \(P\)-values are the assumptions behind them their properties and behavior different schools of interpretation misleading criticisms of \(P\)-values some valid issues in interpretation how these issues can be resolved What is a P-value Anyway? Some Definitions & Descriptions The \(P\)-value is the probability of getting a result (specifically, a test statistic) at least as extreme as what was observed if every model assumption, in addition to the targeted test hypothesis (usually a null hypothesis), used to compute it were correct.3–5
A simple, mathematically rigorous definition of a \(P\)-value (for those interested) is given by Stark (2015).
Let \(P\) be the probability distribution of the data \(X\), which takes values in the measurable space \(\mathcal{X}\). Let \(\left\{R_{\alpha}\right\}_{\alpha \in[0,1]}\) be a collection of \(P\) -measurable subsets of \(\mathcal{X}\) such that (1) \(P\left(R_{\alpha}\right)=\alpha\) and (2) If \(\alpha^{\prime}<\alpha\) then \(R_{\alpha^{\prime}} \subset R_{\alpha}\). Then the \(P\)-value of \(H_{0}\) for data \(X=x\) is inf \(_{\alpha \in[0,1]}\left\{\alpha: x \in R_{\alpha}\right\}\).
A descriptive but technical definition is given by Sander Greenland below. The description can seem dense, so feel free to skip over it for now and revisit it after reading the rest of the post.
A single \(P\)-value \(p\) is the quantile location of a directional measure of divergence \(t\) = \(t(y;M)\) of the data point \(y\) (usually, the vector in \(n\)-space formed by $$n individual observations) from a test model manifold \(M\) in the \(n\)-dimensional expectation space defined the logical structure of the data generator (“experiment” or causal structure) that produced the data \(y\). \(M\) is the subset of the \(Y\)-space into which the conjunction of the model constraints (assumptions) force the data expectation or predict where y would be were there no ‘random’ variability. I also use \(M\) to denote the set of all the model constraints, as well as their conjunction.
With this logical set-up, the observed \(P\)-value is the quantile \(p\) for the observed value \(t\) of \(T\) = \(t(Y;M)\).We May Not Understand Control Groups
https://lesslikely.com/statistics/control-group-effects/
Sun, 28 Oct 2018 00:00:00 +0000https://lesslikely.com/statistics/control-group-effects/The Placebo Effect Isn’t What We Often Think The Additive Model Is Too Simple Possible Directions References It’s well known that randomized trials are some of the most efficient ways to make causal inferences and to determine how much something (an intervention) differs from the comparator (some sort of control). Random assignment helps make these goals easier by minimizing selection bias and making the distribution of prognostic factors between groups random (not balanced).1
Discussions (similar to the one above) praising the efficiency of randomized trials are widespread, however, few of these discussions take a close look at some of the common assumptions that individuals hold regarding randomized trials. And unfortunately, these common assumptions may be based on outdated evidence and simplistic ideas.
The Placebo Effect Isn’t What We Often Think For example, in placebo-controlled trials, many individuals make the assumption that much of the improvement seen in the control group over time is due to the placebo effect, as modeled by the graph below.
However, inquiries into this topic have yielded contradictory results, in that the placebo effect may not be as powerful as we assume.
One systematic review2 that looked at 130 clinical trials concluded the following,
“We found little evidence in general that placebos had powerful clinical effects. Although placebos had no significant effects on objective or binary outcomes, they had possible small benefits in studies with continuous subjective outcomes and for the treatment of pain. Outside the setting of clinical trials, there is no justification for the use of placebos.”
Thus, the placebo effect had some small effects in areas where it is difficult to objectively measure a phenomenon and where there is a higher likelihood for encountering measurement error. However, measurement error is not the only thing that could explain some of the improvements seen in control groups, which we often attribute to the placebo effect.
We simply need to reread two sections of a paper by the very anesthesiologist who popularized the placebo effect3 and who claimed that it had the ability to put patients’ conditions into remission.
Here, Henry Beecher (the anesthesiologist) claims that placebo effects are powerful:
“It is evident that placebos have a high degree of therapeutic effectiveness in treating subjective responses, decided improvement, interpreted under the unknowns technique as a real therapeutic effect, being produced in 35.2 ± 2.2% of cases….
Placebos have not only remarkable therapeutic power but also toxic effects. These are both subjective and objective. The reaction (psychological) component of suffering has power to produce gross physical change.”
Then he reports another observation from his patients.
“The evidence is that placebos are most effective when the stress is greatest.”
If you measured someone’s stress (let’s say objectively, via cortisol measurements) and they were extremely stressed and if you followed up after a certain period of time, it would be very likely that the next measurement of their stress (cortisol levels) would be less than the first measurement.
A slightly reverse situation would likely apply too: if you measured someone’s happiness (in some theoretical objective way) and they were extremely happy, it’s very likely that their next measurement of happiness would not be so high.Misplaced Confidence in Observed Power
https://lesslikely.com/statistics/misplaced-power/
Sun, 30 Sep 2018 00:00:00 +0000https://lesslikely.com/statistics/misplaced-power/Two months ago, a study came out in JAMA which compared the effectiveness of the antidepressant escitalopram to placebo for long-term major adverse cardiac events (MACE).
The authors explained in the methods section of their paper how they calculated their sample size and what differences they were looking for between groups.
First, they used some previously published data to get an idea for incidence rates,
“Because previous studies in this field have shown conflicting results, there was no appropriate reference for power calculation within the designated sample size. The KAMIR study reported a 10.9% incidence of major adverse cardiac events (MACE) over 1 year… Therefore, approximately 50% MACE incidence was expected during a 5-year follow-up.”
Then, they calculated their sample size based on some differences they were interested in finding,
“Assuming 2-sided tests, α = .05, and a follow-up sample size of 300, the expected power was 70% and 96% for detecting 10% and 15% group differences, respectively.”
So far so good.
Then, we get to the results,
“A significant difference was found: composite MACE incidence was 40.9% (61/149) in the escitalopram group and 53.6% (81/151) in the placebo group (hazard ratio [HR], 0.69; 95% CI, 0.49-0.96; P = .03). The model assumption was met (Schoenfeld P = .48). The estimated statistical power to detect the observed difference in MACE incidence rates between the 2 groups was 89.7%.”
Ouch. This issue ended up bothering me so much that I wrote a letter to the editor (LTE) to point out the issue. Unfortunately, the LTE got rejected, but Andrew Althouse suggested that I discuss this over at DataMethods, so I did, and I also discussed it on Twitter but also wanted to publish the LTE on my blog. Here it is.
This letter has now been preprinted on arXiv.
In a similar tale, a group of surgeons published a methodological article advocating this practice of calculating observed power, which I further discuss here.The Bradford Hill Criteria Don't Hold Up
https://lesslikely.com/statistics/bradford-hill-criteria-dont-hold/
Thu, 06 Sep 2018 00:00:00 +0000https://lesslikely.com/statistics/bradford-hill-criteria-dont-hold/In 1965, the epidemiologist, Austin Bradford Hill, who helped link smoking to lung cancer, gave a speech where he presented his viewpoints on how we can arrive at causation from correlation.
This lecture was a bit of a game changer at the time given that the tobacco industry was employing statisticians, medical doctors, and even popular science writers to push the idea that the relationship between smoking and lung cancer was merely a correlation, not a causal one.
Both the tobacco industry and empiricists argued that the existing data was not very convincing because there were no human experiments showing that smoking causes lung cancer. Of course, this was a sticky situation because it was neither ethical nor practical to randomize people and force them to smoke cigarettes and compare rates of lung cancer to a control group.
Austin Bradford Hill and his co-investigator, Richard Doll, were able to argue with compelling data from the British Doctors’ Study that smoking cigarettes does indeed cause lung cancer, when combining these data with several other data (molecular, cellular) and referring to a set of viewpoints.
Since then, these viewpoints have been used as a sort of checklist/criteria in several papers and by several authors to assess causality when randomized trials weren’t possible. For example, Science-Based Medicine often refers to these viewpoints as criteria and recently discussed their usage in a blog post reviewing a book about hormonal therapy.
In this post, drawing heavily from Rothman, Greenland & Lash, 2008 (from here on, RGL), I want to discuss these viewpoints, and why many of them don’t often hold up when critically examined.
Hill’s nine viewpoints were the following:
Strength Stronger associations, according to Hill, were more compelling for causal relationships than weaker associations because of the possibility of unmeasured confounding leading to weak associations between two phenomena. He uses the example of smoking and cancer and compares it to thrombosis in smokers,
“…prospective inquiries into smoking have shown that the death rate from cancer of the lung in cigarette smokers is nine to ten times the rate in non-smokers and the rate in heavy cigarette smokers is twenty to thirty times as great. On the other hand the death rate from coronary thrombosis in smokers is no more than twice, possibly less, the death rate in non-smokers.”
It does indeed sound compelling. Science-Based Medicine uses the same argument for alternative medicine,
“If acupuncture or homeopathy were 400 times superior to placebo, there would no discussion of its validity. Many medical therapies are not 400 times as effective as placebo, but the strength of the association between cause and effect is well above background noise.”
Why It Doesn’t Hold Up Several causal relationships that we know of today, such as the relationship between smoking and cardiovascular disease, and environmental tobacco smoke and lung cancer, had weak associations. However, Hill didn’t discount weak associations as seen here,
“In thus putting emphasis upon the strength of an association we must, nevertheless, look at the obverse of the coin.Misuse of Standard Error in Clinical Trials
https://lesslikely.com/statistics/standard-error-clinical-trials/
Tue, 07 Aug 2018 00:00:00 +0000https://lesslikely.com/statistics/standard-error-clinical-trials/Calculating Standard Deviation Calculating Standard Error Misuse of Standard Error References Reporting effect sizes with their accompanying standard errors is necessary because it lets the reader interpret the magnitude of the treatment effect and the amount of uncertainty in that estimate. It is magnitudes better than not providing any effect size at all and only focusing on statements of statistical significance.
Although many authors provide standard errors with the intention of relaying the amount of uncertainty in the estimate, there are several misconceptions about when the standard error should be reported, and so it is often misused.
I believe it is worth explaining what exactly standard error is and how it differentiates from something like standard deviation. Standard deviation is a descriptive statistic that tells us about the variability of the data. Let’s quickly take a look at how it’s calculated using random data.
Calculating Standard Deviation \[4, 5, 10, 6, 7, 10, 10\]
\[4 + 5 + 10 + 6 + 7 + 10 + 10 = 52\]
The mean is: \(52 / 7 = 7.43\)
Next, we calculate the sum of squared errors, which can be thought of as the total deviance from the mean. We calculate this by taking each value in the data (\(4, 5, 10, 6, 7, 10, 10\)) and subtracting the mean from each value, which gives us the deviance.
\[(4 - 7.43) = -3.43\]
\[(5 - 7.43) = -2.43\]
\[(10 - 7.43) = 2.57\]
\[(6 - 7.43) = -1.43\]
\[( 7 - 7.43) = -0.43\]
\[(10 - 7.43) = 2.57\]
\[(10 - 7.43) = 2.57\]
All of the above values are deviances. We square each of them and add them together to get the sum of squared errors.
\[(-3.43)^2 + (-2.43)^2 + (2.57)^2 + (-1.43)^2 + (-0.43)^2 + (2.57)^2 + (2.57)^2 =\]
Sum of Squared Errors = \(39.68\)
We get the variance by taking the sum of squared errors and dividing by the number of data points \(-1\).
\[39.68 / 7-1 = 6.61\]
Variance = \(6.61\)
Finally, we get the standard deviation by taking the square root of the variance, \[(\sqrt{6.61})\]
Standard deviation = \(2.57\)
There we have it. Again, the standard deviation is a descriptive statistic which tells us about the variability of the data.
Calculating Standard Error The standard error, on the other hand, is the standard deviation of the sampling distribution. It can be thought of as the standard deviation of several means. It is an analytic statistic that tells us about the uncertainty in our point estimate. It is calculated by taking the standard deviation and dividing it by the square root of the total number of data points.
\[2.57 / (\sqrt7) = 0.97\]
Standard error of the mean = \(0.97\)
This and the mean would typically be reported as such \(7.43 ± 0.97\)
Misuse of Standard Error In clinical trials, participants are often recruited nonrandomly but are assigned to groups via random assignment. This random allocation allows us to make inferences about those treatment differences (Kempthorne, 1977; Senn & Auclair, 1990).Vitamin E, Mortality, and the Bayesian Gloss
https://lesslikely.com/statistics/bayesian-vitamin-e/
Wed, 20 Jun 2018 00:00:00 +0000https://lesslikely.com/statistics/bayesian-vitamin-e/Bayesian data analysis is beginning to gain traction in several fields. Some of those reasons include that it allows individuals to represent uncertainty using probability distributions and it helps them avoid losing information that’s typically lost with point estimates and dichotomization.
Bayesian inference also allows for relevant background information to be incorporated into a model using a more continuous approach rather than making binary decisions about what to include.
This approach generally plays out well when the background information being incorporated is based on empirical evidence. However, when a prior is poorly chosen and is given a lot of weight, it transfers into the posterior distribution and can yield results that are no more helpful than using a frequentist approach.
Sander Greenland discusses this particular issue in depth in his response to a Bayesian meta-analysis that was done on Vitamin E and mortality.
To give some context, it was well known in the biochemical literature at the time that antioxidants reduced oxidative stress, so naturally, everyone was interested to see whether antioxidant supplements, such as Vitamin E could be used for any health outcomes.
Several randomized controlled trials had investigated the effects of vitamin E supplementation on mortality, and a few conventional meta-analyses that pooled these studies found that:
high doses of vitamin E supplementation increased the risk of mortality
low doses of vitamin E reduced the risk of mortality
studies that were at low risk of bias showed that vitamin E increased the risk of mortality
studies that were at high risk of bias showed that vitamin E decreased the risk of mortality.
The data weren’t all in agreement. But based on what we knew of mechanisms, combined with the human data, it made sense to believe that vitamin E, at specific dosages, may have helped certain people (people with deficiencies), been useless for a lot of people, and harmed others (induced toxicity). It doesn’t make much sense to conclude that vitamin E has no effect and to extrapolate this to the whole population.
Furthermore, many of the meta-analyses were pooling together trials that used the synthetic form of Vitamin E, dl-tocopheryl acetate, a racemic mixture comprising 50% of but one of 8 known active naturally occurring molecules with vitamin E activity and extrapolating these results to the entire class of molecules.
It would be the equivalent to studying only Vitamin K1 in trials, which is found in plants, finding little to no activity on various biomarkers, and extrapolating those results to the entire Vitamin K class with meta-analyses. So from the beginning, many of these meta-analyses and their conclusions were already displaying ignorance of biochemistry.
In 2009, a group of researchers wanted to end the confusion regarding vitamin E and mortality and decided to go with a Bayesian hierarchical meta-analysis and incorporate prior information into the model, thinking that it would be the solution the mixed results found in the past. They explain their reasoning for using a Bayesian meta-analysis,
“This relationship should be modeled in a robust fashion and presented in a way that clarifies the strength of the relationship and, to the extent possible, resolves residual uncertainties about the relationship and its strength.Problems with the Number Needed to Treat
https://lesslikely.com/statistics/problems-with-nnt/
Sun, 27 May 2018 00:00:00 +0000https://lesslikely.com/statistics/problems-with-nnt/Calculations Incidence Proportions Absolute Risk Reduction Relative Risk Numbers Needed to Treat Major Problem: Missing Information Comparator Uncertainty Time Period Rounding NNTs References The number needed to treat (NNT) is a popular statistic used in medicine and its use is encouraged by groups like Cochrane and CONSORT. Even the famous statistician, Doug Altman was a big proponent of it. Why is it so popular? Most believe that the NNT is more understandable than measures like odds ratios or risk ratios or statistics like the absolute risk reduction. The NNT is also believed to convey more meaningful information.
In this blog post, I am going to discuss:
what the NNT is
how to calculate an NNT
some of the problems associated with the use of the NNT
solutions to the problems associated with the use of the NNT
The NNT is defined as the number of patients that need to be treated for one patient to experience a positive outcome or for one patient not to experience an adverse outcome. So, an NNT of 100 means that 100 patients need to be treated for one patient actually to benefit. The NNT is the inverse of the absolute risk reduction (ARR) (1/ARR). I will show how they are calculated with a sample contingency table from a controlled study comparing the drug Apixaban to placebo for recurrent thromboembolism (RT). Apixaban is the treatment for the first row and placebo is the treatment for the second row. Bolded numbers are the totals.
Recurring Events Non Recurring Events Total Events Apixaban 14 826 840 Control 73 756 829 Total Events 87 1582 1669 Calculations We will calculate four different measures: incidence proportions, absolute risk reduction, relative risk, and then the NNT.
Incidence Proportions In the group that received the drug, 14 people experienced RT out of 840 people (14/840). So, about 1.7%. In the control group, 73 people experienced disease progression out of 829 people (73/829), which is 8.8%. These are incidence proportions.
Absolute Risk Reduction We calculate the absolute risk reduction by finding the difference between the incidence proportions: 8.8% (control group) - 1.7% (treatment group) = 7.1% So, the use of the drug Apixaban lowered the absolute risk of RT by 7.1%.
Relative Risk Most people do not enjoy thinking in terms of absolute risk reduction, and they prefer to think in terms of relative risk. We calculate the relative risk by taking the incidence rates and dividing them by one another. So, 8.8% / 1.7% = 5.2
In words, patients who took placebo were 5.2 times more likely to develop RT than patients who took Apixaban.
We can also do 1.7% / 8.8% = 0.19, which means that people who were treated with Apixaban had 19% of the risk that the control group had in developing RT.
Numbers Needed to Treat Relative risk is more easily understood than absolute risk reduction, but relative risk can also be deceiving, and most in medicine prefer to work with absolute risk reduction, so they prefer to use the numbers needed to treat which can be calculated by taking the inverse of the absolute risk reduction which is 1 / 7.Calculating Observed Power Is Just Transforming Noise
https://lesslikely.com/statistics/observed-power-magic/
Fri, 04 May 2018 00:00:00 +0000https://lesslikely.com/statistics/observed-power-magic/“We Respectfully Disagree” Zombie Power The Frustration & Wrath of Statisticians My Response Script to Reproduce Simulations Environment The analyses were run on Around April of last year, a group of surgeons published a paper in the Annals of Surgery (apparently one of the most read journals in surgical science) where they suggested that CONSORT and STROBE guidelines be modified to recommend calculations of post-hoc power, specifically observed power.
They write,
“But, as 80% power is difficult to achieve in surgical studies, we argue that the CONSORT and STROBE guidelines should be modified to include the disclosure of power—even if less than 80%—with the given sample size and effect size observed in that study.”
Some folks noticed, including Andrew Gelman, who decided to write a letter to the editor (LTE) explaining why this is a bad idea.
Gelman writes in the LTE,
“This would be a bad idea. The problem is that the (estimated) effect size observed in a study is noisy, especially so in the sorts of studies discussed by the authors. Using estimated effect size can give a terrible estimate of power, and in many cases can lead to drastic overestimates of power (thus, extreme overconfidence of the sort that is rightly deplored by Bababekov et al. in their article), with the problem becoming even worse for studies that happen to achieve statistical significance.
The problem is well known in the statistical and medical literatures; see, e.g., Lane and Dunlap (1978), Hedges (1984), Goodman and Berlin (1994), Senn (2002), and Lenth (2007). For some discussion of the systemic consequences of biased power calculations based on noisy estimates of effect size, see Button et al. (2013), and for an alternative approach to design and power analysis, see Gelman and Carlin (2014).”
“We Respectfully Disagree” The authors replied of course, where most of their argument boils down to,
“We respectfully disagree that it is wrong to report post hoc power in the surgical literature. We fully understand that P value and post hoc power based on observed effect size are mathematically redundant; however, we would point out that being redundant is not the same as being incorrect… We also respectfully disagree that knowing the power after the fact is not useful in surgical science.”
Andrew Gelman notices and happens to blog about it, and also gets a chance to write a response to the response to the response to the original paper.
Guess now it’s over. Or well… we thought!
Zombie Power Same group of surgeons come back with a completely new paper where they conducted a review of the surgical literature, removed all significant results, and calculated the observed power of each nonsignificant study.
They summarize their findings,
“In this analysis of the current, high-impact surgical literature, we found the median power in studies concluding insignificant differences was 0.16, with only four studies reaching the standard of 0.8.”
They also draw some interesting parallels in their discussion section to show how change can be difficult but necessary,
“The continued inability of surgical studies to reach power of 0.Myth: Covariates Need to Be Balanced in RCTs
https://lesslikely.com/statistics/equal-covariates/
Tue, 10 Apr 2018 00:00:00 +0000https://lesslikely.com/statistics/equal-covariates/Before I get into the discussion, I want to remind everyone why we randomize in the first place. It’s to reduce selection bias and to get rid of systematic variation among groups, which allows us to come to more precise and efficient causal inferences. Many critics claim that we can’t make valid causal inferences if there’s an imbalance in covariates between the groups.
Here’s an example. Say, hypothetically, we had two groups, and we wanted to see the effect of a statin on all-cause mortality and compare it to placebo. We randomized our participants to both groups. Now, imagine our placebo group had more smokers in it than the statin group. We may think that this imbalance in smoking distribution puts us in trouble because there were already substantial between-group differences before the trial even started.
So, we haven’t even started the study, and we got a significant result from our test of homogeneity. Now we’re thinking, “oh no, I need to randomize again or abandon ship, or I’ll attempt to fit this covariate in a generalized linear model like an ANCOVA.”
Critics of RCTs will argue that because there’s also always the possibility of there being an imbalance of known or unknown covariates between groups, RCTs cannot make proper causal inferences, especially small RCTs that are “unable to distribute confounders effectively.”
Unfortunately, there are several problems with these beliefs and approaches.
Tests of homogeneity cannot tell you whether you’ve “completely randomized,” and they are also inappropriate, see the CONSORT statement).
It’s not always possible to rerandomize to achieve better balance in trials (even though the pioneer of randomization is believed to have one said to rerandomize when an undesirable configuration was obtained.)
Larger trials do not necessarily produce a better balance than smaller trials
The balance of covariates has never been the goal of randomization
Fitting covariates to a GLM after a test of significance is inappropriate
RCTs were never designed to perfectly balance covariates or even have very similar distributions of them. The purpose of randomization is to distribute hidden covariates, not perfectly, but efficiently and randomly.
Ronald Fisher’s recommendation, when he originally wrote Statistical Methods for Research Workers in the 1900s, was to block known covariates while randomizing all the hidden ones while looking at the effects of known covariates on both between-group variance and within-group variance (where everyone is getting the same treatment). The effects on the within-group variance would give you some idea as to the impact of the covariate.
Now, will imbalances of hidden covariates affect the point estimates? Of course, but they will also yield large standard errors and wide compatibility intervals, which should make you less sure about the results of your study. But it does not mean your study is unable to make causal inferences.
It’s also worth remembering that analyses of clinical trials are robust in that they account for these possible imbalances in both known and unknown covariates. When you have an imbalance in covariates, what do you get?Stan
https://lesslikely.com/statistics/stan/
Thu, 01 Jan 1970 00:33:38 +0000https://lesslikely.com/statistics/stan/R Markdown Stan Python R Stata Julia R Stata References Please don’t mind this post. I use this to try out various highlighting styles for my code and formats.
\begin{algorithm} \caption{Quicksort} \begin{algorithmic} \PROCEDURE{Quicksort}{$A, p, r$} \IF{$p < r$} \STATE $q = $ \CALL{Partition}{$A, p, r$} \STATE \CALL{Quicksort}{$A, p, q - 1$} \STATE \CALL{Quicksort}{$A, q + 1, r$} \ENDIF \ENDPROCEDURE \PROCEDURE{Partition}{$A, p, r$} \STATE $x = A[r]$ \STATE $i = p - 1$ \FOR{$j = p$ \TO $r - 1$} \IF{$A[j] < x$} \STATE $i = i + 1$ \STATE exchange $A[i]$ with $A[j]$ \ENDIF \STATE exchange $A[i]$ with $A[r]$ \ENDFOR \ENDPROCEDURE \end{algorithmic} \end{algorithm} YPost=β0+β1Group+β2Base+β3Age+β4Z+β5R1+β6R2+ϵY^{\operatorname{Post}} = \beta_{0} + \beta_{1}^{\operatorname{Group}} + \beta_{2}^{\operatorname{Base}} + \beta_{3}^{\operatorname{Age}} + \beta_{4}^{\operatorname{Z}} + \beta_{5}^{\operatorname{R1}} + \beta_{6}^{\operatorname{R2}} + \epsilonYPost=β0+β1Group+β2Base+β3Age+β4Z+β5R1+β6R2+ϵ \[Y^{\operatorname{Post}} = \beta_{0} + \beta_{1}^{\operatorname{Group}} + \beta_{2}^{\operatorname{Base}} + \beta_{3}^{\operatorname{Age}} + \beta_{4}^{\operatorname{Z}} + \beta_{5}^{\operatorname{R1}} + \beta_{6}^{\operatorname{R2}} + \epsilon\]
R Markdown This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Stan df1 <- read.csv("/Users/zad/Dropbox/LessLikely/ts.csv") df1$y <- ts(df1$Sales) df1$ds <- as.Date(df1$Time.Increment) splits <- initial_time_split(df1, prop = 0.5) train <- training(splits) test <- testing(splits) interactive <- TRUE # Forecasting with auto.arima library("forecast") md <- auto.arima(train$y) fc <- forecast(md, h = 12) model_fit_arima_no_boost <- arima_reg() %>% set_engine(engine = "auto_arima") %>% fit(y ~ ds, data = training(splits)) # Model 2: arima_boost ---- model_fit_arima_boosted <- arima_boost( min_n = 2, learn_rate = 0.015 ) %>% set_engine(engine = "auto_arima_xgboost") %>% fit(y ~ ds + as.numeric(ds) + factor(month(ds, label = TRUE), ordered = F ), data = training(splits) ) # Model 3: ets ---- model_fit_ets <- exp_smoothing() %>% set_engine(engine = "ets") %>% fit(y ~ ds, data = training(splits)) model_fit_lm <- linear_reg() %>% set_engine("lm") %>% fit(y ~ as.numeric(ds) + factor(month(ds, label = TRUE), ordered = FALSE ), data = training(splits) ) # Model 4: prophet ---- model_fit_prophet <- prophet_reg() %>% set_engine(engine = "prophet") %>% fit(y ~ ds, data = training(splits)) #> Error in sampler$call_sampler(c(args, dotlist)): c++ exception (unknown reason) # Model 6: earth ---- model_spec_mars <- mars(mode = "regression") %>% set_engine("earth") recipe_spec <- recipe(y ~ ds, data = training(splits)) %>% step_date(ds, features = "month", ordinal = FALSE) %>% step_mutate(date_num = as.numeric(ds)) %>% step_normalize(date_num) %>% step_rm(ds) wflw_fit_mars <- workflow() %>% add_recipe(recipe_spec) %>% add_model(model_spec_mars) %>% fit(training(splits)) models_tbl <- modeltime_table( model_fit_arima_no_boost, model_fit_arima_boosted, model_fit_ets, model_fit_prophet, model_fit_lm, wflw_fit_mars ) #> Error in eval_tidy(xs[[j]], mask): object 'model_fit_prophet' not found calibration_tbl <- models_tbl %>% modeltime_calibrate(new_data = testing(splits)) #> Error in modeltime_calibrate(., new_data = testing(splits)): object 'models_tbl' not found calibration_tbl %>% modeltime_forecast( new_data = testing(splits), actual_data = df1 ) %>% plot_modeltime_forecast( .legend_max_width = 25, # For mobile screens .interactive = interactive ) #> Error in modeltime_forecast(., new_data = testing(splits), actual_data = df1): object 'calibration_tbl' not found calibration_tbl %>% modeltime_accuracy() %>% table_modeltime_accuracy( .interactive = FALSE ) #> Error in c(".type", ".calibration_data") %in% names(object): object 'calibration_tbl' not found refit_tbl <- calibration_tbl %>% modeltime_refit(data = df1) #> Error in modeltime_refit(.Using Stata
https://lesslikely.com/statistics/stata/
Mon, 01 Jan 0001 00:00:00 +0000https://lesslikely.com/statistics/stata/Cite R Packages Session info Although concurve was originally designed to be used in R, it is possible to achieve very similar results in Stata. We can use some datasets that are built into Stata to show how to achieve this. I’ll use the Statamarkdown R package so that I can obtain Stata outputs using RMarkdown via my Stata 16 package.
First, let’s load the auto2 dataset which contains data about cars and their characteristics.
sysuse auto2 #> . sysuse au(1978 automobile data) Browse the data set in your data browser to get more familiar with some of the variables. Let’s say we’re interested in the relationship between miles per gallon and price. We could fit a very simple linear model to assess that relationship.
First, let’s visualize the data with a scatter plot.
sysuse auto2 #> . sysuse au(1978 automobile data) #> #> . scatter price mpg, mcolor(dkorange) scale( 0.70) #> #> . graph export "scatter.svg", replace #> file scatter.svg saved as SVG format scatter
That’s what our data looks like. Clearly there seems to be an inverse relationship between miles per gallon and price.
Now we could fit a very simple linear model with miles per gallon being the predictor and price being the outcome and get some estimates of the relationship.
sysuse auto2 #> . sysuse au(1978 automobile data) #> #> . regress price mpg #> #> Source | SS df MS Number of obs = 74 #> -------------+---------------------------------- F(1, 72) = 20.26 #> Model | 139449474 1 139449474 Prob > F = 0.0000 #> Residual | 495615923 72 6883554.48 R-squared = 0.2196 #> -------------+---------------------------------- Adj R-squared = 0.2087 #> Total | 635065396 73 8699525.97 Root MSE = 2623.7 #> #> ------------------------------------------------------------------------------ #> price | Coefficient Std. err. t P>|t| [95% conf. interval] #> -------------+---------------------------------------------------------------- #> mpg | -238.8943 53.07669 -4.50 0.000 -344.7008 -133.0879 #> _cons | 11253.06 1170.813 9.61 0.000 8919.088 13587.03 #> ------------------------------------------------------------------------------ That’s what our output looks like.
Our output also gives us 95% consonance (confidence) intervals by default. But suppose we wished to fit a fractional polynomial model and graph it and get the confidence bands, here’s what we would do.
sysuse auto2 #> . sysuse au(1978 automobile data) #> #> . mfp: glm price mpg #> #> Deviance for model with all terms untransformed = 1373.079, 74 observations #> #> Variable Model (vs.) Deviance Dev diff. P Powers (vs.) #> ---------------------------------------------------------------------- #> mpg Lin. FP2 1373.079 19.565 0.000+ 1 -2 -2 #> FP1 1356.927 3.413 0.182 -2 #> Final 1356.927 -2 #> #> #> Transformations of covariates: #> #> -> gen double Impg__1 = X^-2-.2204707671 if e(sample) #> (where: X = mpg/10) #> #> Final multivariable fractional polynomial model for price #> -------------------------------------------------------------------- #> Variable | -----Initial----- -----Final----- #> | df Select Alpha Status df Powers #> -------------+------------------------------------------------------ #> mpg | 4 1.0000 0.0500 in 2 -2 #> -------------------------------------------------------------------- #> #> Generalized linear models Number of obs = 74 #> Optimization : ML Residual df = 72 #> Scale parameter = 5533697 #> Deviance = 398426217.