Statistical Glossary

Zad Rafi

Statistical Glossary

Definitions of common statistical terms, with notes on their proper interpretation

A reference glossary of foundational statistical concepts — P-values, confidence intervals, S-values, likelihoods, Bayesian and frequentist terminology — with formal definitions and common misinterpretations.

Author

Affiliation

Zad Rafi

Less Likely

Published

May 26, 2026

Keywords

glossary, statistics, p-value, confidence interval, s-value, likelihood, bayes factor, null hypothesis, statistical power, effect size

This glossary collects working definitions for terms that appear frequently in statistical practice. The first two entries — P-value and confidence interval — are treated at greater depth because they are the most widely used and the most widely misinterpreted¹. Subsequent entries are intentionally shorter and ordered alphabetically.

Where a term is treated in more depth elsewhere on the site, you’ll find a cross-link to the relevant post.

P-value

The P-value is the probability, assuming a specified statistical model is correct, of obtaining a test statistic at least as extreme as the one actually observed.

Formally, for an observed test statistic $t_{\text{obs}}$ and a model $M$ (which includes the null hypothesis $H_{0}$ together with all auxiliary assumptions about sampling, independence, distributional form, and measurement),

\[ P = \Pr\!\bigl(T \ge t_{\text{obs}} \,\big|\, M \bigr) \]

for a one-sided test. Two-sided P-values double the smaller one-sided tail (for symmetric reference distributions) or sum the two tails (more generally).

Properties

Continuous on the unit interval $[0, 1]$.
Uniformly distributed on $[0, 1]$ when the model $M$ is exactly correct.
Smaller P-values indicate that the data are less compatible with the model than larger ones.
The P-value refers to the entire model — not just the null hypothesis. A small P-value can reflect a failure of any assumption inside the model, including but not limited to the null².

Common misinterpretations

What a P-value is not

A P-value is not:

the probability that the null hypothesis is true;
the probability that the data arose by chance alone;
$1$ minus the probability that the alternative hypothesis is true;
a measure of the size or importance of an effect;
a measure of the strength of evidence against the null, except in a very narrow technical sense.

These interpretations are wrong because they confuse $\Pr(\text{data} \mid \text{hypothesis})$ with $\Pr(\text{hypothesis} \mid \text{data})$ — the latter requires Bayes’ theorem and a prior^{3, 4}.

The American Statistical Association’s 2016 statement on P-values³ is the canonical reference for what P-values can and cannot tell us.

Confidence interval

A confidence interval is an interval estimator constructed so that, under repeated application of the same procedure to data generated by the same process, the interval contains the true parameter value a specified proportion of the time.

Formally, a $(1-\alpha)$ confidence interval for a parameter $\theta$ is a random interval $\bigl[L(\mathbf{X}),\; U(\mathbf{X})\bigr]$ depending on the data $\mathbf{X}$ such that

\[ \Pr\!\bigl(L(\mathbf{X}) \le \theta \le U(\mathbf{X}) \,\big|\, \theta\bigr) \ge 1 - \alpha \quad \text{for every value of } \theta. \]

The probability statement is about the interval, which is random before the data are observed. Once the data are in hand and the interval is computed, the parameter is either inside it or it isn’t — but we don’t know which (Neyman, 1937).

Properties

The conventional $95\%$ level corresponds to $\alpha = 0.05$ and matches a two-sided P-value threshold of $0.05$ for the parameter values inside the interval.
For a normally distributed estimator, the standard form is \[ \hat{\theta} \pm z_{1 - \alpha/2}\,\widehat{\mathrm{SE}}(\hat{\theta}), \] where $z_{1-\alpha/2}$ is the appropriate normal quantile and $\widehat{\mathrm{SE}}$ is the estimated standard error.
Coverage is a long-run, frequency property of the procedure — not a property of any particular computed interval.
The interval contains exactly those parameter values that would not be rejected by a test at significance level $\alpha$. For this reason, some authors call it a compatibility interval⁵.

Common misinterpretations

What a confidence interval is not

A $95\%$ confidence interval is not:

a $95\%$ probability that the true parameter lies in the calculated interval (that is a credible interval and requires a prior);
a range that contains $95\%$ of future observations (that is a prediction interval);
a range of equally plausible parameter values (the values near the point estimate are typically more compatible with the data than those at the edges);
a region outside of which results are “non-significant” — significance at level $\alpha$ for a parameter value $\theta_{0}$ depends on whether $\theta_{0}$ lies inside the interval at the corresponding $\alpha$.

A useful way to understand the interval is through coverage simulation: repeatedly drawing samples and constructing an interval each time. Roughly $95\%$ of those intervals — across the simulations, not across parameter values — will cover the true parameter.

Bayes factor

The Bayes factor comparing hypotheses $H_{1}$ and $H_{0}$ is the ratio of the marginal likelihoods of the data under each hypothesis:

\[ \mathrm{BF}_{10} \;=\; \frac{\Pr(\text{data} \mid H_{1})}{\Pr(\text{data} \mid H_{0})}. \]

It quantifies how much the data should shift the prior odds in favour of $H_{1}$ relative to $H_{0}$. Unlike a P-value, a Bayes factor requires the analyst to fully specify both hypotheses, including any unknown parameters’ prior distributions (Kass & Raftery, 1995).

Bayesian inference

A school of inference in which probability represents a degree of belief about a proposition, updated as data arrive via Bayes’ theorem:

\[ \underbrace{\Pr(\theta \mid \mathbf{X})}_{\text{posterior}} \;\propto\; \underbrace{\Pr(\mathbf{X} \mid \theta)}_{\text{likelihood}} \;\times\; \underbrace{\Pr(\theta)}_{\text{prior}}. \]

Bayesian methods deliver full probability distributions over parameters and predictions, conditional on the specified model and prior.

Bayesian credible interval

An interval $[L, U]$ such that the posterior probability of the parameter lying in $[L, U]$ equals a chosen level, e.g. $\Pr(L \le \theta \le U \mid \mathbf{X}) = 0.95$. This is the interval that can be interpreted as “a 95% probability that the parameter lies in the interval” — but only because it conditions on a prior.

Effect size

A quantitative summary of the magnitude of a phenomenon. Examples include a mean difference, a risk ratio, an odds ratio, a correlation coefficient, or a standardised effect such as Cohen’s $d$:

\[ d \;=\; \frac{\bar{x}_{1} - \bar{x}_{2}}{s_{\text{pooled}}}. \]

Effect sizes describe how much; P-values describe how compatible. Confusing the two is one of the most common errors in applied research⁶.

Frequentist inference

A school of inference in which probability refers to the long-run relative frequency of events under hypothetical repetitions of an experiment. Parameters are treated as fixed but unknown quantities; uncertainty is described by properties of estimators across repeated sampling (e.g. unbiasedness, coverage, type I error rate).

Likelihood

For a fixed dataset $\mathbf{X}$, the likelihood function is the probability (or density) of the data viewed as a function of the parameter $\theta$:

\[ L(\theta) \;=\; \Pr(\mathbf{X} \mid \theta). \]

Likelihood is not a probability over parameters — it is a function of $\theta$ for fixed data. Its shape is what most inferential procedures (maximum likelihood, likelihood ratio tests, Bayesian updating) ultimately depend on (Fisher, 1922).

NHST (Null Hypothesis Significance Testing)

The hybrid testing procedure most often taught in introductory courses: compute a test statistic, compute a P-value against a null hypothesis, and reject the null if the P-value falls below a pre-specified threshold $\alpha$ (typically $0.05$).

NHST mixes elements of Fisherian significance testing and Neyman–Pearson hypothesis testing, which are conceptually distinct frameworks. The hybrid form has been criticised for producing dichotomous “significant / not significant” thinking that ignores effect sizes and uncertainty^{1, 5}.

Null hypothesis

A specific value or set of values for a parameter that is taken as the reference point for a statistical test, usually denoted $H_{0}$. In a comparison of two group means, $H_{0}$ is typically $\mu_{1} - \mu_{2} = 0$ (“no difference”), but the null hypothesis can be any value the analyst chooses to test.

Power

The probability that a test correctly rejects the null hypothesis when a specified alternative is true:

\[ \text{Power} \;=\; 1 - \beta \;=\; \Pr(\text{reject } H_{0} \mid H_{1} \text{ true}). \]

Power depends on the effect size, the sample size, the variability in the data, and the chosen significance level $\alpha$. Underpowered studies inflate the magnitude of detected effects (Type M error) and increase the chance of getting the sign wrong (Type S error)⁷.

Posterior distribution

The probability distribution over parameters after observing data, $\Pr(\theta \mid \mathbf{X})$. The posterior is the answer that Bayesian inference is designed to produce. All Bayesian summaries — point estimates, credible intervals, predictive distributions — are derived from it.

Prior distribution

The probability distribution that encodes information about parameters before observing data, $\Pr(\theta)$. Priors can be informative (reflecting substantive knowledge), weakly informative (regularising), or attempted as “non-informative” (a more contentious notion than it sounds).

S-value

A transformation of the P-value into Shannon information bits:

\[ S \;=\; -\log_{2}(P). \]

An S-value of $s$ bits represents the same amount of information against the model as observing $s$ heads in a row from a hypothetically fair coin. The transformation puts P-values on a scale where differences are interpretable (3 bits and 6 bits of information are very different; $P = 0.10$ and $P = 0.01$ are easier to confuse)⁸.

Standard error

The standard deviation of an estimator’s sampling distribution — i.e., how much the estimate would vary from sample to sample if the experiment were repeated. For an estimator $\hat{\theta}$,

\[ \mathrm{SE}(\hat{\theta}) \;=\; \sqrt{\operatorname{Var}(\hat{\theta})}. \]

The standard error is what goes into the denominator of test statistics ($t$, $z$) and into the half-width of confidence intervals. It is not the standard deviation of the data — that’s the sample SD. The two are related by $\mathrm{SE}(\bar{x}) = s / \sqrt{n}$ for the sample mean.

Type I and Type II error

In the Neyman–Pearson framework, the two ways a test can be wrong:

Type I error: rejecting the null hypothesis when it is actually true. Its probability is $\alpha$ (the significance level).
Type II error: failing to reject the null when an alternative is true. Its probability is $\beta$; power is $1 - \beta$.

Choosing $\alpha$ and the sample size sets a target for both error rates. The two trade off: lowering $\alpha$ raises $\beta$ at a fixed sample size⁹.

References

1. Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. (2016). “Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations.” European Journal of Epidemiology. 31:337–350. doi: 10.1007/s10654-016-0149-3.

2. Greenland S. (2019). “Valid P-values behave exactly as they should: Some misleading criticisms of P-values and their resolution with S-values.” The American Statistician. 73:106–114. doi: 10.1080/00031305.2018.1529625.

3. Wasserstein RL, Lazar NA. (2016). “The ASA Statement on p-Values: Context, Process, and Purpose.” The American Statistician. 70:129–133. doi: 10.1080/00031305.2016.1154108.

4. Goodman S. (2008). “A Dirty Dozen: Twelve P-Value Misconceptions.” Seminars in Hematology. 45:135–140. doi: 10.1053/j.seminhematol.2008.04.003.

5. Amrhein V, Greenland S, McShane B. (2019). “Scientists rise up against statistical significance.” Nature. 567:305. doi: 10.1038/d41586-019-00857-9.

6. Cohen J. (1988). “Statistical Power Analysis for the Behavioral Sciences.” Erlbaum Associates, Hillsdale.

7. Gelman A, Loken E. (2014). “The statistical crisis in science.” American Scientist.

8. Rafi Z, Greenland S. (2020). “Semantic and cognitive tools to aid statistical science: Replace confidence and significance by compatibility and surprise.” BMC Medical Research Methodology. 20:244. doi: 10.1186/s12874-020-01105-9.

9. Neyman J, Pearson ES. (1933). “On the Problem of the Most Efficient Tests of Statistical Hypotheses.” Philosophical Transactions of the Royal Society of London Series A, Containing Papers of a Mathematical or Physical Character. 231:289–337. doi: 10.1098/rsta.1933.0009.

--- title: "Statistical Glossary" subtitle: "Definitions of common statistical terms, with notes on their proper interpretation" image: "images/Logo.png" page-layout: article toc: true toc-depth: 3 toc-location: right link-citations: true date: "2026-05-26" lastmod: "`r Sys.Date()`" description: A reference glossary of foundational statistical concepts — P-values, confidence intervals, S-values, likelihoods, Bayesian and frequentist terminology — with formal definitions and common misinterpretations. bibliography: references.bib csl: custom.csl keywords: - glossary - statistics - p-value - confidence interval - s-value - likelihood - bayes factor - null hypothesis - statistical power - effect size --- This glossary collects working definitions for terms that appear frequently in statistical practice. The first two entries — **P-value** and **confidence interval** — are treated at greater depth because they are the most widely used and the most widely misinterpreted [@greenlandStatisticalTestsValues2016]. Subsequent entries are intentionally shorter and ordered alphabetically. Where a term is treated in more depth elsewhere on the site, you'll find a cross-link to the relevant post. ------------------------------------------------------------------------ ## P-value The **P-value** is the probability, *assuming a specified statistical model is correct*, of obtaining a test statistic at least as extreme as the one actually observed. Formally, for an observed test statistic $t_{\text{obs}}$ and a model $M$ (which includes the null hypothesis $H_{0}$ together with all auxiliary assumptions about sampling, independence, distributional form, and measurement), $$ P = \Pr\!\bigl(T \ge t_{\text{obs}} \,\big|\, M \bigr) $$ for a one-sided test. Two-sided P-values double the smaller one-sided tail (for symmetric reference distributions) or sum the two tails (more generally). ### Properties - Continuous on the unit interval $[0, 1]$. - Uniformly distributed on $[0, 1]$ when the model $M$ is exactly correct. - Smaller P-values indicate that the data are *less compatible* with the model than larger ones. - The P-value refers to the *entire* model — not just the null hypothesis. A small P-value can reflect a failure of any assumption inside the model, including but not limited to the null [@greenlandValidPvaluesBehave2019]. ### Common misinterpretations ::: callout-warning ## What a P-value is **not** A P-value is **not**: - the probability that the null hypothesis is true; - the probability that the data arose by chance alone; - $1$ minus the probability that the alternative hypothesis is true; - a measure of the size or importance of an effect; - a measure of the strength of evidence against the null, except in a very narrow technical sense. These interpretations are wrong because they confuse $\Pr(\text{data} \mid \text{hypothesis})$ with $\Pr(\text{hypothesis} \mid \text{data})$ — the latter requires Bayes' theorem and a prior [@wassersteinASAStatementPValues2016; @goodmanDirtyDozenTwelve2008]. ::: The American Statistical Association's 2016 statement on P-values [@wassersteinASAStatementPValues2016] is the canonical reference for what P-values can and cannot tell us. ### Related entries on this site - [P-values Are Tough And S-values Can Help](statistics/s-values.html) — an in-depth treatment of P-value interpretation and the S-value transformation. ------------------------------------------------------------------------ ## Confidence interval A **confidence interval** is an interval estimator constructed so that, under repeated application of the same procedure to data generated by the same process, the interval contains the true parameter value a specified proportion of the time. Formally, a $(1-\alpha)$ confidence interval for a parameter $\theta$ is a random interval $\bigl[L(\mathbf{X}),\; U(\mathbf{X})\bigr]$ depending on the data $\mathbf{X}$ such that $$ \Pr\!\bigl(L(\mathbf{X}) \le \theta \le U(\mathbf{X}) \,\big|\, \theta\bigr) \ge 1 - \alpha \quad \text{for every value of } \theta. $$ The probability statement is about the *interval*, which is random before the data are observed. Once the data are in hand and the interval is computed, the parameter is either inside it or it isn't — but we don't know which ([Neyman, 1937](https://royalsocietypublishing.org/doi/10.1098/rsta.1937.0005)). ### Properties - The conventional $95\%$ level corresponds to $\alpha = 0.05$ and matches a two-sided P-value threshold of $0.05$ for the parameter values inside the interval. - For a normally distributed estimator, the standard form is $$ \hat{\theta} \pm z_{1 - \alpha/2}\,\widehat{\mathrm{SE}}(\hat{\theta}), $$ where $z_{1-\alpha/2}$ is the appropriate normal quantile and $\widehat{\mathrm{SE}}$ is the estimated standard error. - *Coverage* is a long-run, frequency property of the procedure — not a property of any particular computed interval. - The interval contains exactly those parameter values that would *not* be rejected by a test at significance level $\alpha$. For this reason, some authors call it a **compatibility interval** [@amrheinScientistsRiseStatistical2019]. ### Common misinterpretations ::: callout-warning ## What a confidence interval is **not** A $95\%$ confidence interval is **not**: - a $95\%$ probability that the *true parameter* lies in the calculated interval (that is a [credible interval](#bayesian-credible-interval) and requires a prior); - a range that contains $95\%$ of future observations (that is a *prediction interval*); - a range of equally plausible parameter values (the values near the point estimate are typically more compatible with the data than those at the edges); - a region outside of which results are "non-significant" — significance at level $\alpha$ for a parameter value $\theta_{0}$ depends on whether $\theta_{0}$ lies inside the interval at the corresponding $\alpha$. ::: A useful way to understand the interval is through *coverage simulation*: repeatedly drawing samples and constructing an interval each time. Roughly $95\%$ of those intervals — across the simulations, not across parameter values — will cover the true parameter. ### Related entries on this site - [Bootstrap methods](statistics/bootstrap.html) — a non-parametric route to confidence intervals. - [Model assumptions](statistics/assumptions.html) — coverage relies on the model being approximately right. ------------------------------------------------------------------------ ## Bayes factor The **Bayes factor** comparing hypotheses $H_{1}$ and $H_{0}$ is the ratio of the marginal likelihoods of the data under each hypothesis: $$ \mathrm{BF}_{10} \;=\; \frac{\Pr(\text{data} \mid H_{1})}{\Pr(\text{data} \mid H_{0})}. $$ It quantifies how much the data should shift the prior odds in favour of $H_{1}$ relative to $H_{0}$. Unlike a P-value, a Bayes factor requires the analyst to fully specify both hypotheses, including any unknown parameters' prior distributions ([Kass & Raftery, 1995](https://doi.org/10.1080/01621459.1995.10476572)). ## Bayesian inference A school of inference in which probability represents a *degree of belief* about a proposition, updated as data arrive via Bayes' theorem: $$ \underbrace{\Pr(\theta \mid \mathbf{X})}_{\text{posterior}} \;\propto\; \underbrace{\Pr(\mathbf{X} \mid \theta)}_{\text{likelihood}} \;\times\; \underbrace{\Pr(\theta)}_{\text{prior}}. $$ Bayesian methods deliver full probability distributions over parameters and predictions, conditional on the specified model and prior. ## Bayesian credible interval An interval $[L, U]$ such that the *posterior* probability of the parameter lying in $[L, U]$ equals a chosen level, e.g. $\Pr(L \le \theta \le U \mid \mathbf{X}) = 0.95$. This is the interval that *can* be interpreted as "a 95% probability that the parameter lies in the interval" — but only because it conditions on a prior. ## Effect size A quantitative summary of the magnitude of a phenomenon. Examples include a mean difference, a risk ratio, an odds ratio, a correlation coefficient, or a standardised effect such as Cohen's $d$: $$ d \;=\; \frac{\bar{x}_{1} - \bar{x}_{2}}{s_{\text{pooled}}}. $$ Effect sizes describe *how much*; P-values describe *how compatible*. Confusing the two is one of the most common errors in applied research [@cohenStatisticalPowerAnalysis1988]. ## Frequentist inference A school of inference in which probability refers to the *long-run relative frequency* of events under hypothetical repetitions of an experiment. Parameters are treated as fixed but unknown quantities; uncertainty is described by properties of estimators across repeated sampling (e.g. unbiasedness, coverage, type I error rate). ## Likelihood For a fixed dataset $\mathbf{X}$, the **likelihood function** is the probability (or density) of the data viewed as a function of the parameter $\theta$: $$ L(\theta) \;=\; \Pr(\mathbf{X} \mid \theta). $$ Likelihood is *not* a probability over parameters — it is a function of $\theta$ for fixed data. Its shape is what most inferential procedures (maximum likelihood, likelihood ratio tests, Bayesian updating) ultimately depend on ([Fisher, 1922](https://royalsocietypublishing.org/doi/10.1098/rsta.1922.0009)). ## NHST (Null Hypothesis Significance Testing) The hybrid testing procedure most often taught in introductory courses: compute a test statistic, compute a P-value against a null hypothesis, and reject the null if the P-value falls below a pre-specified threshold $\alpha$ (typically $0.05$). NHST mixes elements of Fisherian significance testing and Neyman–Pearson hypothesis testing, which are conceptually distinct frameworks. The hybrid form has been criticised for producing dichotomous "significant / not significant" thinking that ignores effect sizes and uncertainty [@greenlandStatisticalTestsValues2016; @amrheinScientistsRiseStatistical2019]. ## Null hypothesis A specific value or set of values for a parameter that is taken as the reference point for a statistical test, usually denoted $H_{0}$. In a comparison of two group means, $H_{0}$ is typically $\mu_{1} - \mu_{2} = 0$ ("no difference"), but the null hypothesis can be any value the analyst chooses to test. ## Power The probability that a test correctly rejects the null hypothesis when a specified alternative is true: $$ \text{Power} \;=\; 1 - \beta \;=\; \Pr(\text{reject } H_{0} \mid H_{1} \text{ true}). $$ Power depends on the effect size, the sample size, the variability in the data, and the chosen significance level $\alpha$. Underpowered studies inflate the magnitude of detected effects (Type M error) and increase the chance of getting the sign wrong (Type S error) [@gelmanStatisticalCrisisScience2014]. ## Posterior distribution The probability distribution over parameters *after* observing data, $\Pr(\theta \mid \mathbf{X})$. The posterior is the answer that Bayesian inference is designed to produce. All Bayesian summaries — point estimates, credible intervals, predictive distributions — are derived from it. ## Prior distribution The probability distribution that encodes information about parameters *before* observing data, $\Pr(\theta)$. Priors can be informative (reflecting substantive knowledge), weakly informative (regularising), or attempted as "non-informative" (a more contentious notion than it sounds). ## S-value A transformation of the P-value into *Shannon information bits*: $$ S \;=\; -\log_{2}(P). $$ An S-value of $s$ bits represents the same amount of information against the model as observing $s$ heads in a row from a hypothetically fair coin. The transformation puts P-values on a scale where differences are interpretable (3 bits and 6 bits of information are *very* different; $P = 0.10$ and $P = 0.01$ are easier to confuse) [@rafiSemanticCognitiveTools2020]. ### Related entries on this site - [P-values Are Tough And S-values Can Help](statistics/s-values.html) — the long-form discussion. - [S-value calculator](https://lesslikely.shinyapps.io/svalue/) — convert P-values to S-values interactively. ## Standard error The standard deviation of an estimator's *sampling distribution* — i.e., how much the estimate would vary from sample to sample if the experiment were repeated. For an estimator $\hat{\theta}$, $$ \mathrm{SE}(\hat{\theta}) \;=\; \sqrt{\operatorname{Var}(\hat{\theta})}. $$ The standard error is what goes into the denominator of test statistics ($t$, $z$) and into the half-width of confidence intervals. It is **not** the standard deviation of the data — that's the sample SD. The two are related by $\mathrm{SE}(\bar{x}) = s / \sqrt{n}$ for the sample mean. ## Type I and Type II error In the Neyman–Pearson framework, the two ways a test can be wrong: - **Type I error**: rejecting the null hypothesis when it is actually true. Its probability is $\alpha$ (the significance level). - **Type II error**: failing to reject the null when an alternative is true. Its probability is $\beta$; power is $1 - \beta$. Choosing $\alpha$ and the sample size sets a target for both error rates. The two trade off: lowering $\alpha$ raises $\beta$ at a fixed sample size [@neymanProblemMostEfficient1933]. ------------------------------------------------------------------------ ## Further reading For deeper treatments of the concepts covered here, the following references are particularly useful: - Greenland, S., et al. (2016). *Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.* European Journal of Epidemiology. <https://doi.org/10.1007/s10654-016-0149-3> - Wasserstein, R. L. & Lazar, N. A. (2016). *The ASA Statement on p-Values: Context, Process, and Purpose.* The American Statistician. <https://doi.org/10.1080/00031305.2016.1154108> - Amrhein, V., Greenland, S., & McShane, B. (2019). *Scientists rise up against statistical significance.* Nature. <https://doi.org/10.1038/d41586-019-00857-9> - Rafi, Z. & Greenland, S. (2020). *Semantic and cognitive tools to aid statistical science.* BMC Medical Research Methodology. <https://doi.org/10.1186/s12874-020-01105-9> ::: callout-note ## A note on terminology Several entries above flag a tension between conventional definitions and newer terminology proposed in the statistical reform literature — for example, the suggestion to read confidence intervals as "compatibility intervals." Both framings are mathematically equivalent; the disagreement is about which interpretation is least likely to mislead. This glossary uses the conventional terminology while pointing out the relevant alternatives. :::

P-value

Properties

Common misinterpretations

Related entries on this site

Confidence interval

Properties

Common misinterpretations

Related entries on this site

Bayes factor

Bayesian inference

Bayesian credible interval

Effect size

Frequentist inference

Likelihood

NHST (Null Hypothesis Significance Testing)

Null hypothesis

Power

Posterior distribution

Prior distribution

S-value

Related entries on this site

Standard error

Type I and Type II error

Further reading

References

Comments