DOI: 10.4244/EIJV10I11A235

Tools and Techniques - Statistical: It’s statistically significant, but is it clinically relevant?

Ron T. van Domburg*, PhD; Isabella Kardys, PhD; Matti Lenzen, PhD; Sara Baart, Msc; Eric Boersma, PhD; Sanne E. Hoeks, PhD

Statistical hypothesis testing is a key element of evaluating the results of medical research and may result in a so-called “significant” finding. However, the value of “statistical significance” must not be overstated, as it describes only one aspect of the results of a study. The interpretation of medical research data from the perspective of clinical relevance, though far less emphasised, is of equal importance. The purpose of this article is to provide clinicians with an outline of the two essential concepts of statistical significance and clinical relevance, or clinical significance as others prefer. We aim to increase awareness of the subject, and we do not intend to provide a complete overview, which can be found elsewhere in the literature1.

By applying a statistical hypothesis test, the investigator aims to quantify the evidence against the so-called null hypothesis, which usually states that there is no difference (i.e., the effect is “null”). The appropriate statistical test provides the investigator with a probability value (p-value), which indicates the strength of the evidence against the null hypothesis. Mathematically, the p-value is the probability of obtaining the observed effect when the null hypothesis is actually true. For example, if a statistical test shows a p-value of 0.30, then the probability is 30% that the observed difference occurs, whereas in reality there is no true difference. Statisticians explain this phenomenon by the concept of “random sampling error”. According to this concept, the study patients constitute a random sample out of a large population. A p-value of 0.30 indicates that, just by natural variation (or “chance”), an effect might be found in three of 10 such samples. If the p-value is very small, then the study data are compatible with a true effect above chance, and the null hypothesis will be rejected. The effect is considered “statistically significant”. Mostly, a threshold p-value of 0.05 (5%) is used to declare statistical significance.

However, statistical significance does not automatically mean that the observed effect is also clinically relevant and vice versa. A p-value indicates in an objective way how sure we are that an observed effect is true, but provides no information on the magnitude of that effect. Clinical relevance can be conceptualised as a difference that is large enough to justify clinicians changing the standard of care. Therefore, when evaluating the results of a study, one must address both the statistical significance and the clinical relevance of the findings.

Assume a (hypothetical) placebo-controlled randomised clinical trial to investigate the effect of a new drug for arterial hypertension in a sample of 2*10,000 patients. At the end of the trial, the mean change in systolic blood pressure of the patients randomised to the active treatment turns out to be on average –5 mmHg, compared with –4.5 mmHg in placebo, i.e., a mean difference of –0.5 mmHg. The p-value is 0.001, which is much less than 0.05, and, consequently, the null hypothesis of no difference in mean blood pressure reduction can be rejected. However, from a clinical point of view, one might question the clinical relevance of an average effect on systolic blood pressure of –0.5 mmHg. The implementation of the new drug in all future patients who fulfil the trial inclusion and exclusion criteria is not self-evident.

An example from the literature is the RIO-Rimonabant trial2, in which 1,047 overweight or obese patients with type 2 diabetes were randomised to rimonabant, a drug intended to reduce body weight, or placebo. After one year of follow-up, the mean weight loss was statistically significantly larger in the patients randomised to rimonabant than in placebo (placebo: –1.4 kg; rimonabant: –2.3 kg, p=0.01). Thus, a mean difference appeared of 0.9 kg in favour of the new drug in a sample with an average body weight of 97 kg. One could question here the clinical relevance of this absolute 1% reduction.

The GUSTO IIb trial randomised 1,138 patients presenting within 12 hours of acute myocardial infarction to primary angioplasty (pPCI, n=565) or accelerated thrombolytic therapy with recombinant tissue plasminogen activator (t-PA, n=573). Mortality at 30 days (pPCI: 5.7%, t-PA: 7.0%; p=0.37) was not statistically significantly different3. A meta-analysis was performed by Keeley et al, showing with statistical significance that pPCI was better than thrombolytic therapy at reducing overall short-term death (6.9% vs. 9.3%; p=0.0002)4. Thus, although the decrease in mortality in the GUSTO IIb trial was not statistically significant and caused no change in treatment, the effect size was big enough to consider it clinically relevant. The meta-analysis had more power to show that this effect was also statistically significant and eventually caused a massive move from thrombolytic therapy to pPCI.

Thus, when evaluating the validity of a study in cardiovascular literature, the reader must consider both the clinical and the statistical significance of the study results. Successful study planning requires an explicit definition of the clinically meaningful primary study endpoint, an estimate of the proposed treatment effect, and an estimate of the sample size necessary to demonstrate the difference of interest. Understanding the direct relationship between sample size and power is crucial for the critical judgement of any study conclusion. An inadequate sample size will fail to detect clinically important differences, whereas an excessively large sample size may show significant differences which are far from clinically relevant.

In conclusion, a good notion and awareness of both statistical significance and clinical relevance is crucial for a correct interpretation of clinical trial results.

Conflict of interest statement

The authors have no conflicts of interest to declare.

Volume 10 Number 11
Mar 20, 2015
Volume 10 Number 11
View full issue


Key metrics

Suggested by Cory

10.4244/EIJV10I4A90 Aug 19, 2014
Tools & Techniques - Statistics: Non-inferiority trials
Collier T
free

10.4244/EIJV11I6A144 Oct 20, 2015
Getting maximum information out of a continuous outcome: applying linear regression
de Ridder M et al
free

10.4244/EIJV9I1A3 May 21, 2013
EuroIntervention - Methodology and Statistics Review Board
Boersma E et al
free

Expert review

10.4244/EIJ-D-19-00953 Apr 2, 2021
Statistical methods for composite endpoints
Hara H et al
free

10.4244/EIJ-D-21-00440L Jan 28, 2022
Letter: Composite endpoints in clinical trials - simplicity or perfection?
Lozano I et al
free

10.4244/EIJV9I8A167 Dec 27, 2013
Tools and Techniques - Statistics: descriptive statistics
Hoeks S et al
free
Trending articles
152.9

Clinical research

10.4244/EIJ-D-20-01125 Oct 20, 2021
An upfront combined strategy for endovascular haemostasis in transfemoral transcatheter aortic valve implantation
Costa G et al
free
47.8

NEW INNOVATION

10.4244/EIJ-D-15-00467 Feb 20, 2018
Design and principle of operation of the HeartMate PHP (percutaneous heart pump)
Van Mieghem NM et al
free
39.1

Clinical research

10.4244/EIJ-D-22-00558 Feb 6, 2023
Permanent pacemaker implantation and left bundle branch block with self-expanding valves – a SCOPE 2 subanalysis
Pellegrini C et al
free
38.95

State-of-the-Art

10.4244/EIJ-D-23-00912 Oct 7, 2024
Optical coherence tomography to guide percutaneous coronary intervention
Almajid F et al
free
X

The Official Journal of EuroPCR and the European Association of Percutaneous Cardiovascular Interventions (EAPCI)

EuroPCR EAPCI
PCR ESC
Impact factor: 7.6
2023 Journal Citation Reports®
Science Edition (Clarivate Analytics, 2024)
Online ISSN 1969-6213 - Print ISSN 1774-024X
© 2005-2024 Europa Group - All rights reserved