DOI: 10.4244/EIJV9I12A245

Tools and Techniques - Statistics: How many variables are allowed in the logistic and Cox regression models?

Ron van Domburg*, PhD; Sanne Hoeks, PhD; Isabella Kardys, MD, PhD; Mattie Lenzen, PhD; Eric Boersma, PhD

Introduction

Multivariable statistical analyses are frequently used today and commonly appear in the medical literature. The results are often expressed in statements such as “After adjustment for other baseline characteristics, the use of DES was associated with 21% reduction of restenosis as compared with BMS”. Among other applications, multivariable methods such as logistic regression and Cox proportional hazard regression are often used to adjust a “target” parameter for differences in baseline characteristics (or variables), to search for predictors for adverse cardiac outcome or to develop risk prediction models. The primary advantage of multivariable analyses is the possibility to adjust for multiple variables simultaneously.

The increased use of multivariable methods does not automatically imply that these analyses are well conducted. Many studies report incorrect application of these methods. Incorrect conclusions may result if methodological guidelines and mathematical assumptions are ignored. In the current paper, we address an important issue that is often neglected, i.e., the number of variables which are allowed in multivariable regression models.

Points of attention in multivariable analysis are the total number of patients and the number of outcome events in the patient population used to perform the analysis. Although the total number of patients enrolled in a study is always important to know, the statistical strength of a multivariable analysis is driven by the number of events. Many studies have applied multivariable analyses using only a small number of events, forgetting the golden rule: if there are no or almost no events, there is nothing to predict or to investigate.

In addition to the above, one of the major pitfalls of multivariable models is the number of variables (variable of interest as well as variables such as age, gender, diabetes, prior MI, etc.) analysed in the model. In statistical packages such as SPSS, SAS and STATA, no restrictions exist on the number of variables to be entered in the model, and no warnings are given if too many variables are used. Multivariable methods render incorrect results if an insufficient number of outcome events (such as death or major adverse cardiac events [MACE]) are available relative to the number of variables analysed in the model1, or in other words if the ratio of events per variable (EPV) is too small. For example, if, in a cohort of 1,000 patients, nine variables are examined in relation to 45 deaths, the EPV = 45/9=5. In multivariable models, an EPV which is too small affects the accuracy (risk estimates) and precision (95% confidence intervals) of odds or hazard ratios of the variables, which may result in misleading findings2. The consequence might be an incorrect significant association between the variable and outcome event (type I error), or on the other hand an incorrect lack of association between a variable and the outcome event (type II error).

On theoretical grounds, Harrell et al suggested a minimum of 10 to 20 EPV2. Peduzzi et al performed a simulation study, and suggested that at least 10 EPV are needed to maintain the validity of the model3. Both found that, with decreasing EPV, the bias of the odds or hazard ratios increased (Table 1).

If one is interested in the relation between a specific variable of interest (e.g., DES versus BMS) and an outcome event (e.g., MACE), then a propensity score might be a good alternative to adjust for confounders in case <10 EPVs are present4,5. The propensity score can be calculated in a separate logistic regression analysis. In brief, it consists of entering the baseline characteristics into a logistic model while using the variable to be compared (in our example DES vs. BMS) as the “outcome event”. As a result, for every patient a probability (propensity) to have a DES or BMS stent type can be determined, based on his/her individual characteristics. This “summary” or propensity score - which is in fact one variable representing a larger number of baseline characteristics - can then be entered into a logistic or Cox model that also contains the variable of interest (DES vs. BMS) and that examines the real outcome event (e.g., MACE). The propensity score will be addressed in detail in the current series of papers in the future.

Conclusion

In conclusion, the validity of multivariable logistic or Cox regression analyses becomes problematic when there are too few events and the number of events per variable becomes less than 10. The odds ratios and hazard ratios may be biased and their 95% confidence intervals may not be reliable. We recommend at least 10 EPV when performing multivariable analyses.

Conflict of interest statement

The authors have no conflicts of interest to declare.

Volume 9 Number 12
Apr 22, 2014
Volume 9 Number 12
View full issue


Key metrics

On the same subject

Editorial

10.4244/EIJ-E-23-00052 Mar 18, 2024
Comparative preclinical assessment of drug-coated balloons: a blessing and a curse for clinical translation
Joner M and Wild L
free

Debate

10.4244/EIJ-E-24-00005 Mar 18, 2024
Ischaemic and viability testing for guiding PCI are overrated: pros and cons
McEntegart M et al
free

Original Research

10.4244/EIJ-D-23-00783 Mar 18, 2024
Redo-TAVI with the ACURATE neo2 and Prime XL for balloon-expandable transcatheter heart valve failure
Meier D et al
Trending articles
281.53

State-of-the-Art Review

10.4244/EIJ-D-21-00695 Nov 19, 2021
Transcatheter treatment for tricuspid valve disease
Praz F et al
free
243.2

State of the art

10.4244/EIJ-D-21-01117 Sep 20, 2022
Recanalisation of coronary chronic total occlusions
Di Mario C et al
free
208.35

State-of-the-Art Review

10.4244/EIJ-D-21-01034 Jun 3, 2022
Management of in-stent restenosis
Alfonso F et al
free
168.7

Translational research

10.4244/EIJ-D-21-00824 May 15, 2022
Bench test and in vivo evaluation of longitudinal stent deformation during proximal optimisation
Toth GG et al
free
167.05

Expert review

10.4244/EIJ-D-21-00690 May 15, 2022
Crush techniques for percutaneous coronary intervention of bifurcation lesions
Moroni F et al
free
151.03

State-of-the-Art

10.4244/EIJ-D-22-00776 Apr 3, 2023
Computed tomographic angiography in coronary artery disease
Serruys PW et al
free
118

Translational research

10.4244/EIJ-D-22-00718 Jun 5, 2023
Preclinical evaluation of the degradation kinetics of third-generation resorbable magnesium scaffolds
Seguchi M et al
110.35

Viewpoint

10.4244/EIJ-E-22-00007 May 15, 2022
TAVI at 20: how a crazy idea led to a clinical revolution
Eltchaninoff H et al
free
X

The Official Journal of EuroPCR and the European Association of Percutaneous Cardiovascular Interventions (EAPCI)

EuroPCR EAPCI
PCR ESC
Impact factor: 6.2
2022 Journal Citation Reports®
Science Edition (Clarivate Analytics, 2023)
Online ISSN 1969-6213 - Print ISSN 1774-024X
© 2005-2024 Europa Group - All rights reserved