Abstract
Background: Delayed diagnosis or misdiagnosis of acute myocardial infarction (AMI) is not unusual in daily practice. Since a 12-lead electrocardiogram (ECG) is crucial for the detection of AMI, a systematic algorithm to strengthen ECG interpretation may have important implications for improving diagnosis.
Aims: We aimed to develop a deep learning model (DLM) as a diagnostic support tool based on a 12-lead electrocardiogram.
Methods: This retrospective cohort study included 1,051/697 ECGs from 737/287 coronary angiogram (CAG)-validated STEMI/NSTEMI patients and 140,336 ECGs from 76,775 non-AMI patients at the emergency department. The DLM was trained and validated in 80% and 20% of these ECGs. A human-machine competition was conducted. The area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were used to evaluate the performance of the DLM.
Results: The AUC of the DLM for STEMI detection was 0.976 in the human-machine competition, which was significantly better than that of the best physicians. Furthermore, the DLM independently demonstrated sufficient diagnostic capacity for STEMI detection (AUC=0.997; sensitivity, 98.4%; specificity, 96.9%). Regarding NSTEMI detection, the AUC of the combined DLM and conventional cardiac troponin I (cTnI) increased to 0.978, which was better than that of either the DLM (0.877) or cTnI (0.950).
Conclusions: The DLM may serve as a timely, objective and precise diagnostic decision support tool to assist emergency medical system-based networks and frontline physicians in detecting AMI and subsequently initiating reperfusion therapy.
Introduction
Acute myocardial infarction (AMI) remains a major public health issue despite global advances in diagnosis and management1. AMI refers to the evidence of acute myocardial injury detected by abnormal cardiac biomarkers with necrosis in a clinical setting consistent with myocardial ischaemia. The categories of ST-segment elevation myocardial infarction (STEMI) and non-ST-segment elevation acute coronary syndrome (NSTE-ACS) based on the presentation of a 12-lead electrocardiogram (ECG) have customarily been included in the concept of acute coronary syndrome (ACS)2. Patients with symptoms suggestive of myocardial ischaemia and ST-segment elevation on the ECG require timely reperfusion therapy to reduce cardiac morbidity and mortality3. Likewise, patients with non-ST-segment elevation myocardial infarction (NSTEMI) considered to be in the very high or high risk categories require an immediate/early invasive strategy to prevent a worse prognosis4.
However, prompt management depends on rapid recognition and precise diagnosis. Despite the established criteria for the diagnosis of AMI, it remains a critical challenge for emergency physicians to recognise rapidly. Previous studies reported that the rate of misdiagnosis of AMI at first medical contact ranged from 2 to 30%5,6,7. Failure to identify high-risk ECG findings in patients with AMI results in lower quality care and higher adverse outcomes. One of the leading causes of missed identification in the diagnostic process was incorrect interpretation of a diagnostic test8,9. Systematic processes to improve ECG interpretation may therefore have important implications for improving diagnosis. Since the principal diagnostic tool for AMI is a 12-lead ECG, a more detailed analysis of the ECG may significantly speed up this process.
The current artificial intelligence revolution that started with a deep learning model (DLM) has provided us with an unpre-cedented opportunity to improve the healthcare system, and it has been proven to be effective in medical applications10,11,12. Additionally, DLMs were confirmed to be superior to cardiologists in ECG interpretation when they were trained by large annotated ECG data sets13,14. To our knowledge, the available and applicable ECG databases of AMI were relatively small. Our study aimed to develop a DLM to detect AMI in a timely, objective and precise manner by a 12-lead ECG. More than 100,000 AMI-associated ECGs were recruited and learned by the DLM. Facilitated by the system’s powerful computing ability, the performance of the trained model was compared with that of physicians, including cardiologists and an emergency physician. The diagnostic power for STEMI and NSTEMI by the DLM and conventional cardiac troponin I (cTnI) was also evaluated.
Methods
STUDY DESIGN
This was a single-centre, case-control study. The data were provided by the Tri-Service General Hospital, Taipei, Taiwan, and the retrospective design was ethically approved by the institutional review board (IRB No. 2-107-05-168). An electronic health system was built for collecting ECGs and medical records. The study period was from January 2012 to December 2018.
STUDY POPULATION
AMI patients presenting to the emergency department (ED) who received a coronary angiogram (CAG) to rule in type I AMI and to confirm the infarct-related artery (IRA) of STEMI were recruited2. AMI patients with no elec-tronic ECG available, right side ECG, posterior ECG and pacemaker rhythm were excluded. Non-AMI patients presenting to the ED during the same period were recruited, while excluding those with a history of AMI or any elevated cTnI during their ED stay. The definitions of AMI, STEMI, NSTEMI, non-AMI, and non-STEMI in this study are provided in Supplementary Table 1 and Supplementary Appendix 1. The AMI cases were divided into development (80%) and validation (20%) cohorts by date. The ECGs in the development cohort were excluded from the validation cohort. There was no overlap of patients between these two cohorts.
ADJUDICATED FINAL DIAGNOSIS
Adjudication of the final diagnosis was performed by three board-certified interventional cardiologists who did not participate in the human-machine competition and who retrospectively and independently reviewed the AMI cases according to the clinical presentations, serial ECGs, serial cTnI levels and angiographic findings to make the final diagnosis of STEMI and NSTEMI, as recommended in the current guidelines2,3,4. In situations of disagreement about the diagnosis, cases were reviewed and adjudicated in consensus meetings.
DATA COLLECTION AND IMPLEMENTATION OF THE DLM
Data collection and DLM implementation are shown in Supplementary Appendix 1 and Supplementary Figure 1. ECG recordings were collected using a Philips 12-lead ECG machine (PH080A), and the DLM was based on ECG12Net, which had previously been developed14. The output of the DLM was the probability of STEMI, NSTEMI, and non-AMI.
HUMAN-MACHINE COMPETITION
We evaluated the performance of participanting physicians using a competition set of 450 ECGs, which included 174 STEMI, 138 NSTEMI, and 138 non-AMI ECGs. The STEMI ECGs, based on the IRA, were further classified into the left main coronary artery (LMCA), left anterior descending artery (LAD), left circumflex artery (LCx), or right coronary artery (RCA). Five cardiologists and one emergency attending physician participated in the competition. In addition, the Philips 12-lead algorithm was also included to detect AMI in the competition15. The physicians had no access to any patient information and no knowledge of the data. Their responses were entered into an online standardised data entry program. We calculated the sensitivities, specificities, and kappa values to compare their results with those of the DLM.
STATISTICAL ANALYSIS
The study cohort was divided into training, validation, and competition sets. We presented their characteristics as the means and standard deviations, numbers of patients, or percentages where appropriate. They were compared using either the Student’s t-test or the chi-square test, as appropriate. The statistical analysis was performed using R software version 3.4.4 (R Foundation for Statistical Computing, Vienna, Austria).
All analyses were based on ECGs but not patients. A significance level of p<0.05 was used throughout the analysis. The primary analysis was to evaluate the performance of the DLM, the physicians and the Philips algorithm for STEMI detection in the human-machine competition. The receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) were applied to evaluate the competition results. We also used precision-recall ROC (PRROC) to evaluate the model performance in hypothetical real-world situations. Because the proportions of STEMI, NSTEMI, and non-AMI were distorted in the competition set, we re-weighted the samples based on the incidences in the real world (0.1%, 0.2%, and 99.7% of STEMI, NSTEMI, and non-AMI cases, respectively)16,17,18. The secondary analysis was performed on the whole validation cohort. We included more clinical information, such as patient characteristics and laboratory tests, to improve the model performance. A multivariable logistic regression model was used to integrate the DLM and clinical information. A series of logistic regression models identified the effects of different clinical information on the performance of STEMI and NSTEMI detection. The AUC was applied to evaluate the changes in model performance. The research interests, model comparison and statistical methods in this study are summarised in detail in Supplementary Table 2.
Results
BASELINE CHARACTERISTICS OF THE COHORTS
There were 1,051 ECGs before CAG from 737 STEMI patients, 697 ECGs before CAG from 287 NSTEMI patients and 140,336 ECGs from 76,775 non-AMI patients in this study. The development and validation cohorts included records from 58,056 and 19,743 patients, respectively. The characteristics and laboratory data are shown in Supplementary Table 3, and a detailed description is shown in Supplementary Appendix 2.
PREDICTION OF STEMI, NSTEMI AND NON-AMI
The results of the human-machine competition are summarised in Figure 1. The AUC of the DLM in the human-machine competition was 0.976 for STEMI detection, with a corresponding sensitivity and specificity of 89.7% and 94.6%, respectively. In contrast, the sensitivities and specificities for STEMI detection among the physicians and the Philips algorithm ranged from 60.5-92.6% and 76.0-97.5%, respectively, which were lower than those of the DLM. The PRROC analysis demonstrated the feasibility of an automatic ECG screening system, which revealed that the AUC of the DLM for STEMI detection was 0.586 in the hypothetical real world. The DLM achieved 63.2% precision and 50.3% recall using the appropriate cut-off point. These values were significantly better than those of all the physicians and the Philips algorithm.
Figure 1. Performance comparison for STEMI detection in the human-machine competition. The area under the receiver operating characteristic curve (AUC) was generated by the prediction of the DLM. The triangles, the square and the diamond denote the cardiologists, the emergency physician and the Philips algorithm, respectively. A) The ROC curve in the competition set (STEMI=174, NSTEMI=138, and non-AMI=138). B) The precision-recall ROC curve in the revised proportion of the hypothetical real world (STEMI=0.1%, NSTEMI=0.2%, and non-AMI=99.7%).
Performance rankings and consistency analysis of STEMI detection among the DLM, the physicians and the Philips algorithm in the human-machine competition were carried out (Figure 2). The DLM achieved the best global performance (kappa=0.645) (Figure 2A), whereas the physicians had relatively better STEMI detection but poor discrimination of NSTEMI and non-AMI. The consistency analysis of AMI detection among the DLM, the physicians and the Philips algorithm is shown in the heatmap (Figure 2B).
Figure 2. Performance rankings and consistency analysis of STEMI detection among the DLM, the physicians and the Philips algorithm in the human-machine competition. A) Global performance rankings based on the class-3 kappa values. V(X) denotes (V) visiting staff with (X) years of experience. B) Consistency analysis as a heatmap coloured based on the values; the values in each cell were the kappa values of each pair.
ANALYSIS OF IRA OF STEMI
The DLM achieved the best global performance (kappa=0.629) for the IRA detection of STEMI (Supplementary Figure 2). As shown in Supplementary Figure 3, after exclusion of LMCA and LCx, the AUC of the DLM for anterior STEMI detection was 0.975, with a corresponding sensitivity of 92.6%, which outperformed all participating physicians. Moreover, the AUC of the DLM in inferior STEMI detection was 0.974, with a corresponding sensitivity of 84.8%, which was better than all but one best physician. In the combined detection of anterior and inferior STEMI, the DLM had better performance than all physicians (AUC, 0.975; sensitivity, 89.4%).
INTERPRETATIONS OF STEMI ECGs BY THE DLM AND PHYSICIANS
Selected examples of STEMI ECGs in the human-machine competition are shown in Figure 3. A typical STEMI ECG with an IRA of the LAD (Figure 3A) was consistently detected by both the DLM and the physicians. One STEMI ECG with an IRA of the RCA (Figure 3B) was misdiagnosed by the DLM but correctly recognised by the best cardiologists. One STEMI ECG with an IRA of the RCA (Figure 3C) was misdiagnosed by both the DLM and the best cardiologists. The DLM correctly detected the ECG (Figure 3D) as STEMI with an IRA of the LAD, which was misdiagnosed by the best cardiologists.
Figure 3. Interpretations of selected STEMI ECGs by the DLM and physicians in the human-machine competition. A) Both the DLM and the best cardiologists consistently detected STEMI. B) The DLM misdetected STEMI, which was correctly detected by the best cardiologists. C) Both the DLM and the best cardiologists misdetected STEMI. D) The DLM correctly detected STEMI, which was misdetected by the best cardiologists.
Among the 138 NSTEMI ECGs, 58 ECGs were detected as non-AMI by the DLM, with an accuracy of 58.0%, which was worse than that of the best cardiologist (75.4%). This discrepancy was due to a more conservative AMI diagnostic strategy by the DLM. In contrast, among 138 non-AMI ECGs, the specificity of 96.4% of the DLM was much better than those of the two best cardiologists (82.6% and 64.5%). After adjustment for the specificities, the misdiagnosis of NSTEMI by the DLM was obviously less than that by the best cardiologists (Table 1). Nevertheless, the DLM offered the best performance in AMI detection under the standardisation of the best cardiologists. The ECG lead-specific analyses for the detection of STEMI and the corresponding IRA are shown in Supplementary Figure 4, and a detailed description is shown in Supplementary Appendix 2.
LOGISTIC REGRESSION ANALYSIS OF STEMI AND NSTEMI
Univariate and multivariate logistic regression analyses in the development cohort revealed that male sex, prior CAD, cTnI, haemoglobin, total cholesterol and low-density lipoprotein (LDL) levels were independent risk factors for STEMI and NSTEMI detection (Supplementary Figure 5).
DIAGNOSTIC VALUE ANALYSIS
We evaluated the performance of the DLM after adjusting for significant patient characteristics, disease histories, and laboratory data to ensure consistency across a wide range of putative confounding variables in the validation cohort. The DLM had significantly better performance than cTnI in detecting STEMI, with an AUC of 0.997 with a corresponding sensitivity and specificity of 98.4% and 96.9%, respectively (Figure 4A). However, cTnI had significantly better performance than the DLM in detecting NSTEMI. The AUC for NSTEMI detection by the combination of the DLM and the first recorded cTnI increased to 0.978, with a corresponding sensitivity and specificity of 91.6% and 96.7%, respectively, which was better than that of the DLM (0.877) or cTnI (0.950) individually (Figure 4B). Using the DLM independently was sufficient to detect STEMI, and the addition of patient characteristics did not significantly improve its performance. However, cTnI was found to improve the diagnostic accuracy for NSTEMI better than any additional characteristics (Supplementary Figure 6).
Figure 4. Comparison of the diagnostic value between the DLM and cTnI in the validation cohort. The area under the receiver operating characteristic curve (AUC) was generated from the logistic regression analysis using the validation cohort. The p-values represent the comparison among the DLM, cTnI and the DLM plus cTnI. A) Regarding STEMI detection: DLM vs cTnI, p<0.001; DLM vs DLM+cTnI, p=ns. B) Regarding NSTEMI detection: cTnI vs DLM, p<0.01; cTnI+DLM vs cTnI, p<0.05.
Discussion
In this study, we established a DLM to detect STEMI precisely through ECG analysis, which applied a deep convolutional network to extract notable ECG features with a development cohort of more than 100,000 ECGs. All AMI patients were validated by CAG, and the corresponding IRA of STEMI was identified. Most importantly, our DLM performed better than the physicians in STEMI detection, with a high sensitivity of 89.7% and specificity of 94.6%.
The application of deep learning technology in the cardiovascular field for arrhythmias, dyskalaemia, and valvular heart disease has recently grown in popularity13,14,19,20,21. However, no large-scale study has been designed for AMI detection. Previous DLMs for AMI detection by a 12-lead ECG mainly used the Physikalisch-Technische Bundesanstalt (PTB) diagnostic ECG database22,23. These studies may be limited because they were not further validated. Moreover, comparisons between the DLM and physicians were lacking. In comparison with previous studies, we enrolled the largest number of clinically validated ECGs for development and validation. Additionally, we further confirmed the role of cTnI in assisting with NSTEMI detection by our DLM. All these results highlight the strengths of the current study.
The sensitivity and specificity for STEMI detection by the DLM were better than those of the physicians. ECG is the timeliest tool among all objective detection methods for AMI. However, the low sensitivity and the disagreement in interpreting ECGs between physicians remain issues. The sensitivity of subjective interpretation for AMI detection using a 12-lead ECG ranged only from 61 to 74%, with a specificity ranging from 72 to 89%24,25,26. In contrast, previous prehospital computer algorithm interpretations for STEMI had a sensitivity of approximately 69%27,28. Our DLM provided extraordinary performance that could support decision-making systems in clinical practice.
The DLM could objectively identify STEMI based on analysing and learning a large number of ECGs. Moreover, subtle ECG changes in the earliest phase of STEMI, which are easily missed by physicians, could be correctly recognised by the DLM. Nevertheless, prior MI or cardiomyopathy might mislead the DLM owing to baseline ST-T changes. Therefore, information regarding previously available ECGs or the history of cardiovascular disease may be needed to strengthen the capacity of the DLM for STEMI detection further.
The performance of our DLM on the detection of STEMI equi-valents and STEMI mimics was further evaluated. STEMI equivalents, including de Winter sign, Wellens’ syndrome, hyperacute T-waves, ST elevation in the lead aVR with diffuse ST depression, ST elevation in the presence of bundle branch block, and posterior wall AMI, representing coronary occlusion without meeting the traditional ST elevation criteria, were crucial for timely recognition29,30. Additionally, high take-off T presentations, such as hyperkalaemia, benign early repolarisation, left ventricular hypertrophy and Brugada syndrome, which mimick STEMI, were usually misdiagnosed, leading to false initiation of primary PCI31,32. Our study demonstrated that the DLM exhibited excellent diagnostic power in the detection of STEMI equivalents (except for type 1 Wellens’ syndrome) and provided extraordinary differentiating capacity in the detection of high take-off T (Supplementary Figure 7, Supplementary Figure 8). Further prospective and large ECG validation data sets are needed to confirm the discriminating abilities of the DLM.
Our DLM has several potential clinical applications. First, the DLM could provide decision support and a high-risk alarm system for AMI that could help to reduce medical errors in the ED resulting from intense time pressures or heavy workloads and harried staff during busy working hours. Second, the DLM could be incorporated into ECG machines in ambulances to facilitate telemedicine and shorten the decision time before initiation of reperfusion therapies. Third, our DLM could be applied in rural and remote areas and places lacking experts to facilitate ECG interpretation and promote diagnostic accuracy, thereby initiating timely management and improving the prognosis of STEMI patients. Finally, the DLM could be incorporated into a wearable device for AMI detection, especially for patients with an extremely high risk of atherosclerotic cardiovascular disease. Accordingly, our DLM exhibits diagnostic benefits and may improve the quality of health care in the near future (Central illustration).
Central illustration. Schematic diagram of the development, validation and future application of the current deep learning model for detecting AMI. The DLM learned from more than 100,000 ECGs was developed and trained. Compared with cardiologist-level physicians. The DLM exhibited the best performance in the detection of STEMI. The validated model achieved excellent diagnostic power with a sensitivity of 98.4% and a specificity of 96.9% for STEMI detection. With the ability of real-time detection, precise diagnosis and early alarm, the application of DLM for STEMI detection, including in-hospital, pre-hospital settings, telemedicine and wearable devices, would improve the quality of health care of cardiovascular disease in the near future.
Limitations
Some limitations of this study should be noted. First, the human-machine competition was based on a well-designed retrospective study. A real-world prospective study should be conducted to verify the clinical impact of the DLM. Second, only six attending physicians participated in the competition with the DLM. Although their performance in AMI detection was relatively consistent with that in previous studies, comparisons should be made with more physicians to confirm the superiority of the DLM33. Third, the studied patients were enrolled from only one academic medical centre, although the diagnosis and management of AMI was based on the guidelines. Multicentre validation is needed to confirm the value and application of this study. Fourth, there were fewer NSTEMI cases than STEMI cases, which may limit the capacity for NSTEMI detection by our DLM. Fifth, during the study period, cTnI rather than hsTnI was used for AMI diagnosis. Sixth, information on prior ECGs to improve diagnostic performance was not available in our DLM system. Seventh, the impacts of coronary collateral flow on ST-T changes during AMI and the performance of the DLM in the detection of STEMI were not analysed. Finally, only patients in the ED were enrolled, which may have led to selection bias and constrained the generalisability of the results.
Conclusions
We established an optimal DLM to detect STEMI based on a 12-lead ECG with better accuracy than physicians. Integration of a DLM may assist frontline physicians in recognising AMI in a timely and precise manner to prevent delayed diagnosis or misdiagnosis of AMI and thereby provide prompt reperfusion therapy. Further prospective validation with prehospital and in-hospital ECG tests is needed to confirm the performance of our DLM.
Impact on daily practice STEMI can now be recognised using this cardiologist-level algorithm, achieving real-time STEMI diagnosis and early alarms. A comprehensive ecosystem has been established including in-hospital, pre-hospital and wearable devices, improving the quality of care in AMI. |
Funding
This work was supported by the Ministry of Science and Technology, Taiwan (MOST 108-2314-B-016-001 to C. Lin, MOST 109-2314-B-016-026 to C. Lin), the National Science and Technology Development Fund Management Association, Taiwan (MOST 108-3111-Y-016-009 and MOST 109-3111-Y-016-002 to C. Lin), and the Cheng Hsin General Hospital, Taiwan (CHNDMC-109-19 to C. Lin).
Conflict of interest statement
The authors have no conflicts of interest to declare.
Supplementary data
To read the full content of this article, please download the PDF.