Abstract
Aims: The SYNTAX™ score has been designed to better anticipate the risks of percutaneous or surgical revascularisation, taking into account the functional impact of the coronary circulation with all its anatomic components including the presence of bifurcations, total occlusions, thrombus, calcification, and small vessels. The purpose of this paper is to describe the baseline assessment of the SYNTAX™ score in the Syntax randomised trial, the corelab reproducibility, the potential difference in score assessment between the investigator and the corelab, and to ascertain the impact on one-year outcome after either percutaneous coronary intervention (PCI) or coronary artery bypass surgery (CABG) in patients with complex coronary artery disease.
Methods and results: To assess the reliability of Syntax™ scoring, 100 diagnostic angiograms from the Syntax trial were randomly selected and assessed independently by two observers. Intra-observer variability was assessed by analysing 91 sets of angiograms after an interval of at least eight weeks by one of the observers. Clinical outcomes in the randomised cohort of the Syntax trial up to one year are presented with stratification by tertile group of the SYNTAX™ score. The weighted kappa value for the inter-observer reproducibility on the global score was 0.45, while the intra-observer weighted kappa value was 0.59. The SYNTAX™ score as calculated by investigators consistently underscored the corelab score by 3.4 points. When the Syntax randomised cohort was stratified by tertiles of the SYNTAX™ score, there were similar or non-significantly different MACCE rates in those with low or intermediate scores; however in the top tertile the MACCE rate was greater in those receiving PCI compared to CABG.
Conclusions: The SYNTAX™ score is a visual coronary score with an acceptable corelab reproducibility that has an impact on the one-year outcome of those having PCI, whereas it has no effect on the one-year outcome following surgical revascularisation. The SYNTAX™ score tool is likely to be useful in a wide range of patients with complex coronary disease.
Introduction
In previously published randomised trials including 2 and 3-vessel disease, patient selection or exclusion criteria resulted in only 2-12% of the patients screened being randomised1. During the initial debate on the design of the Syntax study, it was argued that despite the fact that patients with two or three vessel disease have been included in previous randomised trials, in the «real world» surgeons were often confronted with more complex anatomy and comorbidities. Therefore, the all-comer approach became the cornerstone of the Syntax trial, reducing exclusion criteria to a minimum (previous intervention, acute myocardial infarction and concomitant cardiac surgery)2.
The anatomic heterogeneity in the patients enrolled in previous randomised trials renders their interpretation difficult. For example, a patient with 3-vessel disease and multiple lesions in each vascular territory (including long lesion, bifurcation and total chronic occlusion) was pooled together with a patient with three focal lesions in the mid-portions of each coronary artery. Both were conventionally named “3-vessel disease”, despite the fact that the first patient represents a greater therapeutic challenge for the interventional cardiologist, and has a completely different prognosis compared to the second patient regardless of the revascularisation strategy. Thus the interpretation of the results of previously conducted randomised trials is severely limited by the absence of grading of the severity of coronary artery disease, and by the lack of comparison of lesion complexity based on pretreatment angiographic criteria3.
In the Syntax trial, the decision to refer the patient for either surgery or percutaneous coronary intervention (PCI) was the result of a pretreatment consensus reached between the cardiac surgeon and the interventional cardiologist. In this so called “Heart-Team Conference” the surgeon and interventional cardiologist fully assessed anginal status, comorbidities, coronary anatomy and left ventricular function. Although other scoring systems, such as the Braunwald, NYHA or CCS classification could be used to assess angina status; whilst the EuroSCORE and Parsonnet score could be used to assess the patient history, comorbidities, pulmonary and cardiovascular function4,5, there was no available comprehensive score to describe – in detail – the coronary anatomy. Therefore, the SYNTAX™ score has been designed to better anticipate the risks of percutaneous or surgical revascularisation, taking into account the functional impact of the coronary circulation with all its anatomic components, including bifurcations, total occlusions, thrombus, calcification, small vessels etc. The SYNTAX™ score was not initially devised to predict short or long term prognosis, but was a score designed to allow a detailed objective assessment, and therefore comparison of the coronary anatomy between one patient and another. During the heart-team conference, the calculation of the SYNTAX™ score became pivotal in the selection of the revascularisation strategy. As a result of the heart-team conference the population was subdivided into three groups: patients judged to be only eligible for cardiac surgery, patients eligible for PCI, and patients potentially amenable to both types of revascularisation.
In designing the SYNTAX™ score, the authors’ selected six pre-existing classifications or scores to create a complex algorithm, mixing anatomical and functional characteristics that might increase the risk and complexity of percutaneous or surgical treatment. (see appendix – online as supplementary data at www.eurointervention.org) At the time of the design, it was not known whether the complexity of the coronary anatomy, as described by the score, would have an impact on the outcome of surgery. The purpose of the present paper is to describe the baseline assessment of the SYNTAX™ score, the corelab reproducibility, the potential difference in the score assessment between the investigator and the corelab, and to ascertain the impact of the score on the short- and long-term outcome of PCI and coronary artery bypass graft surgery (CABG). At the time it was designed it was anticipated that the prospective, blind, raw SYNTAX™ score would be retrospectively weighted, based on the short- and long-term outcomes of the Syntax trial.
Methods
The Syntax trial
The design of the Syntax trial has been described in detail elsewhere6. Between March 2005 and April 2007, 4337 patients were screened leading to randomisation of 1,800 patients with LM and/or 3VD to CABG (n=897) or PCI with TAXUS Express2 (n=903) at one of 23 sites in the US (n=245) and 62 sites in Europe (n=1555). Almost 30% of screened patients were found to be amenable for only one treatment option and were enrolled in either the CABG (n=1077) or PCI (n=198) nested registries, while 9.4% of patients were not willing to participate or had a treatment preference.
Assessment of coronary angiograms
To assess the reliability of Syntax scoring, we randomly selected 100 diagnostic angiograms from the Syntax trial. All the angiographic variables pertinent to calculating the SYNTAX™ score were obtained by reviewing the diagnostic angiograms acquired before the procedure. Those films were assessed independently by two corelab technicians who were blinded to the clinical baseline characteristics, procedural data and clinical outcomes. In case of disagreement, the opinion of the third observer, a supervising cardiologist, was obtained and the final decision was made by consensus. To assess intra-observer variability, 91 sets of angiograms were analysed at least eight weeks later by one additional observer who remained blinded to the results of the first analysis.
SYNTAX™ score and angiographic analysis
Each coronary lesion producing >50% luminal obstruction lumen in vessels ≥1.5 mm was separately scored and summated to provide the overall SYNTAX™ score which was calculated using dedicated software that integrates (a) the number of lesions with their specific weighting factors based on the amount of myocardium distal to the lesion according to the score of Leaman et al7, and (b) the morphologic features of each single lesion, as reported in the appendix. An example of SYNTAX™ score calculation in one subject is shown in Figure 1.
Figure 1. An example of Syntax scoring.
Statistical analysis
The degree of agreement was measured as a weighted kappa statistics that reflect the agreement between two or more observations using weight to quantify the relative difference between categories8,12. It is usual to consider kappa values greater than 0.75 to represent excellent agreement beyond chance; values below 0.40 to represent a poor agreement beyond chance, and values between 0.40 and 0.75 to represent fair to good agreement beyond chance. The reproducibility of Syntax scoring was evaluated by calculating the intra-observer and inter-observer variability, which was defined as the difference between the corresponding measurements expressed as a percent of their mean. All variables were expressed as mean±standard deviation or median and range. A 2-tailed P value of <0.05 was considered to indicate statistical significance. The incidence of events over time was studied with the use of the Kaplan-Meier method, whilst log-rank tests were applied to evaluate differences between the treatment groups. Patients lost to follow-up were considered at risk until the date of last contact, at which point they were censored.
Results
Corelab reproducibility
At the corelab, the value of the first measurement was, on average, 30.3 versus 29.2 for the second measurement, with an SD of 11.5 and 11.3, respectively. The mean of the differences (measure of precision) was 2.1 with a SD of 9.1 (measure of accuracy), which reflects the core laboratory inter-observer variability. As shown in Table 1, the weighted kappa value for the observations of the global score was 0.45, while the weighted kappa value for the number of lesions was 0.59.
The values of weighted kappa was 0.82 for the diagnosis of total occlusions, 0.41 for bifurcation lesions and 0.63 for ostial lesions. Inconsistency in the scoring was mainly due to the presence of lesions in small vessels and at bifurcations. The weighted kappa for tertile partitioning of Syntax score (0-22, 23-32, 33-) was 0.52.
Table 2 represents the weighted kappa values for intra-observer reproducibility.
The weighted kappa value for the global score was 0.59, while the weighted kappa value for the number of lesions, total occlusions and bifurcation lesions was 0.71, 0.85 and 0.68, respectively. The weighted kappa for tertile partitioning of Syntax™ score (0-22, 23-32, 33-) was 0.61.
SYNTAX™ score – corelab scoring vs on-site scoring
Figure 2 shows the SYNTAX™ score in the CABG registry, the randomised cohorts and the PCI registry; average values as well as ranges are shown for the corelab and the site.
Figure 2. Bar graph of raw SYNTAX™ scores in each cohort of the Syntax trial: a comparison between Corelab assessment and site reporting. CABG: coronary artery bypass graft; RCT: randomised controlled trial; PCI: percutaneous coronary intervention.
The following observations can be made from these data: 1) the CABG registry has the highest score (37.8±13.3), the second highest group is the PCI registry with an average score of 31.6±12.3, whilst the randomised cohorts had intermediate scores of around 28-29, almost 10 points below the level of the CABG registry; 2) the investigators consistently underscored the corelab score by 3.4 points; 3) as expected by design, the score in the two randomised cohorts are comparable, (29.1±9.1 for CABG vs. 28.4± for PCI cohort, p=0.19).
SYNTAX™ score according to treatment groups
Figure 3 shows the distribution of the SYNTAX™ score in the PCI registry, the CABG registry and the cohort randomised to surgery or PCI.
Figure 3. SYNTAX™ score distribution in the registries and in the randomised cohorts.
The score distribution in these different subgroups is more or less Gaussian. The Gaussian curves of the SYNTAX™ score for patients randomised to CABG and PCI are almost superimposable. The distribution of the score for the PCI registry is shifted rightward with a mean value of 31.6±12.3, and the distribution of the SYNTAX™ score in the CABG registry is shifted even further to the right with a peak value of 37.8±33.3. When the scores of the randomised patients were divided into tertiles, the upper boundary of the lowest tertile is 22, the second tertile ranges from 23 to 32, and the lower boundary for the highest tertile is equal or greater than 33.
SYNTAX™ score and outcome at one year
As previously reported6 – and demonstrated in Figure 4A and 4B – there was no difference in outcome amongst patients randomised to surgery between those who had low, intermediate or high scores; the major adverse cardiovascular and cerebrovascular event (MACCE) rates at one year was 14.4%, 11.7% and 10.7% for low, intermediate and high scores respectively (p=0.38).
Figure 4A. Kaplan-Meier estimates of MACCE rate up to 12 months in the cohort randomised to CABG treatment stratified by tertile of SYNTAX™ score. There are no statistically significant differences between the 3 curves (p=0,38). Figure 4B. Kaplan-Meier estimates of MACCE rate to 12 months in the cohort randomised to PCI treatment stratified by tertile of SYNTAX™ score. Each curve separates at 12 months with statistical significance by log-rank test (p=0.007).
In those randomised to PCI there is a significant separation (log rank p value 0.007) of the cumulative event rate curves between patients with low, intermediate and high scores; with respective MACCE rates at 12 months of 13.5%, 16.6%, and 23.3%.
These data would suggest that patients with a low SYNTAX™ score, regardless of the presence of left main stem or 3-vessel disease, have comparable outcomes after revascularisation with PCI or CABG (Figure 5A-C); furthermore, the MACCE rate in this SYNTAX™ score cohort is not influenced by diabetic status9.
Figure 5A-C show side-by side Kaplan-Meier curves for patients either with left main or in patients with 3-vessel disease, according to the tertiles of the SYNTAX™ score in the overall population.
Therefore, the selected revascularisation strategy in this group of patients will depend on individual patient characteristics, patient preference and the physician choice.
Patients with 3-vessel disease and intermediate SYNTAX™ scores had, irrespective of their diabetic status, a higher MACCE rate following PCI than after bypass surgery6,9. Ultimately, the final selection of treatment in this group will depend on patient characteristics and comorbidity; however, PCI remains a valid option for those patients with left main disease who do not have diabetes. (Figure 5 B and 6) The MACCE rate in patients with high scores (≥33), with or without diabetes, is significantly higher in patients having PCI compared to CABG, and therefore it is inferred that PCI typically is limited by a higher repeat revascularisation rate and might be considered as surgical candidates.(Figure 5C)
Discussion
The present report underscores the important prognostic value of the SYNTAX™ score. When the general principles of analysis (i.e. the heart-team decision, SYNTAX™ score, and diabetic status) are applied to the entire enrolled population (n=3,075), it appears that numerically one-third of all the patients could reasonably be treated by PCI, whilst two-thirds of the patients might be referred to surgery, with the caveat that the present assessment is based on the result of one-year outcome. It has been repeatedly demonstrated that Kaplan Meier curves related to the outcome of surgery or PCI diverge with time, and based on the 5-year outcome the current partition in surgical and PCI candidates might be reviewed more conservatively in the near future3,10,11. This conclusion is based on the prospective, and thus blind and unbiased, evaluation of the SYNTAX™ score prior to randomisation by a blinded corelab who were unaware of the clinical status of the patient.
Overall, in the registries and in the randomised cohorts, the evaluation of the score by the corelab was somewhat more stringent, and the score was numerically higher than those calculated by the participating site. The critical question remains as to whether this potentially powerful prognostic index, at least for PCI, is a reproducible parameter. As with any visual and categorical parameter, reproducibility should be assessed by Kappa statistics. In the present study, the Kappa parameters for inter- and intra-observer reproducibility of the global SYNTAX™ score were superior to 0.40 but inferior to 0.70, which indicate fair to good agreement8; obviously there is room for improvement. The reproducibility of the score in the future will likely improve with the quality of angiography, the standardisation of the angiographic views acquired during diagnostic imaging, the provision of a SYNTAX™ score tutorial with examples based on real images, operator training, the use of objectively quantified parameters (e.g. stenosis, severity and length), consensus between highly qualified observers (technician or interventional cardiologist) and user-friendly software facilitating on-line correction. In addition, taking into account the fact that angiograms performed in the SYNTAX trial and registry may have been of higher quality than in the routine clinical practice, it would be very important to evaluate the reproducibility of SYNTAX score in a “real world” practice.
Kappa statistics are a generally accepted method of evaluating agreement between observers and are most useful when observations are frequent and have a Gaussian distribution (Figure 3). It is well known that visual estimates of lesion characteristics are less accurate in comparison to quantitatively derived parameters, as has been demonstrated in previously conducted variability and quality control studies. Beauman and Vogel12 compared visual estimations of lesion severity, to quantitative analyses of percent diameter stenosis of coronary and phantom obstructions. Quantitatively assessed coronary arteries comprising a 50% diameter stenosis and 50% phantom stenosis recordings were visually scored in ranges from 15 to 80 percent, and 30 to 95 percent respectively. Determination of the reference diameter showed that only 41% of the estimations were within 10 % of the range of the quantitatively derived diameter.
Another study in 50 lesions13 reported an inter-observer agreement of 73% for stenosis length (defined as the length of that portion of the stenosis that had a >30% reduction in luminal diameter using the adjacent normal vessel diameter as a ‘yardstick’ or unit) and 64% for lesion eccentricity (defined as asymmetrically positioning in one or more views), resulting in kappa values of respectively 0.38 and 0.25. The Cardialysis corelab in 1993 reported14 the level of agreement in inter-observer observation made on 151 lesions: 79 % for lesion eccentricity, 71% for branch point involvement, 86% for location in a bend, 98% for presence of thrombus, 90% for presence of calcification and 75 % for the lesion type according to the ACC / AHA classification. These results were largely confirmed in a second evaluation reported in 199615. Another study of 403 coronary lesions using the kappa statistics showed an excellent agreement for type C lesions (k=0.85); good agreement for TIMI flow (k=0.73), ABC classification (k=0.48), angulation (k=0.48) and side branch (k=0.40); and poor agreement for eccentricity, tortuosity, lesion calcification, and in the distinction of discrete, diffuse and tubular lesion length. The SYNTAX™ score analysed in its constituent components largely confirmed the results previously reported.
An issue of essential relevance, which contributes to the poor agreement within and between investigators, is a clear description of the definitions of lesion characteristics being assessed. Length of lesion can be interpreted, for example, as the length of plaque related to the pre-defined size of the catheter on the image. An alternative definition is the length where the lumen diameter has a stenosis > 70%, or >50%, or >30%. This can then be expressed in absolute diameters, or in terms of normal lumen diameter ratio. Lesion length can also be defined as the calliper measurement of the distance from the proximal to the distal shoulder of the lesion in the projection that best elongates the stenosis. For the SYNTAX™ score <10 and >20 mm were deliberately chosen as cut-off points for lesion length because these leave the least room for variation in interpretation.
A panel assessment gives a substantial improvement in inter- and intra-observer agreement. It is clear that the weighted sum of several simultaneous observations eliminates the most extreme disagreements, whereas the assessor working in isolation can develop his own interpretation and thus deviate from the original definitions.
Serial observations as in pre-readings, with knowledge of the results of the first observer’s judgement, may result in higher kappa values for qualitatively assessed lesion characteristics. The mechanism of improved agreement in case of pre-reading, however, differs from improved agreement following panel assessment. In serial readings, the first assessment is dominant and respected by the second reviewer, who tends to comply, resulting in an improved outcome.
So far the assessment of the SYNTAX™ score, as a prognostic index, has been only reported in the ARTS-II registry. Valgimigli et al16 specifically divided the population with 3-vessel disease in tertiles according to SYNTAX™ score and reported the outcome separately. It is noticeable that the MACCE rate in the highest tertile of the ARTS-II trial (SYNTAX™ score >26) at one year is 21.5%, which is identical to the MACCE rate observed in the highest tertile of the Syntax trial (SYNTAX™ score ≥33) in the subgroup of the 3-vessel disease (Figure 5C).
In the Syntax study, and in the subgroups of patients with 3-vessel disease and/or left main disease, the prognostic value of the SYNTAX™ score is even more significant. Irrespective of their diabetic status, the one-year outcome of all patients with left main and/ or 3-vessel disease with a SYNTAX™ score less than 22 was comparable between those randomised to PCI or surgery6. Patients with 3-vessel disease with intermediate or high scores, with or without diabetes, had significantly lower repeat revascularisation rates with surgical revascularisation than with percutaneous treatment. However, non-diabetic patients with an intermediate score and a left-main lesion (isolated or not) have an excellent outcome with PCI when compared to surgery. The take-home message is that in an all-comer population of left-main and 3-vessel disease, numerically one-third of these patients could be legitimately treated by PCI and that two thirds of patients might be referred to surgery. This initial assessment will have to be re-evaluated after medium-term follow-up out to five years. In addition, the cut-off of low, intermediate and high Syntax score classification should be further standardised and re-evaluated in the other cohort to establish robustness of this scoring system in prediction of outcomes.
Finally, we should emphasise that the analysis of the outcome was related to the raw data of the score which was based on an arbitrary ranking of the complexity of the lesions. The impact of certain anatomic parameters (tortuosity, ostial lesion etc.) on predicted outcome may have been overestimated or underestimated and should be re-evaluated on the basis of the actual outcome at one year. The process of simplifying and weighting the SYNTAX™ score will be a retrospective exercise, based on complex statistical analysis, and will again need to be prospectively tested on a different patient population. It might be more straight-forward to combine a prognostic index of mortality such as the EuroSCORE, with the descriptive coronary score of the Syntax trial, to provide more accurate risk assessment on the outcome.
The data presented in this report are the result of post-hoc subgroup analyses. It was based on a tertile division of the entire study population with the partitioning criteria being subsequently applied to subgroups of patients with either main stem or 3-vessel disease. None of the subgroup analyses (with SYNTAX™ score tertile defined a posteriori) were prespecified or statistically powered. It should be emphasised that the global hierarchical statistical hypothesis of non-inferiority of PCI as compared to surgery for treatment of left main and/ or 3-vessel disease was not confirmed; therefore, the observational data provided in the present report are hypothesis generating, and should be further validated in order to be formally incorporated in guidelines on appropriateness of revascularisation for left main or 3-vessel disease17.