## Evaluation

Submitted (semi)-automatically obtained segmentations are evaluated against the reference standard, which has been defined through a consensus reading of two experts. More information on how the reference standard has been obtained can be found on the Data page. Details about the submission format can be found on the Submit page.

True positive (identified coronary calcifications), false positive (non-coronary calcifications) and false negative (missed coronary calcifications) labels are assigned to lesions accordingly. In this study, coronary calcifications are defined as connected groups of voxels (3D 6-connectivity) with intensities greater than 130 HU. Therefore, all connected voxels with the same non-negative label in the (semi)-automatically obtained segmentation are interpreted as one lesion. Performance of the methods is assessed with respect to:

1. The number of identified calcifications.
2. The volume of the identified calcifications.
3. The Agatston score of the identified calcifications.

Evaluation based on the number and volume of detected calcifications is calculated irrespective of the artery-specific labels. In addition, when artery-specific labels are provided, an evaluation is performed per artery. Agatston score is only evaluated on a per-patient basis, thus irrespective of the artery-specific labels.

## Performance analysis

Sensitivity is computed as the number of true positives divided by the total number of positives:

The positive predictive value (PPV) is computed as the fraction of true positives among all identified calcifications:

Sensitivity and PPV are computed for both the number and the volume of identified calcifications. In addition, an F1 score is computed as the harmonic mean of volume sensitivity and PPV

The automatically acquired volume and Agatston scores are compared with the reference scores by means of the two-way intraclass correlation coefficient (ICC) for absolute agreement. The ICC is computed independently for the LAD, LCX and RCA volume scores, as well as for the total volume and Agatston scores per patient.

Based on the Agatston score, patients are assigned to a cardiovascular risk category, corresponding to Agatston scores (0, 1-100, 101-300, >300). The linearly weighted Kappa coefficient is used to assess patient risk categorization with respect to the reference standard.