**Evaluation Criteria**

Each team's submission will be ranked by the following evaluation metrics separately first. The average rank of the evaluation metrics of each team will be used to as the overall rank of each team.

Evaluation criteria include Dice Similarity Coefficient (DSC) and 95% Hausdorff Distance. Each type of the objective scores of different organs will be weighted by their importance weights and then averaged.

**Dice Similarity Coefficient (DSC): **The Dice metric measures volumetric overlap between segmentation results and annotations. Dice is computed by where A is the sets of foreground voxels in the annotation and B is the corresponding sets of foreground voxels in the segmentation result, respectively.

**95% HD: **The maximum Hausdorff distance is the maximum distance of a set to the nearest point in the other set. More formally, The maximum Hausdorff distance from set X to set Y is a maximin function, defined as:

95% HD is similar to maximum HD. However, it is based on the calculation of the 95th percentile of the distances between boundary points in X and Y. The purpose for using this metric is to eliminate the impact of a very small subset of the outliers.