Automatic Assessment of Lower-Limb Alignment from Computed Tomography

Background: Preoperative planning of lower-limb realignment surgical procedures necessitates the quantification of alignment parameters by using landmarks placed on medical scans. Conventionally, alignment measurements are performed on 2-dimensional (2D) standing radiographs. To enable fast and accurate 3-dimensional (3D) planning of orthopaedic surgery, automatic calculation of the lower-limb alignment from 3D bone models is required. The goal of this study was to develop, validate, and apply a method that automatically quantifies the parameters defining lower-limb alignment from computed tomographic (CT) scans. Methods: CT scans of the lower extremities of 50 subjects were both manually and automatically segmented. Thirty-two manual landmarks were positioned twice on the bone segmentations to assess intraobserver reliability in a subset of 20 subjects. The landmarks were also positioned automatically using a shape-fitting algorithm. The landmarks were then used to calculate 25 angles describing the lower-limb alignment for all 50 subjects. Results: The mean absolute difference (and standard deviation) between repeat measurements using the manual method was 2.01 ± 1.64 mm for the landmark positions and 1.05° ± 1.48° for the landmark angles, whereas the mean absolute difference between the manual and fully automatic methods was 2.17 ± 1.37 mm for the landmark positions and 1.10° ± 1.16° for the landmark angles. The manual method required approximately 60 minutes of manual interaction, compared with 12 minutes of computation time for the fully automatic method. The intraclass correlation coefficient showed good to excellent reliability between the manual and automatic assessments for 23 of 25 angles, and the same was true for the intraobserver reliability in the manual method. The mean for the 50 subjects was within the expected range for 18 of the 25 automatically calculated angles. Conclusions: We developed a method that automatically calculated a comprehensive range of 25 measurements that defined lower-limb alignment in considerably less time, and with differences relative to the manual method that were comparable to the differences between repeated manual assessments. This method could thus be used as an efficient alternative to manual assessment of alignment. Level of Evidence: Diagnostic Level III. See Instructions for Authors for a complete description of levels of evidence.

and 3D images, both due to the difference between the natures of these measurement modalities 10 and the difference in limb loading between patients in the supine and standing positions 10,11 . Fürmetz et al. 9 proposed definitions for the manual placement of 24 different landmarks, such as the points on the femoral head, the femoral and tibial condyles, and the tibial plafond, to establish standardized protocols for alignment measurements. Various methods to automate anatomical landmark placement have been proposed to reduce both time and interobserver variation. Subburaj et al. 12,13 proposed methods that used the bone surface curvature. Several studies used alignment of the axis of inertia, model cross-sections, or shape-fitting [14][15][16] . Other authors matched statistical shape models (SSMs) of bones annotated with predefined landmarks to the bones of new patients [17][18][19][20] . All of these methods required initial manual segmentation of the bones. Furthermore, none of these methods provided all of the landmarks and angles defined by Paley.
In this study, we introduced a fully automatic method for lower-limb malalignment assessment from computed tomography (CT). We employed this method twice: first, to validate the automatic approach by comparing manual, semi-automatic, and automatic landmark and angle calculations for 20 subjects and to compare the time needed for the different methods; and second, to automatically quantify the variation in lower-limb alignment for 50 subjects and to compare the results to the normal variation.

Materials and Methods
F igure 1 shows the step-by-step manual and automatic workflows for assessment of lower-limb alignment. The steps are explained in detail below.

Data
Bilateral, non-weight-bearing, lower-limb CT scans of 50 subjects in the supine position were obtained retrospectively from All directly positioned landmarks on the proximal femur ( Fig. 2-A), distal femur ( Fig. 2-B), proximal tibia ( Fig. 2-C), and distal tibia ( Fig. 2-D). The radius of each sphere indicates the mean difference between repeat measurements for the manual method (green) or between the manual and fully automatic methods (red) over all subjects. Landmarks along the shaft (Fig. 2-E) were always calculated automatically and are shown in dark gray. See Table I for landmark definitions. the UMC (University Medical Center) Utrecht as DICOM (Digital Imaging and Communications in Medicine) data and were used without reformatting. Detailed scan parameters have been previously described by Kuiper et al. 21 . The subjects had no previous clinical indication of lower-limb malalignment and had undergone a CT scan because of unrelated medical reasons (e.g., vascular indications). The study was judged not to be subject to the Medical Research Involving Human Subjects Act (WMO) by the Medical Ethical Committee, as described in institutional review board protocol number 16-612/C.

Manual Segmentation
The femur and tibia were labeled as such and segmented in each scan by 2 operators with 2 years of experience, using both the CT Bone Segmentation module and manual editing available in Mimics (Mimics Medical 21.0; Materialise). Each operator performed the segmentation of 25 subjects.

Automatic Segmentation
An artificial intelligence-based workflow was used for automatic segmentation. The details of this method have been described by Kuiper et al. 21 . The workflow used a neural network to segment 6 different bones: the femur, tibia, fibula, hip, talus, and calcaneus. Only the femoral and tibial segmentations were used in this study. The differences between the manual and automatic segmentations were quantified by 4 commonly used metrics 22 : the Dice similarity coefficient (DSC), the mean absolute surface distance (MASD), the Hausdorff distance (HD), and the 95th percentile of the Hausdorff distance (HD95). Both manual and automatic segmentations were converted to triangulated 3D models using the marching cubes algorithm 23 .

Manual Landmarks
Thirty-two landmarks based on the methods proposed by Fürmetz et al. 9 , but adapted to include all landmarks defined by Paley 1 , were positioned twice on each of the 3D bone models of 20 randomly chosen subjects by 1 trained observer (R.J.A.K.), using the open-source software 3D Slicer 24 , to establish the intraobserver reliability. A 1-week interval between the 2 measurements was introduced to reduce memorization bias. The acronym and definition of each landmark are summarized in Table I and are shown in Figure 2. Landmarks were defined using a single point, with the exception of the following.
The femoral head center (FHC) and the femoral neck center (FNC) were each positioned using 3D data (Fig. 3). The FNC was defined as the center of the femoral neck circumference in the plane separating the femoral head from the rest of the femur (the plane corresponding to the smallest circumference) 18 . The FHC was defined as the center of a sphere fitted to the femoral head 25 .
The femoral condyle center (FCC), tibial medial condyle center (TMCC), tibial lateral condyle center (TLCC), tibial center spine (TCS), and tibial center condyle (TCC) were each defined as the mean of several other landmarks (Table I). Eight additional landmarks along the medial axis of the femur (FA1-FA4) and the tibia (TA1-TA4) were never placed manually, but were defined as the midpoint of the femoral or tibial cross-section in the transverse plane at 15%, 30%, 65%, or 75% of that bone's length. Bone length was measured from the most proximal point to the most distal point of the bone model. These percentages were empirically found to closely correspond to the landmarks defined by Paley 1 .

Mean Bone Model Construction
Fifty manually segmented femora and tibiae were used to construct the mean bone models in 4 steps. First, 1 subject was randomly chosen to serve as the template. Second, the template femur and tibia were remeshed to obtain approximately isotropic meshes with a 1-mm edge length. Third, for all other subjects, the mesh was rigidly registered and scaled using an iterative closest point (ICP) 26 algorithm to align it to the template. Fourth, the template was nonrigidly registered to the new subject using an adapted version of the nonrigidICP algorithm 26 developed by Audenaert et al. 27 , which is available online 28 . After registration, the deformed template was used as a substitute for the original mesh of that patient. The mean coordinates of each vertex in all transformed template meshes then defined the corresponding vertex of the mean femur or mean tibia.

Automatic Landmarks
Automatic landmark positioning was performed by fitting the mean bone model to a new patient, using the combined rigid and nonrigid ICP algorithms. The bone models were first split into a proximal part and a distal part along the transverse plane at 50% of the length, to improve registration in case of large deformations. After registration, the vertices of the mean bone model corresponded to the geometry of the bone model of the new subject. For each subject, a mean bone model based on the other 49 subjects was used to avoid bias.
Landmarks were automatically found by averaging the locations of the manual landmarks on the fitted mean bone model. In the cases of the FHC and FNC, the mean of the corresponding vertices was used to construct a new plane, from which the points were calculated as described earlier. For the automatic landmarks of the 20 manually annotated subjects, only the manual landmarks of other subjects were used to avoid bias.

Alignment Parameters
Both manual and automatic landmarks were used to calculate the angles described by Paley 1 . Before angle calculation, the legs were first rotated to align the FHC-to-tibial center plafond (TCP) line with the proximal-distal axis and align the femoral medial condyle posterior (FMCP)-to-femoral lateral condyle posterior (FLCP) line with the medial-lateral axis. Table II shows the angle definitions and their normal range of variation. A visual overview of the angles is shown in Figures 4 and 5. The angles were separated into 3 groups that are reflected by their prefix. Anatomical partial (ap) angles were calculated using the anatomical bone axis close to the joint, anatomical total (at) angles were calculated using the total anatomical bone axis, and mechanical (m) angles were calculated using the mechanical axis.

Evaluation
We performed the first part of the study on 20 subjects using the manual, semi-automatic (semi-auto), and fully automatic *Each angle is defined as the smallest angle between the 2 lines formed by the landmarks shown in Table I. Angles were calculated from a 2D perspective in the coronal, sagittal, or axial plane.
(full-auto) workflows to study the impact of the automatic steps on landmark positioning and angle calculation. The manual workflow used manual segmentation and manual landmark positioning. The semi-auto workflow used manual segmentation but automatic landmark positioning. The full-auto workflow used automatic segmentation and landmark positioning. The time needed for each method was measured in minutes. The absolute intraobserver differences in landmarks and angles were defined as the differences between the repeated manual measurements. The semi-auto and full-auto differences were compared with the mean of the 2 manual measurements.
The second part of the study was only performed using the full-auto workflow and compared the parameters of the study population of 50 subjects with the normal variation as summarized in Table II.

Statistical Analysis
The intraclass correlation coefficient (ICC) for the intraobserver difference in each angle was calculated using a 2-way mixed effects, absolute agreement, single-rater model. The semi-auto and full-auto ICCs were calculated using a multiple-measurements model 29 . The distribution of the differences between the values determined with the full-auto and manual methods was assessed using Bland-Altman analysis. The equivalence of each angle calculated using the full-auto method and the manual method was tested using paired two 1-sided tests (pTOST), which checks whether the mean difference falls within a predefined limit-of-agreement margin. This test used the intraobserver limit of agreement, defined as 1.96 · s intra , where s intra is the average of the standard deviation for each pair of repeated manual measurements. As the pTOST requires independent data, the tests were performed on the left and right legs separately. Measurement of the 25 angles in left and right legs separately resulted in a total of 50 comparisons. With the Bonferroni correction applied, significance was set at p < 0.001.  An example of the coronal alignment parameters. For each parameter, the actual angle for this patient is shown, followed by the normal range in square brackets. The color of each text box corresponds to the color of the lines that were used to calculate that angle. Lines that were used to calculate >1 parameter are shown as a pair of parallel lines of different colors, but are identical. These images were constructed using the fully automatic method. LPFA = lateral proximal femoral angle, LDFA = lateral distal femoral angle, MNSA = medial neck-shaft angle, and MPFA = medial proximal femoral angle.

Validation
The differences between the manual and automatic segmentations of the 50 subjects are shown in Table III. Further analysis of the segmentation accuracy can be found in the study by Kuiper et al. 21 . The operators who performed manual segmentation of the bones reported mean segmentation times of approximately 40 minutes per leg. The mean time (and standard deviation) for manual landmark positioning was 20.3 ± 3.2 minutes per leg. The mean times for the automatic workflow were 10.1 ± 2.2 minutes for segmentation and 2.2 ± 0.2 minutes for landmark positioning.
The mean differences between the repeated manual assessments and between the manual and full-auto methods are An example of the sagittal alignment parameters calculated from the manual landmarks. For each parameter, the actual angle for this patient is shown, followed by the normal range in square brackets. The color of each text box corresponds to the lines that were used to calculate that angle. Lines that were used to calculate >1 parameter are shown as a pair of parallel lines of different colors, but are identical. These images were constructed using the fully automatic method. PPTA = posterior proximal tibial angle, ADTA = anterior distal tibial angle, and ANSA = anterior neckshaft angle. illustrated for each landmark on the bone model in Figure 2. Figure 6 shows the corresponding distribution of differences in each landmark for the 3 methods. The mean difference averaged over all landmarks was 2.01 ± 1.64 mm between the repeated assessments with the manual method, 2.12 ± 1.38 mm for the semi-auto method compared with the manual method, and 2.17 ± 1.37 mm for the full-auto method compared with the manual method. The difference between the full-auto and manual methods ranged from 0.36 ± 0.19 mm for the FHC to 3.63 ± 2.21 mm for the femoral greater trochanter (FGT). Figure 7 shows the distribution of the angular differences for the manual, semi-auto, and full-auto methods. The mean absolute angular difference was 1.05°± 1.48°b etween repeated assessments using the manual method, Boxplots showing the distribution of the absolute differences between repeated measurements of each landmark with the manual method (red), between the manual and semi-automatic methods (green), and between the manual and fully automatic methods (blue). See Table I for definitions of the landmarks. Each box indicates the interquartile range (IQR), the line within the box indicates the median, the whiskers indicate points £1.5 IQR widths from the box, and the 1 symbols indicate outliers. Boxplots showing the distribution of the absolute differences between repeated measurements of each angle with the manual method (red), between the manual and semi-automatic methods (green), and between the manual and fully automatic methods (blue). See Table II   0.98°± 1.15°for the semi-auto method compared with the manual method, and 1.10°± 1.16°for the full-auto method compared with the manual method. The difference for the full-auto method ranged from 0.21°± 0.15°for the mHKA to 2.75°± 1.71°for the mechanical posterior distal femoral angle (mPDFA).

Reliability Assessment
The results of the ICC analysis of the angular measurements are shown in Table IV. Both the intraobserver reliability for the manual method and the reliability of the full-auto method compared with the manual method were good to excellent for 23 of the 25 measured angles. However, the Bland-Altman plots in Appendix Figure S.1 expose system-atic bias between the full-auto and manual values for the remaining 2 angles, the apPDFA and mPDFA. The mean and 95% confidence interval (CI) of the difference between these methods for each angle, and corresponding p value calculated using a pTOST test, are summarized in Appendix Supplemental Table 1.
Population Variation Assessment Table V shows the alignment parameters of all 50 subjects as calculated using the full-auto method. The mean fell outside of the expected normal range for 7 of the 25 angles 1 . These were the anatomical partial anterior neck shaft angle (apANSA) and the 3 variations (ap, at, and m) of the anterior distal tibial angle (ADTA) and lateral distal tibial angle (LDTA). ‡Manual indicates the agreement between repeated assessments using the manual method. §Semi-auto indicates the agreement between the mean of the manual assessments and the semi-auto method. #Full-auto indicates the agreement between the mean of the manual assessments and the full-auto method.

Discussion
I n this study, we developed a method that automatically calculated a comprehensive range of measurements that defined lower-limb alignment. By comparing manual and automatic measurements for both landmark placement and angle calculation in 20 subjects, we found that our method achieved results that closely corresponded to those of manual assessment (with differences that were generally comparable with the intraobserver reliability for the manual method). Additionally, the automatic workflow required approximately 12 minutes, compared with approximately 60 minutes for the manual workflow. We applied the automatic method to a data set of 50 subjects and found that the mean of most measurements fell within the expected range.
A benefit of fitting deformable bone models is that adding more landmarks is trivial, as positioning the landmark once is sufficient to find it in all subsequent subjects. This enables inclusion of landmarks without easily discernible morphological features, in contrast to methods that use specific bone morphology features [12][13][14][15][16] . Further research using the proposed method could focus on defining better 3D measurements to assess the bone morphometry.
The mean of 20.3 ± 3.2 minutes necessary for manual landmark positioning was similar to the time reported by Fürmetz et al. 9 (17.22 ± 4.0 minutes). Additionally, the intraobserver reliability over all angular measurements was similar between our study (mean, 1.05°[range, 0.15°to 4.68°]) and those in the study by Fürmetz et al. 9 (mean, 1.26°[range, 0.18°to 4.64°]).
The mean differences between the full-auto and manual methods were 0.05 mm larger for landmarks and 0.12°larger for angles than the differences between the semi-auto and manual methods, indicating that automatic segmentation had only a small negative impact on the accuracy. The mean differences between the full-auto and manual methods were also only 0.11 mm larger for landmarks and 0.05°larger for angles than the intraobserver differences, showing that the accuracy of the manual and automatic workflows also closely corresponded.
For most angles, the intraobserver and intermethod ICCs were similar and indicative of good or excellent reliability. Only the apPDFA and mPDFA showed poor or moderate reliability for both repeated manual assessments and the differences between the full-auto and manual methods.
The Bland-Altman analysis showed that the 95% CIs for the differences between the full-auto and manual methods were similar to the intraobserver limit of agreement. The plots exposed no systematic bias except in the apPDFA and mPDFA, which showed a greater difference between those methods for more extreme angles. A qualitative inspection of these subjects showed that this was caused by a bias in the automatic method toward a more conservative estimate of the angle, by placing the femoral anterior point (FAP) and femoral posterior point (FPP) landmarks at approximately the same height along the distal femur.
The pTOST analysis showed statistical equivalence between the manually and automatically calculated angles, as the latter were within the intraobserver limit of agreement of the former for all angles except for the anatomical partial medial proximal tibial angle (apMPTA) of the left leg (p = 0.0018). This was slightly higher than the p < 0.001 threshold, which was required due to Bonferroni correction.
Assessing the mean angles on all 50 subjects, we found that the LDTA, ADTA, and ANSA fell outside the expected normal ranges 1 . This could be due to differences in axial alignment between 2D and 3D scans, which could affect alignment measurements 30 . It could also be due to a difference in demographic characteristics between the populations in this study and the previous studies, as age, sex, and race and/or ethnicity could influence the mean alignment parameters [31][32][33] .
A limitation of the study was that the CT scans were made with the patient in the supine position rather than in a weightbearing state. Roth et al. 11 noted significant differences of  35 , or combine information from standing leg radiographs and CT scans to create a virtual standing CT scan 36 .
Another limitation was that the subjects in this study did not have indications of lower-limb malalignment or pathologies such as osteoarthritis, which could influence the accuracy of both manual and automatic measurements. The inclusion of symptomatic patients who had malalignment to study the relationship between the morphometric parameters and clinical measures and the inclusion of cadaveric samples for the direct measurement of these parameters are therefore subjects for further research.
In conclusion, the results of the proposed method closely corresponded to those of manual assessment, based on both segmentation accuracy and landmark and angle calculations. The method performed all steps fully automatically and in considerably less time. It would therefore be a valuable tool for fast and accurate assessment of lower-limb alignment.