The negative effects of particulate debris on the outcome of total hip arthroplasties (THA) are well known. Therefore, reliable techniques for in vivo evaluation of radio-graphic polyethylene penetration are valuable. Charnley and Cupic6 first measured penetration by the uniradio-graphic method, which later was modified to a duoradio-graphic method.13 Livermore et al19 reported a method using a transparent overlay with concentric circles, which is similar to other techniques.23 The first computer-assisted method to measure polyethylene penetration was introduced in 1991.14 Devane et al8,9 developed Polypenetration (Draftware Inc, Vevay, IN), and Martell and Berdia20 created Hip32 (University of Chicago, Chicago, IL). Both used edge detection and could be used to measure two-dimensional (2-D) and three-dimensional (3-D) polyethylene penetration.
Computer software has been used to measure the accuracy and precision of penetration in phantom laboratory simulations and in retrieved samples.5,7,11,17,20 When comparing conclusions from these studies, numerous issues must be considered. First, the definitions of the measurements can differ.5,7,11,17 Second, scatter, absorption of the radiation by soft tissue, and the penetration patterns occurring in vivo cannot be recreated in the laboratory. Third, in clinical radiographs, the center of the xray beam often differs from the center in phantom studies. The center of the xray beam and patient positioning also vary in followup radiographs. Fourth, although clinical pelvic radiographs contain slight distortions of the metal shell and head, the computer-assisted measurement of polyethylene penetration assumes the metal shell and femoral head are true circles on the radiographs. Finally, retrieved components usually have high polyethylene penetration and the sample size is smaller than in phantom studies. Therefore, it is difficult to compare results between the two study types.
The true penetration measurement is unknown on clinical radiographs, with the exception of revised or retrieved components. Therefore, the difference between the measured value and the true wear value cannot be reported. However, the precision between two clinical radiographs can be reported. The precision of 2-D measurements of polyethylene using clinical radiographs has not been addressed specifically in previous studies.
Our objective was to determine the precision of computer-assisted methods for assessing THA wear via analysis of clinical radiographs, and to assess the limitation of penetration measurements of computer-assisted methods during the first 5 postoperative years in hips implanted with a highly cross-linked polyethylene liner.
MATERIALS AND METHODS
We studied two patient groups surgically treated using a nonce-mented Converge® (Zimmer Inc, Warsaw, IN) metal shell and a Durasul® highly cross-linked polyethylene liner10 (Zimmer Inc). This cross-linked polyethylene was achieved by electron-beam radiation to 9.5 Mrads at 125 C° and sterilization in a vacuum. A fully hemispheric titanium alloy shell with a multihole configuration was used. The first group of 70 patients (81 hips) operated on in 2003 was used to study precision. A cobalt-chrome head was used in all patients. A 28-mm head was used in eight patients (eight hips), a 32-mm head was used in 24 patients (28 hips), a 38-mm head was used in 29 patients (35 hips), and a 44-mm head was used in nine patients (10 hips). Two hundred thirty-six pairs of radiographs were measured, and pelvic radio-graphs were taken postoperatively while the patients were in the supine position. Six weeks postoperatively, routine pelvic radio-graphs were taken of all patients while they were in the supine and standing postions. These radiographs were taken just minutes apart. Thirty-one of 70 patients (37 hips) had pelvic radio-graphs taken 6 months postoperatively while they were in the supine position.
The second patient group was used to assess the limitations of penetration measurements. These patients were operated on in 1999 using a noncemented Interop® acetabular shell (Sulzer, Austin, TX), a Durasul® liner (Zimmer Inc), and a 28-mm cobalt-chrome femoral head (Zimmer Inc). The metal shell and Converge® (Zimmer Inc) cup are essentially the same implant but with different names. The average age of the patients was61.5 ± 18.9 years. The average followup was 3.8 ± 1.1 years (range, 2-5.4 years). Forty-four patients (51 hips) had 6-week postoperative radiographs taken while they were in the supine position, which were used as a baseline. However, not all patients had annual radiographs. We used 160 followup radio-graphs for our measurements.
Anteroposterior (AP) pelvic radiographs were taken with the xray beam centered over the symphysis pubis. The distance from the xray beam to the xray film was 40 inches for radiographs of patients in supine and standing positions. To ensure heavier-set patients were captured completely on the xray film, the distance was increased accordingly. All radiographs were taken by two experienced technicians using the same technique. No positioning aids were used while taking the radiographs. The patients' legs were maximally internally rotated, and the patients were placed in the same position.
We scanned the radiographs with an xray scanner (Sierra Plus, Vidar Systems Corp, Herndon, VA), and used Adobe Photoshop (Adobe Inc, San Jose, CA) to scan the radiographs based on a Vidar TWAIN drive at a resolution of 150 dots per inch (2100 × 2550).
Measuring penetration using computer-assisted techniques requires reproducibly identifying the center of the acetabular metal shell and femoral head on serial radiographs. The acetabular center was used as a reference point for tracking motion of the femoral head center with time. For 2-D analysis, the motion of the femoral head with respect to the acetabular center was tracked on serial AP radiographs. We used Hip32 software (Hip Analysis Suite, Version 4.5, University of Chicago, Chicago, IL) to measure 2-D penetration (creep and wear).20 We used the term penetration because we do not intend to separate creep from wear.
To determine 2-D wear, Hip32 software measures the postoperative radiograph (the reference radiograph) and a followup radiograph. The postoperative radiograph can be any radiograph that serves as a reference point for comparison. The postoperative radiograph (baseline) was the radiograph taken in the operating room after the THA. We used the term observation radio-graph for the followup radiograph. The observation radiograph initially was measured on the computer followed by measurement of the baseline radiograph by an observer (ZW). In doing so, the computer automatically could calculate the penetration and direction of penetration from each pair of radiographs. Each computer measurement was copied to the hard drive for review and to verify the direction of polyethylene penetration.
We first tested precision using a computer-assisted method. Next, we assessed the limitations of using this computer method to measure the penetration on followup serial radiographs. To determine the intraobserver repeatability of the results, we used 477 pairs of radiographs measured twice in succession by the same observer (ZW).
We performed two tests of precision on the radiographs from the first group of patients. All our calculations for the tests of precision deal solely with magnitudes, not vectors. The first test of precision estimated the precision from different patient positions by comparing the penetration between radiographs obtained at 6 weeks with patients in the supine and standing positions. Penetration was measured between the postoperative baseline radiographs and the 6-week radiographs of the patients in the supine position, and then again between the baseline and 6-week radiographs of the patients in the standing position. The measurement error between these two penetrations was calculated.
The second test for precision estimated the precision of the penetration measurements via direct and indirect measurements from the baseline radiographs and the 6-week and 6-month radiographs of 31 patients (37 hips) in the supine position. The direct method measured the penetration between the 6-week and 6-month radiographs. The indirect method measured the penetration between the baseline and 6-week observation radiographs and between the baseline and 6-month observation radiographs. We subtracted the 6-week penetration from the 6-month penetration. The measurement error between the direct and indirect methods was calculated.
To assess the limitation of penetration measurements of computer-assisted methods, we determined the relationship between followup time and penetration measurements in hips with low wear in the second group. During the followup, the cumulative total penetration was expected to increase, and we assumed negative polyethylene penetration could not occur. The relationship between followup and penetration was estimated by the correlation coefficient.4
We divided the direction of penetration into four zones (Figs 1, 2). Zone 1 was a loading zone, Zone 2 was a nonloading zone into the polyethylene liner, and Zones 3 and 4 were nonloading zones away from the polyethylene liner. Although manual methods measure penetration in only one direction,19,20 the computer-assisted polyethylene penetration method could identify the direction of the penetration vector. Penetration should be directed into Zone 1 (loading zone), or at least directed toward the liner (Zones 1 and 2) and not away from it (Zones 3 and 4).
The penetration vector varied at the followups. The maximum vector displacement accounted for maximal changes, and was the absolute value of the difference between two vector angles. For example, two vector angles were measured if a patient had two followup radiographs. The absolute value of the difference between the two vector angles was the patient's maximum vector displacement. If a patient had three or more followup radio-graphs, we calculated the absolute value of the difference between vector Angles 1 and 2, 1 and 3, and 2 and 3. Therefore, the largest absolute difference was the maximum vector displacement.
Because the terms accuracy and precision often are confused, we used the definitions recommended by the American Society for Testing and Materials (ASTM),2 and the International Organization for Standardization (ISO) (Fig 3).18 Accuracy was defined as the closeness of agreement between a test result and the accepted reference value.2 When applied to a set of test results, accuracy involves a combination of random error (precision) and a common systematic error (bias). The term accuracy should not be used to avoid confusion; instead, precision and bias should be used to convey the information associated with accuracy.18 Bias was defined as the numerical value difference between the average value of all measurements and the reference value.2 In contrast with random error, bias is the total systematic error. There may be one or more systematic error components contributing to bias. Bias cannot be established if an accepted reference value is unavailable. The concept of bias may be used to describe the systematic difference between two operators, two test sites, and two test methods. In our study, bias was the difference between two measurement methods or intraobserver measurements rather than the difference between the measurement and the true penetration, as true penetration was not available on our clinical radiographs. A larger difference between the test value and the accepted reference is reflected by a larger bias value.18
Precision was defined as the closeness of agreement between independent test results obtained under stipulated conditions.2 Precision depends on the distribution of random errors and is not related to the true value or referenced value. Independent test results are results obtained in a manner not influenced by any previous results on the same or similar test object. Quantitative measures of precision depend critically on the stipulated conditions. Precision usually is expressed in terms of imprecision and is computed as the standard deviation (SD) or some multiple of the SD. Less precision is reflected by a larger SD. Repeatability and reproducibility conditions are particular sets of extreme stipulated conditions.18
We used the ASTM preferred index of precision.2 The preferred index was the 95% limit on the difference between the two test results. The ASTM preferred index of precision was calculated as follows:
In the equation, (r) is the 95% repeatability limit2 and (Sr) is the repeatability standard deviation derived from ASTM E691.1
The limit of agreement3 is a statistical method comparing two measurement techniques. The limit of agreement is the mean difference ± 1.96 times the SD of the differences. For normally distributed data, 95% of all radiographic wear measurements will fall between the mean value ± 2 SD. These limits are referred to as the limits of agreement. The limits of agreement are estimates of the values applying to a whole population. We used Analyse-It® (Analyse-It Software, Ltd, Leeds, UK) software to calculate the limit of agreement.
The rest of the statistical analysis was performed with SPSS software (SPSS Inc, Chicago, IL). The Kolmogorov-Smirnov test for normal distribution was used before additional statistical analysis was done. The reliability analysis of the SPSS software was used to calculate the intraclass correlation coefficient and repeatability SD. This reliability analysis of SPSS software does not directly provide the repeatability standard deviation, but does provide the within people mean square. Therefore, the repeatability SD equals the square root of the within people mean square. For the limitation component of our objective, we calculated the correlation between followup time and polyethylene penetration using correlation analysis.12 The vector of penetration was analyzed using the mean and SD.
Precision was influenced mainly by radiographic variances (Table 1). Precision was high (0.1 mm) when pairs of radiographs were measured twice (intraobserver measurement). But, the precision was worse (0.89 mm) when the patient's position was changed in our first test of precision. Precision weakened if a followup series of radiographs was tested (1.16 mm), such as in our second test of precision. The intraclass correlation coefficient was high(0.998) in intraobserver measurement, but decreased to0.877 and 0.383 in our first and second tests of precision, respectively. All biases were small. Although the limit of agreement of intraobserver measurement had a narrow range, the range was wide in the two tests for precision.
The correlation analysis for the limitation component of our objective showed there was no relationship between followup time and polyethylene penetration (Fig 4). A measurement error was assumed when the direction of penetration was away from the polyethylene liner (Fig 5). Eighty-seven of 160 radiographs (54.4%) had a vector direction toward the liner (Zones 1 and 2), whereas the remaining 45.6% were directed away from the liner (Zones 3 and 4). Sixty of 160 (37.5%) measurements were negative (Fig 6). At the last followup of 44 patients (51 hips), 33.3% of measurements were negative. Of 17 patients (21 hips) with a 5-year followup, 42.9% of measurements were negative. Thirty-four patients (39 hips) had at least two followup radiographs with followup between 2 and 5.4 years, and the mean of the maximum vector variation was 61.9° ± 47.1° (range, 1.8°-178.6°) for each series of radiographs (Fig 7).
The Hip32 computer-assisted technique for measuring penetration on clinical radiographs was not precise in hips with total penetration less than 1.1 mm. Because Durasul® (Zimmer Inc) components showed a total mean 5-year penetration of 0.11 mm (range, 0.06-0.47 mm), this computer method would not be precise even 5 years postoperatively.10 The primary source of penetration measurement error was related to slight distortions or blurring on the edges of the femoral head and acetabular components in clinical radiographs. When AP radiographs of the pelvis were taken, the center of the xray beam was focused on the midpoint between both femoral heads. But, because the femoral head and metal shell are not true circles on a radiograph, it was impossible to correctly estimate the centers of the head and metal shell. Consequently, the position and relationship between the center of the metal femoral head and the acetabular component were not correctly identified (Fig 7).
Our study has several limitations. We used postoperative radiographs as a baseline for testing precision, and compared with 6-week and subsequent radiographs, the postoperative radiographs were more likely to vary. However, this did not affect the results of our two tests of precision because the baseline radiograph served as its own control. For example, in the first test of precision, the postoperative radiograph served as a baseline for the radiographs of the patients in the standing and supine positions. The same was true for the second test. The postoperative radiograph served as a baseline for the radiographs obtained 6 weeks and 6 months after surgery. When we measured penetration on radiographs obtained during clinical followup, this concern was eliminated because the radiographs obtained 6 weeks after surgery served as a baseline for measuring the penetration and the vector of penetration.
Another issue was whether the vector should be considered and the displacement calculated between two observations. Because the purpose of this study was to estimate the precision between observations, not to estimate displacement between two observations, only the magnitudes were evaluated. Furthermore, the intraclass correlation coefficient, precision, and the limits of agreement are scalar quantities and consequently cannot be calculated using vector component. Therefore in this study, we only used the magnitude of penetration to calculate precision.
Although the centers were not estimated precisely, reliability analysis showed computer-assisted 2-D radio-graphic measurements consistently determined the relative position of the femoral head center and acetabular component for a given pair of radiographs by one observer (ZW). Collier et al7 reported an average error of 0 ± 0.01 mm in reliability analysis in a phantom study. Our measurement of the clinical radiographs had an average error of 0.001 mm (with a precision of 0.1 mm), which was similar to that in the phantom study. The high repeatability of the application for a given radiograph was confirmed by an intraclass correlation coefficient of 0.998. Therefore, observer error can be ignored. The main source of error in measuring a series of radiographs occurred because of the Hip32 software measurements and/or radiographs. Because radiographs in the phantom study showed less error, we think the main source of measurement error in clinical tests was radiographic variances unavoidable in a clinical setting. Measurement software should be designed to adjust for these variances. Therefore, the source of error was related indirectly to the weakness of the software.
The precision in phantom studies performed with computer-assisted measurement of polyethylene penetration was more accurate than clinical radiographic measurements. One source of error is because phantom studies cannot duplicate all variable penetration patterns of the femoral head occurring in different radiographs in vivo, mostly because of xray position. Collier et al,7 in their phantom study, reported a fixed position of the xray beam reduced error, but movement of an xray beam could overestimate the wear (0.26-0.4 mm). This error commonly occurs with clinical radiographs because patient positioning is slightly different in each radiograph. The influence of patient positioning was best seen when we compared radiographs of patients in the supine position with radio-graphs of patients in the standing position.
Error is also reduced in a laboratory setting because there is no scatter or absorption of radiation from the xray beam by soft tissue, which allows for high-quality radio-graphs and observation of the entire head of the femur. Although this quality sometimes can be obtained on clinical radiographs, radiographic variations have an adverse effect on the measurements. Phantom studies represent a best case scenario, not a full representation of in vivo.
The variability of precision in clinical radiographs was shown by the two precision tests. The best precision was the simple comparison of wear between the 6-week radio-graphs of patients in the standing position and 6-week radiographs of patients in the supine position. The precision worsened as more variables were introduced. By adding a third variable, comparing a series of radiographs from the same patient worsened the index of precision from 0.89 mm to 1.16 mm.
The precision of the clinical radiograph measurement method was similar to the precision reported in a retrieval study.17 Retrieval studies have tested polyethylene liners with linear penetration of nearly 0.2 mm per year and a total penetration of at least 1 mm.11,17,20 Hui et al17 validated radiographic techniques with retrieved liners with an average of 9.2 years followup (range, 7.1-11.9 years). They reported 0.24-mm precision using the Hip32 2-D technique. However, this precision was calculated on a per year penetration rate and not by total penetration.17 Their penetration would be best calculated using the total wear and the repeatability SD, which was 0.388 mm (2.77 × 0.388). Their precision was 0.94 mm, which was similar to our range of 0.89 to 1.16 mm for comparing two clinical radiographs.
The average radiographic variation of the vector of load for each hip was 61.9° ± 47.1° during 5 years. The directional vector of wear was toward the cup in slightly more than ½ (54.4%) of the measurements. Because a vector away from the cup surface was considered an error of measurement, the error in direction of penetration was45.6%. This error did not decrease during 5 years of followup, with 42.9% of hips with 5-year radiographs still having a negative penetration. When we eliminated negative penetration values, the average penetration was 0.46 mm. When negative penetration values were included, the penetration was 0.12 mm. In addition, volumetric wear should not be used if there is negative linear wear seen on the clinical radiographs.
Heisel et al15 reported average penetration with negative and positive numbers. Hopper et al16 showed negative penetration in their figures but did not directly average these numbers. Martell et al21,22 did not report negative wear values. Hopper et al16 and Martell et al21,22 used the method of least squares linear regression based on the magnitude of the vector to calculate the slope of the best fit, which became the slope of the penetration rate.24 This method will give lower penetration than the usual method of calculating a mean of all values.24 The vectors of wear were ignored and the negative wear numbers were hidden to readers. If there are not enough radiographs, then the results calculated by three or slightly more radiographs are not reliable because the statistical power is too low.
Two-dimensional computer-assisted radiographic measurements of polyethylene penetration had reproducible measurements on the same radiograph. However, measurements of different clinical radiographs were not so precise. When the total penetration on an individual radio-graph was less than 1.1 mm (the best precision of the Hip32 computer-assisted technique), the penetration was in the range of measurement error. The penetration of Durasul® (Zimmer Inc) polyethylene usually was less than1.1 mm at 5 years followup.
1. American Society for Testing and Materials. Standard practice for conducting an interlaboratory study to determine the precision of a test method. Annual Book of ASTM Standards
. Philadelphia, PA: American Society for Test and Materials; 1999;E691-E692.
2. American Society for Testing and Materials. Standard practice for use of the terms precision and bias in ASTM test methods. Annual Book of ASTM Standards
. Philadelphia, PA: American Society for Testing and Materials. 2002;E177-E190a.
3. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet
. 1986;1:307- 310.
4. Bland JM, Altman DG. Measurement error and correlation coefficients. BMJ
5. Bragdon CR, Malchau H, Yuan X, Perinchief R, Karrholm J, Borlin N, Estok DM, Harris WH. Experimental assessment of precision and accuracy of radiostereometric analysis for the determination of polyethylene wear in a total hip replacement model. J Orthop Res
6. Charnley J, Cupic Z. The nine and ten year results of the low-friction arthroplasty of the hip. Clin Orthop Relat Res
. 1973;95: 9-25.
7. Collier MB, Kraay MJ, Rimnac CM, Goldberg VM. Evaluation of contemporary software methods used to quantify polyethylene wear after total hip arthroplasty. J Bone Joint Surg Am
8. Devane PA, Bourne RB, Rorabeck CH, Hardie RM, Horne JG. Measurement of polyethylene wear in metal-backed acetabular cups: I. Three-dimensional technique. Clin Orthop Relat Res
. 1995; 319:303-316.
9. Devane PA, Horne JG. Assessment of polyethylene wear in total hip replacement. Clin Orthop Relat Res
10. Dorr LD, Wan Z, Shahrdar C, Sirianni L, Boutary M, Yun A. Clinical performance of a Durasul highly cross-linked polyethylene acetabular liner for total hip arthroplasty at five years. J Bone Joint Surg Am
11. Ebramzadeh E, Sangiorgio SN, Lattuada F, Kang JS, Chiesa R, McKellop HA, Dorr LD. Accuracy of measurement of polyethylene wear with use of radiographs of total hip replacements. J Bone Joint Surg Am
12. Fox J. Linear Statistical Models and Related Methods: With Applications to Social Research (Wiley Series in Probability and Mathematical Statistics)
. New York, NY: John Wiley and Sons; 1984.
13. Halley DK, Charnley J. Results of low friction arthroplasty in patients thirty years of age or younger. Clin Orthop Relat Res
. 1975; 112:180-191.
14. Hardinge K, Porter ML, Jones PR, Hukins DW, Taylor CJ. Measurement of hip prostheses using image analysis: the maxima hip technique. J Bone Joint Surg Br
15. Heisel C, Silva M, Rosa MA, Schmalzried TP. Short-term in vivo wear of cross-linked polyethylene. J Bone Joint Surg Am
. 2004;86: 748-751.
16. Hopper RH Jr, Young AM, Orishimo KF, McAuley JP. Correlation between early and late wear rates in total hip arthroplasty with application to the performance of marathon cross-linked polyethylene liners. J Arthroplasty
. 2003;18 (suppl 1):60-67.
17. Hui AJ, McCalden RW, Martell JM, MacDonald SJ, Bourne RB, Rorabeck CH. Validation of two and three-dimensional radio-graphic techniques for measuring polyethylene wear after total hip arthroplasty. J Bone Joint Surg Am
18. International Organization for Standardization. Statistics -Vocabulary and symbols. Design of experiments. ISO Standards Handbook-Statistical Methods for Quality Control
. Geneva, Switzerland: International Organization for Standardization; 1999:3534-3.11, 3.13, 3.14.
19. Livermore J, Ilstrup D, Morrey B. Effect of femoral head size on wear of the polyethylene acetabular component. J Bone Joint Surg Am
20. Martell JM, Berdia S. Determination of polyethylene wear in total hip replacements with use of digital radiographs. J Bone Joint Surg Am
21. Martell JM, Berkson E, Berger R, Jacobs J. Comparison of two and three-dimensional computerized polyethylene wear analysis after total hip arthroplasty. J Bone Joint Surg Am
22. Martell JM, Verner JJ, Incavo SJ. Clinical performance of a highly cross-linked polyethylene at two years in total hip arthroplasty: a randomized prospective trial. J Arthroplasty
. 2003;18 (suppl1):55-59.
23. Scheier H, Sandel J. Wear affecting the plastic cup in metal-plastic endoprostheses. In Gschwend N, Debrunner HU (eds). Total Hip Prosthesis
. Baltimore, MD: H. Huber; 1976:186-190.
© 2006 Lippincott Williams & Wilkins, Inc.
24. Sychterz CJ, Engh CA Jr, Yang A, Engh CA. Analysis of temporal wear patterns of porous-coated acetabular components: distinguishing between true wear and so-called bedding-in. J Bone Joint Surg Am