Materials and Methods
We used purposive sampling from a larger study of 274 low-risk pregnant women participating in a study of serial ultrasound biometry.8 The criterion for selection was the measurement of four variables by ultrasound, biparietal diameter (BPD), femur length (FL), fetal abdominal area, and abdominal circumference (AC), within 7 days of delivery at term (over 37 weeks') gestation. There were 50 (18%) subjects who satisfied that criterion. Gestational ages were calculated with reference to the crown-rump length instead of the last menstrual date because the former provided more reliable estimates of the true fetal age.9 The BPD was estimated from the leading edge to the leading edge at the level of the cavum septum pellucidum.10 Femur length was measured according to the method of O'Brien et al.11 The fetal abdominal area and AC were measured at the level of the umbilical vein12 by tracing the outline of the trunk on the screen of the ultrasound machine. The outline is circular or elliptical and includes the fetal spine, umbilical vein, and stomach. Three measurements were made of each variable and the mean for each was recorded. All measurements were done by one of the authors (PO) who used a real-time ultrasound scanner (Aloka SSD-650, Tokyo, Japan) using a 3.5-MHz curvilinear probe.
Estimates of fetal weight were calculated using the Aoki,13 Campbell and Wilkin,12 Shepard et al,14 and Hadlock et al3 formulas. These formulas utilize one or more of those fetal biometric measurements to calculate the fetal weight. The formulas used to calculate the estimated fetal weight (EFW) are shown in Table 1.
The newborn birth weights (in grams) were measured using the same balance independently of the ultrasonic in utero weight estimation.
Statistical analysis was conducted to compare criterion validity, defined as the level of concordance between the observed measurements and the true or definitive state recorded independently.15 In this study, the observed measurements of fetal weights were those obtained by ultrasound measurements of the biometric variables within 7 days of delivery plus any additional weight gain between the ultrasound scan and delivery. Previous research8 has shown that between 37 and 40 weeks' gestation, the average observed weight gain was 25 g per day. Therefore, 25 g was added to the EFW for each day between the ultrasound scan and delivery of the fetus. The true state, generally known as the reference standard, was the documented birth weight obtained by balance at delivery. We assessed the concordance between the EFW and the birth weight by the use of two methods16–18: the Bland and Altman limits of agreement method19 and the intraclass correlation coefficient.15
The Bland and Altman limits of agreement method19 plots the difference between the adjusted fetal weight estimated by ultrasound and the actual birth weight against their mean values to provide a visual assessment of agreement between the two sets of measurement. The overall mean difference between the ultrasonically derived fetal weight and birth weight and the limits of agreement (mean difference ± 1.96 × standard deviation [SD]) are also plotted on the graph. With any given situation, the mean difference refers to the extent of systematic error introduced by the measurement under evaluation, whereas the limits of agreement refer to the random error.19
The intraclass correlation coefficient as a method of assessing validity is not widely reported in the biomedical literature,15 so we will describe it in detail here. The intraclass correlation coefficient is mathematically defined as follows:
where σ2 subject is the variance in fetal weight between subjects; σ2 measurement method is the variance in fetal weight between the adjusted ultrasonic estimation and the measured birth weight at delivery; and σ2 error refers to the measurement error.
From that equation it can be seen that the intraclass correlation coefficient is the proportion of the total variance in the measurements (σ2 subject + σ2 measurement method + σ2 error) that is due to true variation between subjects. An intraclass correlation coefficient value of unity indicates that the total variance in the observed measurements is entirely a result of the difference between subjects, and there is no variance due from measurement method and error. In that situation, validity is perfect as there is no discordance between the adjusted fetal weights estimated by ultrasound and the measured birth weights. On the other hand, large values for σ2 measurement method and σ2 error will lead to a low intraclass correlation coefficient. An intraclass correlation coefficient value of zero therefore indicates complete lack of validity. An intraclass correlation coefficient value of greater than 0.75 was set a priori to indicate minimum acceptable level of validity.20 Fisher transformation was used to generate the 95% confidence interval (CI) of the intraclass correlation coefficient for each formula for calculating fetal weight.21 The purpose of this study was to obtain precise estimates of intraclass correlation coefficients for the various ultrasonic formulas for estimating fetal weight rather than to compare the various estimates for them. Assuming that the intraclass correlation coefficient is approximately 0.8 (95% CI 0.7, 0.9) and α = 0.05, the estimated sample size required for the study would be 31 subjects.15
Analysis of variance was used to calculate the components of variance (σ2) using the following equations:
where MSsubject, MSmeasurement method, and MSerror refer to the mean square between subjects, measurement methods, and the error term in the analysis of variance table generated by Minitab Statistical Software (Minitab Inc., State College, PA).
The descriptive statistics of the ultrasonic biometric variables, adjusted ultrasonic fetal weight estimations using the four formulas, time, and the actual birth weights are shown in Table 2. The average time interval between ultrasonic assessment and delivery ± SD was 3.6 ± 1.9 days. The mean gestational age at delivery ± SD was 275.0 ± 6.5 days.
Figure 1 shows the agreement between the adjusted, ultrasound-estimated fetal weight and the actual birth weight assessed visually by the limits of agreement method. All four formulas tended to underestimate the fetal weight, ie, the mean adjusted fetal weights derived from all the formulas were less than the mean birth weight. The smallest mean difference was obtained with the Shepard and Aoki formulas (51.4 g and 60.5 g, respectively), whereas the Campbell and Hadlock formulas produced larger mean differences (141.8 g and 190.7 g, respectively). The Aoki formula generated the smallest range between the limits of agreement (−324.2 to 445.2 g), whereas the Campbell formula produced the largest range (−286.5 to 570.1 g). The range between the limits of agreement generated with the Hadlock (−209.5 to 590.9 g) and Shepard (−365.1 to 467.9 g) formulas were intermediate between those produced by the Aoki and Campbell formulas.
Among the four formulas for estimating fetal weight by ultrasound, the highest intraclass correlation coefficient was generated with the Aoki and Shepard formulas (Table 3). Their intraclass correlation coefficients were identical, ie, 0.90 (95% CI 0.83, 0.94). The values for the intraclass correlation coefficient obtained with the Hadlock and Campbell formulas were 0.84 (95% CI 0.73, 0.91) for the Hadlock formula and 0.85 (95% CI 0.75, 0.91) for the Campbell formula.
Our study showed that fetal weight estimated by ultrasound using the Aoki, Campbell, Shepard, and Hadlock formulas is a valid estimate of actual weight. The credibility of our findings depends on the rigor of design, conduct, and analysis of our study. Our study fulfills the design and conduct criteria for bias in the assessment of validity in clinical measurements.16,18,21 In particular, we used an appropriate reference standard for fetal weight measurement, and the author who calculated the results of ultrasound estimation of fetal weight was masked to the measurement of newborn weight.
The high level of validity for estimating fetal weight obtained with the Aoki formula might be because it uses three fetal biometric variables (BPD, fetal abdominal area, and FL). The Shepard formula, with a similarly high intraclass correlation coefficient for validity, uses two fetal biometric measurements (BPD and AC). The use of only one fetal biometric parameter (AC) to estimate fetal weight by the Campbell formula could explain why its validity is lower than that of the Aoki and Shepard formulas. Although AC is also used in addition to FL for estimating fetal weight in the Had-lock formula, it appears that AC should be used in combination with BPD rather than FL to achieve a level of validity that is similar to that observed with the Aoki formula. We would therefore recommend that the Aoki or Shepard formulas as the formulas of choice in the calculation of fetal weight from ultrasonically measured biometric variables.
Validity studies are also subject to bias in the choice of analytic methods. There is disagreement in the literature with respect to the most appropriate index of concordance for assessing validity for continuous or dimensional data.15 We have summarized our results using two approaches. The limits of agreement method by Bland and Altman19 has been advocated as a simple way to assess validity that is more easily understood by clinicians. Indeed, its importance has been hailed as the statistical counterpart of the discovery of the polymerase chain reaction.22 It overcomes the limitation of the conventional Pearson correlation coefficient by explicitly separating the systematic bias effect of the measurement method from random error.15 This distinction between systematic and random bias, although desirable, can also pose a problem in the interpretation of results when the validity of several measurement methods is compared. For example, of the four formulas used for estimating fetal weight in this study, the largest mean difference between fetal weight estimated by ultrasound and actual birth weight was obtained with the Hadlock formula (190.7 g). Although this formula generated the largest systematic bias, the magnitude of its random bias (ie, range of the limits of agreement) was smallest (800.5 g), with the exception of the Aoki formula. Therefore, the graphic presentation of the results without a simple numeric index can also make it more difficult to provide meaningful comparison of the validity of several measurement methods.16 Furthermore, this method of evaluating validity measures only absolute agreement, and the extent to which (dis)agreement is present against a background of true variability in the observations is not assessed numerically.15 In other words, the significance of the (dis)concordance between the two measurements is not evaluated against the diversity of the measurements routinely encountered in clinical practice.
The use of intraclass correlation coefficient to assess validity overcomes all the preceding limitations; therefore, it was our preferred index of concordance for this measurement variability. It provided a quantitative assessment of the variability inherent in the four measurement methods against a background of dispersion seen in the observations. The results from this study should be extrapolated only to obstetric populations with characteristics that are similar to those in this study.15 The newborn birth weight was 2080–4430 g in our study population. This range determines the limits of generalizability of our findings, and caution should therefore be used in judging the validity of ultrasonic estimation of fetal weight outside of those limits.
1. Simon NV, Levisky JS, Shearer DM, O'Lear MS, Flood JT. Influence of fetal growth patterns on sonographic estimation of fetal weight. J Clin Ultrasound 1987;15:376–83.
2. Lin CC, Moawad AH, Rosenow PJ, River P. Acid-base characteristics of fetuses with intrauterine growth retardation during labor and delivery. Am J Obstet Gynecol 1980;137:553–9.
3. Hadlock FP, Harrist RB, Sharman RS, Deter RL, Park SK. Estimation of fetal weight with the use of head, body and femur measurements—A prospective study. Am J Obstet Gynecol 1985; 151:333–7.
4. Deter RL, Harrist RB, Hadlock FP, Carpenter RJ. The use of ultrasound in the assessment of normal fetal growth: A review. J Clin Ultrasound 1981;9:481–93.
5. Jordaan HVF. Estimation of fetal weight by ultrasound. J Clin Ultrasound 1983;11:59–66.
6. Hill LM, Breckle R, Wolfgram KR, O'Brien PC. Evaluation of three methods for estimating fetal weight. J Clin Ultrasound 1986;14:171–8.
7. Hirata GI, Medearis AL, Horenstein J, Bear MB, Platt LD. Ultra-sonographic estimation of fetal weight in the clinically macrosomic fetus. Am J Obstet Gynecol 1990;162:238–42.
8. Owen P, Donnet ML, Ogston SA, Christie AD, Howie PW, Patel NB. Standards for ultrasound fetal growth velocity. Br J Obstet Gynaecol 1996;103:60–9.
9. Geirsson RT, Busby-Earle RMC. Certain dates may not provide a reliable estimate of gestational age. Br J Obstet Gynaecol 1991;98:108–9.
10. Campbell S. An improved method of fetal cephalometry by ultrasound. J Obstet Gynaecol Br Commonw 1968;75:568–73.
11. O'Brien GD, Queenan JT, Campbell S. Assessment of gestational age in the second trimester by real-time ultrasound measurement of the femur length. Am J Obstet Gynecol 1981;139:544–8.
12. Campbell S, Wilkin D. Ultrasonic measurement of fetal abdomen circumference in the estimation of fetal weight. Br J Obstet Gynaecol 1975;82:689–97.
13. Aoki M. Fetal weight calculation; Osaka University method. In: Yoshihide C, ed. Ultrasound in obstetrics and gynaecology. 2nd ed. Kyoto: Kinpodo, 1990:95–107 [Japanese].
14. Shepard MJ, Richards VA, Berkowitz RL, Warsof SL, Hobbins JC. An evaluation of two equations for predicting fetal weight by ultrasound. Am J Obstet Gynecol 1982;142:47–54.
15. Streiner DL, Norman GR. Health measurement scales. A practical guide to their development and use. 2nd ed. Oxford, United Kingdom: Oxford University Press, 1998.
16. Khan KS, Chien PFW, Honest MR, Norman GR. Evaluating measurement variability in clinical investigations: The case of ultrasonic estimation of urinary bladder volume. Br J Obstet Gynaecol 1997;104:1036–42.
17. Chien PFW, Neven P, Khan KS, Agustsson P, Patel NB, Ogston S. The validity and reliability of real-time ultrasound estimation of bladder volume in postnatal women. J Obstet Gynaecol 1996;16:224–7.
18. Nwosu CR, Khan KS, Chien PFW, Honest MR. Is real-time ultrasonic bladder volume estimation reliable and valid? An overview. Scand J Urol Nephrol 1998;32:325–30.
19. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;2:307–10.
20. Kramer MS, Feinstein AR. The biostatistics of concordance. Clin Pharmacol Ther 1981;29:111–23.
21. Dunn G, Everitt B. Clinical biostatistics: An introduction to evidence-based medicine. London: Edward Arnold, 1995;35–63.
22. McDonough PG. Measurement error—how much of a difference does it take to make a difference? Fertil Steril 1997;67:790–1.