#### Materials and Methods

We used purposive sampling from a larger study of 274 low-risk pregnant women participating in a study of serial ultrasound biometry.^{8} The criterion for selection was the measurement of four variables by ultrasound, biparietal diameter (BPD), femur length (FL), fetal abdominal area, and abdominal circumference (AC), within 7 days of delivery at term (over 37 weeks') gestation. There were 50 (18%) subjects who satisfied that criterion. Gestational ages were calculated with reference to the crown-rump length instead of the last menstrual date because the former provided more reliable estimates of the true fetal age.^{9} The BPD was estimated from the leading edge to the leading edge at the level of the cavum septum pellucidum.^{10} Femur length was measured according to the method of O'Brien et al.^{11} The fetal abdominal area and AC were measured at the level of the umbilical vein^{12} by tracing the outline of the trunk on the screen of the ultrasound machine. The outline is circular or elliptical and includes the fetal spine, umbilical vein, and stomach. Three measurements were made of each variable and the mean for each was recorded. All measurements were done by one of the authors (PO) who used a real-time ultrasound scanner (Aloka SSD-650, Tokyo, Japan) using a 3.5-MHz curvilinear probe.

Estimates of fetal weight were calculated using the Aoki,^{13} Campbell and Wilkin,^{12} Shepard et al,^{14} and Hadlock et al^{3} formulas. These formulas utilize one or more of those fetal biometric measurements to calculate the fetal weight. The formulas used to calculate the estimated fetal weight (EFW) are shown in Table 1.

The newborn birth weights (in grams) were measured using the same balance independently of the ultrasonic in utero weight estimation.

Statistical analysis was conducted to compare criterion validity, defined as the level of concordance between the observed measurements and the true or definitive state recorded independently.^{15} In this study, the observed measurements of fetal weights were those obtained by ultrasound measurements of the biometric variables within 7 days of delivery plus any additional weight gain between the ultrasound scan and delivery. Previous research^{8} has shown that between 37 and 40 weeks' gestation, the average observed weight gain was 25 g per day. Therefore, 25 g was added to the EFW for each day between the ultrasound scan and delivery of the fetus. The true state, generally known as the reference standard, was the documented birth weight obtained by balance at delivery. We assessed the concordance between the EFW and the birth weight by the use of two methods^{16â€“18}: the Bland and Altman limits of agreement method^{19} and the intraclass correlation coefficient.^{15}

The Bland and Altman limits of agreement method^{19} plots the difference between the adjusted fetal weight estimated by ultrasound and the actual birth weight against their mean values to provide a visual assessment of agreement between the two sets of measurement. The overall mean difference between the ultrasonically derived fetal weight and birth weight and the limits of agreement (mean difference Â± 1.96 Ã— standard deviation [SD]) are also plotted on the graph. With any given situation, the mean difference refers to the extent of systematic error introduced by the measurement under evaluation, whereas the limits of agreement refer to the random error.^{19}

The intraclass correlation coefficient as a method of assessing validity is not widely reported in the biomedical literature,^{15} so we will describe it in detail here. The intraclass correlation coefficient is mathematically defined as follows:

where Ïƒ^{2} subject is the variance in fetal weight between subjects; Ïƒ^{2} measurement method is the variance in fetal weight between the adjusted ultrasonic estimation and the measured birth weight at delivery; and Ïƒ^{2} error refers to the measurement error.

From that equation it can be seen that the intraclass correlation coefficient is the proportion of the total variance in the measurements (Ïƒ^{2} subject + Ïƒ^{2} measurement method + Ïƒ^{2} error) that is due to true variation between subjects. An intraclass correlation coefficient value of unity indicates that the total variance in the observed measurements is entirely a result of the difference between subjects, and there is no variance due from measurement method and error. In that situation, validity is perfect as there is no discordance between the adjusted fetal weights estimated by ultrasound and the measured birth weights. On the other hand, large values for Ïƒ^{2} measurement method and Ïƒ^{2} error will lead to a low intraclass correlation coefficient. An intraclass correlation coefficient value of zero therefore indicates complete lack of validity. An intraclass correlation coefficient value of greater than 0.75 was set a priori to indicate minimum acceptable level of validity.^{20} Fisher transformation was used to generate the 95% confidence interval (CI) of the intraclass correlation coefficient for each formula for calculating fetal weight.^{21} The purpose of this study was to obtain precise estimates of intraclass correlation coefficients for the various ultrasonic formulas for estimating fetal weight rather than to compare the various estimates for them. Assuming that the intraclass correlation coefficient is approximately 0.8 (95% CI 0.7, 0.9) and Î± = 0.05, the estimated sample size required for the study would be 31 subjects.^{15}

Analysis of variance was used to calculate the components of variance (Ïƒ^{2}) using the following equations:

where MS_{subject}, MS_{measurement method}, and MS_{error} refer to the mean square between subjects, measurement methods, and the error term in the analysis of variance table generated by Minitab Statistical Software (Minitab Inc., State College, PA).

#### Results

The descriptive statistics of the ultrasonic biometric variables, adjusted ultrasonic fetal weight estimations using the four formulas, time, and the actual birth weights are shown in Table 2. The average time interval between ultrasonic assessment and delivery Â± SD was 3.6 Â± 1.9 days. The mean gestational age at delivery Â± SD was 275.0 Â± 6.5 days.

Figure 1 shows the agreement between the adjusted, ultrasound-estimated fetal weight and the actual birth weight assessed visually by the limits of agreement method. All four formulas tended to underestimate the fetal weight, ie, the mean adjusted fetal weights derived from all the formulas were less than the mean birth weight. The smallest mean difference was obtained with the Shepard and Aoki formulas (51.4 g and 60.5 g, respectively), whereas the Campbell and Hadlock formulas produced larger mean differences (141.8 g and 190.7 g, respectively). The Aoki formula generated the smallest range between the limits of agreement (âˆ’324.2 to 445.2 g), whereas the Campbell formula produced the largest range (âˆ’286.5 to 570.1 g). The range between the limits of agreement generated with the Hadlock (âˆ’209.5 to 590.9 g) and Shepard (âˆ’365.1 to 467.9 g) formulas were intermediate between those produced by the Aoki and Campbell formulas.

Among the four formulas for estimating fetal weight by ultrasound, the highest intraclass correlation coefficient was generated with the Aoki and Shepard formulas (Table 3). Their intraclass correlation coefficients were identical, ie, 0.90 (95% CI 0.83, 0.94). The values for the intraclass correlation coefficient obtained with the Hadlock and Campbell formulas were 0.84 (95% CI 0.73, 0.91) for the Hadlock formula and 0.85 (95% CI 0.75, 0.91) for the Campbell formula.

#### Discussion

Our study showed that fetal weight estimated by ultrasound using the Aoki, Campbell, Shepard, and Hadlock formulas is a valid estimate of actual weight. The credibility of our findings depends on the rigor of design, conduct, and analysis of our study. Our study fulfills the design and conduct criteria for bias in the assessment of validity in clinical measurements.^{16,18,21} In particular, we used an appropriate reference standard for fetal weight measurement, and the author who calculated the results of ultrasound estimation of fetal weight was masked to the measurement of newborn weight.

The high level of validity for estimating fetal weight obtained with the Aoki formula might be because it uses three fetal biometric variables (BPD, fetal abdominal area, and FL). The Shepard formula, with a similarly high intraclass correlation coefficient for validity, uses two fetal biometric measurements (BPD and AC). The use of only one fetal biometric parameter (AC) to estimate fetal weight by the Campbell formula could explain why its validity is lower than that of the Aoki and Shepard formulas. Although AC is also used in addition to FL for estimating fetal weight in the Had-lock formula, it appears that AC should be used in combination with BPD rather than FL to achieve a level of validity that is similar to that observed with the Aoki formula. We would therefore recommend that the Aoki or Shepard formulas as the formulas of choice in the calculation of fetal weight from ultrasonically measured biometric variables.

Validity studies are also subject to bias in the choice of analytic methods. There is disagreement in the literature with respect to the most appropriate index of concordance for assessing validity for continuous or dimensional data.^{15} We have summarized our results using two approaches. The limits of agreement method by Bland and Altman^{19} has been advocated as a simple way to assess validity that is more easily understood by clinicians. Indeed, its importance has been hailed as the statistical counterpart of the discovery of the polymerase chain reaction.^{22} It overcomes the limitation of the conventional Pearson correlation coefficient by explicitly separating the systematic bias effect of the measurement method from random error.^{15} This distinction between systematic and random bias, although desirable, can also pose a problem in the interpretation of results when the validity of several measurement methods is compared. For example, of the four formulas used for estimating fetal weight in this study, the largest mean difference between fetal weight estimated by ultrasound and actual birth weight was obtained with the Hadlock formula (190.7 g). Although this formula generated the largest systematic bias, the magnitude of its random bias (ie, range of the limits of agreement) was smallest (800.5 g), with the exception of the Aoki formula. Therefore, the graphic presentation of the results without a simple numeric index can also make it more difficult to provide meaningful comparison of the validity of several measurement methods.^{16} Furthermore, this method of evaluating validity measures only absolute agreement, and the extent to which (dis)agreement is present against a background of true variability in the observations is not assessed numerically.^{15} In other words, the significance of the (dis)concordance between the two measurements is not evaluated against the diversity of the measurements routinely encountered in clinical practice.

The use of intraclass correlation coefficient to assess validity overcomes all the preceding limitations; therefore, it was our preferred index of concordance for this measurement variability. It provided a quantitative assessment of the variability inherent in the four measurement methods against a background of dispersion seen in the observations. The results from this study should be extrapolated only to obstetric populations with characteristics that are similar to those in this study.^{15} The newborn birth weight was 2080â€“4430 g in our study population. This range determines the limits of generalizability of our findings, and caution should therefore be used in judging the validity of ultrasonic estimation of fetal weight outside of those limits.