Bone age (BA) is a measure of the skeletal maturity of an individual and tells us about the growth potential of a child. The other applications of BA include height prediction and estimation of age in children seeking asylum in other countries and in competitive sports where chronological age (CA) may be unknown.
The two major methods of BA assessment used commonly are i) Atlas method [Gruelich-Pyle (GP) and Gilsanz-Ratib (GR)] and ii) Scoring of epiphyses method [Tanner Whitehouse-3 (TW-3) method]. GP Atlas was prepared from 1,000 Caucasian children of North European descent born between 1917 and 1942 in the USA. GR atlas, a relatively new method developed in 2005, was produced by the creation of artificial, idealized, averaged sex, and age-specific images of skeletal development. The TW method was derived from British children in 1950 and was revised as the TW-3 method in 2001 (inclusion of children from different ethnic populations and abolition of 20 bone score). TW method, in contrast to other methods, is not based on pattern matching; rather, it is based on the stage of maturity of epiphyses of long and short bones of the hand or carpal bones depending on the method used respectively (13 bone method or 20 bone method, respectively).
Although GP atlas is the most commonly used method of BA estimation worldwide, it is based only on Caucasian children and the data are over 7 decades old. Hence, there are concerns regarding its accuracy and reliability in different ethnic populations. This has been discussed in a systematic review, which showed that the GP atlas could be imprecise and should be used with caution in Asian/African children. This makes the use of the TW-3 method of BA estimation an attractive option in children of different ethnicities (Indian children), as, during its development, Asian children’s bones were also included. However, only a handful of Indian studies have compared GP atlas and TW methods of estimation, and these studies were based on the TW-2 method and were performed several decades ago. In addition, most of the studies done worldwide comparing different methods of BA estimation have used correlation analysis. Correlation coefficients measure the strength of association between two variables and not the agreement between them. Also, the wider the range of values analyzed (in our study 2–16.5 years), the greater is the correlation obtained. The objectives of this study were thus 1) to assess the relationship of bone age with chronological age as assessed by different methods (GP, GR, and TW3) in healthy Indian children and to study differences in genders and 2) to assess which of the three methods of BA assessment is most suitable for Indian children.
This was a cross-sectional, observational study (June–September 2018). Five schools and preschools were randomly selected from Pune district (western India) and approached for permission to carry out anthropometric measurements and BA X-rays. After approval from the institutional ethics committee and consent from the management of each school, written consent was obtained from the parent or caregiver of each child before study commencement.
All children in the schools approached were offered the study. The selection criteria included that the child should be physically and mentally healthy and that the parents gave consent and children (greater than 7 years) gave assent. Standing height using a portable stadiometer (Leicester Height Meter, Child Growth Foundation, UK) was measured to the nearest millimeter and weight was measured using an electronic scale (nearest 100 g). Body mass index (BMI) was computed by dividing weight in kilograms by height in meter square; height, weight, and BMI were converted to Z-scores. Children with a past history of chronic illness, previous fracture to the left hand, and whose height, weight, and BMI were >2 or <-2 SD for that age and gender were excluded. Of the 1,016 children, 908 children (477 boys and 431 girls) met the inclusion and exclusion criteria. As the TW-3 method generates Z-scores in boys ≤16.5 years and girls ≤16 years, 57 children (who were older) were excluded. The study was carried out on 851 children (438 boys, 413 girls), who then underwent X-rays for BA assessment.
CA was computed by subtracting the difference between the date of birth and the date of X-ray taken. The 851 children were then divided into prepuberty and puberty groups for analysis; girls between 2 to 9 as prepubertal and 9.1 to 18 as pubertal, in case of boys between 2 to 11 as prepubertal and 11.1 to 18 as pubertal. We performed Preece-Baines growth modeling (PBGM) of anthropometric data to evaluate the timing of pubertal height spurt. The PBGM is a family of curves that conform to the shape of the human growth curve and is one of the most widely accepted models of human growth. The model is represented by the formula:
Height = H1− (2[H1 − H0])/(−eS0[t-theta] + eS1[t-theta])
Where t = age. The equation parameter, H1 is the estimated adult height. The parameters S1 and S0 are rate constants, and H0 and theta are related to the height and age at the take-off of the adolescent growth spurt. The first and second derivatives of the equation are used to calculate the biological parameters viz. the age at peak height velocity (PHV), age at take-off (T0), the velocity at PHV, and velocity at T0. By using this model, it is possible to confirm the timing of pubertal height spurt in any given population, which is proxy to Tanner stage 2 in girls and Tanner stage 3 in boys. In our study, the age of pubertal spurt in boys was 13.9 and in girls, it was 11.7 years, respectively, which corresponds to the average timing of onset of puberty in Indian children.
Bone age X-ray
Roentgenograms of the left hand and wrist were performed by exposing them in posteroanterior (P-A) position with the elbow and wrist on the same axis, thumb at an angle of 30°, and fingers not too close or widely spaced. X-ray was centered overhead of the third metacarpal bone from a distance of 76 cm. The radiographic examinations were performed using a portable X-ray generator machine (Philips, Eindhoven, Netherlands) operating at 50 mA and the digital X-rays were generated by the standard computed radiography system (Fuji, Tokyo, Japan) by the same technician.
Bone age assessment
As this was a blinded study, subject identifying information, except gender of children, was masked during BA estimation of X-rays including subject’s CA. Initially, 30 randomly selected X-rays, different from the ones used in the study, were analyzed by three methods of BA assessment (GP atlas, GR atlas, and TW-3 method) to check for interobserver variation. TW-3 method was based on age calculated from the Radius-Ulna-Short bone (RUS) score of 13 bones. These assessments were carried out independently by four pediatric endocrinologists (VK, NL, HP, and PP) trained in methods of BA estimation. To assess intrarater variation, the 30 radiographs were analyzed initially and then were reevaluated by the same four observers independently by the three methods after a 4-week interval. If there were statistically significant inter-rater (disagreement among the four observers) and intra-rater differences, they were checked for clinical significance. If the differences were within the range of SDs given in each of the methods of BA assessment, it was considered as clinically not significant and if they were out of range, it was considered as clinically significant. As there were no significant clinical interobserver and intraobserver variations, all the 851X-rays (blinded except for gender) were analyzed by the four observers independently by all three methods
For statistical analysis, the final BA for each of the three methods for each child was the mean of readings calculated by all four observers for that particular child for that method. These mean bone ages were then converted into Z-scores. For GP and GR atlas, Z-scores were calculated using mean ages and standard deviations given in the atlas. For the TW-3 method, Z-scores were calculated by the software provided with the atlas from the CA, gender, and RUS scores.
Statistical analyses were carried out using SPSS (v. 26, Chicago, IL, USA). Outcome variables were tested for normality before performing analyses. Interobserver variation was assessed using one-way ANOVA and intraobserver variation was tested using paired sample t-test. Correlations between CA and BA measured by each method were analyzed using Pearson correlation coefficients. The correlation coefficient of 0.7 to 1 was defined as strong, 0.4 to 0.7 as moderate, and <0.4 as weak. For purpose of deciding which method of BA was most suitable in our cohort, tests of proportions, and root mean square (RMS) deviations were computed. Z-scores of differences between BA and CA in the range of +1 and −1 were defined as within normal limits and anything greater than or less than 1 SD was defined as outside normal limits (4, 5). This was then calculated as the proportion of children within normal limits by each of the three methods and the differences between these proportions were computed using related sample Cochran’s Q test. The method with the highest proportion of correct classification within normal limits was considered the best fit. Similarly, RMS deviations were also calculated for the differences between BA and CA to confirm the results of the test of proportions. The method with the least RMS deviation represented the most suitable method of BA estimation among the three methods. P values <0.05 were considered statistically significant.
Initially, bone ages on the 30 children’s X-rays (15 boys and 15 girls) as assessed by all four observers using three different methods of BA assessment were analyzed for interobserver and intraobserver variation. Although there were statistically significant intraobserver differences for GP, GR, and TW-3 methods, none of these were clinically significant (as judged by SD’s given in the methods, e.g., for GP method, VK and PP showed a difference of 3.4 and 4.1 months, respectively, which though statistically significant did not reach the clinical significance of 9.0 and 10.7 months, respectively, given by the GP atlas). There were no statistically or clinically significant interobserver differences for the GP, GR atlas, and TW-3 method (P > 0.1). Based on these observations, there were no clinically significant interobserver and intraobserver differences, the mean SD of BA assessment by the observers for each method was considered for further analysis.
Data on 851 children (438 boys, 413 girls), aged 2–16.5 including mean CA, anthropometric parameters, and BA assessed by different methods (GP atlas, GR atlas, and TW-3 method) has been illustrated in Tables 1 and 2.
The correlations between CA and BA for each method of assessment for different age groups and genders have been described in Table 3. There were statistically significant correlations between CA and BA for all methods (GP atlas, GR atlas, and TW3 method) (P < 0.05) as a whole and as per gender and age group [Table 3].
When we compared, proportions of successful classification of differences between CA and BA within normal limits (BA should lie within ± 1 SD of CA) by different methods, the TW-3 method overall was the most suitable method (P < 0.05) [Figure 1]. Similarly, for girls and boys, the TW-3 method was the most suitable method of BA assessment (boys, P = 0.08; and girls, P < 0.05) [Figure 1]. On dividing groups based on whether puberty was achieved or not, the TW-3 method was again the most applicable in prepubertal boys (P < 0.05), prepubertal girls (P > 0.1), and pubertal girls (P < 0.05) [Figure 2]. However, in pubertal boys, the GR atlas method was the most suitable (P < 0.05). When we computed RMS deviation on differences between CA and BA for each method of BA assessment, results showed a similar trend to the proportions test. The TW-3 method had the least RMS deviation [Table 4], thus suggesting that the TW-3 method was possibly the most suitable method for BA assessment of children in our study.
Figures 3 and 4 summarize the relationship of BA with CA using three different methods of assessment. In boys, till the age of 9 years, BA estimated by TW-3 method was closest to the CA (maximum delay = 6 months). However, in pubertal years, GR atlas estimated bone ages were closest to the CA, whereas GP atlas and TW-3 method underestimated the BA. As compared with this, TW-3 bone ages were close to the CA in girls irrespective of their pubertal status. Interestingly, GP and GR atlases underestimate BA till 12 years, after which they overestimate the BA.
Our cross-sectional, observational study in healthy Indian children shows that in methods used for BA assessment, the TW-3 method was overall the most suitable. This statement holds true for girls irrespective of pubertal status and prepubertal boys. However, for boys in puberty, the GR method was more suitable and BA assessment using the TW-3 method was less precise and needs to be used with caution. Also, irrespective of the method used, BA is underestimated at all ages in boys and girls till the pubertal growth spurt. In girls, after a pubertal growth spurt, BA rapidly advanced and was overestimated by all methods.
GP method, the most commonly used method worldwide, is based on Caucasian children studied in the early 20th century. Thus, when the GP method is used irrespective of the ethnicity of the child, it may lead to errors and imprecision in its estimation. Similar findings of BA estimation using the GP method being imprecise in Indian children and children belonging to different ethnicities have been documented earlier.[717-24] In our study, GP atlas underestimated BA in prepubertal children (irrespective of pubertal status) and overestimated BA in girls after 12 years. Similar findings of BA being underestimated in Korean, French, and Indian children have been documented in studies by Kim et al., Zabet et al., and Patel et al., respectively.
There have been many studies worldwide that have compared different methods of BA estimation. Studies using correlation coefficients (which only measure the strength of association and not an agreement between methods) done by Kim et al., Lin et al., and Kim et al. showed no differences between various BA methods. Similarly, in our study, the correlation between all three methods of BA estimation was very strong. However, when we used the test of proportions (number of X-rays where estimated BA was within the normal limits, i.e., ±1 SD of CA) and the RMS deviations, the TW-3 method was the most suitable method (except for pubertal boys). Similar findings of the TW method being more suitable than GP were reported by Bull et al., Pinchi et al., and Buken et al. The main reasons for TW-3 being more accurate as compared with other methods, are that the TW-3 method is based on a strong mathematical base rather than on pattern matching (GP and GR method) and multiethnic populations (Europeans, Americans, and Asians) have been used in its development. Furthermore, the TW method has been updated over time (currently TW-3 is in use) and reflects secular trends making it more applicable. Also, the TW-3 method gives BA at 1 monthly interval as compared with GP and GR methods where the differences can range from 3 months to 1 year depending on the age of the child
As GR atlas is a relatively new method of BA estimation, only a handful of studies have assessed its applicability and there are no studies on Indian children. In the study by Schmidt et al., GR atlas overestimated BA in 14- to 18-year-old girls by up to 0.6 years and underestimated BA in 14–18 boys by up to 0.5 years. In our study, when we assessed children older than 13 years, in girls, BA was overestimated by up to 0.7 years and in boys, was underestimated by up to 0.6 years. Interestingly, we found that the GR method was less accurate as compared with GP and TW-3 methods in the prepubertal years, and the accuracy improved in puberty.
There is a paucity of studies on Indian children using the TW method. These studies were performed a few decades ago and used the TW-2 method of estimation. Prakash et al. showed that when Indian children were assessed using the TW-2 method, RUS scores for 6 to 7 year old, and 13- to 14-year-old boys were advanced, and in 8- to 12-year-old boys, the RUS scores were on the 50th centile. In the same study, girls’ RUS maturity scores were slightly advanced except at 6 years. To our knowledge, ours is the first study using the TW-3 method for estimation of BA in Indian children.
Interestingly, in our study BA estimated in boys using any of the methods was delayed as compared with the CA (GP much more than TW-3) except for a few age groups where the GR method was mildly advanced. On the other hand, in Indian girls, BA was underestimated by all methods till about 12 years (although less than in boys) and after 12 years it rapidly advanced. This pattern of skeletal development is peculiar and very different from other Asian populations. In Chinese populations, GP and GR methods underestimated BA in 5- to 10-year-old boys and overestimated after 12 years, whereas, after 5 years in girls the BA was overestimated (less than in Chinese boys). In Thai children (8–16 years), where BA was assessed by GP and TW-3 methods, BA was underestimated in boys till 12 years of age after which it was overestimated, and in girls the BA was overestimated at all ages. Given these differences noted in BA estimation of different Asian populations, country-specific standards may be more useful in determining BA.
The strengths of our study include a large number of children analyzed, we also studied children from both extremes of age (preschool and adolescent). To our knowledge, this is the first study assessing the applicability and suitability of GR and TW-3 methods of BA assessment in Indian children. Also, this is the first Indian study, comparing BA in Indian children using three different methods (GP, GR, and TW-3 methods). Finally, more appropriate statistical methods (test of proportions and RMS deviation) to compare BA assessment by different methods (as compared with the correlation coefficients used by most studies) have been demonstrated in our study.
Our study is limited by the fact that children involved in the study were from a single-center and these findings may be variable in different parts of the country. Furthermore, the absence of sexual maturity data in our cohort is another limitation of our study; we have tried to overcome this limitation by using the Preece-Baines modeling to evaluate the timing of pubertal height spurt (which correlates with sexual maturity). This age-based presumption of pubertal status has its limitations; as in any given population, there may be subjects of early and late puberty, which may affect the bone ages. Hence, the results of this study based on age-based classification of the pubertal status should be interpreted with caution and need to be validated with future studies.
In conclusion, BA is underestimated in Indian boys irrespective of the method used. In Indian girls, bone age is underestimated till pubertal growth spurt, after which there is a rapid advancement of BA. Among the three methods of bone age assessment (Gruelich Pyle, Gilsanz Ratib, and Tanner Whitehouse-3), BA’s estimated by the TW-3 method were closest to the chronological age in our cohort (except for pubertal boys). Hence, it seems reasonable to recommend the use of the TW-3 method for BA estimation in Indian girls of all ages. Although the TW-3 method may be used in younger Indian boys, it should be used with caution in older boys, especially till an Indian standard bone age atlas is developed.
Declaration of patient consent
The authors certify that they have obtained all appropriate patient consent forms. In the form the patient(s) has/have given his/her/their consent for his/her/their images and other clinical information to be reported in the journal. The patients understand that their names and initials will not be published and due efforts will be made to conceal their identity, but anonymity cannot be guaranteed.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.