Journal Logo

APPLIED SCIENCES: Physical Fitness and Performance

Validity and Reliability of Activity Measures in African-American Girls for GEMS


Author Information
Medicine & Science in Sports & Exercise: March 2003 - Volume 35 - Issue 3 - p 532-539
doi: 10.1249/01.MSS.0000053702.03884.3F
  • Free


Methods to measure physical activity or energy spent on physical activity include direct observation, self-report, activity monitoring by motion sensors, heart rate monitoring, indirect and direct calorimetry, and doubly labeled water. All of these techniques offer advantages and disadvantages, and the choice of an instrument is dependent on its purpose, participant characteristics, study design, and resources (3,8). Most challenging is the assessment of physical activity in children 10 yr of age or younger.

The Girls health Enrichment Multi-site Studies (GEMS) is a collaborative two-phase research project to develop (phase 1) and test (phase 2) several obesity prevention interventions for 8- to 10-yr-old African-American (AA) girls (11). Three-month pilot studies were planned for phase 1 and full-scale trials for phase 2. All centers planned to use common measures to assess physical activity. These measures ideally consist of a self-report measure to include information on specific physical activities and an electronic monitor or accelerometer to obtain an objective measure of overall activity level. A validation study was designed during phase 1 to assist in the choice of physical activity measures suitable for the GEMS target population. The decision was to be based on reliability, validity, logistics, and cost. Three field centers, a coordinating center, and the NHLBI project office participated in the validation study.

An electronic accelerometer was used as the criterion measure against which all instruments could be compared for this validation study. Accelerometers, such as the MTI/CSA (Manufacturing Technologies, Inc., Fort Walton Beach, FL), have demonstrated good reliability (r = 0.3–0.5) and high validity compared with heart rate and energy expenditure by calorimetry (r = 0.6–0.9) (4,5,16), have the ability to detect different intensities of physical activity, and can distinguish activity by time of day. The use of pedometers as an indicator of physical activity was also investigated because of its low cost (13) and reported high validity between the pedometer and direct observation (r = 0.95) among 12-yr-old children (6). Of interest was whether the pedometer could provide a good objective measure of physical activity in lieu of an accelerometer.

The Self-Administered Physical Activity Checklist (SAPAC) (12) and Activitygram (19) were two self-report measures of children’s levels and types of activity. The SAPAC, a paper and pencil checklist of activities assessing the previous day’s activities, had been previously validated among older children in comparison with the Caltrac accelerometer (r = 0.32) (12). The SAPAC was modified and renamed the GEMS Activity Questionnaire (GAQ). The Activitygram, developed by the Cooper Institute for Aerobics Research, (version 6.0, Human Kinetics, Champaign, IL), was based conceptually on the Previous Day Physical Activity Recall (PDPAR) (18). It is a computer software tool that assesses 3 d of activity and had been preliminarily validated.

The purpose of this article is to report the reliability, which reflects the instrument reliability and intraindividual variability, and the validity, using the CSA as a criterion standard, of the physical activity instruments under consideration by the GEMS investigators for use in young AA girls. An additional objective is to determine the optimal number of days for MTI/CSA data collection.



In the spring and summer of 2000, 85 girls were recruited from urban areas of Houston, Minneapolis/St. Paul, and Memphis to participate in the validation study. The eligibility requirements were chosen to select girls who would be broadly representative of girls to be randomized in the planned GEMS pilot studies and subsequent full-scale trials, specifically healthy AA, girls age 8–9 yr of age. The girls were recruited from schools and community centers by using fliers and newspaper ads. Girls were excluded from the validation study if they were taking medications or had medical illnesses affecting growth, had conditions limiting participation in physical activity, had conditions limiting participation in measurements, or had problems likely to reduce adherence to the study protocol.

Parents provided written informed consent, and girls gave assent to participate in the validation study. Consent forms were approved by each participating institution’s Human Subject Review Board.

Study Design

The study was designed to compare the criterion method, the MTI/CSA, against a pedometer and two self-report methods, the Activitygram and GAQ. The study entailed 6 d of the girls’ involvement (Table 1). Activity monitoring data were collected for four consecutive days, referred to as days 1, 2, 3, and 4. Girls came to the research centers the day before the start of activity monitoring, referred to as clinic visit 1 for study orientation, completion of the first GAQ questionnaire, placement of the pedometer (Digiwalker, SW200, Yamax Co., Yamasa Corp., Tokyo, Japan) and MTI/CSA monitor (Model 7164WAM, Manufacturing Technologies, Inc., MTI) and instructions for the activity log. The day after the 4-d data collection period, the girls returned to the research centers to return the two monitors and complete the Activitygram and second GAQ questionnaire, in alternating order. For the majority of girls (N = 51), activity monitoring data were collected on Friday–Monday. For the remaining 17 girls activity monitoring data were collected on Sunday–Wednesday. For the GAQ questionnaire, all girls always recalled a weekday for “yesterday’s activities,” i.e., Thursday and Monday for the majority of girls, and a Friday and a Monday for the minority. By this design, the data collection periods for the MTI/CSA, pedometer, and Activitygram were comparable within each girl, and the “yesterday” recalled in the GAQ questionnaire was always a weekday. Therefore, days of data collection for methods comparison were controlled for within a child.

Design of validation study.

Demographic and Anthropometric Measurements

Age, ethnicity, household education, and income were obtained from the parent by questionnaire. Following a standardized protocol, GEMS staff measured the girl’s weight twice to the nearest 0.1 kg on an electronic scale (Seca, Model 770, Hamburg, Germany). Height was measured twice, to the nearest 0.1 cm using a portable stadiometer (Shorr Height Measuring Board, Olney, MD).

Activity Monitors

The pedometer and MTI/CSA were attached to a belt that was worn around the child’s midline above the hip, pedometer on the left, and MTI/CSA on the right. Girls were instructed to wear both of the monitors all the time, even when sleeping, and to remove them only if the monitors would get completely wet, such as when showering or swimming. The log was used to record activities for the Activitygram, the times when the MTI/CSA and pedometers were taken off, reasons for removal, and the pedometer count each night. Verbal and written instructions for each instrument were given. GEMS staff called the girl or her caregiver at least once during the 4 d to make sure she was wearing the monitors and to answer any questions.

MTI/CSA accelerometer.

The MTI/CSA monitor was programmed to begin at midnight of the day of the first clinic visit and continue for four complete days, ending at midnight. The monitor was set to accumulate the number of counts in 1-min epochs. A custom-designed Excel macro was used to download and process the data. Data were excluded from analysis if the monitor was taken off; epochs were coded manually (and checked by the computer program) as “off” if during the waking hours, 20 or more consecutive minutes contained continuous zeros. Based on previous experience, consecutive zeros for >20 min are not observed in an awake child wearing an accelerometer. Data were processed to identify four time periods of the day (midnight–6 a.m., 6 a.m.–noon, noon–6 p.m., and 6 p.m.–midnight) The total number of counts·min−1 were summed to four time periods in any day, to totals for that day, and a grand total over the 4 d.


Each night, before going to bed, the girl (with the help of her parent or caregiver) was asked to record the pedometer count and then to reset the pedometer to zero. The average number of steps per minute for each of the 4 d was computed, defined as number of steps for the day divided by the number of minutes the monitor was worn. Also, the grand average number of steps per minute across the 4 d was computed.

Self-Report Measures


At the first visit, each girl was given an activity log to record her physical activity during the 4 d she wore the activity monitors. The activity log was designed to look like a simplified version of the Activitygram software. Upon returning to the field center 5 d later, the girl used the last 3 d of activities she had recorded in the log as a prompt while completing the Activitygram.

The Activitygram software prompted the girl to recall her physical activities, or rest, for each 30-min interval beginning at 7 a.m. and ending at 10:30 p.m. For each activity, the girl chose an intensity level that best described how the activity felt; i.e., “easy” (light), “not too tiring” (moderate), or “very tiring” (vigorous). Intensity levels were coded 1–3, with rest coded as 0. The girl also chose the duration of the activity as either “some of the time” (coded as 15 min) or “all of the time” (coded as 30 min). Multiplying the score by the number of minutes in any interval gave “intensity-minutes” for that interval. For each day, the number of minutes in each intensity level was calculated, and the intensity-minutes score was aggregated to three time periods per day (7 a.m.–12 noon, after 12 noon–6 p.m., and after 6 p.m.–10:30 p.m.), for a total for that day, and a grand total over the 3 d.


The GAQ was completed at each of the two clinic visits, 4 d apart. The checklist listed 28 activities typically performed by AA girls, along with pictures of the activities (bicycling, exercise, climbing on playground equipment, basketball, baseball/softball, football, soccer, volleyball, racket sports, ball playing, games, outdoor play, water play, swimming laps, jump rope, dance, outdoor chores, indoor chores, walking and running, walking, running, gymnastics, skates, hiking, weight lifting, martial arts, yoga, and cheerleading). For each activity, girls were asked to check off whether they had engaged in that activity yesterday, and duration was ascertained by three categories (see below). They also were asked whether they usually take part in the activity, and frequency was ascertained by three categories (see below). The last seven questions of the GAQ asked about sedentary activities (TV or video watching, computer or video games, arts and crafts, board games, homework or reading, talking on phone or hanging out, listening to music, or playing an instrument) performed yesterday and usually, both having five categories of duration.

A total physical activity score was estimated for the 28 physical activities performed yesterday, applying the code 0 for the response “none,” 1 for the response “less than 15 min,” and 10 for the response “15 min or more.” Analogously, a total activity score for usual activities was based on frequency of physical activity performed. The scoring was 0 for the response “none,” 1 for the response “a little,” and 10 for the response “a lot.” This scoring system, designated as “0,” “1,” and “10,” gave a girl more credit if she recorded a few activities “a lot,” than many activities “a little.” The scoring helps differentiate very active from less active girls. Scores were weighted according to intensity level of the activity using appropriate MET values for children for each of the 28 physical activities (7). A MET-weighted average was computed (ΣMETk × Score) ÷ (ΣMETk). For the seven sedentary questions, codes were assigned as 0 for “none,” 0.25 for “less than 30 min,” 0.75 for “30 min–1 h,” 1.5 for “1–3 h,” and 2.5 for “more than 3 h.” The GAQ summary scores were computed as the total score divided by the number of nonmissing items.

A physical activity score was calculated for all 28 activities. For yesterday’s activities in the GAQ, all girls recalled a weekday, because they were recalling yesterday. The score from the first GAQ administration (reflecting activity performed before wearing the MTI/CSA) and the second GAQ administration were examined, as well as the average of the two GAQ administrations.

Statistical Analysis

The order of administration of the GAQ and Activitygram was randomized to protect against possible bias from order effects. A randomization sequence was generated using a permuted block design with varying block size. At a level of 0.05 and 80% power, a sample of 68 allowed us to detect a Pearson correlation of 0.36 between instruments (10).

A priori, MTI/CSA data were considered complete if the girl completed 70% of the day (1000 min). These criteria excluded 15 girls. Each girl must have worn the MTI/CSA for at least 3 of the 4 d, and 1 of these days must have been the last day (day 4, the same day that the GAQ data reflected). All data from the pedometer were included. Summary scores on the GAQ and Activitygram were set to missing if 25% or more of the individual items were missing.

Characteristics of the sample were summarized using simple descriptive statistics. For reliability, Pearson correlation coefficients were calculated for the two GAQ administrations, and a paired t-test was performed to determine whether the two scores were different. For the MTI/CSA, pedometer, and Activitygram that had multiple days of measurement, consistency over time was assessed by calculating the intraclass correlation (ICC). Differences between the daily means were tested using repeated measures analysis of variance (PROC MIXED in SAS). Age and field center were included as adjustment variables in the analysis of variance model. If differences were found among the days, pairwise comparisons were performed to determine which days were different. Specifically, an ACOVA-type model in PROC MIXED was developed. Then, an LSMEANS statement with the PDIFF option was used to test the pairwise comparisons among the days. Pearson correlation coefficients between the MTI/CSA and test methods were used to assess the validity of the instruments.

The ICC was computed (between-person variance/total variance), and Spearman-Brown prophecy formula was applied to estimate an optimal number of repeated measures for the MTI/CSA. The optimality criterion was the number of repeated measures needed, or k, for which the between-person variance was at least 80% of the total variance. The total variance is defined as the within-person variance divided by k, plus the between-person variance. The formula used was 0.8 = [(ICC·k)/1 + ICC·(k − 1)] and solved for k, which represents the number of measures needed to obtain the desired reliability (9).


Participant Characteristics

A total of 85 girls were enrolled, and 68 girls provided usable data for the analysis (see Statistical Analysis section for a priori rules, i.e. requirement of 1000 min of recorded data). There were no significant differences in mean age, weight, height, body mass index (BMI), and household income between girls excluded from the analyses and the girls included. There were no significant differences in variances for age, weight, BMI, and household income, with the exception of height. Households of included girls tended to have a higher educational level than excluded girls (P = 0.0028).

The mean ± SD for age, weight, height, and BMI of the girls were 9.0 ± 0.6 yr, 37.7 ± 10.7 kg, 139.1 ± 9.0 cm, and 19.4 ± 4.9 kg·m−2. The highest household or education level attained was 10.5% high school graduate or GED or less, 20.9% technical school, 43.3% some college, and 25.4% college graduate or more. The annual household income was 17.9% less than $19,999; 47.8% between $20,000 and 39,999; 14.9% between $40,000 and $59,999; and 19.4% with $60,000 or more.

Reliability and Consistency


The most counts per minute occurred in the afternoon, followed by the evening, and then the morning (Fig. 1).

Average MTI/CSA scores (counts·min−1) per day by time of day.

The ICC for average MTI/CSA counts·min−1 across 4 d was 0.37 (P < 0.0001). The average total MTI/CSA counts·min−1 were significantly different across the 4 d (P = 0.02, Table 2). Day 1 was significantly different from days 3 and 4, and day 2 was different from day 3 (all P < 0.05). For the majority of the girls (51 of the 68 girls or 75%), day 1 was a Friday and day 4 was a Monday. For only 7 girls, day 1 was a Saturday, and for 10 girls, day 4 was a Saturday.

Reliability and summary statistics for the MTI/CSA, pedometer, and Activitygram.


The ICC for average pedometer steps·min−1 across 4 d was 0.08 (P = 0.094). The average (± SD for all 4 means) number of steps·min−1 across the 4 d was 8.2 ± 4.9 steps·min−1, and there was no difference across days (P = 0.63) (Table 2).


The amount of time reported spent on physical activity was highest for light, and lowest for vigorous activities (Fig. 2). The ICC for average Activitygram intensity-minutes across 3 d was 0.24 (P = 0.005). The intensity-minutes across days categorized by time periods are shown in Figure 3. Intensity-minutes were significantly different across the days (P = 0.02). Significantly more activity was observed on day 3 than day 4 (317 vs 217, P = 0.006), and higher activity was observed during the 12 noon–6 p.m. (435 ± 299) than both the 7 a.m.–12 noon (234 ± 229) and 6 p.m.–10:30 p.m. (130 ± 162) time periods.

Average number of minutes spent per day by intensity level from Activitygram.
Average Activitygram scores (intensity-minutes) per day by time of day.


Girls tended to have higher scores for usual than yesterday physical activities (Table 3). The GAQ scores for yesterday and usual physical activity (all 28 activities) were highly correlated (r = 0.8, P < 0.0001). GAQ mean scores for the day before the clinic visit and day 4 were not different for yesterday (P = 0.10); however, usual physical activities were different (P = 0.003). The GAQ MET-weighted scores for the day before the clinic visit and day 4 for yesterday and usual physical activity (18 activities) were significantly different (P = 0.04 and P = 0.02). Nevertheless, only a few individual physical activities differed significantly. These included “walking and running yesterday,”P = 0.013; “baseball and softball usual,”P = 0.004; “walking and running usual,”P = 0.003; and “yoga usual,”P = 0.023. Scores for sedentary activities were moderately correlated (r = 0.3–0.5, P < 0.005), and no differences were observed between the day before the clinic visit and day 4 for any of the yesterday and usual sedentary activities.

Reliability and summary statistics for the various components of the GAQ MET-weighted.

Validity of the Physical Activity Instruments:

MTI/CSA versus pedometer.

The overall correlation between average of the 4 d of the pedometer (steps·min−1) and MTI/CSA monitor (counts·min−1) was moderate, at r = 0.47 (Table 4). The correlations of each of the four individual days between the pedometer and the MTI/CSA monitor were significant for day 3 but not for days 1, 2, and 4.

Validity: Pearson correlations of pedometer, Activitygram, and GAQ with MTI/CSA.

MTI/CSA versus Activitygram.

The correlation between the average of the three days of the Activitygram score and the 4-d average MTI/CSA score was significant (Table 4). When each day was examined separately, the correlation was significant only for day 4 (P = 0.024). The average Activitygram score was significantly but moderately correlated with the average MTI/CSA counts per minute for the 7 a.m.–12 noon (r = 0.43, P = 0.0003) and 6 p.m.–12 p.m. (r = 0.32, P = 0.0088) periods, but no significant association was observed for the 12 noon–6 p.m. period (data not shown).

MTI/CSA versus GAQ.

Physical activities. The correlations between the GAQ MET-weighted score derived from all 28 activities and the MTI/CSA average counts per minute were not significant, the use of yesterday or usual activities, or the use of a single or average of two GAQ administrations, (range of r-values −0.05 to 0.21, P value 0.078–0.864, data not shown).

Further examination of the GAQ items showed that there were 18 activities for which girls reported participating in more frequently and more reliably (comparing the first with the second administration of usual activities). The other 10 activities were typically recorded as “none” by the girls. The 10 activities deleted were baseball/softball, football, volleyball, racket sports, swimming laps, gymnastics, hiking, weight lifting, martial arts, and yoga. Scores were calculated both for all 28 activities and for the subset of 18 activities. Using the scoring scheme of 0–1–10, the correlations of the GAQ MET-weighted score with the MTI/CSA average counts per minute were significant for the subgroup of 18 activities (Table 4). The correlation for the average of the two usual days was r = 0.29, P = 0.02. No significant correlation was observed between the GAQ MET-weighted score of the remaining 10 activities with the average MTI/CSA counts per minute, with the scoring of 0–1–10 (r = −0.045, P = 0.70).

TV watching and sedentary activities excluding TV. Scores for TV watching and sedentary activities excluding TV were not correlated with the MTI/CSA scores. The correlations of the average of the two GAQ TV watching for both yesterday and usual scores with the average 4-d MTI/CSA counts per minute were r = −0.145 and −0.004 (P = 0.24 and 0.98, respectively). The corresponding correlations for sedentary activities excluding TV for yesterday and usual were r = 0.0227 and −0.0916 (P = 0.85 and 0.46, respectively).

Number of days for MTI/CSA monitoring. Applying the Spearman-Brown prophecy formula to these data suggested that a minimum of 7 d of measurement would be required to achieve an ICC of 0.8.


The goal of this validation study was to examine the reliability, or consistency over time, and validity, using the MTI/CSA as a criterion standard, of physical activity assessment instruments under consideration for the GEMS pilot study and subsequent phase 2 trial. The target population is AA girls age 8–10 yr. Three instruments, two self-reports and one monitor, were examined. Reliability was highest for 18 physical activities assessed by the GAQ (r = 0.8, P < 0.0001) and was moderate for the MTI/CSA accelerometer (ICC = 0.37). Lower reliability was observed for the Activitygram (ICC = 0.23) and was not significant for the pedometer (ICC = 0.08). Validity correlations between the instruments and the MTI/CSA were significant when the averages of 3 or 4 d, or two administrations, were used, for the pedometer, Activitygram, and GAQ, ranging from r = 0.47 to 0.29 (P = 0.0001–0.05). As has been reported by others, precision improves with multiple administrations (13).

The MTI/CSA was chosen as a criterion standard because it has been shown to be valid in children (4,5). Three-day average MTI/CSA counts were highly correlated with heart rate (r = 0.6) (4) and with energy expenditure (r = 0.9) (16). Previously reported reliability ranging from 0.32 to 0.53 (4) is similar to what we observed (r = 0.42). However, we observed a significant difference of about 10% between the days in the average daily MTI/CSA scores. These differences could be due to variations in how well the girls followed the protocol, how often and for what duration the monitors were removed, how well the instruments could record different types of activities, other random nonspecific effects, and true differences in physical activity performed over the 4 d (8,13).

The pedometer demonstrated low reliability (ICC = 0.08) even though the mean number of steps per minute was not different between days. The pedometer was significantly correlated with the MTI/CSA (r = 0.47, P < 0.0001), which could be attributed principally to the moderately high correlation for day 3 (r = 0.64, P < 0.0001). In other investigations in children where the study was structured and pedometer readings were obtained by the investigator, very high correlations were observed between the pedometer and direct observation (r = 0.93–0.95) (6,14,15) and with oxygen consumption (r = 0.92) (2). It is likely that the lower validity we observed (r = 0.47) for the pedometer compared with other studies was because of the need to rely on the girls’ cooperation and ability to follow the instructions and record the pedometer’s readings each night before going to bed, as well as the instruments’ different sensitivities to various types of movement. Box plots of pedometer readings showed several girls having almost double the values that are typically observed for a single day.

For self-report instruments, imprecision in cognitive processing and in recall of physical activity, both complex cognitive tasks, hamper reliability (1), particularly in young children. The SAPAC checklist, from which the GAQ was adapted, had moderate reliability when tested on the same day in another study (r = 0.65) (12). For physical activities, we observed a correlation of 0.8 between the first and second administration of the GAQ taken 5 d apart, with lower correlations for sedentary activities. Higher average scores were reported for usual activities than yesterday activities, and there was a lower percentage of extreme disagreement between the first and second administrations of the GAQ for the usual than the yesterday scores. These results suggest better reliability for the usual than yesterday score.

Other investigators have reported correlations between activity checklists and several criterion standards ranging from 0.26 to 0.60 (7,12). In our study, the best performance of the GAQ, leading to a correlation of 0.29, occurred with the average score from two administrations rather than one, scores from usual rather than yesterday activities, use of 18 physical activities which were performed more frequently and reliably than all 28 physical activities listed on the checklist, and use of 0–1–10 scoring scheme that gave more weight to activities performed “a lot” compared with those performed “a little.”

The PDPAR, upon which the Activitygram was based, had high test-retest reliability when administered on the same day in a prior study (r = 0.98) (18). We observed a lower correlation across 3 d, r = 0.24, and the scores among the 3 d were significantly different. Although the reported validity has been moderate (0.53–0.77) for the Activitygram (19) and the PDPAR (18), we observed a lower validity correlation (r = 0.37) for the Activitygram. Although the Activitygram was moderately and significantly correlated with the MTI/CSA during the morning (r = 0.43) and evening (r = 0.32) time periods, no significant correlation was observed for the 12 p.m.–6 p.m. time period (r = 0.06). Thus, the Activitygram provided similar information as the MTI/CSA on activity levels in the less-active morning and evening periods but not for the most active part of the day. This lower correlation may be due to the fact that self-reports tend to overestimate physical activity (13), making the most active time period of the day more prone to inaccuracy.

Although appealing to children because it is computer-administered, the Activitygram was difficult to administer to 8- and 9-yr-old girls. The logs used as a prompt were usually incomplete, and the ability of young girls to recall activity over 3 d appeared to be limited. In addition, the Activitygram took longer to administer than the GAQ and was sometimes stressful for the girl. This was likely related to the young age and cognitive abilities of these girls.

Based on this study, the pedometer was considered not a sufficiently good measure of physical activity to replace the MTI/CSA. Compared with the MTI/CSA, the pedometer depends on significant cooperation of the participant to record readings each night and had low reliability. Of the two self-reports, despite their similar validity and reliability, the GAQ was chosen to be used in the GEMS pilot study rather than the Activitygram because of its greater ease of administration. The GAQ also identifies specific types of physical activity performed, whereas the Activitygram identifies activities by broad categories.

Knowledge about reliability helps in guiding investigators as to how many days of measurement are necessary to capture a good estimate of overall activity. Trost et al. (17) estimated that between 4 and 5 d of MTI/CSA measurements were necessary to achieve a reliability of 0.80 in children. The calculations indicated that 7 d of monitoring the girls’ physical activity was required to attain a reliability of 0.80. Reliability of 0.64, 0.70, and 0.75 can be achieved with 3, 4, and 5 d of monitoring of these girls, respectively. In designing studies, the sample size, subject compliance, logistics, and cost will weigh into the decision as to how many monitoring days are feasible.

Limitations of our study include the loss of data from the pedometer, some questionable recordings of pedometer counts, a fairly large sample of girls were dropped from the analyses, and the use of both weekday and weekend days in the data set.

In summary, accurately measuring physical activity in young 8- to 9-yr-old girls is a challenge. Although the pedometer was not suitable, the MTI/CSA was found to be feasible for measuring overall activity level. It was not too intrusive, had a low level of participant burden, and was able to capture most activities and the time of day when the activities are performed. In contrast, self-report instruments for girls this age to assess overall activity level as well as types of activities are limited in terms of validity. Although of high reliability, it is likely that the GAQ will need more improvement to be a suitable self-report checklist for preadolescent AA girls.

We would like to thank the girls who participated in the study; the study coordinators for participant recruitment; and the GEMS Steering Committee, consisting of: Robert Klesges, Ph.D., University of Memphis; Mary Story, Ph.D., R.D., University of Minnesota; Thomas N. Robinson, M.D., M.P.H., Stanford University; Tom Baranowski, Ph.D., Baylor College of Medicine; James Rochon, Ph.D., The George Washington University Biostatistics Center; and Eva Obarzanek, Ph.D., M.P.H., R.D., National Heart, Lung, and Blood Institute.

This study was sponsored by the National Heart, Lung, and Blood Institute (U01 HL65160, U01 HL62662, U01 HL62663, U01 HL62732, and U01 HL62668)


1. Baranowski, T. R. Validity and reliability of self-report of physical activity: an information processing perspective. Res. Q. 59: 314–327, 1988.
2. Eston, R. G., A. V. Rowlands, and D. K. Ingledew. Validity of heart rate, pedometry, and accelerometry for predicting the energy cost of children’s activities. J. Appl. Physiol. 84: 362–371, 1998.
3. Freedson, P. S. Electronic motion sensors and heart rate as measures of physical activity in children. J. Sch. Health 6: 220–223, 1991.
4. Janz, K. F. Validation of the CSA accelerometer for assessing children’s physical activity. Med. Sci. Sports Exerc. 26: 369–375, 1994.
5. Janz, K. F., J. Witt, and L. T. Mahoney. The stability of children’s physical activity as measured by accelerometry and self-report. Med. Sci. Sports Exerc. 27: 1326–1332, 1995.
6. Kilanowski, C., A. Consalvi, and L. H. Epstein. Validation of an electronic pedometer for measurement of physical activity in children. Pediatr. Exerc. Sci. 11: 63–68, 1999.
7. Kimm, S. Y. S., N. W. Glynn, A. A. Kriska, et al. Longitudinal changes in physical activity in a biracial cohort during adolescence. Med. Sci. Sports Exerc. 32: 1445–1454, 2000.
8. Kohl, H. W., III, J. E. Fulton, and C. J. Caspersen. Assessment of physical activity among children, and adolescents. a review and synthesis. Prev. Med. 31: S54–S76, 2000.
9. Levin, S., D. R. Jacobs, M. Richardson, B. E. Ainsworth, and A. S. Leon. Intra-individual variation of estimates of usual physical activity. Ann. Epidemiol. 19: 481–488, 1999.
10. Machin, D., M. J. Campbell, P. M. Fayers, and A. P. Y. Pinol. Sample Size Tables for Clinical Studies, 2nd Ed. Oxford: Blackwell Scientific Publications, 1997, pp. 168–173.
11. Rochon, J., R. C. Klesges, M. Story, et al. Common design elements of the Girls health Enrichment Multi-site Studies (GEMS). Ethnicity and Disease (in press).
12. Sallis, J. F., P. K. Strikmiller, D. W. Harsha, et al. Validation of interviewer-and self-administered physical activity checklists for fifth grade students. Med. Sci. Sports Exerc. 28: 840–851, 1996.
13. Sallis, J. F., and B. E. Saelens. Assessment of physical activity by self-report: status, limitations, and future directions. Res. Q. Exerc. Sport 71: 1–14, 2000.
14. Saris, W. H. M., and R. A. Binkhorst. The use of pedometer and actometer in studying daily physical activity in man. Part II: reliability of pedometer and actometer. Eur. J. Appl. Physiol. 37: 229–235, 1977.
15. Saris, W. H. M., and R. A. Binkhorst. The use of pedometer and actometer in studying daily physical activity in man. Part I: reliability of pedometer and actometer. Eur. J. Appl. Physiol. 37: 219–228, 1977.
16. Trost, S. G., D. S. Ward, S. M. Moorehead, P. D. Watson, W. Riner, and J. R. Burke. Validity of the computer science and applications (CSA) activity monitor in children. Med. Sci. Sports Exerc. 30: 629–633, 1998.
17. Trost, S. G., R. R. Pate, P. S. Freedson, J. F. Sallis, and W. C. Taylor. Using objective physical activity measures with youth: how many days of monitoring are needed? Med. Sci. Sports Exerc. 32: 426–431, 2000.
18. Weston, A. T., R. Petosa, and R. R. Pate. Validation of an instrument for measurement of physical activity in youth. Med. Sci. Sports Exerc. 29: 138–143, 1997.
19. Welk, G. J., D. A. Dzewaltowski, G. J. Ryan, E. M. Sepulveda-Jowers, and J. L. Hill. Convergent validity of the Previous Day Physical Activity Recall and the ACTIVITYGRAM assessment. Med. Sci. Sports Exerc. 33: S144, 2001.


©2003The American College of Sports Medicine