There are questions we as epidemiologists ask when we come across unanticipated findings. Is the result plausible? Does it fit with prior knowledge and expectations? Should I discard the result because it is so discordant with what we know that it suggests serious methodological error? We know the danger that we can fall into if we rely completely on prior expectations—medicine is replete with well-established evidence that is later refuted by new data. Discarding such findings can lead us to miss important new insights.
A case in point is the paper by Nguyen and colleagues1 in this issue of Epidemiology, using data from the National Longitudinal Study of Adolescent Health (Add Health Study). The authors conducted a standard analysis to describe the mean blood pressure and percent of persons who were hypertensive in their sample population. They then compared the results with the national sample in the National Health and Nutrition Examination Survey (NHANES). Because their estimates were so discordant from the NHANES, they must have asked the plausibility questions described above. But, rather than give up, they proceeded to use this discordance as an opportunity to investigate reasons for the differences, with the hope of gaining some insight as to why measured blood pressure could differ so substantially between presumably similar populations.
Although blood pressure is a simple concept, it is extremely hard to measure accurately. Anyone who has worked with blood pressure measurement knows the long list of measurement difficulties. Measurement methodology could be the culprit for discordant findings. Possible reasons could be variations in cuff size, machine type, measurement setting, period of rest, technician training, or white-coat effect, just to name a few. Comparisons can also be influenced by differences in personal characteristics such as age, sex, race, social and economic status, obesity or body mass index, fitness, medication use, and population source and geography to name just some. Nguyen et al1 investigated many of these possibilities. Nonetheless, in the end, they could not explain the differences. The results show that, for almost all of comparisons and adjustments, mean blood pressure is about 10 mm Hg higher in the Add Health Study than in NHANES for both systolic and diastolic pressures. This is a substantial discordance.
What then do we do when our analytic methods do not provide the anticipated answers? If we have measured without error all relevant variables, then we should (in principle) be able to explain all observed variations. The 2 key problems, however, are that we do not identify all the relevant variables and we certainly do not measure them without error. Consequently, we must keep looking for explanations.
There seem to be 2 missing pieces in the analysis. One is knowledge of the validity of the sphygmomanometer device itself as implemented in this field study, and the other is possible differences in population selection for this study compared with NHANES. Although the authors describe their testing of the pressure measurement in the automated device, they did not describe comparisons with a standard mercury device. The oscillometric sphygmomanometer used in the Add Health Study did undergo certification and approval by the British Hypertension Society, with none of the suggestions of bias as large as the magnitude of differences between the populations. However, the problem with oscillometric devices in general is that the actual algorithm to estimate the blood pressure values is proprietary. Consequently, as users, we never really know how blood pressure is estimated from the oscillometric measurements. As part of NHANES, the mercury device was compared with an oscillometric device (different from the one in this study). In NHANES, the oscillometric and mercury devices differed by a mean of less than 2 mm in the age group 20-49 years.2 Thus, there is no clear reason why the sphygmomanometer used by Add Health Study personnel should have provided a higher average blood pressure. However, use of the device that estimates blood pressure from an algorithm, and with measurement by many different technicians under variable environments, might nonetheless result in unanticipated bias.
The issue of differences in results caused by differences in sampling is one for which we probably do not have the necessary information. The NHANES is a national sample of the US noninstitutionalized population, with a response rate to the examination of 75%. The Add Health Study began as a sample of students in grades 7 through 12, drawn from a representative sample of schools in the United States. In the initial recruitment, the response rate was 79%, and for Wave IV (source of data for this analysis) the response was 80% of those starting the study. The sample sizes are quite different, with the Add Health Study providing data for more than 14,000 people and NHANES more than 700—although for mean values of a measure such as blood pressure, 700 is sufficient to provide a very narrow confidence interval. The analyses by Nguyen et al1 compare results by a variety of measured characteristics, but none can address the more undefined differences, such as the effect of a greater nonresponse in the Add Health Study or selection from schools as compared with household sampling.
Totally unexplained differences in well-designed and- implemented studies are not new. A classic case is the contradictory findings from 2 epidemiology studies (Framingham Heart Study3 and the Nurses' Health Study4) on the relation between postmenopausal estrogen use and coronary heart disease. In an editorial that accompanied the publication of both papers, after consideration of various reasons for the differences, Bailar stated that he “must resort to the investigator's great cop-out: More research is needed.”5 (p1081) For the estrogen question, there was subsequent research in the form of clinical trials that essentially resolved the question, and further research in observational populations has added additional insight.6 For the paper at hand, only a little more analysis can be done to prove the discrepancy. In the absence of explanations, I would stick with NHANES as the best estimate of mean blood pressure and hypertension in this age group because of its national sample design, “gold standard” sphygmomanometer, and long-term standardization and consistency. What to do with implausible findings? As was done in this study, discrepancies should be exposed to stimulate further inquiry.
ABOUT THE AUTHOR
PAUL SORLIE is Chief of the Epidemiology Branch at the National Heart, Lung, and Blood Institute, National Institutes of Health. He has been involved in the design and implementation of many of the large cohort studies funded by the National Heart, Lung, and Blood Institute and has had long-term research interests in the prevalence and determinants of elevated blood pressure.
1. Nguyen QC, Tabor JW, Entzel PP, et al. Discordance in national estimates of hypertension among young adults. Epidemiology.
2. Ostchega Y, Nwankwo T, Sorlie PD, Wolz M, Zipf G. Assessing the validity of the Omron HEM-907XL oscillometric blood pressure measurement device in a national survey environment. J Clin Hypertens.
3. Wilson PW, Garrison RJ, Castelli WP. Postmenopausal estrogen use, cigarette smoking, and cardiovascular morbidity in women over 50: the Framingham Study. N Engl J Med.
4. Stampfer MJ, Willett WC, Colditz GA, Rosner B, Speizer FE, Hennekens CH. A prospective study of postmenopausal estrogen therapy and coronary heart disease. N Engl J Med.
5. Bailar JC III. When research results are in conflict. N Engl J Med.
6. Hoover RN. The sound and the fury, was it all worth it? Epidemiology.