Secondary Logo

Journal Logo

Institutional members access full text with Ovid®

Data Mining Methods Find Demographic Predictors of Preterm Birth

Goodwin, Linda K.; Iannacchione, Mary Ann; Hammond, W. Ed; Crockett, Patrick; Maher, Sean; Schlitz, Kaye


Background Preterm births in the United States increased from 11.0% to 11.4% between 1996 and 1997; they continue to be a complex healthcare problem in the United States.

Objective The objective of this research was to compare traditional statistical methods with emerging new methods called data mining or knowledge discovery in databases in identifying accurate predictors of preterm births.

Method An ethnically diverse sample (N = 19,970) of pregnant women provided data (1,622 variables) for new methods of analysis. Preterm birth predictors were evaluated using traditional statistical and newer data mining analyses.

Results Seven demographic variables (maternal age and binary coding for county of residence, education, marital status, payer source, race, and religion) yielded a .72 area under the curve using Receiving Operating Characteristic curves to test predictive accuracy. The addition of hundreds of other variables added only a .03 to the area under the curve.

Conclusion Similar results across data mining methods suggest that results are data-driven and not method-dependent, and that demographic variables offer a small set of parsimonious variables with reasonable accuracy in predicting preterm birth outcomes in a racially diverse population.

Linda K. Goodwin, PhD, RN, is Division Chief, Health Systems and Primary Care, and Director, Nursing Informatics Program, School of Nursing and Community and Family Health Medicine (Joint Appointment), Duke University, Durham, NC.

Mary Ann Iannacchione, BSN, RN, is Project Director, Clinical Research Nurse, Durham, NC.

W. Ed Hammond, PhD, is Professor, Medical Informatics, Durham, NC.

Patrick W. Crockett, PhD, is Statistical Consultant, Chapel Hill, NC.

Sean Maher, MS, is Chief Technical Officer,, Boston, MA.

Kaye C. Schlitz, RNC, is Clinical Information Systems Administrator, Durham, NC.

Editor’s Note:

This manuscript was accepted for the 50 th Anniversary Issue.

Accepted for publication May 21, 2001.

Funding for this work was provided by the National Library of Medicine RO1 LM-06488, $867,443.00 (Direct Costs), LK Goodwin, Principal Investigator, (1997–2000).

The authors thank Elizabeth Tornquist for her expertise and assistance in editing this manuscript.

Address correspondence to Linda K. Goodwin, PhD, RN, Box 3322, Duke University, Durham, NC 27710. (e-mail:

© 2001 Lippincott Williams & Wilkins, Inc.