To the Editor:
Epidemiologic research has to struggle with the necessity of extensive data collection despite limited financial resources. High numbers of participants and detailed questionnaires, which are often not designed for automatic data capture, are typical. Double entry of data, in combination with subsequent or simultaneous data comparison and creation of a final dataset, is state-of-the-art in clinical trials1 and has been recommended for epidemiologic studies.2,3 However, double data entry substantially increases costs compared with single data entry. Our Medline search could not identify reports assessing the data quality achieved by single versus double data entry in epidemiologic studies under real conditions. Therefore, we investigated the amount and sources of error occurring during single data entry and the potential improvement by double data entry, within the context of an ongoing multicenter environmental cohort study.
We compared 2 databases resulting from single data entry with the reference dataset (created by double entry, followed by comparison and correction of errors), using all records for August through October 2003. We defined a data entry error as a deviation between the single entry databases and the reference database in a character or digit of any database field. We took into account the type of questionnaire (interview, self-administered) and type of variable (closed questions: dichotomous, categorical; open questions: continuous, text field). To calculate the error rate, we divided the number of observed deviations by the total number of database fields. In addition, we investigated the reasons for errors in a random sample of about 10% of all observed discrepancies.
Overall, the observed error rates varied between 0.54% and 0.72% (Table). The error rates of open plain text fields were higher (ranging from 1.2% to 2.1%) than for open continuous variables (0.2%–0.8%) or closed dichotomous variables (0.4%–0.5%).
Most of the errors (72%) originated from interpretation problems by the data entry staff. These arose mostly from additional handwritten comments of the fieldworkers or study participants, or from incorrect questionnaire completion such as multiple ticks in questions where only one tick was allowed. Classic mistakes, such as shifting in input line or mistyping, led to only 28% of the observed errors.
Quality assurance measures, such as training of fieldworkers (eg, standardized interview performance) and data entry staff (eg, specifications on how to handle frequent problems), as well as monitoring of the completed questionnaires, contribute to lower error rates in data entry.4 In our study with adequately trained and experienced staff, the overall error rate of single data entry was only slightly higher than 0.5%.
Under these conditions, double data entry would only marginally enhance data quality, but would increase time and costs of data entry substantially. Not only twice the time for entering the data has to be taken into account, but also the time necessary for programming the comparisons of databases, working through the documentation to explore deviations, and performing the corrections. Expressed in monetary terms, double data entry increases cost by a factor of about 2.5 in comparison with single data entry. Although software programs can integrate the comparison of data in the second data entry, these software solutions are costly and time-consuming.
In conclusion, in times of increasingly limited financial resources, it may be worthwhile to consider single data entry with concomitant quality control5 as an option to enhance cost-effective allocation of research funds in epidemiologic studies.
Stephan K. Weiland
Department of Epidemiology; University of Ulm, Ulm, Germany email@example.com
1. European Agency for the Evaluation of Medicinal Products (EMEA). ICH Topic E 6 Guideline for Good Clinical Practice. Step 5 Consolidated Guideline 1. 5.96. Note for Guidance on Good Clinical Practice (CPMP/ICH/135/95), 2002. http://www.emea.eu.int/pdfs/human/ich/013595en.pdf
2. Bellach BM, Hense HW, Hoffmann W. Arbeitsgruppe Epidemiologische Methoden der DAE. Leitlinien und Empfehlungen zur Sicherung von Guter Epidemiologischer Praxis (GEP). 1999.http://www.rki.de/GESUND/EPIDEM/GEP_LANG.PDF
3. Whitney CW, Lind BK, Wahl PW. Quality assurance and quality control in longitudinal studies. Epidemiol Rev
4. Gibson D, Harvey AJ, Everett V, Parmar MKB, on behalf of the CHART Steering Committee. Is double data entry necessary? The CHART trials. Control Clin Trials
5. Day S, Fayers P, Harvey D. Double data entry: what value, what price? Control Clin Trials