Assigning Identification Numbers to Study Participants
We follow a unique numbering system to identify the study participants (both the pregnant women and their newborns) and respective clinical specimens.8 A woman receives a unique 7-digit identification number (ID) for each pregnancy registered. The first digit of an ID refers to the study site where the pregnancy is registered with the last digit always zero (0), while the 5 digits in-between are study participant identifiers that are generated sequentially by the data server of respective sites. Each live birth receives a 7-digit ID. The first 6 digits are the same as for the corresponding mother, while the seventh digit refers to the child’s serial birth outcome from the same pregnancy (1 for first birth, 2 for second birth, etc.). The list of pregnancy IDs is generated by the ANISA data management software at each site, and CHWs allocate those to study participants. Each CHW can only use the IDs from her own list.
Data Management Application
The ANISA data management application was designed and developed using Microsoft SQL Server 2008 R2 for data storage, Dot net (.Net) with Code Behind C Sharp (C#) and Visual Basic 6.0 for interfaces and Crystal Report 8.0 for reporting.9,10 Some site-specific modifications have been made for capturing site-specific information (eg, the local identification part). Every site uses this application for entry and storage of its own data. The 5 primary components of the ANISA data management application are detailed in the following sections.
The primary graphical user interface for data entry is traditional keyboard input. The graphical user interface is designed for data entry to match the DCFs with an emphasis on ease and speed. Simultaneous data entry is possible through multiple client computers connected to a local server. Validation rules are set to prevent inconsistency and other errors during data entry. These validation rules include logical checks, range checks, uniqueness check of IDs and skipping rules for avoiding entry of undesired information. The data entry system has 2 parts linked by the study ID, the field data entry system and the laboratory data entry system. The interface for laboratory data entry also includes a bar-code reader for logging specimen repository and entry of specimen testing results. This interface also has the facility of importing molecular test results from the molecular Taqman array machine (Life Technologies, Foster City, CA). Additional user interfaces include report generation, ID creation, data editing and text message processing systems.
A relational management system is used for the databases, and all data tables are linked through primary and foreign keys. The primary key of a relational table uniquely identifies each record in the table. The foreign key in a relational table identifies the primary key of another table. The foreign keys are used to uniquely identify the relationship between 2 tables. The detailed architecture of the database system is shown in Figure 2.
Data Storage and Uploading System
Each study site maintains its ANISA database in a local server and extracts its own data for error checking, reporting and study monitoring. Sites upload their data weekly to the central server located at the Child Health Research Foundation, Dhaka, Bangladesh. ANISA uses ADO.NET’s SqlBulk Copy program for quick uploading of the data via the internet. If data upload to the central server is not successful or is not carried out as scheduled, an automatic notification email is sent to the responsible persons at the site and the DCC.
Specimen Tracking and Biorepository System
Both the newborn and the specimen IDs are recorded in the specimens collection form and physician assessment form so that both IDs can be linked in the databases. A virtual specimen biorepository is created in every site database, mirroring the actual location of a specimen in the freezer. The specimen tracking and biorepository system is described in detail elsewhere in this supplement.8
Text Message System
ANISA uses mobile phone-based text messages as an easily accessible method of information input and output for the data management application. This permits entering some critical data in the site databases in real time. The use of text messaging in ANISA is described separately in this supplement.11
During the development of the ANISA data management application, software testing personnel were assigned to work with the programmers. After development of the data entry interface for each DCF, that interface was tested with different ranges/values to check whether the application could flag inconsistent data, follow skip rules and capture the values within ranges. The beta version of the data management application was also tested by 3 of the site teams with real study data.
The ANISA DCC aims to ensure generation of high quality and reliable data from all sites through the following set of activities, processes and procedures.
Training on the ANISA Data Management System
Before the pilot phase of the project, we conducted a 3-day workshop on the ANISA data management system. The lead of the data team and a data supervisor from each site attended the workshop, where the following issues were covered:
- Use of the instruction manual for the ANISA data management system and code book of the data files;
- Configuring the MS SQL server and restoring the ANISA data management system in the site servers;
- Installation and use of ANISA data management application;
- Guidelines for data entry and comparing the first and second entry of data;
- Guidelines for data editing, cleaning and uploading;
- Configuring the database server to run the text message software.
Each site uses its own monitoring procedure to check data consistency and validity. Field supervisors review each DCF filled out by a CHW before entry into the system. A clinical supervisor checks each DCF filled out by the study physicians. In addition, using STATA 12.0 the DCC has developed a program to identify inconsistent values within and between tables and variables. This program generates a list of study IDs with inconsistent entries, which is shared with the sites for corrective action.
Double Data Entry
Keystroke errors during data entry are common and can affect quality. The ANISA data management software allows study sites to enter the same data twice. ANISA’s built-in comparison program checks the entries in both sets of data and identifies the difference between the 2 entries. Data supervisors at the study sites check the mismatches and correct the values after reviewing the DCFs. A web page (http://chu.icddrb.org/anisa) that automatically displays site-specific data entry progress has been designed to monitor data entry at each site.
Data Security and Audit Trail
The data at site servers are secured by user-specific passwords that require separate permissions for data entry and editing. An audit trail system was added to the data management application for keeping track of every update made in the databases after data entry and identifying the persons who made the changes with time and date.
Each site is responsible for making a backup of their site-specific databases at periodic intervals. At the Coordination Centre, we create a backup of the entire database on an optical disc once a month and store it in a location separate from that of the server. In addition, we maintain a backup copy of the entire database on a different server in a different physical location.
CHALLENGES AND KEY LESSONS
All the ANISA study sites have experience in data collection with printed DCFs. A key challenge in using paper-based DCFs is that any change to a form creates difficulties in data capture, entry and analysis. Although the site teams evaluated the DCFs during their creation and the piloting of the study, the DCC still make changes to the DCFs as necessary. These changes are reflected simultaneously in the DCFs across all sites. The data management application is updated accordingly to facilitate smooth data entry with the modified DCFs. We tried to create a data entry interface that allows speedy data entry and minimizes backlog. Most of the DCF data is entered into the database within 1 month of collection (Fig. 3), which helps to monitor study progress in real time. We are using MS SQL Server for data storage that allows convenient export of the data to other formats (eg, MS Excel, STATA, SAS, etc.) for analysis.
Programmers from the DCC resolve issues with the data management software as they arise. They also use TeamViewer software to access the site servers remotely for troubleshooting. The combination of the high volume of transmitted data and fluctuating internet connectivity has always caused problems during uploading of data to the central database. We have separated the data uploading system from the audit table for making data upload events smooth. A snapshot of the successful data upload status in the central server at 2 stages of the study is shown in Figure 4A, B. Data cleaning has been a continuous challenge throughout the project as missing values and data entry errors are almost unavoidable in an extensive database. The DCC regularly meets with site data teams either physically or online (Webex and Skype) to review the progress of data entry and cleaning. These meetings help to overcome any data entry backlog and correct data entry problems including missing values and inconsistencies. Some key messages learned during the use of our data system at field level are listed in Table 3.
Although the design and implementation of the ANISA data management application across multiple sites was challenging, centralized management of the system has resulted in smooth handling of study data. It has also facilitated sharing of the workload by enabling sites to enter data locally and allowing the DCC to focus on identifying specific data inconsistencies and clarifying ambiguities. Site teams are also not required to develop their individual data capture and management software and are thus relieved from troubleshooting the system. Data entry at all study sites is thus expedited; ANISA has a minimal data entry backlog at any given point in time and is producing a harmonized dataset from all sites with very limited data entry errors. This gives us confidence in the quality and validity of the ANISA data and of the resulting study outcomes.
We acknowledge the contribution of Mr. Abu Mohammad Saleheen from the International Centre for Diarrhoeal Disease Research, Bangladesh in developing the ANISA software. We are indebted to CSL Software Resources Ltd. for helping us in designing the text messages component of the ANISA data management software. We thank the site teams for their valuable feedback on this application to make it functional.
The ANISA Methods Group: Aarti Kumar: Community Empowerment Lab, Lucknow, India; Abdul Momin Kazi: The Aga Khan University, Karachi, Pakistan; Abdullah H. Baqui: Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland; Anita K. Zaidi: The Aga Khan University, Karachi, Pakistan; Anuradha Bose: Christian Medical College, Vellore, India; Arif Billah: Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland; Daniel E. Roth: Department of Paediatrics, Hospital for Sick Children and University of Toronto, Canada; Derrick Crook: John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom; Hamidul Haque: Child Health Research Foundation, Dhaka, Bangladesh; Jonas M. Winchell: Centers for Disease Control and Prevention, Atlanta, Georgia; Maksuda Islam: Child Health Research Foundation, Dhaka, Bangladesh; Mathuram Santosham: Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland; Maureen H. Diaz: Centers for Disease Control and Prevention, Atlanta, Georgia; Nazma Begum: Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland; Nong Shang: Centers for Disease Control and Prevention, Atlanta, Georgia; Pinaki Panigrahi: University of Nebraska Medical Center, Omaha, Nebraska; Sajid B. Soofi: The Aga Khan University, Karachi, Pakistan; Shahida M. Qureshi: The Aga Khan University, Karachi, Pakistan; Shamim A. Qazi: Department of Maternal, Newborn, Child and Adolescent Health, World Health Organization, Geneva, Switzerland; Sheraz Ahmed: The Aga Khan University, Karachi, Pakistan; Stephen P. Luby: Stanford Woods Institute for the Environment, Stanford University, Stanford, California; Vishwajeet Kumar: Community Empowerment Lab, Lucknow, India; Yoonjoung Choi: Centers for Disease Control and Prevention, Atlanta, Georgia; Zulfiqar A. Bhutta: The Aga Khan University, Karachi, Pakistan; and Stephanie J. Schrag, Dphil: Centers for Disease Control and Prevention, Atlanta, Georgia.
1. Onyango AW, Pinol AJ, de Onis M. Managing data for a multicountry longitudinal study: experience from the WHO Multicentre Growth Reference Study. Food Nutr Bull. 2004;25(1 Suppl):S46–S52
2. Biswas K, Carty C, Horney R, et al. Data management and other logistical challenges for the GEMS: the data coordinating center perspective. Clin Infect Dis. 2012;55(Suppl 4):S254–S261
3. Krishnankutty B, Bellary S, Kumar NB, et al. Data management in clinical research: an overview. Indian J Pharmacol. 2012;44:168–172
4. Prokscha S Practical Guide to Clinical Data Management. 20123rd ed Boca Raton, FL CRC Press
5. National Institute of Population Studies (Pakistan). Pakistan Demographic and Health Survey 2006–2007. 2007 Calverton, MD Macro International, Inc
6. International Institute for Population Sciences (India).India Demographic and Health Survey 2005–2006. 2007 Calverton, MD Macro International, Inc
7. USAID, NIPORT. Bangladesh Demography and Health Survey 2007. 2007 Calverton, MD Macro International Inc
8. Connor NE, Hossain T, Rahman QS, et al. Development and implementation of the ANISA labeling and tracking system for biological specimens. Pediatr Infect Dis J. 2016;35(Suppl 1):S29–S34
9. Carter J Database Design and Programming with Access, SQL, Visual Basic, and ASP. 2002 New York, NY McGraw-Hill
10. Peck G Crystal Reports 2008: The Complete Reference. 2008 New York, NY McGraw-Hill
11. Islam MS, Rahman QS, Hossain T, et al. Using text messages for critical real-time data capture in the ANISA study. Pediatr Infect Dis J. 2016;35(Suppl 1):S35–S38
Keywords:Copyright © 2016 Wolters Kluwer Health, Inc. All rights reserved.
data management; population-based; young infants; multicenter; ANISA