Secondary Logo

Share this article on:

Centralized Data Management in a Multicountry, Multisite Population-based Study

Rahman, Qazi Sadeq-ur MSc; Islam, Mohammad Shahidul MSc; Hossain, Belal MSc; Hossain, Tanvir MSc; Connor, Nicholas E. MSc; Jaman, Md. Jahiduj MA; Rahman, Md. Mahmudur MSc; Ahmed, A. S. M. Nawshad Uddin FCPS; Ahmed, Imran MA; Ali, Murtaza MBBS; Moin, Syed Mamun Ibne BSc; Mullany, Luke PhD; Saha, Samir K. PhD; Arifeen, Shams El DrPHfor the ANISA Methods Group

The Pediatric Infectious Disease Journal: May 2016 - Volume 35 - Issue 5 - p S23–S28
doi: 10.1097/INF.0000000000001102
ANISA Supplement

Background: A centralized data management system was developed for data collection and processing for the Aetiology of Neonatal Infection in South Asia (ANISA) study. ANISA is a longitudinal cohort study involving neonatal infection surveillance and etiology detection in multiple sites in South Asia. The primary goal of designing such a system was to collect and store data from different sites in a standardized way to pool the data for analysis.

Methods: We designed the data management system centrally and implemented it to enable data entry at individual sites. This system uses validation rules and audit that reduce errors. The study sites employ a dual data entry method to minimize keystroke errors. They upload collected data weekly to a central server via internet to create a pooled central database. Any inconsistent data identified in the central database are flagged and corrected after discussion with the relevant site. The ANISA Data Coordination Centre in Dhaka provides technical support for operations, maintenance and updating the data management system centrally. Password-protected login identifications and audit trails are maintained for the management system to ensure the integrity and safety of stored data.

Conclusion: Centralized management of the ANISA database helps to use common data capture forms (DCFs), adapted to site-specific contextual requirements. DCFs and data entry interfaces allow on-site data entry. This reduces the workload as DCFs do not need to be shipped to a single location for entry. It also improves data quality as all collected data from ANISA goes through the same quality check and cleaning process.

From the *Centre for Child and Adolescent Health, International Centre for Diarrhoeal Disease Research, Bangladesh, Dhaka, Bangladesh; Child Health Research Foundation, Dhaka, Bangladesh; International Center for Maternal and Newborn Health, Department of International Health, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland; §Aga Khan University, Karachi, Pakistan; and Centers for Disease Control and Prevention, Atlanta, Georgia.

Accepted for publication January 10, 2016.

The members of the ANISA Methods Group are listed in the Acknowledgments.

The ANISA study is funded by the Bill & Melinda Gates Foundation (Grant No. OPPGH5307). The authors have no other funding or conflicts of interest to disclose.

Address for correspondence: Shams El Arifeen, DrPH, Director, Centre for Child and Adolescent Health, International Centre for Diarrhoeal Disease Research, Bangladesh, Mohakhali, Dhaka 1212, Bangladesh. E-mail: shams@icddrb.org.

This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial License 4.0 (CCBY-NC), where it is permissible to download, share, remix, transform, and buildup the work provided it is properly cited. The work cannot be used commercially.

Collection, management and transmission of reliable and scientifically sound data are critical to the success of any research study. A centralized data management system can improve the data quality and performance of multicenter studies.1–3 Aetiology of Neonatal Infection in South Asia (ANISA) is a longitudinal cohort study being carried out in 5 population-based sites in Bangladesh, India and Pakistan. We developed and implemented a management system centrally for capture, entry, storage and distribution of the ANISA data. In this study, approximately 300,000 married women of reproductive age (15–49 years) are under surveillance across the 5 study sites for a 2-year period. We expect to register 66,000 live births in this population and follow them up to the age of 59 days. The data system is designed to collect, store and transmit information related to sociodemographics, pregnancy, antenatal and essential newborn care for the enrolled pregnant women and their newborns; and home-based follow-up information for the young infants. The young infant follow-up includes clinical assessment for possible serious bacterial infections by community health workers (CHWs) and care-seeking history, physician assessments for possible serious bacterial infections and the laboratory testing of clinical specimens. This article describes the design and development of the ANISA data management system as implemented across the participating entities.

Back to Top | Article Outline

ANISA DATA MANAGEMENT SYSTEM DESIGN REQUIREMENTS

The ANISA Data Coordination Centre (DCC) ensures that all data for ANISA are recorded, processed, securely stored and transmitted using standardized methods at every stage. It helps coordinate and monitor data management activities, and provides support to the study sites when necessary. In consultation with the ANISA Study Coordination Team, the DCC established requirements and principles for the ANISA data management system (Table 1).

TABLE 1

TABLE 1

Back to Top | Article Outline

ANISA DATA MANAGEMENT SYSTEM

ANISA requires data collection from communities, hospitals and laboratories. This situation makes the management of data very complex, as each set requires proper linkage at individual level. It also necessitates having appropriate data security and audit systems in place.4 Therefore, DCC has not only designed and developed a system to meet these needs but also works closely with the site data teams to ensure that the system functions optimally at all locations. The design and development process of the ANISA data management system involves 5 key activities:

Back to Top | Article Outline

Data Capture Forms

Data capture forms (DCFs) were designed in a manner that allows efficient collection and processing of data at all study sites.4 The ANISA Study Coordination Team and DCC have been responsible for developing the study procedures and DCFs. Following the research protocol, we initially mapped out procedures for enrollment and follow-up of pregnant women and newborns. A literature review was conducted to determine maternal and neonatal risk factors that could be measured practically in the context of all ANISA study sites. The Coordination team reviewed the latest available Demographic and Health Survey instruments from Bangladesh, India and Pakistan to ensure that similar socio-demographic variables were measured in a comparable way using tried and tested questions.5–7 The drafts of DCFs were shared with the sites for review and testing at field level. Each site data team re-examined the forms, tested them in the field and shared their feedback. The number of feedback reports we received on each version of the DCFs is shown in Figure 1. Once the generic DCFs were finalized, site-specific DCFs were developed. These site-specific forms differ from each other only in the mother and newborn identification sections and site-specific responses to some of the questions, for example, caste, types of facilities and providers. DCFs were translated into local languages by the site teams. DCC checked the translations of the forms independently to ensure that the intended meanings of questions and responses were preserved in these translations. Detailed instruction manuals for each form were also provided to the sites. A summary of the ANISA DCFs is given in Table 2.

TABLE 2

TABLE 2

FIGURE 1

FIGURE 1

Back to Top | Article Outline

Assigning Identification Numbers to Study Participants

We follow a unique numbering system to identify the study participants (both the pregnant women and their newborns) and respective clinical specimens.8 A woman receives a unique 7-digit identification number (ID) for each pregnancy registered. The first digit of an ID refers to the study site where the pregnancy is registered with the last digit always zero (0), while the 5 digits in-between are study participant identifiers that are generated sequentially by the data server of respective sites. Each live birth receives a 7-digit ID. The first 6 digits are the same as for the corresponding mother, while the seventh digit refers to the child’s serial birth outcome from the same pregnancy (1 for first birth, 2 for second birth, etc.). The list of pregnancy IDs is generated by the ANISA data management software at each site, and CHWs allocate those to study participants. Each CHW can only use the IDs from her own list.

Back to Top | Article Outline

Data Management Application

The ANISA data management application was designed and developed using Microsoft SQL Server 2008 R2 for data storage, Dot net (.Net) with Code Behind C Sharp (C#) and Visual Basic 6.0 for interfaces and Crystal Report 8.0 for reporting.9,10 Some site-specific modifications have been made for capturing site-specific information (eg, the local identification part). Every site uses this application for entry and storage of its own data. The 5 primary components of the ANISA data management application are detailed in the following sections.

Back to Top | Article Outline

User Interfaces

The primary graphical user interface for data entry is traditional keyboard input. The graphical user interface is designed for data entry to match the DCFs with an emphasis on ease and speed. Simultaneous data entry is possible through multiple client computers connected to a local server. Validation rules are set to prevent inconsistency and other errors during data entry. These validation rules include logical checks, range checks, uniqueness check of IDs and skipping rules for avoiding entry of undesired information. The data entry system has 2 parts linked by the study ID, the field data entry system and the laboratory data entry system. The interface for laboratory data entry also includes a bar-code reader for logging specimen repository and entry of specimen testing results. This interface also has the facility of importing molecular test results from the molecular Taqman array machine (Life Technologies, Foster City, CA). Additional user interfaces include report generation, ID creation, data editing and text message processing systems.

Back to Top | Article Outline

Databases

A relational management system is used for the databases, and all data tables are linked through primary and foreign keys. The primary key of a relational table uniquely identifies each record in the table. The foreign key in a relational table identifies the primary key of another table. The foreign keys are used to uniquely identify the relationship between 2 tables. The detailed architecture of the database system is shown in Figure 2.

FIGURE 2

FIGURE 2

Back to Top | Article Outline

Data Storage and Uploading System

Each study site maintains its ANISA database in a local server and extracts its own data for error checking, reporting and study monitoring. Sites upload their data weekly to the central server located at the Child Health Research Foundation, Dhaka, Bangladesh. ANISA uses ADO.NET’s SqlBulk Copy program for quick uploading of the data via the internet. If data upload to the central server is not successful or is not carried out as scheduled, an automatic notification email is sent to the responsible persons at the site and the DCC.

Back to Top | Article Outline

Specimen Tracking and Biorepository System

Both the newborn and the specimen IDs are recorded in the specimens collection form and physician assessment form so that both IDs can be linked in the databases. A virtual specimen biorepository is created in every site database, mirroring the actual location of a specimen in the freezer. The specimen tracking and biorepository system is described in detail elsewhere in this supplement.8

Back to Top | Article Outline

Text Message System

ANISA uses mobile phone-based text messages as an easily accessible method of information input and output for the data management application. This permits entering some critical data in the site databases in real time. The use of text messaging in ANISA is described separately in this supplement.11

Back to Top | Article Outline

Application Testing

During the development of the ANISA data management application, software testing personnel were assigned to work with the programmers. After development of the data entry interface for each DCF, that interface was tested with different ranges/values to check whether the application could flag inconsistent data, follow skip rules and capture the values within ranges. The beta version of the data management application was also tested by 3 of the site teams with real study data.

Back to Top | Article Outline

Quality Control

The ANISA DCC aims to ensure generation of high quality and reliable data from all sites through the following set of activities, processes and procedures.

Back to Top | Article Outline

Training on the ANISA Data Management System

Before the pilot phase of the project, we conducted a 3-day workshop on the ANISA data management system. The lead of the data team and a data supervisor from each site attended the workshop, where the following issues were covered:

  • Use of the instruction manual for the ANISA data management system and code book of the data files;
  • Configuring the MS SQL server and restoring the ANISA data management system in the site servers;
  • Installation and use of ANISA data management application;
  • Guidelines for data entry and comparing the first and second entry of data;
  • Guidelines for data editing, cleaning and uploading;
  • Configuring the database server to run the text message software.
Back to Top | Article Outline

Data Cleaning

Each site uses its own monitoring procedure to check data consistency and validity. Field supervisors review each DCF filled out by a CHW before entry into the system. A clinical supervisor checks each DCF filled out by the study physicians. In addition, using STATA 12.0 the DCC has developed a program to identify inconsistent values within and between tables and variables. This program generates a list of study IDs with inconsistent entries, which is shared with the sites for corrective action.

Back to Top | Article Outline

Double Data Entry

Keystroke errors during data entry are common and can affect quality. The ANISA data management software allows study sites to enter the same data twice. ANISA’s built-in comparison program checks the entries in both sets of data and identifies the difference between the 2 entries. Data supervisors at the study sites check the mismatches and correct the values after reviewing the DCFs. A web page (http://chu.icddrb.org/anisa) that automatically displays site-specific data entry progress has been designed to monitor data entry at each site.

Back to Top | Article Outline

Data Security and Audit Trail

The data at site servers are secured by user-specific passwords that require separate permissions for data entry and editing. An audit trail system was added to the data management application for keeping track of every update made in the databases after data entry and identifying the persons who made the changes with time and date.

Back to Top | Article Outline

Data Backup

Each site is responsible for making a backup of their site-specific databases at periodic intervals. At the Coordination Centre, we create a backup of the entire database on an optical disc once a month and store it in a location separate from that of the server. In addition, we maintain a backup copy of the entire database on a different server in a different physical location.

Back to Top | Article Outline

CHALLENGES AND KEY LESSONS

All the ANISA study sites have experience in data collection with printed DCFs. A key challenge in using paper-based DCFs is that any change to a form creates difficulties in data capture, entry and analysis. Although the site teams evaluated the DCFs during their creation and the piloting of the study, the DCC still make changes to the DCFs as necessary. These changes are reflected simultaneously in the DCFs across all sites. The data management application is updated accordingly to facilitate smooth data entry with the modified DCFs. We tried to create a data entry interface that allows speedy data entry and minimizes backlog. Most of the DCF data is entered into the database within 1 month of collection (Fig. 3), which helps to monitor study progress in real time. We are using MS SQL Server for data storage that allows convenient export of the data to other formats (eg, MS Excel, STATA, SAS, etc.) for analysis.

FIGURE 3

FIGURE 3

Programmers from the DCC resolve issues with the data management software as they arise. They also use TeamViewer software to access the site servers remotely for troubleshooting. The combination of the high volume of transmitted data and fluctuating internet connectivity has always caused problems during uploading of data to the central database. We have separated the data uploading system from the audit table for making data upload events smooth. A snapshot of the successful data upload status in the central server at 2 stages of the study is shown in Figure 4A, B. Data cleaning has been a continuous challenge throughout the project as missing values and data entry errors are almost unavoidable in an extensive database. The DCC regularly meets with site data teams either physically or online (Webex and Skype) to review the progress of data entry and cleaning. These meetings help to overcome any data entry backlog and correct data entry problems including missing values and inconsistencies. Some key messages learned during the use of our data system at field level are listed in Table 3.

TABLE 3

TABLE 3

FIGURE 4

FIGURE 4

Back to Top | Article Outline

SUMMARY

Although the design and implementation of the ANISA data management application across multiple sites was challenging, centralized management of the system has resulted in smooth handling of study data. It has also facilitated sharing of the workload by enabling sites to enter data locally and allowing the DCC to focus on identifying specific data inconsistencies and clarifying ambiguities. Site teams are also not required to develop their individual data capture and management software and are thus relieved from troubleshooting the system. Data entry at all study sites is thus expedited; ANISA has a minimal data entry backlog at any given point in time and is producing a harmonized dataset from all sites with very limited data entry errors. This gives us confidence in the quality and validity of the ANISA data and of the resulting study outcomes.

Back to Top | Article Outline

ACKNOWLEDGMENTS

We acknowledge the contribution of Mr. Abu Mohammad Saleheen from the International Centre for Diarrhoeal Disease Research, Bangladesh in developing the ANISA software. We are indebted to CSL Software Resources Ltd. for helping us in designing the text messages component of the ANISA data management software. We thank the site teams for their valuable feedback on this application to make it functional.

The ANISA Methods Group: Aarti Kumar: Community Empowerment Lab, Lucknow, India; Abdul Momin Kazi: The Aga Khan University, Karachi, Pakistan; Abdullah H. Baqui: Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland; Anita K. Zaidi: The Aga Khan University, Karachi, Pakistan; Anuradha Bose: Christian Medical College, Vellore, India; Arif Billah: Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland; Daniel E. Roth: Department of Paediatrics, Hospital for Sick Children and University of Toronto, Canada; Derrick Crook: John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom; Hamidul Haque: Child Health Research Foundation, Dhaka, Bangladesh; Jonas M. Winchell: Centers for Disease Control and Prevention, Atlanta, Georgia; Maksuda Islam: Child Health Research Foundation, Dhaka, Bangladesh; Mathuram Santosham: Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland; Maureen H. Diaz: Centers for Disease Control and Prevention, Atlanta, Georgia; Nazma Begum: Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland; Nong Shang: Centers for Disease Control and Prevention, Atlanta, Georgia; Pinaki Panigrahi: University of Nebraska Medical Center, Omaha, Nebraska; Sajid B. Soofi: The Aga Khan University, Karachi, Pakistan; Shahida M. Qureshi: The Aga Khan University, Karachi, Pakistan; Shamim A. Qazi: Department of Maternal, Newborn, Child and Adolescent Health, World Health Organization, Geneva, Switzerland; Sheraz Ahmed: The Aga Khan University, Karachi, Pakistan; Stephen P. Luby: Stanford Woods Institute for the Environment, Stanford University, Stanford, California; Vishwajeet Kumar: Community Empowerment Lab, Lucknow, India; Yoonjoung Choi: Centers for Disease Control and Prevention, Atlanta, Georgia; Zulfiqar A. Bhutta: The Aga Khan University, Karachi, Pakistan; and Stephanie J. Schrag, Dphil: Centers for Disease Control and Prevention, Atlanta, Georgia.

Back to Top | Article Outline

REFERENCES

1. Onyango AW, Pinol AJ, de Onis M. Managing data for a multicountry longitudinal study: experience from the WHO Multicentre Growth Reference Study. Food Nutr Bull. 2004;25(1 Suppl):S46–S52
2. Biswas K, Carty C, Horney R, et al. Data management and other logistical challenges for the GEMS: the data coordinating center perspective. Clin Infect Dis. 2012;55(Suppl 4):S254–S261
3. Krishnankutty B, Bellary S, Kumar NB, et al. Data management in clinical research: an overview. Indian J Pharmacol. 2012;44:168–172
4. Prokscha S Practical Guide to Clinical Data Management. 20123rd ed Boca Raton, FL CRC Press
5. National Institute of Population Studies (Pakistan). Pakistan Demographic and Health Survey 2006–2007. 2007 Calverton, MD Macro International, Inc
6. International Institute for Population Sciences (India).India Demographic and Health Survey 2005–2006. 2007 Calverton, MD Macro International, Inc
7. USAID, NIPORT. Bangladesh Demography and Health Survey 2007. 2007 Calverton, MD Macro International Inc
8. Connor NE, Hossain T, Rahman QS, et al. Development and implementation of the ANISA labeling and tracking system for biological specimens. Pediatr Infect Dis J. 2016;35(Suppl 1):S29–S34
9. Carter J Database Design and Programming with Access, SQL, Visual Basic, and ASP. 2002 New York, NY McGraw-Hill
10. Peck G Crystal Reports 2008: The Complete Reference. 2008 New York, NY McGraw-Hill
11. Islam MS, Rahman QS, Hossain T, et al. Using text messages for critical real-time data capture in the ANISA study. Pediatr Infect Dis J. 2016;35(Suppl 1):S35–S38
Keywords:

data management; population-based; young infants; multicenter; ANISA

Copyright © 2016 Wolters Kluwer Health, Inc. All rights reserved.