Ojwang', James K. MSc*; Lee, Veronica C. MPH†; Waruru, Anthony MPhil*; Ssempijja, Victor MSc*; Ng'ang'a, John G. BS‡; Wakhutu, Brian E. BS‡; Kandege, Nicholas O. BS‡; Koske, Danson K. BS‡; Kamiru, Samuel M. BS‡; Omondi, Kenneth O. BS‡; Kakinyi, Mutua MSc§; Kim, Andrea A. PhD, MPH*; Oluoch, Tom MSc*; for the KAIS Study Group
Survey and healthcare data collection are increasingly relying on mobile electronic devices as information communication and technology (ICT) equipment improve in power, capability, and energy efficiency.1–3 A recent International Telecommunication Union survey showed that prices of ICT hardware are decreasing and long-lasting batteries can now provide longer uninterrupted data collection periods.4
Large-scale surveys using paper-based data collection (PDC) have significant data management challenges and require substantial time for double data entry, cleaning, and analysis. In developing countries, these surveys are often conducted under challenging conditions that make PDC vulnerable to data loss, poor data quality, and other data management inefficiencies.1,2 Considerable effort is often required to resolve data inconsistencies before meaningful analysis can be done and findings disseminated.2
Electronic data capture (EDC) at the point of data entry offers several benefits for large surveys. Previous experiences with using personal digital assistants (PDA) during household surveys have found data completeness and accuracy to be high and the time needed for data cleaning to be minimal.5–7 Although EDC is increasing in many sectors, many public health surveys continue to use PDC for a variety of reasons, including perceived higher costs and concerns over data security in an EDC system. The Kenya Ministry of Health has recognized the importance of EDC and has provided recommendations for using EDC in its Health Sector Strategic Plan for Health Information Systems to provide cost-efficient, timely, reliable, and available health information to inform evidenced-based decisions.8
The second Kenya AIDS Indicator Survey (KAIS 2012) was the first national survey to use EDC to capture data at the household level and transmit them in real time from the field to a central database. In this article, we discuss the feasibility of using EDC in a national household-based survey with regard to development, implementation, and lessons learned.
DEVELOPMENT OF THE KAIS 2012 APPLICATION
KAIS 2012 was a nationally representative, population-based survey of persons aged 18 months to 64 years in Kenya, the methods of which have been described elsewhere.9 The design and development of the KAIS 2012 EDC system began 1 year before survey implementation. We compared netbooks, PDA, and laptops and assessed which hardware would best meet the requirements of the survey. We chose netbooks as the hardware of choice based on their portability, data storage capacity, processing speed, battery life, typing convenience, and cost. The Mirus Schoolmate Convertible Netbook (Mirus Innovations, Mississauga, Ontario, Canada) (Fig. 1) was selected based on cost, long battery life, durability, and its flexibility of use as a tablet or with a keyboard.
KAIS 2012 Software Application
For the data collection platform, a team of 6 data programmers designed a novel application for KAIS 2012 over a period of 38 person-months. The application was programmed in Microsoft Visual studio .NET 2010 (Visual Basic and C#) (Microsoft, Redmond, WA) and used Microsoft SQL 2008 (Microsoft) for the database management system and Microsoft Replication Republishing Architecture (Microsoft) for the construction of the data sharing architecture. The application consisted of 6 data collection modules: the household interview, the adult female interview, the adult male interview, the children's interview, the specimen collection log, and the home-based testing and counseling (HBTC) log. Translations of the household and individual interview questionnaires were programmed into 13 local languages. The application interface was designed to capture different types of response options using radio buttons, check boxes, drop-down menus, and text boxes. The application also used date fields for all questions requiring date responses to ensure proper entry of date, month, and year (Fig. 2).
A key feature of the application was automated eligibility determination for individual interviews, specimen collection, and HBTC to inform the KAIS 2012 field team members if the household member was eligible for the survey and, if eligible, which data collection modules were required (Fig. 3). The program also disallowed noneligible individuals from proceeding with the interview, specimen collection, and HBTC.
We implemented several data quality control measures. We automated all skip patterns and developed user prompts to remind interviewers to check for completeness of responses. Warning messages appeared in red on the screen to inform the interviewer about response inconsistencies. For questions that required numeric responses, we restricted the range of possible responses to ensure valid data entry. We also designed database constraints to prohibit multiple records from being created for 1 survey respondent. This was an important application feature given that multiple sessions, or visits, with the survey participant could be required to complete all components of survey. If a field team member tried to create a duplicate record, a warning message appeared to alert the field team member that a data record already existed for that participant.
We designed data transmission to operate at 2 levels. The first level was intrateam data sharing through a wireless local area network (WLAN). We configured the field team supervisor's netbook to be the network hub, allowing it to receive and send data to the team members' (user) netbooks. This allowed user netbooks to “pull” data from the supervisor's netbook that informed team members which persons in the household were eligible for individual interviews, blood specimen collection, and HBTC. Once the survey module was completed, the user netbook “pushed” the data to the supervisor's netbook. If the WLAN failed, we provided a back-to-back cable to connect the 2 netbooks directly for data transfer.
The second level of data transmission was between the supervisor's netbook in the field and the central data server in Nairobi. We used a cellular modem to transmit data through a secure virtual private network (VPN) with Kenya's largest global systems for mobile communication network service provider. Once field supervisors “pushed” their data into the central server, the data were replicated on the central server, allowing for real-time data monitoring.
Data Back-up and Security
We established a data back-up system to create multiple back-up files at various levels. Paper questionnaires were available in the event of technology failure. At the field level, daily back-ups were automated within the application, saved onto the secure digital cards, and then transferred monthly on to an external hard drive during national supervision visits. The “pulling and pushing” of data between the supervisor and user netbooks also served as a back-up mechanism; all data collected by the team resided on each of the team's 6 netbooks. At the central level, we programmed an automatic incremental back-up within the server and saved daily back-ups to an external hard drive that was stored in a separate and locked location.
For data security, all netbooks and servers were password protected. We established role-based access controls in the field team netbooks, enabling the field supervisor to access all data records, but the field team members to only access records in their individual user accounts. To protect identifying information, we encrypted individual identifiers with binary codes and encrypted all data during transfer to the central data server through the VPN. We installed anti-virus software on each netbook and gave field team members a security cable to prevent theft.
Testing of the KAIS 2012 application and data sharing architecture was a critical component of the application development process. Ongoing development testing occurred throughout the application development period, before survey implementation. For user acceptance testing (UAT), we developed standard operating procedures to implement a strategy for rigorous and repeated external testing of the application. The objectives of the UAT were to determine if the application could collect data that were consistent, coherent, and complete; to identify vulnerabilities or threats to collecting these data; to ascertain risks to data security and confidentiality; to develop solutions to the vulnerabilities or risks identified; and to determine if the data sharing structure could transmit data within the team and to the central data server. The UAT also sought input on the appearance and usability of the application's interface, including the layout of the screen, ease of navigation from one screen to the next, and interpretability of the data entry fields. Feedback during the UAT was reviewed to determine which changes could be implemented to improve the application.
Technical Support Structure
We established a technical support system to assist the field teams at 2 levels. Field team supervisors, who were required to have an intermediate knowledge of computer skills, were the first level of technical support and trained to troubleshoot basic issues, such as password resetting. Issues that could not be resolved by the field supervisors were escalated to the second level of technical support, which was the team's assigned ICT support person who provided assistance by phone or through remote access of the field supervisor's netbook. The ICT support person also provided in-person technical support during the monthly national field supervision visits and on an ad hoc basis according to demonstrated need. The ICT support person was responsible for installing application updates and repairing or replacing damaged or lost equipment. We developed a standard reporting template for field team members to record any issues with netbooks or the software application. These reports were reviewed centrally by the ICT team to determine solutions for issues raised.
Survey Field Team Training
Field staff were recruited based on their education level, previous work experience, and level of computer skill knowledge. We conducted a 3-week training for all KAIS 2012 field team members on the purpose of the survey, their respective content area (interview, specimen collection, or HBTC), and on the use of the KAIS 2012 application. Each survey team comprised 1 field supervisor, 3 interviewers, 2 laboratory technicians, and 1 HBTC service provider. The training focused on learning the technical skills needed to implement the survey according to the team member's specific role. Additional training was provided to teach field staff how to use the KAIS 2012 application for data entry and data sharing. This training included instructions on equipment security and proper equipment handling. Field team supervisors received an additional 1-day training immediately before survey implementation on how to troubleshoot basic technology issues and conduct daily quality control checks. We also provided an end-user manual, which included the standard operating procedures for handling the netbook and using the KAIS 2012 application. Field team members also had the opportunity to practice using the netbooks for data collection in actual household settings during the survey pilot.
IMPLEMENTATION OF THE KAIS 2012 APPLICATION
Between October 2012 and February 2013, 40 field teams with 240 netbooks were deployed across the country to implement KAIS 2012. The total amount of technology-related hardware and VPN costs was $4673 US dollars per team, or approximately $187,000 US dollars for all 40 teams (Table 1). Had PDC been used, we estimated that the cost of printing alone for all data collection tools would have been approximately $185,000 US dollars. A total of 68,202 records for the household and individual interviews, specimen collection logs, and HBTC logs were entered into the netbooks by the 40 field teams.
Acceptability and Usability of EDC
During the survey training and data collection, use of the netbook and the KAIS 2012 application were highly acceptable by field team members, who also reported that they preferred netbooks to PDC. In the feedback to the national supervision teams, field team members reported having some difficulty with navigating the application and data sharing, particularly during the first week of the survey, requiring constant technical support from the ICT teams. Field teams, however, stated that after conducting the first few interviews, they were able to confidently operate the netbook and application, enter data, and transfer data to and from the supervisor netbook. The automation of the skip patterns and checks facilitated more efficient interviews and led to decreased time spent in the household by the survey team. The median time for completing an interview was 38.3 minutes (interquartile range: 25.5 to 51.0) during the first month of the survey and decreased to 20.4 minutes (interquartile range: 13.6–29.5) by the end of the survey (Fig. 4).
One of the broader concerns with the use of EDC in a national survey was how it would be accepted by survey participants. Teams reported that participants and community members viewed them as professional staff since they were using electronic technology for data entry. Some participants also reported they felt more confident that their information would be kept private on the netbooks, compared to paper records that could be lost or mistakenly viewed. Throughout data collection, no survey respondent declined the use of netbooks for EDC.
Our main challenge during field implementation was the loss of database constraints due to a systems malfunction. This temporarily impacted the quality of the data collected in the field due to the creation of duplicate records for participants, which the database constraints were designed to prevent. This issue was identified during the first week of the survey, precluding programmers from manually reconfiguring the application, the preferred approach for a systematic and efficient response to the issue. To address this issue, we developed a “Duplicates Detection” tool in a software update to identify invalid duplicate records in the netbook databases, remove invalid records, and reinstate the database constraints. During the first national supervision visit, the ICT team guided the field team supervisors on installing the application update and re-instating the database constraints. In addition the supervisors were trained on how to use the Duplicates Detection tool to remove the duplicate records. Within the first 4 weeks of data collection, all 40 teams were reached and database constraints were re-instated.
Each survey participant who provided a blood specimen was assigned a specimen identification number in the form of a barcode, which was used to uniquely identify the participant's blood specimen in the specimen collection log and HBTC log. Teams were instructed to use their barcode scanners to ensure that the barcode numbers entered in the specimen collection log and HBTC log were identical for the same individual. However, during the national supervision visits, reports of discrepant barcode numbers for the same individual indicated that teams were not using their barcode scanners but manually entering barcode numbers into the application, further contributing to data entry errors. To address this, the ICT team developed a “Barcode Discrepancy tool” as a software update to examine each survey participant's records in the database and generate a list of which participants had discrepant barcodes between the 2 modules. Field team supervisors were instructed to check the Barcode Discrepancy tool at the end of each day as part of their quality control checks and resolve any data conflicts in consultation with the team. We also reinforced the use of barcode scanners during the national supervision visits.
Intrateam data sharing operated well during the survey; no teams reported having trouble with WLAN connectivity. However, data transmission from the field to the central server through the VPN encountered several challenges with cellular connectivity. Thirty-one of the 40 field teams experienced difficulty in connecting to the cellular network at least once during the survey. Low to no network coverage in remote locations and heavy volume of traffic on the cellular network in urban locations contributed to suboptimal connectivity, affecting approximately one-quarter to one-third of the teams at any given point during the 3 month data collection period. To prevent data loss from poor connectivity, we designed data transmission to resume from the point that the connection was disrupted, thereby preventing teams from having to restart the data transfer process while protecting the team's data that had already reached the central server.
However, due to connectivity problems, real-time data monitoring across the 40 teams was challenged. Additionally, untransmitted data were vulnerable to loss because the data and back-up data remained in the field. To address these challenges, some teams traveled to a location where there was cell phone network coverage to transmit their data, while other teams required an ICT team member to visit the team to collect the data and manually transmit them to the server.
Data Back-up and Security
Two of 40 field supervisor netbooks experienced technical problems, which prohibited them from sharing data with team members and with the central server. To resolve this, these netbooks were reconfigured with back-up databases. None of the previously collected data on those netbooks were lost during the reconfiguration process.
We experienced the loss of 1 netbook during the data collection period. However, participant privacy was not compromised as the netbook was password protected, and the data in the database and on the secure digital card were encrypted. After the survey was completed, we reformatted all netbooks to remove all data collected during KAIS 2012 and checked all netbooks to confirm that the KAIS 2012 software application and database had been removed.
The data management process was divided into 2 phases: (1) cleaning the raw databases to address data conflicts, completeness, and internal consistency and (2) merging the 16 raw SQL database tables to produce the final 10 databases for analysis. The first phase focused on removal of duplicate records (n = 44) by reviewing summary details for each record and resolving discrepant barcode numbers for a single specimen (n = 154). The majority of these discrepancies were due to data entry error.
In the second phase, we merged the 16 raw datasets to develop the final analysis databases. We merged data tables using a unique ID to create complete interview records for each survey participant. We then used this unique ID to link a participant's interview record with the corresponding specimen collection and HBTC records for that participant. We also linked data records across survey participants, such as for a parent and child or for sex partners. Once the final analysis databases were merged, data cleaning was minimized through the data quality control measures that were in place, which controlled the skip patterns and limited missing information, outliers, and nonsensical responses.
Preparation of the final KAIS 2012 dataset for analysis required a team of 4 people approximately 2.5 months to complete. If PDC had been used, we estimate that the time for data entry and data cleaning would have been double this time, at an estimated cost $56,000 US dollars for a team of 32 persons to complete data processing.
KAIS 2012 demonstrates the feasibility of using an EDC system in a large national household-based survey. Despite the technological challenges encountered during survey implementation, the benefits of EDC were apparent in the quality of the data and the financial savings when considering the printing and human resources costs of PDC. Given the minimal data management requirements, data were available for analysis within 2.5 months after the completion of data collection.
We learned several important lessons during the design and implementation of this EDC system, and these factors should be considered when using technology for data collection. A key consideration is the time and monetary investment required for designing and implementing such a system. This determination should be made at the outset of survey planning to ensure that adequate human and financial resources can be dedicated to the project and that a realistic timeline that accounts for the available resources be developed. Limited human resources was one of the challenges encountered during the development phase of the KAIS 2012 EDC system; additional resources were needed to execute the numerous technology-related tasks.
Careful and continuous user testing during the development phase is essential for any EDC system. Adequate time is needed to ensure that the preprogrammed skip patterns, logic checks, and data value and range restrictions are functioning as they should. Any change to the data collection tools during actual data collection requires reprogramming, which may disrupt the configuration of these automated checks or other elements of the application. The KAIS 2012 application underwent continuous user testing up until survey implementation, but technical issues, namely the loss of database constraints, still occurred. Whenever changes are made, repeated testing of the whole system by multiple users is critical for confirming the application and the preprogrammed checks are operating correctly. User testing is also important for verifying that the data entered into the application are stored and correctly coded in the back-end databases. Although user testing requires considerable time investment during the survey planning phase, ensuring these controls are in place will reduce the number of data quality checks needed during data collection and the amount of time needed for data cleaning while improving the quality of the data collected.
Poor cellular network access presents several challenges to the functionality of an EDC system. The connectivity issues between the field and the central server due to limited network coverage prohibited comprehensive real-time data monitoring across the 40 survey teams. A unique advantage that an electronic system has over PDC is real-time data. Therefore, how network issues affect gathering these data must be considered. In situations of no or limited network coverage, data back-up mechanisms at the field level become especially important to ensure that the data are not lost should an adverse event occur.
EDC systems offer real opportunities for delivering quality data in a shorter period compared with paper-based platforms. KAIS 2012 demonstrated that EDC is feasible for conducting household-based studies in Kenya and that netbooks are acceptable data collection tools among users and survey respondents. As technology continues to improve and cellular network coverage expands, EDC systems should be considered for future national studies.
We thank the survey teams for their work during KAIS data collection and all persons who participated in this national survey. The authors would like to thank Angela Broad, Silas Mulwa, Serenita Lewis, Yakubu Owolabi, Timothy Kellogg, and Mike Grasso for their input on the development of the EDC system; Nicky Okeyo and Daniel Kwaro for their technical support; Wolfgang Hladik, Mike Grasso and George Rutherford for reviewing and providing input on the manuscript; and the KAIS Study Group for their contribution to the design of the survey and collection of the data set: Willis Akhwale, Sehin Birhanu, John Bore, Angela Broad, Robert Buluma, Thomas Gachuki, Jennifer Galbraith, Anthony Gichangi, Beth Gikonyo, Margaret Gitau, Joshua Gitonga, Mike Grasso, Malayah Harper, Andrew Imbwaga, Muthoni Junghae, Mutua Kakinyi, Samuel Mwangi Kamiru, Nicholas Owenje Kandege, Lucy Kanyara, Yasuyo Kawamura, Timothy Kellogg, George Kichamu, Andrea Kim, Lucy Kimondo, Davies Kimanga, Elija Kinyanjui, Stephen Kipkerich, Danson Kimutai Koske, Boniface O. K'Oyugi, Veronica Lee, Serenita Lewis, William Maina, Ernest Makokha, Agneta Mbithi, Joy Mirjahangir, Ibrahim Mohamed, Rex Mpazanje, Nicolas Muraguri, Patrick Murithi, Lilly Muthoni, James Muttunga, Jane Mwangi, Mary Mwangi, Sophie Mwanyumba, Francis Ndichu, Anne Ng'ang'a, James Ng'ang'a, John Gitahi Ng'ang'a, Lucy Ng'ang'a, Carol Ngare, Bernadette Ng'eno, Inviolata Njeri, David Njogu, Bernard Obasi, Macdonald Obudho, Edwin Ochieng, Linus Odawo, Jacob Odhiambo, Caleb Ogada, Samuel Ogola, David Ojakaa, James Kwach Ojwang, George Okumu, Patricia Oluoch, Tom Oluoch, Kenneth Ochieng Omondi, Osborn Otieno, Yakubu Owolabi, Bharat Parekh, George Rutherford, Sandra Schwarcz, Shanaaz Sharrif, Victor Ssempijja, Lydia Tabuke, Yuko Takanaka, Mamo Umuro, Brian Eugene Wakhutu, Celia Wandera, John Wanyungu, Wanjiru Waruiru, Anthony Waruru, Paul Waweru, Larry Westerman, and Kelly Winter.