Whalen, Christopher J. MA*; Donnell, Deborah PhD†; Tartakovsky, Michael MS*
Conducting, supporting, and sponsoring basic and clinical research in settings with poor infrastructure is a serious challenge for those working to advance the use of technologies for data collection or capture and communication. Improving support for these technologies is critical for research clinics and laboratories in high-income and resource-limited settings (RLS).
Sites with reliable infrastructure can increasingly rely on sustainable availability of Internet connectivity for communication and data exchange to support research activities. However, many research sites in RLS still struggle with unreliable Internet access. Building systems that can overcome this is challenging for the information technology professionals who support the investigators and their protocols. A contributing factor is the lack of skilled information technology (IT) support personnel on-site to assist researchers in overcoming the obstacles in reliable Internet connectivity, to support advanced IT technologies, and maintain the basic infrastructure needed to implement them.1 We discuss the challenges in providing and maintaining reliable IT infrastructure at these sites and present both the opportunities and some ethical considerations in implementing these new technologies for data management and communication.
CHALLENGES FOR INFORMATION TECHNOLOGY INFRASTRUCTURE
Reliable Electrical Power
Modern computing and scientific systems require continuous conditioned power to maintain the equipment and avoid loss of data.2,3 In RLS, one of the most difficult challenges in supporting IT infrastructure for field clinics and laboratories, and in cities, has been the lack of reliable and high-quality utility power.
There are strategies for delivering conditioned power regardless of location; examples include proper sizing of the power system, automatic voltage switches, and uninterruptible power supplies (UPS). Correct sizing of power systems and circuits for either a single device (such as a flow cytometer or server) or whole facilities is critical. A spike in power demand when a device starts, called inrush or startup load, is common for many devices, such as refrigeration systems and laser printers. Inrush demand requires sizing a power delivery system, such as UPS or custom-built inverter/rectifiers/batteries, to accommodate the demand; failing to do so reduces the operational life of the equipment or failure of systems using, conditioning, and sharing that circuit. Use of tools like automatic voltage switches that cut power to the supplied systems (including UPS and inverters) when there is a fluctuation from the utility or generator can significantly improve the operational life. Of the 3 modern types of UPS (“on-line,” “off-line” and “line interactive”), the “on-line” types provide the best continuous conditioning of power in areas with unstable power.4
Options for power generation at remote sites or even at core facilities now encompass solar, wind, and traditional generators (sizing and need for battery capacity during dark or windless periods are critical during initial evaluation). Required capacity, availability of local maintenance, and the expected run time are the basic evaluation criteria for generators. Most commonly, sites install standby generators, designed to operate for a few hours at a time to bridge infrequent and short interruptions. Installing a standby generator at a location that suffers long and/or repeated interruptions in power will result in frequent breakdowns and high maintenance. Continuous or primary generators, designed to deliver power for either constant or variable loads over long periods, are a more viable option.5
Adherence to electrical codes is a critical aspect of supporting modern information and communication technologies (and laboratory equipment) in remote locations. Failure to ground electrical supply grids, network wiring, and antennas leaves the equipment unprotected from power surges, yet ensuring correct grounding in locations with poor compliance to electrical codes can be difficult. Proper lightning protection is critical in locations where there is uncertainty about the utility or facility grounding because lightning surges can both damage sensitive electronic components and interrupt Internet connectivity by causing interference on network cabling.6
It is generally preferable to use locally supplied equipment because of ready access to local maintenance in the event of failure. However, this can mean committing to using equipment and devices that are only available in specific regions with uncertainty about quality and compatibility. For example, selecting touch screen devices for direct participant data entry in a trial in Brazil, Thailand, and South Africa involved using 3 different devices marketed and sold in local markets and entailed thorough testing of the interaction of each device with server/client-based data entry software, and resolving device driver and software incompatibility. The alternate approach, selecting standardizing devices and equipment, may have avoided some software complexities but inherently would have required long-distance maintenance contracts in 3 continents and incurred the hurdle of importing equipment and ensuring compatibility with local electrical, Internet, and cellular systems.
Internet Connections and Service Providers
Rapid changes in the Internet access market throughout the world, including RLS in recent years, have dramatically expanded the available choices of Internet service providers. Most metropolitan areas have broadband Internet access, although uncertain Internet service provider technical competence and oversubscription can still cause uneven performance. Even in remote locations, there are now options for data and Internet from mobile phone providers that provide reliable, albeit expensive, Internet access. In addition, there are new satellite technologies available that face competition from terrestrial mobile networks and optical fiber providers, forcing improved performance and pricing.
Management of the local area networks is still the primary determinant of performance of an Internet connection. Fortunately, new products on the market are providing much simpler means of managing Internet connections. Cloud managed routers and content filtering provide for easy network monitoring and the ability to prioritize the traffic for applications and Web sites used directly for research.7 The most common causes of network congestion is the use of multimedia, social networking, and virus-infected computers that participate in spamming or ‘bot network attacks on other internet sites. The 2 most effective ways to preserve the integrity of a network are restricting administrative access (so the users cannot install software or make changes to the system) and disabling autoplay (so that USB flash drives and CDs do not automatically run programs upon insertion). For extensive guidance on securing computers, the US National Institute of Standards and Technology maintains extensive checklists as part of the United States Government Configuration Baseline.8
Implementing and maintaining a server infrastructure in remote locations is both expensive and complicated. Unmanaged servers endanger the data, bandwidth, and the other devices on the network. Malware protection is essential for servers, desktops, laptops, and mobile devices: systems with inadequate antivirus protection and lack of connectivity to the Internet to download virus signatures for detection or system patches and updates present risks to the research data and to collaborating research partners.9 Furthermore, systems running pirated operating systems result in unpatched systems that become virus reservoirs.
All clinical data guidelines require the backup of data for emergency recovery. Annex B to ISO 15189 Particular requirements for quality and competence specifies the quality management system requirements particular to medical laboratories states, “Efficient back-up should be in place to prevent loss of patient result data in case of hardware or software failure.10” Unfortunately, tape is still the best solution for backups. Although cloud systems are appealing because of seemingly ubiquitous access, restoring from cloud systems can be lengthy and difficult. Cloud-based systems can contend seriously for bandwidth in parts of the world where that is still a constraint. Disk-to-disk systems have problems with portability; it is difficult to send disk backup systems off-site to retain for disaster recovery. Thus, simple tape solutions remain the best option for data backup. Typically, the simplest default backup configurations are the best, and avoid vendor-specific hardware compression.
Mobile Technologies and Electronic Data Capture
Mobile tools and electronic data capture (EDC) technology could be a method for conducting clinical field research that does not rely heavily on local IT infrastructure and its attendant challenges. An attraction of direct EDC is eliminating the need for dual data entry, ultimately leading to reduced cost and time conducting clinical research in any environment, not only in RLS. International regulatory bodies are still developing guidelines for directly collecting data electronically for clinical protocols. Monitoring and oversight of studies that use these technologies is also still developing. Although there are solutions available now that offer the ability to collect research data using tablets, laptops, and smart phones, not all of them are compliant with the appropriate regulatory framework.11
Electronic Data Capture
True EDC systems may virtually eliminate the need for paper source documents and provide real-time access to the data. In these protocols, capturing data directly to a remote database offers many advantages for field operations, researchers, and clinical staff in the community. However, a serious concern for local investigators and institutions may be the physical location of the “source” data. Traditional data capture, with paper case report forms (CRFs) or locally installed servers, provides an in-country/at-site “master” copy of the source data, even when electronic copies of the data in multisite studies are stored centrally for primary analysis.12 These are serious concerns with broad scope that require the attention of the international community. Two proposed methods for solving some of these problems are presented below. For protocols that capture the data to a remote server, reliable access to the Internet becomes critical, and good local IT infrastructure is even more important.
Recent advances in mobile technology and application development are facilitating data collection in remote locations with the ability to switch between “on-line” and “off-line” modes and reliable management of seamless data synchronization using a secure and compliant remote server (public or private “cloud”). Compliant data management is a function of the application selected to collect data on the mobile devices, and applications for mobile EDC need to reliably synchronize data collected offline, provide full audit trails to maintain regulatory compliance, and manage form version control.
Device requirements for mobile data collection in RLS are quite daunting. Devices need to be rugged, have little attraction for theft or repurposing, be convenient for mobile use, have long battery life, and/or convenient recharging options. Laptops, tablets, and smartphones offer a range of screen sizes, data entry modalities, and portability to match study use requirements. If used outdoors, high-contrast screens are important. To minimize loss of data, capability for synchronization via cellular data networks is ideal. In studies collecting biological specimens, built-in barcodes are invaluable to enable direct linking of specimens to participant records at the time of data collection. Final selection of the devices seems to offer 2 approaches: (1) select an expensive, durable device that is hardened for use in challenging environments and will suffer fewer hardware failures or breakage or (2) select a low-cost device that can be inexpensively replaced if the device is lost or damaged. Ensuring compatibility between device operating systems and software has proved challenging: in our experience, slight mismatches between operating system versions, EDC mobile software, and local service providers can render the devices mysteriously inoperable, a problem that is complicated by the addition of external devices, such as barcode readers, global positioning system (GPS), or fingerprint readers.
Furthermore, integration with a mobile device management solution is a critical part of any mobile EDC system or device. An mobile device management provides the data team with the means to control access to the device, maintain a user database for authentication to support audit trails, controlling use, updating applications, patching, and the ability to disable the device remotely if stolen or lost.
Protecting the devices from virus and malware requires planning a management system that will provide user authentication for the devices, and blocking the data collectors or other users from installing unapproved applications that can have spyware, malware, or adware included or consume costly bandwidth and storage.13 Furthermore, mobile devices can become malware vectors (transporting viruses to tablets, phones, laptops, and computers), leading to compromised patient confidentiality or disabling the devices and delaying data collection.
An important tool available on mobile devices is the GPS that can insert the location of the data collection directly into the electronic form. Mobile devices acquire their location using the satellite GPS (uses more battery power), their location relative to the local mobile towers (least accurate), or a hybrid of the above 2.14 A note of caution, storing location of a participant's residence or place of work within the electronic database, requires implementation of appropriate protections to avoid compromising confidentiality.
Managing Patient Identifiable Data and Source Data in Mobile Data Capture
Most investigators, data managers, and monitors are familiar with using the clinic record, patient identifier, locator information, and CRF as source documentation (Figure 1). However, in the mobile EDC environment, the first place where data are recorded is the electronic record in the clinical data management system (CDMS), which thus becomes the source document.15,16 This has 2 immediate challenges: (1) source documents often have patient identifiable data (PID) and (2) there are specific regulatory guidelines for handling of source documents/data.17 We describe 2 different approaches to this challenge. One solution, used in HPTN 071, is a single CDMS hosted at the software vendor with internal access controls that separate the source PID (with the identifiable information link to the patient key) from the research or CRF data. This internal barrier provides 2 distinct environments for the study data: one, with PID, accessible by those overseeing the fieldwork at the site, and the other, with no access to PID, accessible by the statistical and data management center. National Institute of Allergy and Infectious Diseases' Laboratory of Parasitic Diseases used an alternate approach in a tuberculosis–filariasis coinfection study in southern India, where the support team built a separate database from a commercial product for the PID information collected by the mobile application. This database at the Indian collaborator site provides a function similar to a hospital electronic health record system for the study. The anonymized research data (the CRF data) automatically uploads to the CDMS at a US data center. In addition to the regulatory challenges, the latter solution also overcomes objections often encountered in international clinical protocols that the data reside in the country of the participants. The Division of AIDS, as a study sponsor, requires the following for storage of source documents in electronic format: “All versions of application software, operating systems, and software development tools involved in processing of data or records need to be available as long as data or records associated with these versions are required to be retained.18” Many countries have guidelines for the handling of specimens and biological agents in clinical protocols, but there is less clarity regarding source clinical data. There are strict guidelines for using CRFs as source documents and their accessibility or storage at the site. How these guidelines will translate to mobile EDC are beyond the scope of this article. In some regions, the electronic source data collected in a clinical protocol may be the only existing health record of the participants, and the use and storage of this “document” may be important to both local health practitioners and IRBs or ethics committees.
An Example: Data Collection in a Large Community-Randomized HIV Prevention Trial
An HIV prevention trial planned for 21 communities in Zambia and South Africa (HPTN 071, the PopART study) proposed annual data collection in 52,500 households over a 6-month period. Mobile data collection was seen as the only viable solution, given the scope and environment of the study. The process for the implementation of mobile data collection for a group who were novices in mobile data collection, although experienced in data management for clinical trials, is described in Table 1. Some features of note in this project include the following: (1) repeated visits to the same household member over a 3-year period led to a decision to implement fingerprint data capture as an aid for participant identification; (2) critical to the protocol was the collection and storage of biological specimens: a device barcode reader was a device system requirement; (3) accurate identification of randomly sampled households required GPS for implementation; and (4) real-time random selection among eligible household members was implemented locally on the device.
Improving the Ability to Analyze Previously Collected Data
Over the past 20 years, sites have collected data for protocols and cohorts at remote field locations and entered the data using multiple database technologies with a separate database files for each round or protocol revision. This is how systems such as FoxPro, dBase, DB2, and FileMaker stored their data. A recent project integrated 14 rounds of cohort census data from multiple FoxPro files into a single SQL database. With each annual census, the survey questions and labels were changed, added, deleted, or reused for slightly different questions. The multiple files and inconsistent labeling made longitudinal analysis of the historic data very difficult because it required custom queries to run against each database and specialized knowledge of the evolution of the questionnaires and field labels. In modern EDC systems, the stronger data management protocols require a single database that dramatically improves current and future accessibility of data. The single SQL database that was formed as noted above met these requirements and has simplified the data analysis process.19
Conducting clinical research in RLS is challenging. Digital devices and laboratory equipment rely on continuity of quality power supplied by well-designed and maintained sources, whether utility, local diesel generator, or renewable energy. The performance of these devices, such as tablets, mobile phones, and laptops, depend on the networks to which they connect, which in turn depend on IT systems that are updated and secure.
Modern computing and scientific systems require continuous conditioned power to maintain the equipment and avoid loss of data. In RLS, one of the most difficult challenges in supporting IT infrastructure for field clinics and laboratories, and in cities, has been the lack of reliable and high-quality utility power
An adherence to electrical codes is a critical aspect of supporting modern information and communication technologies (and laboratory equipment) in remote locations
It is generally preferable to use locally supplied equipment because of ready access to local maintenance in the event of failure. However, this can mean committing to using equipment and devices that are only available in specific regions with uncertainty about quality and compatibility
Rapid changes in the Internet access market throughout the world, including RLS in recent years, have dramatically expanded the available choices of Internet service providers
Management of the local area networks is still the primary determinant of performance of an Internet connection
Emerging EDC and mobile technologies can make conducting clinical research easier and improve the quality and timeliness of data collection
EDC challenges include clarification of source data and careful separation of patient identifiable data and study data
Emerging EDC and mobile technologies can make conducting clinical research easier and improve the quality and timeliness of data collection. These technologies promise faster access to validated data, although this depends critically on reliable access to the Internet and support for local IT infrastructure. Implementation of these new technologies creates new challenges for the protection of participant confidentiality and the regulatory frameworks. Finally, despite their promise, these technologies are not a panacea for regions with poor IT infrastructure and technical support: without a reliable communication infrastructure to store and access data, researchers may not be able to utilize EDC in research studies.
The authors thank Dr J. J. McGowan of the National Institute of Allergy and Infectious Diseases (NIAID) Office of Science Management and Operations and Dr Katherine Zoon of the Division of Intramural Research, respectively, for their backing of the international IT infrastructure support program at the NIAID.
2. Seymour J. The Seven Types of Power Problems. Schneider Electric; 2011:1–21. Available at: http://whitepapers.apc.com
. Acccessed November 19, 2013.
3. The Institute of Electrical and Electronics Engineers Inc. IEEE Recommended Practice for Powering and Grounding Electronic Equipment. New York, NY: The Institute of Electrical and Electronics Engineers, Inc; 2005:1–603.
4. Emadi A, Bekiarov SB. Uninterruptible power supplies: classification, operation, dynamics, and control. IEEE. 2002;1:7.
5. Siu SK, Lopopolo J. Compatibility, sizing, and design considerations for generators and UPSs in Tiers I, II, III, and IV topologies. IEEE Trans Ind Appl. 2011;47:2324–2329.
7. Rubens P. Cloud Wi-Fi could become key for network management. In: Enterprise Networking Planet. Foster City, CA: IT Business Edge; 2013.
8. The United States Government Configuration Baseline (USGCB). National Institute of Standards and Technology. Available at: http://usgcb.nist.gov
. Accessed November 19, 2013.
9. Padmavathi GD, Divya S. A survey on various security threats and classification of malware attacks, vulnerabilities and detection techniques. Int J Computer Sci Appl. 2013;2:7.
10. Standardisations of ISO 15189 Medical Laboratories—Requirements for Quality and Competence. Geneva, Switzerland: International Organization for Standardization; 2012.
11. Administration FDA. Draft Guidance: Electronic Source Data in Clinical Investigations. Rockville, MD: Food and Drug Administration; 2012:10.
13. Felt AP, Finifter M, Chin E, et al.. A Survey of Mobile Malware in the Wild. In: Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices. Chicago, IL: ACM; 2011. 3–14.
14. Singhal M, Shukla A. Implementation of location based services in Android using GPS and Web services. Int J Computer Sci Issues. 2012;9:237–242.
15. IWG, GIWGG. Reflection Paper on Expectations for Electronic Source Data and Data Transcribed to Electronic Data Collection Tools in Clinical Trials. London, United Kingdom: European Medicines Agency; 2007.
16. Ohmann C, Kuchinke W, Canham S, et al.. Standard requirements for GCP-compliant data management in multinational clinical trials. Trials. 2011;12:85.
17. International Conference on Harmonisation of technical requirements for registration of pharmaceuticals for human use (ICH). Guideline for Good Clinical Practice. Note for Guidance on Good Clinical Practice. London, United Kingdom: European Medicines Agency; 2002:1–48.
18. DAIDS. Source Documentation Requirements [Requirements for Source Documentation in DAIDS Funded and/or Sponsored Clinical Trials]. Bethesda, MD: DHHS/NIH/NIAID/DAIDS/Office for Policy in Clinical Research Operations (OPCRO) 2007.
19. Beaulah SA, Correll MA, Munro RE, et al.. Addressing informatics challenges in Translational Research with workflow technology. Drug Discov Today. 2008;13:771–777.