Advances in technology, including the rapidly growing field of genomics, are transforming clinical medicine. The term ‘genomics’ was first coined in 1986 by Dr Thomas Roderick, a geneticist in Bar Harbour, Maine, and was initially intended as a term to encompass the study and comparison of genomes of various species, including their evolution and relationships.1 Essentially, genomics involves the application of DNA sequencing and the subsequent analyses using in vitro experiments and bioinformatic approaches to study the structure and function of genes, both human and pathogen.
In recent decades, genomics has been used extensively in a research capacity to study infectious agents, with the development of high throughput ‘next-generation’ sequencing technologies allowing detailed large scale analyses of entire pathogen genomes. However, despite the perceived benefits of sequencing technology to support traditional methods in diagnostic microbiology, there has been limited application in clinical and public health laboratories in Australasia to date.
This review aims to examine applications of current technologies in diagnostic microbiology and to outline the added value and current limitations of genomics, and in particular, bacterial whole genome sequencing (WGS), in order to support microbiologists in future implementation and use of these new technologies in clinical and public health practice.
WGS: METHODS, SEQUENCING TECHNOLOGY AND DATA ANALYSIS
The evolution of sequencing technology
The Human Genome Project instigated a revolution in sequencing technologies resulting in the establishment of high-throughput WGS as an important tool for the study of organisms, both human and microbial. Initial technological advances focussed on enhancing the chain termination sequencing method published by Sanger et al. in 1977.2 These modifications included fluorescent labelling of molecules, development and utilisation of capillary-based instruments, and automation of these processes to allow analysis of multiple samples in parallel.3
As Sanger sequencing was limited to <1000 bases, the search for more efficient methods for sequencing long, complex pieces of DNA such as entire chromosomes, led to other approaches. Initially described in 1979, ‘shotgun sequencing’, where longer segments of DNA were randomly fragmented into smaller segments for Sanger sequencing, was an early step towards facilitating genome sequencing, but was slow and labour-intensive for an entire genome, requiring a map to assemble the sequenced fragments.4 With the parallel advancements in computation technology and software, this strategy evolved into ‘whole-genome shotgun sequencing’, which bypassed the need for a genetic map by using bacterial clones to produce a large amount of redundant sequence read data across the genome and utilising newer computation technology to assemble the sequence reads. This method resulted in the landmark sequencing of the Haemophilus influenzae genome,5 the first genome from a free-living organism to be sequenced, and was the most popular and advanced sequencing method until the late 2000s.6
More recently, the invention of high-throughput ‘next-generation’ sequencing technology, with relatively simple benchtop technology and efficient library preparation protocols, has significantly improved the capacity to perform low-cost, efficient WGS, and has made it a feasible tool to enhance clinical diagnostic investigations in near real-time. Next-generation processes generally involve parallel sequencing, producing vast quantities of data that require modern computation methods to assemble the sequence reads.
Figure 1 shows the typical workflow and application of next-generation sequencing that could be applied to clinical microbiology.
There are a number of commercialised next-generation sequencing methods in use and novel technologies emerging onto the market, each with advantages and disadvantages, which have been reviewed in detail previously,6–12 although several are now outdated with the rapid growth in technology. While this review is not exhaustive, a summary of the current most common sequencing technology is shown in Tables 1–3.
Sequencing options for clinical microbiology: what needs to be considered?
There are a number of important considerations in comparing sequencing platforms for clinical microbiology, and deciding whether to perform in-house sequencing or to out-source to an experienced sequencing service provider.
The cost of implementation including equipment set up, routine sequencing costs for reagents and consumables as well as post-processing bioinformatics costs is an obvious, but significant factor. These expenses can be measured in cost per sequencing run, cost per organism genome sequenced, or cost per megabase of output data. To be a financially viable option for clinical microbiology laboratories WGS must be able to replace current technologies (e.g., methods for molecular characterisation of pathogens such as pulsed field gel electrophoresis), or provide additional benefits in patient outcomes and clinical or laboratory efficiency.
In-house versus outsourced
In-house sequencing may improve turnaround times for data generation and analyses, however this requires significant investment in technology and data analysis expertise. Although outsourcing may result in longer turnaround times, it may improve overall time and cost efficiency of sequencing by pooling isolates from smaller laboratories with insufficient sample numbers to fill a standard sequencing run. However, clear communication between referrer and provider is paramount to ensure that the clinical questions to be answered with WGS are clear, and that the subsequent analysis is understood and verified by both parties.
Some available technologies allow sequencing a handful of bacterial genomes in a few hours, while others have capacity to sequence 50–100 bacterial genomes in a single run that may take between 1 and 3 days. Flexibility in sequencing throughput, without significant financial implications of cost per sample, should also be considered. A reference microbiology laboratory needs to be able to sequence a large collection of 50–100 samples for epidemiological purposes, but also have the ability to sequence a small number of strains of pathogens of public health concern urgently for a similar cost per sample.
Adaptability of the sequencing platform to upgrades and changing sequencing practices is another factor, with sequencing technology rapidly evolving. The capability of the sequencer to be used for human genome sequencing and for research groups may also allow sharing of resources in smaller centres with lower demand for microbial WGS.
The quality of a sequence result can be reported using a score to indicate the quality and accuracy of each nucleotide base call. For example, Illumina uses a Phred-score (see Appendix 1: Glossary, http://links.lww.com/PAT/A30), with a score of 20 (Q20) equating to 1 error every 100 bases, or a 1% error rate, while a score of 30 (Q30) indicates an error rate of 1 every 1000 bases (0.1%). Modern WGS methods aim to achieve a quality score of 30 across the genome, although sequencing for different purposes may have different targets.
Despite the differences between sequencing platforms, in experienced hands, the output from several of the established next-generation sequencers (Tables 2 and 3) appears to be sufficient for most current clinical applications.13 The potential advantages of long sequence reads for clinical microbiology are still being investigated,14 although in a research environment, long reads (>5000 bp) have helped overcome many of the limitations of short read data.15,16 Examples include resolution of tandem repeat units and insertion sequences, identification of smaller circularised sequences such as plasmids, and bridging contiguous sequence gaps that litter de novo assemblies from short reads to assist with genome closure. Although this may soon become the standard in bacterial genome sequencing, the advantages are offset by the lower throughput, and higher implementation and sequencing costs, with other sequencers producing output data of sufficient quality and resolution for clinical purposes.
Bioinformatic analysis of sequencing data
With the technological advances in generating large amounts of high quality sequencing data, the bottleneck in implementing whole genome sequencing for clinical purposes has shifted to the post-sequencing data analysis. The term ‘bioinformatics’ encompasses the handling and analysis of sequencing data, usually with the assistance of computer-based algorithms.
Although both ‘open source’ and commercially available bioinformatic programs/tools have been specifically developed for use in a clinical setting by clinicians with limited bioinformatics knowledge,17–20 many of these lack the ability to batch analytical processes on large datasets and customise automation of data analysis pipelines, as a trade-off for the ease of use via a graphical user interface (GUI). The majority of available bioinformatic software requires some knowledge of the text-based command-line of the UNIX or Linux operating systems, allowing custom programming scripts and pipelines to automate data manipulation and analysis in a single step. Table 4 shows examples of bioinformatics tools commonly employed for analysis of bacterial genomes.
In assessing bioinformatics software for analysis of WGS data for clinical microbiology, there are several considerations and criteria to take into account.
Although Linux-based tools will continue to predominate due to the ability and ease in customising analyses, tools that can be operated through a GUI may be preferred by those unfamiliar with bioinformatics.
Another key advantage of Linux-based tools, although often requiring some initial work to establish, is a ‘pipeline’ for specific types of analyses. These pipelines enable ‘batching’ or sequential running of a number of processes on multiple genomes with a single command, compared with running each component individually, before manually entering the next command.
In a clinical setting, the ability to obtain a result quickly is often a priority over correcting minor inaccuracies in single nucleotide polymorphism (SNP) calls that do not change the overall result. Bioinformatic tools that are able to analyse multiple samples together and utilise the processing power and resources of modern computers to split large complex processes into smaller processes running in parallel exemplify the ‘many hands make light work’ proverb, a feature known as multithreading or hyperthreading.
Accuracy and detail
It naturally follows that the accuracy of the analysis is important for clinical microbiology, particularly for organism identification, typing, and resistance detection. However, while research pursuits require accurate and detailed analyses, the additional resolution from this level of detail is not always required for clinical decisions. For example, in inferring phylogenetic relationships between organisms, Bayesian methods have become popular in estimating a phylogenetic tree. However, while substantially faster neighbour-joining methods may not produce as accurate an evolutionary tree, the resolution is likely to be sufficient and rapid enough for analysing a public health outbreak in real time, where the organisms involved are highly clonal.
Although there is a large amount of free publicly available software for bioinformatic analysis, these tend to be command-line based with low adaptability across different sequencing platforms. GUI-based software that can be used with relatively little experience is available, though often at a cost, both financial as well as speed and occasionally detail. For example, Applied Maths Bionumerics suite offers a wide range of tools for analysis at a cost of approximately AU$17 per isolate. Galaxy is a free, open source web-based platform for bioinformatics, but requires data uploading and sharing of public servers, which limit the speed of analyses.
Documentation and support
An advantage of commercial software is the availability of user manuals and professional support for troubleshooting. In contrast, while there is usually some documentation for use and limited support available from open-source software developers, many issues require local computing expertise for implementation and troubleshooting.
Public genome data
A number of public repositories of sequencing data are available, with published sequences available for download for comparative genomic analysis. The National Center for Biotechnology Information's (NCBI) GenBank database currently lists just over 3000 annotated complete genome assemblies in addition to 25,000 draft genome assemblies. The NCBI genome data are exchanged with the European Molecular Biology Laboratory's (EMBL) European Nucleotide Archive and the DNA Data Bank of Japan (DDBJ), which together form the International Nucleotide Sequence Database Collaboration. The Global Microbial Identifier initiative (http://www.globalmicrobialidentifier.org/) is another independent collaboration that aims to coordinate a microbial sequencing data collection and collate the collective genomic and metadata to facilitate subsequent analysis on a global scale, although the data are only available to collaborators.
PERSPECTIVES ON GENOMICS IN CLINICAL MICROBIOLOGY
We recently conducted a qualitative online survey of infectious diseases physicians, microbiologists and other professionals involved in the management of infectious diseases on attitudes towards bacterial whole genome sequencing in Australia and New Zealand. Of 102 respondents, 74% were either clinical microbiologists or infectious diseases physicians, with the remaining 26% either infectious diseases/microbiology trainees (23%) or research-based professionals (3%). Respondents were predominantly based in Victoria (34%) or New South Wales (25%), although a number represented Western Australia (9%) and New Zealand (8%).
Of respondents, 32% had some prior involvement with WGS, although only 24% reported local capacity to perform WGS. The Illumina MiSeq and the Ion Torrent PGM were the only sequencers used. Although subject to survey bias, all respondents indicated that they thought WGS would be useful in clinical microbiology in the next 5–10 years, primarily for epidemiological surveillance typing, clonality testing for outbreak investigation, and for detection of antimicrobial resistance. Due to concerns about the cost of implementing and conducting WGS, and the current lack of expertise in WGS and bioinformatics, most respondents thought WGS would be most likely used in reference laboratories, tertiary hospital laboratories and research laboratories. However, 83% thought that WGS would be used at least once per month in their laboratory over the next 5–10 years.
Although a qualitative study, this survey indicates a perceived utility of WGS in clinical and public health microbiology, with realistic anticipation that in the current economic climate, this will only be feasible in reference laboratories and large tertiary hospitals. As others have alluded to,21,22 this model, with a few peripheral nodes and a centralised hub for WGS, would help to facilitate national/international collaboration and standardisation.
WGS IN CLINICAL AND PUBLIC HEALTH MICROBIOLOGY: HOW CAN IT HELP?
Structural and functional genomics
One of the primary investigation tools in microbial research is the use of genomics to characterise an organism, including identification of the genetic elements that may result in pathogenicity, survival, or antimicrobial resistance. As with human genetics, microbial genomics has the capacity to interrogate organisms for key genetic markers that may influence treatment and prognosis of infections. Currently, there are four main potential applications of WGS for bacterial pathogen characterisation in the diagnostic microbiology laboratory: identification, typing, resistance detection, and virulence gene detection.
Previous studies have illustrated proof-of-concept applications using next-generation sequencing for bacterial identification.23–28 Given the current costs of sequencing, this is unlikely to surpass current methods such as matrix-assisted laser desorption ionisation-time of flight (MALDI-TOF) for routine bacterial detection for standard isolates. However, WGS may play a key role with organisms that are unable to be identified using routine methods. This includes organisms that often undergo methods such as 16S rDNA sequencing or specific nucleic acid probes to confirm identification, such as Nocardia and non-tuberculous mycobacteria, and organisms that are not usually or unable to be readily cultured. A recent report of neuroleptospirosis diagnosed through next-generation sequencing where conventional tests were non-diagnostic highlighted this potential role in diagnostic microbiology.23
Typing of bacterial pathogens for epidemiological surveillance, infection control and outbreak investigation is a more obvious and immediate application of WGS. There are numerous traditional typing methods for several key organisms that are generally performed in centralised reference laboratories, although occasionally will be performed for a specific purpose in routine diagnostic laboratories. For example, surveillance typing of Listeria monocytogenes for outbreak monitoring has been previously performed by a number of methods, including serotyping, binary typing, ribotyping, multilocus variable number tandem repeat analysis (MLVA), pulse-field gel electrophoresis (PFGE), and multi-locus sequence typing (MLST). Such diversity of methods with different resolution power is difficult to maintain and may hinder rather than enhance strain comparisons. Furthermore, typing is organism specific and requires constant validation. In contrast, WGS has the capacity to supersede traditional typing methods, through either in silico typing, or superior discriminatory capacity.29,30 For instance, MLST, which is traditionally performed by sequencing of a set of housekeeping genes, can be simulated by mapping WGS reads to the reference sequences of those genes,31 or using the Basic Local Alignment Search Tool (BLAST) to identify the alleles of the housekeeping genes.32 The role of WGS as a superior method to typing for epidemiological surveillance and outbreak investigation is described in the ‘Comparative genomics’ section below.
There are also potential applications for WGS to assist with antimicrobial resistance detection. A few studies have attempted to validate the accuracy of WGS for predicting antimicrobial resistance, with reasonable concordance.20,33–35 Current analyses using WGS data can readily detect acquired resistance such as beta-lactamases and aminoglycoside modifying enzymes, although characteristic mutations in critical genes such as rpoB can also be detected with prediction of resistance phenotypes. However, these methods are currently unable to reliably predict some resistance mechanisms, for example, resistance resulting from derepression of ampD, ampR and other regulatory genes of AmpC hyperproduction, or vancomycin heteroresistance conferred by mutations in the complex regulatory system that includes graRS, vraSR, walKR, agr and rpoB. Although current susceptibility methods from organism culture are likely to be more rapid and reliable for routine testing, as with organism identification, WGS methods may be useful for slow-growing organisms, organisms that are unable to be cultured, or where phenotypic susceptibility testing is unreliable, e.g., clarithromycin susceptibility testing for Mycobacterium abscessus. For example, WGS was used to rapidly diagnose a case of extensively drug-resistant (XDR) Mycobacterium tuberculosis, reducing time to diagnosis from weeks to days, subsequently reducing exposure to ineffective drugs and minimising risk of de novo resistance.36
The other main potential use of WGS data for organism characterisation is detection of genetic markers of virulence, such as Panton-Valentine leukocidin (PVL) in Staphylococcus aureus, or Shiga toxin in Escherichia coli, although this still remains investigational due to the uncertainty in gene expression and significance of gene presence.37
The emergence of WGS as a universal replacement for traditional bacterial typing has unveiled its potential as a powerful tool for epidemiological surveillance of bacterial pathogens, one of the cornerstones of infection control. Although largely performed in research environments, several studies have illustrated the capabilities of WGS to describe the evolution and epidemiology of important infections.38–46 In an era of increasing antimicrobial resistance, mapping the epidemiology of such multidrug resistant infections to direct public health responses and antimicrobial prescribing practices is vital. In addition to tracking resistant organisms, WGS allows tracking of specific resistance mechanisms, including motifs on mobile genetic elements such as plasmids and elucidation of mechanisms of gene transfer.47,48 For example, Wright et al. demonstrated that patients can be colonised with multiple strains of Acinetobacter baumanii capable of interacting within the patient, and that movement of patients and staff between healthcare facilities contributes to strain mixing and diversification.49
There have been numerous studies reporting the use of WGS to inform hospital infection control responses to suspected pathogen transmission.50–58 In 2012, investigators from the US National Institutes of Health (NIH) used WGS to track a suspected outbreak of carbapenem-resistant Klebsiella pneumoniae, identifying a single patient as the source for three independent transmission events.58 Another example was the paradigm-shifting evidence from WGS that multidrug-resistant Mycobacterium abscessus subspecies massiliense was frequently transmitted between patients with cystic fibrosis, prompting reconsideration of infection control measures.50 The high discriminatory capacity of WGS has promoted it as the new gold-standard method for strain comparison, surpassing more traditional typing methods for inferring disease transmission and providing one of the strongest arguments for the use of WGS in clinical and public health microbiology laboratories.
Comparative genomic studies have also attempted to clarify transmission events and outbreak propagation. These methods relied upon established ‘molecular clocks’ to estimate the time to the most common recent ancestor and dates of presumed transmission events, using phylogenomic models.59 Some defined thresholds for the number of SNPs between independent isolates that are required to infer whether they are epidemiologically linked,43,60 although mutation and recombination rates vary between species and lineages,50 and the rates of microevolution of endemic clones may need to be defined in each context.
Culture-independent identification and metagenomics
As alluded to above, WGS has been demonstrated to be a useful tool as a culture-independent method of bacterial identification, predominantly through metagenomic analyses. Although it is yet to be implemented in routine diagnostics, metagenomics involves sequencing all DNA content in a clinical sample, before using bioinformatic analyses to filter out human and non-pathogenic organism DNA to identify the causative agent. Due to the extensive depth of sequencing required for species identification, metagenomic investigations performed on low diversity sterile site samples are likely to produce a greater yield of results, in comparison to high diversity samples such as faeces. High quality samples with sufficient concentrations of genomic nucleic acid, such as tissue or fluid aspirates, are paramount for this application of WGS.
Previous methods including broad-range 16S rRNA PCR and sequencing have been used for diagnosis of culture-negative bacterial infections.61 However, these methods frequently had low sensitivity if insufficient pathogen DNA was present, and were affected by the presence of contaminating DNA from other bacterial species. Metagenomic analysis of WGS data from a clinical sample has the capacity to overcome these limitations by filtering out unwanted DNA in the post-sequencing analysis. Sensitivity is also potentially greater, as organisms can be identified from a number of different segments of DNA, rather than a specific target segment which may have been altered or fragmented in the pre-testing process.
Aside from research studies on the human microbiome, the other potential application of metagenomics is in novel pathogen discovery. Although it has been successful for identification of some pathogens,62,63 further testing is required to confirm the validity of novel genomes discovered by next-generation sequencing.64
Overall, metagenomic investigations remain experimental as sequencing technology and bioinformatic software to process and analyse metagenomic data is only just emerging.
CONSIDERATIONS FOR IMPLEMENTATION
Although WGS appears promising as an addition to the armoury of tests that are currently used in clinical and public health microbiology, it is yet to be widely implemented. With the significant improvements in cost and ease of sequencing, it is likely that WGS will supersede other molecular technologies including PFGE, MLST, DNA microarray and 16S rDNA sequencing in the near future, if not already. However, until costs become negligible, it is unlikely to be adopted for routine bacterial investigation over standard microscopy and culture, MALDI-TOF identification, and phenotypic antimicrobial susceptibility testing. Currently, the most immediately feasible applications are typing, epidemiological surveillance, and outbreak investigation to inform infection control procedures, most applicable to public health and tertiary hospital laboratories, although this would need to be matched by appropriate bioinformatic expertise. This situation is dynamic, and likely to change as sequencing technology evolves further and knowledge of bioinformatics develops.
Limitations of WGS
There are limitations of WGS that should be recognised. At present, the majority of analyses are based upon single nucleotide variants or SNPs identified from comparisons to a reference genome sequence. Consequently, analyses are dependent on the quality of sequencing and genome assembly, as well as the quality and selection of the reference genome. As current comparative analyses based on SNPs selectively exclude a significant proportion of phylogenetic data, some bioinformaticians have suggested conducting phylogenetic analyses based on all loci in a genome, rather than limiting the analysis to SNPs.65 However, the significant requirements in computation resources and time would render such analyses unusable in a clinical environment. In contrast to research applications, comparative genomic methods for clinical purposes should aim to utilise more time efficient estimations of phylogenetic relationships that may not be the most accurate approximation, but are of sufficient resolution and accuracy to inform clinical and public health decisions.
Although WGS data can be used to provide detailed genomic information, this does not necessarily translate into knowledge of gene expression and transcription. For example, the presence of lukSF-PV does not necessarily equate to PVL production and/or clinically aggressive Staphylococcus aureus infection. Detection of post-transcription RNA with next-generation sequencing can be performed and may help detect gene expression or enzyme hyperproduction (e.g., BlaZ or AmpC beta-lactamases), but requires a separate RNA sequencing run and is unlikely to replace current phenotypic screening methods.
A major limitation of WGS in clinical laboratories is the lack of validation and utility comparisons in clinical studies. Although proof-of-concept studies are frequently published, sequencing methods and data analyses for clinical projects have been customised around selected pathogens. The development and standardised evaluation of WGS pipelines for clinical and public health laboratories would guide further widespread implementation and add much needed evidence to an emerging field.
Quality control and standardisation
As with all tests that are incorporated into diagnostic and public health laboratory workflows, there is a need for a rigorous quality control process and standardisation of testing. Such measures have not yet been established, and benchmarks for quality control are yet to be determined. What should be the standard acceptable run quality? How should the quality of a genome assembly be assessed? Based on Illumina sequencing, we would recommend a minimum quality score of >30 across the genome with a minimum depth of 30–50-fold (i.e., an average of 30–50 overlapping reads at any particular locus) to produce adequate quality sequence for clinical comparative genomics. Although these have not been validated as the optimal target sequencing metrics, they are consistent with the recommendations recently proposed by the Australian Public Health Laboratory Network.66 We currently interrogate our read quality and genome assemblies manually, though this is an arduous process for a large number of sequences that might arise from an outbreak, and a standardised automated quality control process is required. Figure 2 summarises some of the key considerations when assessing quality of WGS analyses.
National and international standardisation is also required, particularly with comparative analyses. Decisions about which reference genome is selected, or whether a composite reference genome is used, or which typing method to correlate results with each organism, need to be made at an overarching level and accepted across sequencing sites nationally. However, inter-site comparative analyses based on core genome SNPs may also vary depending on the selection of test isolates. To address this, the Global Microbial Identifier (GMI) initiative requires individual sequencing sites to upload data to a centralised server, which updates real-time phylogenetic analyses with the addition of new strains. An alternative solution has been to use a globally standardised set of core genome housekeeping genes to perform MLST, or whole-genome MLST (wgMLST). This analysis involves typing based on approximately 2000 core genes for each species. At present, both wgMLST and GMI are internationally-driven subscription based services, and are yet to be globally accepted.
However, these methods can create ambiguity when inferring direct pathogen transmission. Although wgMLST will likely be useful as a high resolution typing tool, it excludes a large amount of potentially informative genomic information, such as phages, insertion sequences, and other mobile genetic elements that may indicate direct transmission. A centralised server that includes all strains and a constant reference genome in comparative analyses can generate a phylogenetic signal, but will frequently lack the resolution to infer transmission. As there is frequently substantial genetic diversity within a single species group, the number of core genes that are common to all strains within that group diminishes as the number of strains increases. Conversely, core genome analysis of a clonal subpopulation of that species group can involve many more common genes, resulting in amplification of the number of loci compared, and frequently the number of SNPs identified as well. Thus, for identification of outbreak transmission, a reference genome should ideally be as similar to the outbreak strains as possible, if not part of the outbreak, and comparative genomic analyses should exclude outlying taxa that are not suspected to be part of the outbreak.
A further consideration is at what level of similarity are two isolates considered to be ‘identical’ or ‘related’ for inferring disease transmission. In theory, immediate sampling of an isolate that has been passed directly from Host A to Host B should result in identical whole genome sequences with no SNPs identifiable when the strains are directly compared to one another. However, this rarely occurs, due to the background mutation rate of organisms over time and the possible influences of sampling, storage and sequencing. Hence, it is difficult to pre-specify a threshold that defines an outbreak strain from one that is not. Within-host microbial genomic diversity has also been demonstrated for a number of pathogens including Klebsiella pneumoniae,58Pseudomonas aeruginosa,67Burkholderia dolosa,68 and Staphylococcus aureus. In particular, studies of S. aureus have demonstrated genetic diversity of up to 40 SNPs between nasal isolates of the same spa-type and MLST group from a single patient,69 as well as nasal carriage of multiple strain types.70 Consequently, although appealing, establishing a fixed threshold of genetic divergence for an outbreak definition is difficult, and while comparative analyses of genomic data can support epidemiological investigations, they are not definitive.
Standardisation of data reporting is adopted to a certain degree for other microbiological tests. However, with the detail and complexity of genomic information, there are likely to be a limited number of clinicians who are able to fully comprehend and interpret all the data from a detailed analysis. Analogous to continuous data variables, results may not be able to be conveyed as a simple dichotomous ‘yes’ or ‘no’ report. For example, in an outbreak analysis, two isolates that differed by 100 SNPs might not be considered related if other clustered isolates were differing by 10 SNPs. However, if 90 of the 100 SNPs were co-located within a small genomic segment suggestive of a recombination event, this may still represent evidence of transmission. Lengthy technical descriptions of possible interpretations are unhelpful, and there is a need to develop standardised plain language reports, however there are no studies to guide reporting methods.
Resource and infrastructure requirements
The relative financial costs of WGS are discussed above. Additional costs for other equipment such as a high-end fluorometer for assessing DNA quantity, and a bioanalyser to assess DNA quality should be factored in. Fully automated systems from DNA extraction, sample and library preparation and sequencing have been marketed, although are rare. These systems were developed to maximise efficiency and costs of sequencing, but are geared more towards genomics reference centres.
The post-sequencing bioinformatic costs can also be significant. The emergence and accessibility of next-generation sequencing has resulted in an exponential increase in the amount of data generated from sequencing. The Illumina MiSeq is able to generate up to 15 gigabytes of raw data every 3 days before post-sequencing processing, while the HiSeq can generate up to one terabyte of data every week in high output mode. Although some clinical microbiology laboratories have previously invested in capacity for organism storage, physical sequencing data storage with backup needs to be considered with the implementation of WGS. Cloud-based storage options have been proposed for both workflow and data storage, however this may be impaired by bandwidth and data transfer capabilities. Handling of data confidentiality, security and integrity for these options also needs to be verified, though storage of sequencing data would seem easier than frozen organism storage.
Although many WGS analyses can be theoretically performed on standard desktop computers, the computational power required to process and analyse more than 50 genomes in a clinically actionable timeframe is considerable. The optimal specifications of this technology are beyond the scope of this review, and require local expertise.
Comparisons with human genomics
As with bacteria, human DNA has also been sequenced in research and clinical settings. While some parallels can be drawn between human and microbial genomics, there are a number of differences. The human genome is over 3 billion base pairs in size, a thousand times the size of the average bacterial genome. Somatic human cells are diploid with 23 pairs of chromosomes, while the majority of bacterial genomes, if not haploid in genome content, behave in a haploid manner. Several bacteria are polyploid organisms, e.g., Neisseria gonorrhoeae, although the significance of this has not be determined.
Clinical human genomics has focussed on identifying defined functional genetic mutations that result in disease. With the size of the human genome, high throughput next-generation sequencing has been used to perform targeted capture sequencing of exomes (the collective protein-coding regions of the genome) on large numbers of samples,71 replacing previous DNA microarrays. Rapidly declining costs of sequencing and improvement in sequencing technology has resulted in greater utilisation of whole genome sequencing, providing more comprehensive human genome data.
With the significant number of short reads required to assemble the human genome, human WGS is more resource intensive. For example, on Illumina's NextSeq 500 in high output mode, a single run can theoretically sequence the entire genomes of up to 500 bacteria with an average genome size of 3 Mbp with 30× coverage. Alternatively, a single human genome or up to 10 exomes can be sequenced with similar metrics.
There are also differences in post-sequencing bioinformatics. Human genomics largely involves searching for defined mutations in specific segments of DNA, although other analyses such as genome wide association studies (GWAS) have gained popularity to search for genetic markers of disease. Assuming the most significant mutations occur in coding regions, attention can be focussed on exome analysis. Adequate coverage over these regions to ensure high sequence accuracy is critical. In contrast, SNPs in non-coding regions can still be informative to comparative bacterial phylogenomics. Detection of small indels is essential to identify mutations in the human genome, but is not always required for bacterial outbreak investigation. While detection of recombination in bacterial genomes is important to identify horizontal genetic exchange and improve phylogenetic signal, such analyses are more of interest in the evolution of the human genome, rather than informing clinical human genetics.
Given these important differences, it follows that different expertise is required for human and bacterial genomics, and although there are similarities that may allow some sharing of certain resources such as a sequencing platform and reagents, each requires a different sequencing and bioinformatics configuration.
Whole genome sequencing has undisputed applications in research to enhance our understanding in numerous facets of infectious diseases and microbiology. Research into these aspects, including pathogen evolution, epidemiology and virulence determinants, and development and spread of antimicrobial resistance mechanisms, indirectly influences microbiology and clinical infectious diseases practices, and has the ultimate goal of improving patient care. Genomics is also increasingly being used in identifying potential drug and vaccine targets, and the increasing use of metagenomic analyses are starting to build our understanding of microbial ecosystems including the human microbiome.
There are still limitations that hinder widespread implementation of WGS in clinical and public health microbiology as a test performed in real-time to directly inform clinical practices. Even with rapidly improving sequencing efficiency, WGS is unlikely to surpass current methods for routine bacterial identification and antimicrobial susceptibility testing in the near future. It would seemingly have an expanding role in public health, reference and infection control laboratories for detailed isolate characterisation, outbreak investigation, and detection of disease transmission. As sequencing becomes more widely available and utilised, more routine use in diagnostic laboratories for pathogen identification in culture-negative samples, and metagenomic investigation of polymicrobial samples for pathogen and ‘resistome’ identification may be adopted.
It is clear that user-friendly bioinformatic pipelines are key to facilitating more widespread use of WGS, with more widespread bioinformatics expertise. Until this bottleneck is overcome, the most immediate implementation strategy is for centralised state reference laboratories to perform WGS and data analysis, with peripheral centres outsourcing to the reference laboratories as required. As uptake of WGS improves and costs decline, a more powerful epidemiological surveillance system could be established with sequencing performed at peripheral nodes, with bioinformatic analysis and oversight of the sequencing at the centralised reference centre. In an era of increasing drug resistance globally, ease of international travel, and little investment into antimicrobial drug development, utilising the few but powerful tools we have available to monitor and curb the spread of infectious diseases is paramount.
JCK is supported by a postgraduate scholarship from the National Health and Medical Research Council (NHMRC), Australia (APP1074824). BPH is supported by a NHMRC fellowship (APP1023526).
Conflicts of interest and sources of funding: The authors state that there are no conflicts of interest to disclose.
1. Kuska B. Beer, Bethesda, and biology: how “genomics
” came into being. J Natl Cancer Inst
2. Sanger F, Nicklen S, Coulson AR. DNA sequencing
with chain-terminating inhibitors. Proc Natl Acad Sci USA
3. Green ED. Strategies for the systematic sequencing
of complex genomes. Nat Rev Genet
4. Staden R. A strategy of DNA sequencing
employing computer programs. Nucleic Acids Res
5. Fleischmann RD, Adams MD, White O, et al. Whole-genome random sequencing
and assembly of Haemophilus influenzae
6. Metzker ML. Sequencing
technologies - the next generation. Nat Rev Genet
7. Loman NJ, Misra RV, Dallman TJ, et al. Performance comparison of benchtop high-throughput sequencing
platforms. Nat Biotechnol
8. Loman NJ, Constantinidou C, Chan JZ, et al. High-throughput bacterial genome sequencing
: an embarrassment of choice, a world of opportunity. Nat Rev Microbiol
9. Junemann S, Sedlazeck FJ, Prior K, et al. Updating benchtop sequencing
performance comparison. Nat Biotechnol
10. Miyamoto M, Motooka D, Gotoh K, et al. Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes. BMC Genomics
11. Quail MA, Smith M, Coupland P, et al. A tale of three next generation sequencing
platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics
12. Liu L, Li Y, Li S, et al. Comparison of next-generation sequencing
systems. J Biomed Biotechnol
13. Harris SR, Torok ME, Cartwright EJ, et al. Read and assembly metrics inconsequential for clinical utility of whole-genome sequencing
in mapping outbreaks. Nat Biotechnol
14. Chin CS, Sorenson J, Harris JB, et al. The origin of the Haitian cholera outbreak strain. N Engl J Med
15. McCoy RC, Taylor RW, Blauwkamp TA, et al. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS One
16. Conlan S, Thomas PJ, Deming C, et al. Single-molecule sequencing
to track plasmid diversity of hospital-associated carbapenemase-producing Enterobacteriaceae
. Sci Transl Med
17. Blankenberg D, Coraor N, Von Kuster G, et al. Integrating diverse databases into an unified analysis framework: a Galaxy approach. Database
2011; 2011: bar011.
18. Lazarus R, Kaspi A, Ziemann M, et al. Creating reusable tools from scripts: the Galaxy Tool Factory. Bioinformatics
19. Gupta SK, Padmanabhan BR, Diene SM, et al. ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes. Antimicrob Agents Chemother
20. Zankari E, Hasman H, Cosentino S, et al. Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother
21. Köser CU, Ellington MJ, Cartwright EJP, et al. Routine use of microbial whole genome sequencing
in diagnostic and public health microbiology
. PLoS Pathog
22. Fricke WF, Rasko DA. Bacterial genome sequencing
in the clinic: bioinformatic challenges and solutions. Nat Rev Genet
23. Wilson MR, Naccache SN, Samayoa E, et al. Actionable diagnosis of neuroleptospirosis by next-generation sequencing
. N Engl J Med
24. Hasman H, Saputra D, Sicheritz-Ponten T, et al. Rapid whole-genome sequencing
for detection and characterization of microorganisms directly from clinical samples. J Clin Microbiol
25. Loman NJ, Constantinidou C, Christner M, et al. A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli
O104:H4. J Am Med Assoc
26. Larsen MV, Cosentino S, Lukjancenko O, et al. Benchmarking of methods for genomic taxonomy. J Clin Microbiol
27. Naccache SN, Federman S, Veeraraghavan N, et al. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing
of clinical samples. Genome Res
28. Byrd AL, Perez-Rogers JF, Manimaran S, et al. Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing
data. BMC Bioinformatics
29. Bartels MD, Petersen A, Worning P, et al. Comparing whole-genome sequencing
with Sanger sequencing
for spa typing of methicillin-resistant Staphylococcus aureus
. J Clin Microbiol
30. Athey TB, Teatero S, Li A, et al. Deriving group A Streptococcus typing information from short-read whole-genome sequencing
data. J Clin Microbiol
31. Inouye M, Conway TC, Zobel J, et al. Short read sequence typing (SRST): multi-locus sequence types from short reads. BMC Genomics
32. Larsen MV, Cosentino S, Rasmussen S, et al. Multilocus sequence typing of total-genome-sequenced bacteria. J Clin Microbiol
33. Stoesser N, Batty EM, Eyre DW, et al. Predicting antimicrobial susceptibilities for Escherichia coli
and Klebsiella pneumoniae
isolates using whole genomic sequence data. J Antimicrob Chemother
34. Gordon NC, Price JR, Cole K, et al. Prediction of Staphylococcus aureus
antimicrobial resistance by whole-genome sequencing
. J Clin Microbiol
35. Zankari E. Comparison of the web tools ARG-ANNOT and ResFinder for detection of resistance genes in bacteria. Antimicrob Agents Chemother
36. Koser CU, Bryant JM, Becq J, et al. Whole-genome sequencing
for rapid susceptibility testing of M. tuberculosis
. N Engl J Med
37. Knobloch JK, Niemann S, Kohl TA, et al. Whole-genome sequencing
for risk assessment of long-term Shiga toxin-producing Escherichia coli
. Emerg Infect Dis
38. Leopold SR, Goering RV, Witten A, et al. Bacterial whole-genome sequencing
revisited: portable, scalable, and standardized analysis for typing and detection of virulence and antibiotic resistance genes. J Clin Microbiol
39. Howden BP, Holt KE, Lam MM, et al. Genomic insights to control the emergence of vancomycin-resistant enterococci. mBio
40. Petty NK, Ben Zakour NL, Stanton-Cook M, et al. Global dissemination of a multidrug resistant Escherichia coli
clone. Proc Natl Acad Sci USA
41. Grad YH, Kirkcaldy RD, Trees D, et al. Genomic epidemiology of Neisseria gonorrhoeae
with reduced susceptibility to cefixime in the USA: a retrospective observational study. Lancet Infect Dis
42. Chua KYL, Seemann T, Harrison PF, et al. The dominant Australian community-acquired methicillin-resistant Staphylococcus aureus
clone ST93-IV [2B] is highly virulent and genetically distinct. PLoS One
43. Eyre DW, Cule ML, Wilson DJ, et al. Diverse sources of C. difficile
infection identified on whole-genome sequencing
. N Engl J Med
44. Gire SK, Goba A, Andersen KG, et al. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science
45. Johnson JR, Tchesnokova V, Johnston B, et al. Abrupt emergence of a single dominant multidrug-resistant strain of Escherichia coli
. J Infect Dis
46. Stinear TP, Holt KE, Chua K, et al. Adaptive change inferred from genomic population analysis of the ST93 epidemic clone of community-associated methicillin-resistant Staphylococcus aureus
. Genome Biol Evol
47. Sivertsen A, Billstrom H, Melefors O, et al. A multicentre hospital outbreak in Sweden caused by introduction of a vanB2 transposon into a stably maintained pRUM-plasmid in an Enterococcus faecium
ST192 clone. PLoS One
48. Stoesser N, Giess A, Batty EM, et al. Genome sequencing
of an extended series of NDM-Klebsiella pneumoniae
neonatal infections in a Nepali hospital characterizes the extent of community versus hospital-associated transmission in an endemic setting. Antimicrob Agents Chemother
49. Wright MS, Haft DH, Harkins DM, et al. New insights into dissemination and variation of the health care-associated pathogen Acinetobacter baumannii
from genomic analysis. mBio
50. Bryant JM, Grogono DM, Greaves D, et al. Whole-genome sequencing
to identify transmission of Mycobacterium
abscessus between patients with cystic fibrosis: a retrospective cohort study. Lancet
51. Harris SR, Cartwright EJ, Torok ME, et al. Whole-genome sequencing
for analysis of an outbreak of meticillin-resistant Staphylococcus aureus
: a descriptive study. Lancet Infect Dis
52. Long SW, Beres SB, Olsen RJ, et al. Absence of patient-to-patient intrahospital transmission of Staphylococcus aureus
as determined by whole-genome sequencing
53. Price JR, Golubchik T, Cole K, et al. Whole-genome sequencing
shows that patient-to-patient transmission rarely accounts for acquisition of Staphylococcus aureus
in an intensive care unit. Clin Infect Dis
54. Epson EE, Pisney LM, Wendt JM, et al. Carbapenem-resistant Klebsiella pneumoniae
producing New Delhi metallo-β-lactamase at an acute care hospital, Colorado, 2012. Infect Control Hosp Epidemiol
55. Wendt JM, Kaul D, Limbago BM, et al. Transmission of methicillin-resistant Staphylococcus aureus
infection through solid organ transplantation: confirmation via whole genome sequencing
. Am J Transplant
56. Sherry NL, Porter JL, Seemann T, et al. Outbreak investigation using high-throughput genome sequencing
within a diagnostic microbiology laboratory. J Clin Microbiol
57. Reuter S, Harrison TG, Koser CU, et al. A pilot study of rapid whole-genome sequencing
for the investigation of a Legionella
outbreak. BMJ Open
58. Snitkin ES, Zelazny AM, Thomas PJ, et al. Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae
with whole-genome sequencing
. Sci Transl Med
59. Lindsay JA. Evolution of Staphylococcus aureus
and MRSA during outbreaks. Infect Genet Evol
60. Walker TM, Lalor MK, Broda A, et al. Assessment of Mycobacterium tuberculosis
transmission in Oxfordshire, UK, 2007-12, with whole pathogen genome sequences: an observational study. Lancet Respir Med
61. Rampini SK, Bloemberg GV, Keller PM, et al. Broad-range 16S rRNA gene polymerase chain reaction for diagnosis of culture-negative bacterial infections. Clin Infect Dis
62. Palacios G, Druce J, Du L, et al. A new arenavirus in a cluster of fatal transplant-associated diseases. N Engl J Med
63. Bhatt AS, Freeman SS, Herrera AF, et al. Sequence-based discovery of Bradyrhizobium enterica
in cord colitis syndrome. N Engl J Med
64. Naccache SN, Greninger AL, Lee D, et al. The perils of pathogen discovery: origin of a novel parvovirus-like hybrid genome traced to nucleic acid extraction spin columns. J Virol
65. Bertels F, Silander OK, Pachkov M, et al. Automated reconstruction of whole-genome phylogenies from short-sequence reads. Mol Biol Evol
67. Smith EE, Buckley DG, Wu Z, et al. Genetic adaptation by Pseudomonas aeruginosa
to the airways of cystic fibrosis patients. Proc Natl Acad Sci USA
68. Lieberman TD, Michel JB, Aingaran M, et al. Parallel bacterial evolution within multiple patients identifies candidate pathogenicity genes. Nat Genet
69. Golubchik T, Batty EM, Miller RR, et al. Within-host evolution of Staphylococcus aureus
during asymptomatic carriage. PLoS One
70. Mongkolrattanothai K, Gray BM, Mankin P, et al. Simultaneous carriage of multiple genotypes of Staphylococcus aureus
in children. J Med Microbiol
71. Ng SB, Turner EH, Robertson PD, et al. Targeted capture and massively parallel sequencing
of 12 human exomes. Nature
72. Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res
73. Bankevich A, Nurk S, Antipov D, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing
. J Comput Biol
74. Chin CS, Alexander DH, Marks P, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing
data. Nat Methods
75. Koren S, Harhay GP, Smith TP, et al. Reducing assembly complexity of microbial genomes with single-molecule sequencing
. Genome Biol
76. Li H, Homer N. A survey of sequence alignment algorithms for next-generation sequencing
. Brief Bioinform
77. Pabinger S, Dander A, Fischer M, et al. A survey of tools for variant analysis of next-generation genome sequencing
data. Brief Bioinform
78. Inouye M, Dashnow H, Raven L, et al. SRST2: Rapid genomic surveillance for public health and hospital microbiology labs. Genome Med 2014; 6:90.