Sun, Fang MD, PhD; Schoelles, Karen M. MD, SM, FACP; Coates, Vivian H. MBA
DURING the past few decades, advances in genetic science have greatly deepened the understanding of the mechanisms of diseases at the molecular level. The development of a broad range of genetic technologies has accompanied this understanding. These technologies include polymerase chain reaction (PCR) and its derivatives (eg, reverse transcription PCR, real-time PCR, multiplex PCR), PCR-like nucleic acid amplification techniques (eg, ligase chain reaction), in situ hybridization methods (eg, fluorescence in situ hybridization [FISH]), microarray, Southern blot, Northern blot, and new-generation high-throughput DNA sequencing techniques. As new genetic technologies emerge and become increasingly affordable, many genetic tests have been developed and used in clinical practice. According to GeneTests (http://www.genetests.org), as of December 20, 2011, there were 1061 clinics and 602 laboratories worldwide performing genetic tests for 2505 diseases. These tests allow fast and efficient detection or quantification of genetic variants and have a wide variety of clinical applications. These applications include making diagnoses, determining risk or susceptibility in asymptomatic individuals, revealing prognostic information to guide clinical management and treatment, and predicting response to treatments or environmental factors such as diet, behavioral factors, and drugs (Bonis et al., 2007; Bradley et al., 2009; Matcher et al., 2007; Palomaki et al., 2009; Secretary's Advisory Committee on Genetics, Health, and Society [SACGHS], 2008; Segal et al., 2009; Sun et al., 2011).
Currently, in the United States, a genetic test may reach the market either as a commercially distributed test kit approved or cleared by the US Food and Drug Administration (FDA) or as a laboratory-developed test (LDT) (SACGHS, 2008; Sun et al., 2010). Test kits, either FDA-cleared or FDA-approved, include all reagents and instructions needed to complete the test procedure and interpret the results. These test kits can be used in multiple laboratories and are currently regulated by the FDA as in vitro diagnostic devices. Laboratory-developed tests, also known as homebrew or in-house molecular tests, are developed in laboratories using either FDA-regulated or self-developed analyte-specific reagents and intended for use solely in the test developer's laboratory. The FDA claims jurisdiction over LDTs but, historically, has been exercising enforcement discretion (SACGHS, 2008; Sun et al., 2010). Only recently has the agency started to play a more active role in overseeing these tests (FDA, 2010). Laboratory-developed tests compose the majority of the genetic tests that have become available to clinical practice. The FDA has approved or cleared only a small number of genetic test kits for marketing in the United States (Sun et al., 2010). The US Centers for Medicare & Medicaid Services regulates laboratories that perform genetic LDTs under the Clinical Laboratory Improvement Amendments of 1988 (CLIA) (SACGHS, 2008; Sun et al., 2010).
While the introduction of new genetic tests creates tremendous potential for improving patient care, it is essential to adequately evaluate these tests to ensure their accuracy and utility for clinical practice. Inaccurate test results may mislead clinicians to make wrong decisions and potentially cause harm to patients. Tests without clinical utility do not lead to improved outcomes but could impose unnecessary burdens on patients and society. Many stakeholders have voiced concern about the quality, safety, and clinical utility of genetic tests (Javitt & Hudson, 2006; Kutz, 2006; SACGHS, 2008). Laboratory-developed tests are of particular concern due to the historical lack of active FDA regulation, although there is no solid evidence demonstrating that FDA-regulated test kits perform better than LDTs.
Evaluation of genetic tests is challenging for a variety of reasons. Most of these tests advanced so fast and sometimes became obsolete before sufficient data had been generated to validate their performance. The test validation data generated by clinical laboratories often go unpublished and are rarely accessible by the public (SACGHS, 2006; Sun et al., 2010, 2011). There is also a lack of tools for rating the quality of the identified data (Sun et al., 2011). In this article, we discuss some major issues regarding the evaluation of genetic tests, including the general approaches to evaluation and common challenges evaluators face. This article's goal was to provide a starting point for those who are concerned with the safety and utility of genetic tests to develop an overall strategy to perform the assessment.
KEY CONCEPTS RELATED TO GENETIC TEST EVALUATION
Different authors may define “genetic tests” differently. In this article, we use the SACGHS's definition:
A genetic or genomic test involves an analysis of human chromosomes, deoxyribonucleic acid, ribonucleic acid, genes, and/or gene products (eg, enzymes and other types of proteins), which is predominately used to detect heritable or somatic mutations, genotypes, or phenotypes related to disease and health. (SACGHS, 2008)
This definition includes molecular tests (analysis performed on human DNA or RNA), cytogenetic tests (analysis performed on human chromosomes), and biochemical tests (analysis of human proteins and certain metabolites). The targeted analytes (eg, DNA, RNA, chromosomes, protein, metabolites) could be related to either acquired/somatic or heritable/germline genetic variants. This article does not discuss tests related to infectious pathogens, other analyses of microbial genomes, or exogenous analytes such as toxins and environmental chemicals.
Three intrinsically connected concepts that are commonly mentioned in genetic testing evaluation are analytic validity, clinical validity, and clinical utility (Centers for Disease Control and Prevention [CDC], 2007). Analytic validity refers to a laboratory test's ability to accurately and reliably measure the properties or characteristics it is intended to measure (eg, the presence of a gene mutation). Analytic validity is a function of many factors (Chen et al., 2009; SACGHS, 2008; Sun et al., 2010), including the following:
* Analytic accuracy: The closeness of agreement between a test result and true value of what is being measured. (International Organization for Standardization, 2007)
* Precision: The closeness of agreement between independent results of measurements obtained under stipulated conditions. (International Organization for Standardization, 2004)
* Analytic sensitivity: A measure describing how effectively a test can detect all true-positive specimens, as determined by a reference method. (SACGHS, 2008)
* Analytic specificity: The ability of a measurement procedure to measure solely the analyte of interest. (International Organization for Standardization, 2007)
* Reportable range: The span of test result values over which the laboratory can establish or verify the accuracy of the instrument or test system measurement response. (CDC, 2008)
* Reference range: The range of test values expected for a designated population of persons. (CDC, 2008)
Clinical validity (also known as diagnostic accuracy) refers to a test's accuracy in predicting the presence or absence of a clinical condition or predisposition. Clinical validity is usually described in terms of clinical sensitivity (ie, the probability of a positive test result when disease is present), clinical specificity (ie, the probability of a negative test result when disease is absent), and positive and negative predictive values (SACGHS, 2008; Sun et al., 2010).
Clinical utility refers to the test's usefulness and the value of the information to medical practice. Clinical utility reflects a balance between health-related benefits and the harms that can ensue from using the information a test provides. The potential benefits and harms of a genetic test should be compared with the standard-of-care test to assess incremental benefits and harms. Benefits and harms should be considered at multiple levels, including the patient, family, health care organizations, and society (SACGHS, 2008). Each level may have a different perspective of risk, which will ultimately affect a test's acceptance into routine clinical practice. This article's primary focus is to evaluate the utility of genetic testing. However, as discussed in the following section of this article, the evaluation of the utility cannot be isolated from the evaluation of the test's analytic validity and clinical validity. Therefore, we need to discuss the evaluation of analytic validity and clinical validity as well.
MAIN APPROACHES TO EVALUATING GENETIC TESTS
The general principles for genetic test evaluation are similar to those for other medical tests. However, differences exist in how the principles need to be applied and the relevance of certain issues. Historically, many important concepts related to the evaluation of medical tests, such as sensitivity and specificity (Yerushalmy, 1947) and the “receiver operating characteristics” curve (Swets & Pickett, 1982), were initially developed for imaging tests. Authors in this field were also among the first to propose conceptual frameworks for evaluating the performance of the tests (Fryback & Thornbury, 1991; Gatsonis, 2000; Guyatt et al., 1986; Loop & Lusted, 1978). These frameworks clarified the scope of the evaluation and the types of evidence required for addressing various issues regarding the tests' safety, clinical utility, and other impacts. Some evaluation frameworks (often referred to as analytic frameworks) also provide additional detail on the key questions (eg, relevant populations, interventions, comparators, outcomes, time points, and settings) and depict the evaluation process graphically.
Arguably the best-known evaluation framework for imaging tests is the hierarchical model proposed by Fryback and Thornbury (1991). This model synthesized the previous work by other authors in the field (Guyatt et al., 1986; Loop & Lusted, 1978) and described 6 levels of testing impacts that might need to be evaluated:
* Level 1: Technical efficacy (Does the test measure what it purports to measure?)
* Level 2: Diagnostic accuracy efficacy (What are the medical test characteristics of the test [eg, sensitivity, specificity]?)
* Level 3: Diagnostic thinking efficacy (Does the medical test help clinicians come to a diagnosis?)
* Level 4: Therapeutic efficacy (Does the medical test aid in planning treatment or change planned treatments?)
* Level 5: Patient outcome efficacy (Do patients who undergo this medical test fare better than similar patients who are not tested?)
* Level 6: Societal efficacy (cost-benefit and cost-effectiveness)
This model suggests that the lower levels in the hierarchy should be verified before the higher levels. This model, although originally developed for imaging tests, has been widely applied to other medical tests, including genetic tests (Myers et al., 2006).
The US Preventive Services Task Force (USPSTF, 2008) proposed another influential evaluation framework for medical tests. The mission of this Agency for Healthcare Research and Quality (AHRQ)-sponsored task force is to assess the evidence for clinical preventive services to be delivered in the primary care setting. The evaluation framework it proposed specifically addresses screening tests for disease prevention. This USPSTF evaluation approach has a strong preference to using data on “health outcomes,” which are defined as symptoms and conditions that patients can feel or experience, such as visual impairment, pain, dyspnea, impaired functional status or quality of life, and death (Harris et al., 2001). This approach contrasts health outcomes with “intermediate outcomes,” such as pathologic or physiologic measures, which patients cannot directly perceive (Harris et al., 2001). The USPSTF evaluation approach also emphasizes the balance between potential benefits and harms when evaluating health outcomes (Sun et al., 2011).
The CDC developed another widely used evaluation framework for genetic tests. The agency's National Office of Public Health Genomics worked with the Foundation for Blood Research beginning in 2000 to develop the ACCE model (analytic validity; clinical validity; clinical utility; and ethical, legal, and social implications) (CDC, 2007). The model's purpose is assembling, analyzing, disseminating, and updating existing data on the safety and effectiveness of DNA-based genetic tests and testing algorithms. Under the model, a total of 44 questions are specified for the evaluation of DNA-based tests. These questions address all 4 ACCE aspects. For clinical utility, the ACCE model specifically suggests consideration of the following elements: (1) the natural history of the disorder, (2) availability and effectiveness of interventions, (3) potential adverse outcomes of the test, and (4) available resources (education and expertise) to manage all aspects of service. The ACCE concept has been widely accepted for evaluating genetic tests as well as other medical tests. However, although the ACCE approach is generally straightforward, addressing a set of 44 questions may appear too cumbersome and thus is not practical for many stakeholders.
The ACCE project was discontinued and replaced in 2004 by another CDC-funded initiative, the Evaluation of Genomic Applications in Practice and Prevention (EGAPP) project. This project focused on the review and synthesis of genomic applications to facilitate translation and dissemination into practice (Teutsch et al., 2009). The EGAPP Working Group, which was established in 2005 to make recommendations based on EGAPP-sponsored reviews, developed a set of evaluation frameworks for different testing purposes (eg, pharmacogenetics, diagnosis of disease, risk assessment for a heritable condition; EGAPP Working Group, 2005; Teutsch et al., 2009). Since the project began in 2004, the EGAPP frameworks have been used in several evidence reports on genetic testing topics (Bonis et al., 2007; Bradley et al., 2009; Matcher et al., 2007; Palomaki et al., 2009; Segal et al., 2009). The EGAPP frameworks incorporated the ACCE concepts and some of the components of the Fryback-Thornbury model (eg, asking whether the use of the test has impact on clinical decision making). Similar to the USPSTF evaluation model for screening topics, the EGAPP frameworks attempted to build an explicit link among testing, intermediate outcomes, and health outcomes.
Figure 1 is a graphical summary and comparison of the 4 aforementioned evaluation frameworks. This comparison suggested that these 4 commonly used frameworks cover all 3 domains of evaluation: analytic validity, clinical validity, and clinical utility. The ACCE and the Fryback-Thornbury model also cover another domain of evaluation: societal impact of the test.
On the basis of this comparison and a systematic literature review, the ECRI Institute prepared a methodology report for the AHRQ Effective Health Care Program (Sun et al., 2011). In this report, AHRQ/ECRI Institute proposed a set of evaluation frameworks for 7 testing scenarios. The AHRQ/ECRI Institute frameworks expand on the EGAPP frameworks but cover some testing scenarios that the EGAPP frameworks did not address (eg, treatment monitoring, prenatal screening, susceptibility assessment involving detection of germline/heritable mutations). In addition, the AHRQ/ECRI Institute frameworks shed more light on the comparison between the index test and the current standard-of-care testing strategy, as well as on the balance between potential benefits and harms of the testing.
From individual patients' perspectives, the impact of a genetic test on health outcomes (ie, clinical utility) is typically the ultimate interest of evaluation, although, for society as a whole, the ethical, legal, and social implications of testing might also need to be evaluated. Clinical utility studies that directly correlate health outcomes with a clinical test are often unavailable (SACGHS, 2008; Sun et al., 2010). As a result, analytic validity, clinical validity, and potential impacts of the testing on medical decision making will, in most cases, need to be evaluated to establish a chain of evidence to evaluate clinical utility indirectly (Grosse & Khoury, 2006; Teutsch et al., 2009).
There appears to be a hierarchy of evidence among analytic validity, clinical validity, and clinical utility (see domains 1, 2, and 3 in Figure 1). That is, if a test's analytic validity is poor, the clinical validity will inevitably be poor; subsequently, the clinical utility will also be poor. When clinical utility studies (eg, randomized controlled trials [RCTs] that correlate patient outcomes with testing) are missing, the evaluation of analytic or clinical validity studies could help establish an indirect chain of evidence supporting the test's potential utility. Even when clinical utility studies are available, the evaluation of analytic or clinical validity might still be needed. For example, when the number of clinical utility studies is small or the findings of the studies are contradictory, the evaluation of analytic and clinical validity could be helpful in reducing uncertainty about the conclusions.
Regardless of the potential hierarchical relationship between analytic validity and clinical validity, both types of validity need to be evaluated if studies are available for them. Analytic validity studies evaluate a broad range of testing performance aspects. Some of these aspects, such as testing repeatability and reproducibility, are typically not evaluated in diagnostic accuracy (clinical validity) studies but may have a significant implication about how well the test performs in real-world laboratory settings (ie, the generalizability or applicability of the evidence). For example, if data suggest that a test's reproducibility is poor, the test may perform poorly in predicting the clinical condition in the real-world setting, although landmark clinical validity studies conducted in a single institution yielded a high diagnostic accuracy.
In the next section, we further discuss how to use the frameworks as general conceptual guidance to address key issues for the evaluation. The AHRQ/ECRI Institute frameworks cover 7 testing scenarios: diagnosis in symptomatic patients, disease screening in asymptomatic patients, prognosis assessment, treatment monitoring, pharmacogenetics, risk/susceptibility assessment, and germline mutation–related testing scenarios. For each scenario, a framework graphically depicts the relationship between the population, the test under consideration, subsequent interventions, and outcomes (including intermediate outcomes, patient outcomes, and potential harms). Each framework also includes a set of research questions that need to be addressed for the evaluation. Given the limited space for this article, we select 1 testing scenario for the discussion. The frameworks for other testing scenarios are available in the AHRQ methodology report (Sun et al., 2011).
ADDRESSING KEY ISSUES FOR THE EVALUATION: A SAMPLE CASE
To demonstrate how some of the key issues are addressed under the evaluation frameworks, we use 1 genetic test, ERBB2 testing with FISH assays for guiding trastuzumab treatment in patients with breast cancer, as a sample case. The ERBB2 (also referred to as the human epidermal growth factor receptor-2 or HER2/neu) gene is located at position 17q12 on chromosome 17. ERBB2 is amplified and overexpressed in approximately 18% to 20% of breast cancer cases, which is associated with poor prognosis (Hanna et al., 2007; Wolff et al., 2007). Laboratory testing of the ERBB2 gene and protein in tumor tissue has been used to determine the ERBB2 status of patients with breast cancer. One AHRQ evidence report, “HER2 Testing to Manage Patients With Breast Cancer or Other Solid Tumors,” has provided a good overview of the test(s) (Seidenfeld et al., 2009).
When evaluating a genetic test, the evaluator should first explicitly define the patient population for whom the test is intended to apply. For example, ERBB2 testing has been used to manage patients with breast, ovarian, lung, prostate, or head and neck tumors. For the sample testing case used in this section, the population of interest is patients who had received a diagnosis of breast cancer.
At the beginning of the evaluation process, the testing purpose (eg, diagnosis, prognosis, screening, even multiple purposes) and the testing techniques used should also be defined explicitly. Testing performed on the same gene variants can sometimes be used for multiple clinical purposes. For example, ERBB2 testing has been used to guide trastuzumab treatment targeting the ERBB2 molecule, to guide selection of breast cancer treatments other than trastuzumab (ie, chemotherapy regimen or hormonal therapy regimen), and to monitor treatment response or disease progression in patients. Meanwhile, several types of assays (eg, FISH, immunohistochemistry) have been used to analyze ERBB2 status in tumor tissues (Seidenfeld et al., 2009). For the sample case in this section, the relevant testing purpose is guiding trastuzumab treatment targeting the ERBB2 molecule and the testing technique used is the FISH assay.
After these parameters are clearly defined, a set of key questions for the evaluation can be generated using the evaluation framework for genetic tests for treatment selection (Figure 2):
1. Does ERBB2 testing with FISH assays lead to improved health outcomes in patients with breast cancer compared with the use of other tests or no testing at all?This is an overarching question about whether the use of the test will lead to an incremental change in health outcomes among the patients being tested compared with using standard-of-care testing or no testing.
2. Do FISH assays for testing ERBB2 have analytic validity?This question addresses issues such as analytic accuracy, analytic sensitivity, analytic specificity, precision, reproducibility, and robustness of the test.
3. How accurate are the FISH assays for predicting patients' response to trastuzumab?
3a. How well do the FISH assays predict trastuzumab efficacy?
3b. How well do the FISH assays predict trastuzumab-related adverse reactions?
4. Does ERBB2 testing with FISH assays influence treatment decisions by patients and providers?Addressing this question is particularly helpful when evidence on health or intermediate outcomes is not available. Tests whose results have no impact on decision making by clinicians or patients will certainly not lead to any changes—positive or negative—in health outcomes.
5. Do personalized treatment strategies based on ERBB2 testing results lead to improved intermediate outcomes (eg, pathologic response, imaging response)?While health outcomes are what ultimately matter to patients, it could still be important to evaluate the testing's impact on intermediate outcomes, particularly when direct evidence on health outcomes is not available.
6. Do personalized treatment strategies based on ERBB2 testing results lead to improved health outcomes?
7. Are there harms associated with FISH assays for testing ERBB2? Does the testing cause more harms than alternative testing strategies?
8. Are there harms associated with the personalized treatment strategy that is based on ERBB2?
Potential harms that might be caused by the testing or the subsequent treatments based on the testing results need to be evaluated. While these potential harms could be reflected by incremental health outcomes (eg, mortality, quality of life), it is still important to ask these questions separately, particularly when evidence on incremental health outcomes is not available for evaluation.
The goal of addressing this set of questions is to establish a “chain of evidence” previously described. The overarching question (question 1) attempts to establish a direct link between the ERBB2 testing and health outcomes. However, when this type of desirable evidence is not available or of low quality, the chain of evidence that has been established would become valuable in answering the questions regarding the test's clinical utility. When a chain of evidence is used for the evaluation, the strength of the evidence for each link in the chain, including the quantity (ie, number and size), quality (ie, internal validity), and consistency, as well as the generalizability of the studies, should be adequately assessed.
The testing purpose of the sample case focuses on treatment selection. The evaluation process for other testing scenarios (eg, diagnosis in symptomatic patients, disease screening in asymptomatic patients, prognosis assessment, treatment monitoring, risk/susceptibility assessment) is generally similar. But for germline mutation–related testing scenarios, the evaluation process could be more complicated (Sun et al., 2011). For this type of testing scenario, the evaluator may need to evaluate the impact of the test on both probands and their relatives.
CHALLENGES IN ASSESSING CLINICAL UTILITY OF GENETIC TESTS
Although the evaluation frameworks provide useful conceptual guidance for assessing clinical utility of genetic tests, a variety of challenges remain. First, RCTs or other types of studies that directly correlate genetic testing with clinically important outcomes will continue to be lacking in quantity or quality. Randomized controlled trials, particularly effectiveness or pragmatic RCTs, provide the most relevant evidence for the clinical utility of genetic tests but are difficult to design or implement for the testing topics (Grosse & Khoury, 2006; SACGHS, 2008; Sun et al., 2010, Teutsch et al., 2009). Effectiveness RCTs involve large sample sizes, broad inclusion criteria, and modest data collection and provide estimates of effectiveness in typical care settings (SACGHS, 2008). Other study designs, such as case series (single-group designs), are prone to more internal validity compromises. For most genetic testing topics, the evaluation of clinical utility is likely to depend on the establishment of a chain of evidence that involves evaluating the test's analytic and clinical validity, as well as its impact on diagnostic thinking (ie, the value of information in understanding the diagnosis, cause, and prognosis of a condition) and therapeutic choice (ie, the use of test results in clinical management of an individual with a diagnosed disorder) (Grosse & Khoury, 2006; Sun et al., 2011; Teutsch et al., 2009).
Second, the evaluation of analytic and clinical validity itself is also challenging. Many technical problems (eg, flawed probe, primer, or array design) may occur in complex genetic testing processes and could affect the tests' analytic performance (Chen et al., 2009; Sun et al., 2010). Validation of the tests' analytic validity is challenging due to many technical hurdles such as lack of gold standard reference methods and difficult-to-obtain testing samples (SACGHS, 2008; Sun et al., 2010, 2011). To overcome these hurdles, a higher level of collaboration among the research community, professional societies, and test developers would be required in efforts such as increasing the availability of appropriately validated samples for test validation, developing effective reference methods, and building sample-splitting or sample-sharing programs (SACGHS, 2008).
Third, obtaining test validation information that already exists could also be challenging. This type of information may be scattered in different places, such as clinical laboratories and proficiency testing programs. Data generated by laboratories for validating analytic and clinical performance of LDTs are not publicly accessible unless published in peer-reviewed journals. Searches of the published medical literature yielded limited to no information on analytic validity of the tests being evaluated (Sun et al., 2010, 2011). Locating unpublished data is commonly necessary but can be difficult and time-consuming. In the long run, stakeholders in the field of genetic testing need to work together to establish more effective mechanisms to make the data accessible to any parties who need them. These mechanisms may include enforcing regulatory mandates for release of data, creating a database of de-identified validation data by an independent accrediting organization such as the College of American Pathologists, or providing incentives for laboratories to release what is generally considered proprietary analytic validity data.
Fourth, when data on analytic validity are identified, no widely accepted guidance is available for assessing their quality (Sun et al., 2011). As a result, it is difficult to judge whether the data identified meet the minimum quality standards. Rating the overall strength of evidence across multiple studies could be even more challenging.
Furthermore, in the absence of procedural or reimbursement codes for specific genetic tests, it is difficult to track practice patterns or to understand the impact of these tests on patient outcomes. Using a comprehensive coding system (eg, the Universal Medical Device Nomenclature System developed by ECRI Institute [https://www.ecri.org/Products/Pages/UMDNS.aspx] includes >500 codes for molecular diagnostic tests) to accurately identify the genetic tests performed on patients is essential for effectively assessing and monitoring the tests' clinical utility, safety, or other impacts. In November 2011, the American Medical Association presented a new Molecular Pathology section in its 2012 Current Procedural Terminology (CPT) code set to address coding issues with molecular pathology tests that a CPT Workgroup identified in October 2009 (Synovec & Myles, 2011). In 2012, 2 tiers of molecular pathology codes will be in place: tier 1 for the most common services (ie, gene-specific and genomic procedures), and tier 2 for less common services. Tier 1 services will have a single service-specific CPT code. Tier 2 services will have tests categorized under a single complexity-level code with similar resources to perform, analyze, and interpret results. The new section describes 92 tier 1 codes and 9 levels of complexity for tier 2 codes. However, we have some concerns about this approach. For many molecular tests, particularly those falling into the tier 2 category, the new codes are still not specific enough to allow payers to identify which test had been performed on the patient.
Currently, the vast majority of the clinically available genetic tests are LDTs. Laboratories that perform LDTs of moderate or high technical complexity (including most genetic tests) are regulated under CLIA and are required to establish the tests' analytic validity. However, because of the reasons previously discussed, establishing analytic validity is not an easy task. In addition, there are concerns about whether the clinical validity and utility of LDTs have been addressed adequately, or at all, under CLIA (Javitt & Hudson, 2006; SACGHS, 2008). The Clinical Laboratory Improvement Amendments of 1988 requires laboratory directors and clinical consultants to ensure the clinical relevance of the tests being performed. However, how these directors and consultants establish the clinical relevance (eg, what types of data were used, where the data came from, how the data were synthesized) is rarely revealed to the public.
The current oversight status of LDTs has generated concern among the public and medical community alike about the quality of LDTs. The complex nature of genetic testing further intensifies such concerns. Over the past decade, various efforts have been initiated to address the concerns. These efforts include the aforementioned CDC's ACCE and EGAPP projects, many recommendations made either by SACGHS or by the Clinical Laboratory Improvement Advisory Committee, and the publication of a series of AHRQ evidence or methodology reports on genetic testing topics. In 2010, the FDA also signaled that it might start to regulate certain types of genetic LDTs. However, none of these efforts can substitute for the due diligence by stakeholders (eg, patients, clinicians, payers) or their technology assessment agents (eg, evidence-based practice centers, other technology assessment groups) in the assessment of the safety, clinical utility, and other impacts of genetic tests. We hope that the general approach to evaluating genetic tests that we have introduced in this article helps those stakeholders in that endeavor.
Bonis P. A., Trikalinos T. A., Chung M., Chew P., Ip S., DeVine D., Lau J. (2007). Hereditary nonpolyposis colorectal cancer: Diagnostic strategies and their implications (Evidence Report/Technology Assessment No. 150). Rockville, MD: Agency for Healthcare Research and Quality.
Bradley L. A., Palomaki G. E., Dotson W. D. (2009). Can UGT1A1 genotyping reduce morbidity and mortality in patients with metastatic colorectal cancer treated with Irinotecan? Atlanta, GA: Evaluation of Genomic Applications in Practice and Prevention. Retrieved December 26, 2011, from http://www.egappreviews.org/docs/topics_colorectal.pdf
Centers for Disease Control and Prevention. (2008). Current CLIA regulations (including all changes through 1/24/2004). Atlanta, GA: Centers for Disease Control and Prevention.
Chen B., Gagnon M., Shahangian S., Anderson N. L., Howerton D. A., Boone J. D. (2009). Good laboratory practices for molecular genetic testing for heritable diseases and conditions. MMWR Mortality and Morbidity Weekly Reports, 58(RR-6), 1–37.
Evaluation of Genomic Applications in Practice and Prevention Working Group. (2005). Draft evaluation frameworks for genetic tests
[PowerPoint slideshow, pp. 32]. Atlanta, GA: Author.
Fryback D. G., Thornbury J. R. (1991). The efficacy of diagnostic imaging. Medical Decision Making, 11(2), 88–94.
Gatsonis C. (2000). Design of evaluations of imaging technologies: Development of a paradigm. Academic Radiology, 7(9), 681–683.
Grosse S. D., Khoury M. J. (2006). What is the clinical utility of genetic testing. Genetic Medicine, 8(7), 448–450.
Guyatt G. H., Tugwell P. X., Feeny D. H., Haynes R. B., Drummond M. (1986). A framework for clinical evaluation of diagnostic technologies. Canadian Medical Association Journal, 134(6), 587–594.
Hanna W., O'Malley F. P., Barnes P., Berendt R., Gaboury L., Magliocco A., Thomson T. (2007). Updated recommendations from the Canadian National Consensus Meeting on HER2/neu testing in breast cancer. Current Oncology, 14(4), 149–153.
Harris R. P., Helfand M., Woolf S. H., Lohr K. N., Mulrow C. D., Teutsch S. M., Atkins D; Methods Work Group, U.S. Preventive Services Task Force. (2001). Current methods of the U.S. Preventive Services Task Force: A review of the process. Rockville, MD: Agency for Healthcare Research and Quality.
International Organization for Standardization. (2004). ISO/IEC guide 2: Standardization and related activities—General vocabulary. Geneva, Switzerland: Author.
International Organization for Standardization. (2007). International vocabulary of basic and general terms in metrology (VIM). Geneva, Switzerland: Author.
Kutz G. (2006). Nutrigenetic testing: Tests purchased from four web sites mislead consumers. Statement of Gregory Kutz, Managing Director Forensic Audits and Special Investigations. Washington, DC: US Government Accountability Office. Retrieved December 23, 2011, from http://www.gao.gov/new.items/d06977t.pdf
Loop J. W., Lusted L. E. (1978). American College of Radiology diagnostic efficacy studies. AJR. American Journal of Roentgenology, 131(1), 173–179.
Matcher D. B., Thakur M. E., Grossman I., McCrory D. C., Orlando L. A., Steffens D. C., Gray R. N. (2007). Testing for cytochrome P450 polymorphisms in adults with non-psychotic depression treated with selective serotonin reuptake inhibitors (SSRIs). (AHRQ Publication No. 07-E002). Rockville, MD: Agency for Healthcare Research and Quality.
Myers E. R., Havrilesky L. J., Kulasingam S. L., Sanders G. D., Cline K. E., Gray R. N., McCrory D. C. (2006). Genomic tests for ovarian cancer detection and management (Evidence Report/Technology Assessment No. 145). Rockville, MD: Agency for Healthcare Research and Quality.
Palomaki G. E., Bradley L. A., Douglas M. P., Kolor K., Dotson D. (2009). Can UGT1A1 genotyping reduce morbidity and mortality in patients with metastatic colorectal cancer treated with irinotecan? An evidence-based review. Genetics in Medicine, 11(1), 21–34.
Segal J. B., Brotman D. J., Emadi A., Necochea A. J., Samal L., Wilson L. M., Bass E. B. (2009). Outcomes of genetic testing in adults with a history of venous thromboembolism
(Evidence Report/Technology Assessment No. 180). Rockville, MD: Agency for Healthcare Research and Quality.
Seidenfeld J., Samson D. J., Rothenberg B. M., Bonnell C. J., Ziegler K. M., Aronson N. (2009) HER2 testing to manage patients with breast cancer or other solid tumors (Evidence Report/Technology Assessment No. 172). Rockville, MD: Agency for Healthcare Quality and Research.
Sun F., Bruening W., Uhl S., Ballard R., Tipton K., Schoelles K. (2010). Quality, regulation and clinical utility of laboratory-developed molecular tests, technology assessment report (prepared by ECRI Institute Evidence-based Practice Center under Contract No. 290-2007-1063-I). Rockville, MD: Agency for Healthcare Research and Quality. Retrieved from http://www.cms.gov/determinationprocess/downloads/id72TA.pdf
Swets J. A., Pickett R. M. (1982) Evaluation of diagnostics systems—Methods from signal detection theory. New York, NY: Academic Press.
Teutsch S. M., Bradley L. A., Palomaki G. E., Haddow J. E., Piper M., Calonge N., Berg A. O. (2009). The Evaluation of Genomic Applications in Practice and Prevention (EGAPP) initiative: methods of the EGAPP Working Group. Genetics in Medicine, 11(1), 3–14.
U.S. Preventive Services Task Force. (2008). U.S. Preventive Services Task Force procedure manual. Rockville, MD: US Department of Health and Human Services, Public Health Service, Agency for Healthcare Research and Quality.
Wolff A. C., Hammond M. E., Schwartz J. N., Hagerty K. L., Allred D. C., Cote R. J., Hayes D. F. (2007). American Society of Clinical Oncology/College of American Pathologists guideline recommendations for human epidermal growth factor receptor 2 testing in breast cancer. Journal of Clinical Oncology, 25(1), 118–145.
Yerushalmy J. (1947). Statistical problems in assessing methods of medical diagnosis, with special reference to x-ray techniques. Public Health Reports, 62(39), 1432–1449.
analytic framework; analytic validity; clinical utility; clinical validity; evaluation framework; genetic test