The approximately 80 complete genomes or exomes (the 1.2% of the genome that encodes proteins) published thus far are the leading edge of an avalanche of data expected in the next several years. The International Cancer Genome Consortium, which was launched in 2007, includes researchers from 11 countries. Together they aim to sequence 500 genomes from each of 50 different cancer types or subtypes, and all of the data will be made public.
The NIH-based Cancer Genome Atlas (TCGA) project, which is the US group in the international consortium, plans to sequence genomes from more than 20 tumor types over the next five years. TCGA has also established genome data analysis centers to provide preliminary analyses and develop new informatics tools.
In the pilot phase of TCGA, investigators aimed to sequence genomes from ovarian, brain (glioblastoma), and lung tumors. The initial sequencing effort covered a small portion of the exome, including 601 selected genes in the glioblastoma samples. However, with the cost of sequencing declining quickly, TCGA scientists have shifted their approach.
“Our strategy currently is to perform whole genome or whole exome sequencing,” said the lead scientist on TCGA's glioblastoma and melanoma projects, Lynda Chin, MD, Professor of Medical Oncology at Dana-Farber Cancer Institute. Approximately 10% of the samples will be whole genome sequences, and 90% will be exome.
“Everyone would prefer the whole genome if the cost were the same, but we also get much deeper reads on whole exome.” The proportion of samples with complete genome sequencing will likely increase as the cost of sequencing drops further, she noted.
Genome Data Analysis Centers
Another big change in TCGA's approach is the establishment of genome data analysis centers. Dr. Chin said that while the team was very good about making data available to other researchers quickly, that approach was not adequate. “It wasn't enough to just get [the glioblastoma] data out to the public. If we look back at what has been cited and what information has been used [from the glioblastoma data set] in subsequent studies, 99% of it is what was put into the publications, and those only contained a small fraction of all the data available,” she said.
She said that with hindsight, she thinks the discrepancy makes sense because most cancer researchers do not have the expertise to deal with large data sets or the computational background to analyze them. To compensate for that, TCGA has funded a network of seven data analysis centers. The centers will produce basic analyses of the data, organizing them into a format that non-computational biologists can understand, and developing new informatics tools that will be freely available to other investigators.
2nd-Generation Clustered Heat Map
In one such effort, John N. Weinstein, MD, PhD, Professor and Chair of Bioinformatics and Computational Biology at the University of Texas MD Anderson Cancer Center and colleagues are developing a “second-generation clustered heat map.”
Dr. Weinstein introduced the clustered heat map—which is frequently used to represent gene expression data from numerous different samples—when he was at the National Institutes of Health in the 1990s. Unlike the first-generation heat map that is a static image, the second-generation clustered heat map will allow users to point to any box and reveal the data behind it.
“One can zoom in and click on any data point to see underlying statistics, pathway relationships, and literature citations,” he explained.
The second-generation clustered heat map is only one of several projects Dr. Weinstein's team is working on as part of the TCGA data analysis center. And, like Dr. Chin, he thinks the need for user-friendly informatics tools and data analysis in cancer genomics is pressing. “For the first time, it is faster and easier to generate data than to analyze and interpret them biologically,” he said.
‘Promissory Notes for the Future’
Arul M. Chinnaiyan, MD, PhD, the S.P. Hicks Endowed Professor of Pathology and Urology at the University of Michigan Health System and a Howard Hughes Medical Institute Investigator, seconded that view. “We are now at the point where we can essentially crank through these tumors and understand the molecular differences that distinguish tumors and their various subtypes. The challenge moving forward is how we are going to handle the tsunami of data that is going to be generated.”
But even with these challenges ahead, experts remain optimistic that the cancer genome data will have an impact on patient care. “It is clear that we are already reaping the benefits of the molecular analysis of cancers, as reflected in the targeted therapies that we do have at the moment,” Dr. Weinstein said. “They may not be what we would like them to be....Nonetheless they make a difference, as well as being promissory notes for the future.”
As for assessing whether the community has achieved as much as one might have hoped when the complete human genome was announced in 2000, Dr. Weinstein said, “Everyone knew at the time that it was not quite complete, that there were holes and it was somewhat arbitrary when victory was declared, and that it was only the beginning of a march, rather than the end of one.”