Objective: This study aims to harness the potential of public gene expression repositories, to develop gene expression profiles that could accurately determine nodal status in colorectal cancer.
Background: Currently, techniques that determine lymph node positivity (before resection) have poor sensitivity and specificity. The ability to determine lymph node status, based on preoperative biopsies, would greatly assist in planning treatment in colorectal cancer. This is particularly relevant in polyp-detected cancers.
Methods: Public gene expression repositories were screened for experiments comparing metastatic and nonmetastatic colorectal cancer. A customized graphic user interface was developed to extract genes dysregulated across most identified studies (ie, consensus profiles). The utility of consensus profiles was tested by determining whether classifiers could be derived that determined nodal positivity or negativity. Consensus profiles-derived classifiers were tested on separate Affymetrix- and Illumina-based experiments, and collated outputs were compiled in summary receiver operator curve characteristic format, with area under the curve (AUC) reflecting accuracy. The association between classification and oncologic outcome was determined using an additional, independent data set. Final validation was conducted using the Ingenuity network-linkage environment.
Results: Four consensus profiles were generated from which classifiers were derived that accurately determined node positive and negative status (pooled AUC were 0.79 ± 0.04 and 0.8 ± 0.03 for nodal positivity and negativity, respectively). Overall AUC ranged from 0.73 to 0.86, demonstrating high accuracy across consensus profile type, classification technique, and array platform used. As consensus profile enabled classification of nodal status, survival outcomes could be compared for those predicted node negative or positive. Patterns of disease-free and overall survival were identical to those observed for standard histopathologic nodal status. Genes contained within consensus profiles were strongly linked to the metastatic process and included (among others) FYN, WNT5A, COL8A1, BMP, and SMAD family members.
Conclusions: Microarray expression data available in public gene expression repositories can be harnessed to generate consensus profiles. The latter are a source of classifiers that have prognostic and predictive properties.