Journal Logo

Research Article: Quality Improvement Study

Identification of potential gene drivers of cutaneous squamous cell carcinoma

Analysis of microarray data

Zheng, Yi MDa; Chi, Sumin MDb; Li, Chengxin MDa,∗

Editor(s): Alaibac., Mauro

Author Information
doi: 10.1097/MD.0000000000022257
  • Open


1 Introduction

Cutaneous squamous cell carcinoma (cSCC) is the second most frequent skin cancer, accounting for approximately 20% of all skin cancers.[1,2] The morbidity of cSCC is steadily increasing, posing a significant threat to public health.[3] The most significant risk factor for cSCC is actinic keratosis (AK), a precancerous lesion developed from the damage effects of chronical ultraviolet radiation.[4,5] Although it is difficult to determine whether an AK will evolve into cSCC, up to 65% to 97% of cSCCs are reported to originate in lesions previously diagnosed as AKs.[6] Defining critical molecular biomarkers to help identify which AKs are more likely to progress to cSCC is an urgent research need, which will contribute to the prevention, early diagnosis, and effective treatment of cSCC.

It is reported that the karyotypic profile of AK is similar to that of cSCC, but the degree of complexity is reduced, representing an earlier stage of tumor formation development.[7] Multiple genetic abnormalities have been observed in cSCC and important roles of some genes contributing to the progression of AK to cSCC as TP53, CDKN2A, and NOTCH have been proven.[8–11] Up-regulation of matrix metalloproteinases (MMP) has been validated in many different types of tumors, and in particular the expression and production of MMP-7 proves to be enhanced in cSCC specifically.[12] In addition, overexpression of interleukin (IL-24) was found in cSCC lesions via promoting focal expression of MMP7.[13] Besides, the mitogen-activated protein kinase (MAPK) pathway is reported to be vital in the transition from AK to cSCC.[14] However, current knowledge about the underlying molecular pathogenesis is poorly characterized and the development of molecular biomarkers for early diagnosis, treatment, and prognosis prediction of tumor remain underexplored.

Deoxyribonucleic acid (DNA) microarray is now widely used in clinical research to identify differentially expressed genes (DEGs) between normal samples and tumor specimens. Secondary analysis of exiting DNA microarray data provides a comprehensive and responsible comparison of gene expression among different tissues.[15] In this study, we tried to investigate the possible molecular mechanism of progression from AK to cSCC and to identify the key genes associated with cSCC and thus potentially provide novel treatment targets for cSCC management. DEGs of cSCC samples compared with controlled AK samples were screened out to explore the molecular biological mechanisms of cSCC.

2 Methods

2.1 Microarray data source

Microarray data containing both cSCC samples and AK samples were retrieved in the Gene Expression Omnibus (GEO), a database for gene expression managed by the National Center of Biotechnology Information (NCBI) ( Two datasets with the accession number of GSE42677 and GSE45216 were selected for the following analysis.[16,17] The dataset GSE42677 included 10 cSCC samples and 5 AK samples and the platform employed was GPL571 (HG-U133A_2) Affymetrix Human Genome U133A 2.0 Array. The dataset GSE45216 contained 30 cSCC samples and 10 AK samples and the platform used was GPL570 (HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array. The data used in this study were downloaded from the openly available GEO dataset and ethical approval was waived.

2.2 Data pre-processing and identification of DEGs

The affy package (, version 1.60.0), which contained functions for exploratory oligonucleotide array analysis, was used to perform the background adjustment and normalization so as to ensure the comparability and integrity of data.[18] The Robust Multi-array Average (RMA) method was used for normalizing and calculating expression measures.[19] The limma package (, version 3.38.3) was used to identify the DEGs between cSCC and AK.[20] Genes with an absolute value of log 2 (fold change) greater than 1 and P < .05 were screened out as DEGs. To discard probes that did not match any gene symbol, probes were annotated with corresponding platform annotation file.

2.3 Functional enrichment analysis of DEGs

The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEEG) enrichment analysis of DEGs was performed using the online analysis tool the Database for Annotation Visualization and Integrated Discovery (DAVID) (, version 6.8), a database providing investigators tools to comprehend biological meaning behind a number of genes.[21] Given a gene list, the DAVID could identify enriched biological terms and detect enriched functional-related gene groups through standard accumulative hypergeometric statistical test. Significance threshold P < .05 together with the enriched gene number ≥2 were regarded as the threshold of significant enrichment analysis.

2.4 Protein network analysis

Protein networks provide a complementary means to dynamically identify protein groupings that are functionally relevant. Proteins connected within a protein-protein interaction (PPI) network are likely to collaborate to perform various related biological processes (BP). PPI networks were constructed with the Metascape ( and Cytoscape software (, version 3.7.3).[22,23] In Metascape, PPI enrichment analysis was carried out with databases BioGrid, InWeb IM, and OmniPath.[24] The Molecular Complex Detection (MCODE) plugin (, version 1.5.1) of Cytoscape was used to cluster the PPI network to recognize the most densely connected network neighborhoods, where such neighborhood components were more likely to associate with a particular complex or functional unit.[25] The threshold score was set at ≥10. Pathway and process enrichment analysis was performed and the best-scoring terms appraised by P value were regarded as the functional description.

2.5 Drug-gene interaction analysis

The Drug Gene Interaction Database (DGIdb, collects and integrates genomic resources and therapeutic resources to search genes against the existing compendia of known or potential drug-gene interactions.[26] In this study, DGIdb 3.0 was used to identify possible effective drugs based on the above obtained module genes. All the gene-drug interactions related were predicted.

3 Results

3.1 DEGs screening

Data processing was applied on the raw data by affy and limma packages in R software. According to the above-mentioned screening criteria, we identified 302 DEGs from GSE42677 dataset, containing 137 upregulated genes and 165 downregulated genes. At the same time, 619 DEGs were screened out from GSE45216 dataset, the number of genes upregulated was 194 while 425 genes were downregulated. Figure 1 showed the clustering heatmaps of the top 200 DEGs. To avoid loss of disease-associated DEGs, DEGs from two datasets were combined. After removal of duplicate genes and genes with self-contradictory expression trends, 711 genes altogether were included in the next analysis, of which 238 were upregulated and 473 were downregulated.

Figure 1:
Heatmaps of the top 200 DEGs. (A) Heatmaps of DEGs in GSE42677. (B) Heatmaps of DEGs in GSE45216. DEGs = differentially expressed genes.

3.2 Functional enrichment analysis of DEGs

The KEEG pathway enrichment analysis results demonstrated that upregulated genes were enriched in 10 pathways, for example, transcriptional misregulation in cancer, rheumatoid arthritis, focal adhesion, and amoebiasis. Downregulated genes were enriched in 18 pathways, like extracellular matrix (ECM)-receptor interaction, hematopoietic cell lineage, phosphatidylinositol 3-kinase (PI3K-Akt) signaling pathway, and focal adhesion. The KEEG pathway terms enriched by DEGs were shown in Table 1. In terms of GO BP, genes upregulated were enriched in 62 GO BP terms, while the downregulated genes were enriched in 55 terms. The bubble plot showed the most significant 20 GO BP terms (Fig. 2). The results indicated that upregulated genes mainly participated in BP such as extracellular matrix organization, cell proliferation, inflammatory response, and blood coagulation. At the same time, genes downregulated mainly involved in BP such as epidermis development, cell adhesion, sphingolipid biosynthetic process, and skin development.

Table 1:
The main KEEG pathways enriched by DEGs.
Figure 2:
The bubble plot of the 20 most remarkable GO terms of genes upregulated and genes downregulated. (A) The 20 most remarkable GO terms of genes upregulated. (B) The 20 most remarkable terms of downregulated genes. GO = gene ontology.

3.3 PPI network analysis

Figure 3A showed the PPI network analysis results, which contained 369 nodes and 862 edges. Nodes with higher topological scores were key nodes of PPI network. The key 10 gene nodes included JUN (also known as Jun proto-oncogene or activator protein 1 transcription factor subunit, upregulated, degree = 52), filamin A (FLNA, upregulated, degree = 34), androgen receptor (AR, downregulated, degree = 29), heat shock protein family H member 1 (HSPH1, downregulated, degree = 28), casein kinase 1 delta (CSNK1D, upregulated, degree = 27), tropomyosin 1 (TPM1, downregulated, degree = 26), pyruvate kinase, muscle (PKM, downregulated, degree = 24), histone cluster 1 H3 family member f (HIST1H3F, upregulated, degree = 23), LIM domain and actin binding 1 (LIMA1, downregulated, degree = 23), and synaptopodin (SYNPO, downregulated, Degree = 19). Pathway and process enrichment analysis showed that these genes mainly involved in transmembrane receptor protein tyrosine kinase signaling pathway, extracellular matrix organization, and regulation of cell adhesion. The MCODE algorithm were applied to identify densely connected network components and the MCODE networks identified for individual gene lists were gathered and shown in Figure 3B. The biggest 10 nodes of the PPI network were also hub genes in the MCODE networks.

Figure 3:
PPI network and the MCODE networks. (A) PPI network. (B) MCODE networks. The network contains proteins that form physical interactions with at least one other member. The densely connected network components were identified with the MCODE algorithm. Node size was decided by the degree, higher degree represented a larger node. MCODE = molecular complex detection, PPI = protein-protein interaction,.

3.4 Drug-gene interaction analysis

By searching the DGIdb for the module genes identified by the MCODE networks, 52 gene-drug interactions were obtained, containing 13 genes (4 upregulated genes: NOTCH1, CXCL8, JUN, CYP27B1; 9 downregulated genes: CYP3A5, AR, MAP2, EDNRB, CYP7B1, LRRK2, PTGER3, SAA1, MMP7) and 44 drugs (preset filters: FDA approved and antineoplastic). The drug-gene interaction results were shown in Figure 4.

Figure 4:
Drug-gene interaction diagram, yellow squares represents the DEGs and blue squares represents the drugs. DEG = differentially expressed gene.

4 Discussion

cSCC usually arises from interfollicular epidermal keratinocytes and are characterized by different degrees of keratosis.[27] In the year 2015, up to 2.2 million people were troubled with cSCC.[28] Although prognosis of cSCC is generally favorable, its five-year survival rate reduces to only 34% when invasion and distant metastasis occurs.[1,29] Until now, the biomarker study of cSCC was insufficient and the identification of biomarkers indicating progression from AK to cSCC was significant and urgent, which might be beneficial in the prevention, earlier diagnosis and prognostic management of cSCC.

In this study, 53 samples, 40 cSCC samples and 13 AK samples from datasets GSE42677 and GSE45216, were used for bioinformatics analysis, aiming to explore the potential molecular mechanism of cSCC. Finally, 238 upregulated genes and 473 downregulated genes were identified. All the 711 DEGs were subjected to following gene annotation, enrichment analysis and PPI analysis. Upregulated genes JUN, FLNA, CSNK1D, and HIST1H3F, and down regulated genes AR, HSPH1, TPM1, PKM, LIMA1, and SYNPO were identified as the potential target genes. JUN, FLNA, AR, HSPH1, and CSNK1D were the genes most differentially expressed in cSCC and AK.

JUN, otherwise known as activator protein 1 transcription factor subunit, is a transcription factor that regulates gene expression to respond to a variety of stimuli, such as cytokines, stress, and viral or bacterial infections.[30] JUN governs many cellular processes like cell proliferation, cell differentiation, and apoptosis.[31] It is overexpressed in a variety types of human tumors, including cSCC.[32] The activated state of JUN in answer to extracellular signals can be modified under conditions in which the balance of keratinocyte proliferation and differentiation is rapidly changed.[33] Therefore, JUN was investigated as a possible target for cancer prevention and treatment.[34] As an actin-binding protein encoded by the FLNA gene, FLNA crosslinks actin filaments and attaches actin filaments to membrane glycoproteins. FLNA not only participates in remodeling the cytoskeleton to influence migration and cell shape, but also interacts with integrins and second messengers.[35] It is reported that regulation of epidermal growth factor dependent inactivation of α5β1 integrin is by FLNA phosphorylation and cellular contractility.[36] As a member of the casein kinase I gene family, CSNK1D mainly participates in the control of cytoplasmic and nuclear processes, including DNA replication and repair.[37] The upregulation of CSNK1D has been verified in squamous carcinoma.[38]

The AR has 3 major functional domains: the N-terminal domain, the androgen binding domain, and the DNA binding domain.[39] To act as a DNA-binding transcription factor and regulate gene expression is the main function of the AR. One of the known AR activation target genes is the insulin-like growth factor receptor (IGFR).[40] HSPH1, also known as HSP105, acts as a nucleotide exchange factor for the molecular chaperone heat shock cognate 71 kDa protein (Hsc70).[41] Furthermore, HSPH1 plays a unique but related role as a holdase that inhibits the aggregation of misfolded proteins, including the cystic fibrosis transmembrane conductance regulator (CFTR) protein.[42] However, the specific role of HSPH1 in the genesis and invasion of cSCC was still undefined, which warrants more studies. Interestingly, JUN and AR were also of significance in drug-gene interaction network.

Altogether, our study indicates that genes JUN, FLNA, AR, and HSPH1 may play vital roles in the occurrence and development of cSCC and may be used as potential biomarkers of cSCC, indicating an application in the improvement of prognostic tools and treatments of cSCC. However, it should be noted that the lack of experimental verifications of identified genes is a limitation of this study, which will be focused in future study.

5 Conclusion

Our study identified several candidate genes which may be closely related to the occurrence and progression of cSCC from AK via comprehensive analysis of microarray data. These genes may function as the biomarkers of cSCC and will contribute to the development of novel targeted therapies for cSCC.

Author contributions

Conceptualization: Yi Zheng, Chengxin Li.

Data curation: Yi Zheng, Sumin Chi.

Formal analysis: Yi Zheng, Sumin Chi.

Funding acquisition: Chengxin Li.

Investigation: Chengxin Li.

Methodology: Yi Zheng, Sumin Chi.

Project administration: Chengxin Li.

Resources: Chengxin Li.

Software: Yi Zheng, Sumin Chi.

Supervision: Chengxin Li.

Validation: Sumin Chi, Chengxin Li.

Writing – original draft: Yi Zheng.

Writing – review & editing: Yi Zheng, Sumin Chi, Chengxin Li.


[1]. Rogers HW, Weinstock MA, Feldman SR, et al. Incidence estimate of nonmelanoma skin cancer (keratinocyte carcinomas) in the U.S. population, 2012. JAMA Dermatol 2015;151:1081–6.
[2]. Waldman A, Schmults C. Cutaneous squamous cell carcinoma. Hematol Oncol Clin North Am 2019;33:1–2.
[3]. Karia P. Cutaneous squamous cell carcinoma: estimated incidence of disease, nodal metastasis, and deaths from disease in the United States, 2012. J Am Acad Dermatol 2013;68:957–66.
[4]. Brantsch K. Analysis of risk factors determining prognosis of cutaneous squa- mous cell carcinoma: a prospective study. Lancet Oncol 2008;9:713–20.
[5]. de Berker D, McGregor JM, Hughes BR. British Association of Dermatologists Therapy Guidelines and Audit Subcommittee. Guidelines for the management of actinic keratoses. Br J Dermatol 2007;156:222–30.
[6]. Rosen T, Lebwohl MG. Prevalence and awareness of actinic keratosis: barriers and opportunities. J Am Acad Dermatol 2013;68: (1 Suppl 1): S2–9.
[7]. Ashton KJ, Weinstein SR, Maguire DJ, et al. Chromosomal aberrations in squamous cell carcinoma and solar keratoses revealed by comparative genomic hybridization. Arch Dermatol 2003;139:876–82.
[8]. Jonason AS, Kunala S, Price GJ, et al. Frequent clones of p53-mutated keratinocytes in normal human skin. Proc Natl Acad Sci U S A 1996;93:14025–9.
[9]. Durinck S, Ho C, Wang NJ, et al. Temporal dissection of tumorigenesis in primary cancers. Cancer Discov 2011;1:137–43.
[10]. Brown VL, Harwood CA, Crook T, et al. p16INK4a and p14ARF tumor suppressor genes are commonly inactivated in cutaneous squamous cell carcinoma. J Invest Dermatol 2004;122:1284–92.
[11]. Wang NJ, Sanborn Z, Arnett KL, et al. Loss-of-function mutations in Notch receptors in cutaneous and lung squamous cell carcinoma. Proc Natl Acad Sci U S A 2011;108:17761–6.
[12]. Kivisaari A, Kähäri VM. Squamous cell carcinoma of the skin: emerging need for novel biomarkers. World J Clin Oncol 2013;4:85–90.
[13]. Mitsui H, Suárez-Fariñas M, Gulati N, et al. Gene expression profiling of the leading edge of cutaneous squamous cell carcinoma: IL-24-driven MMP-7. J Invest Dermatol 2014;134:1418–27.
[14]. Pourreyron C, Chen M, McGrath JA, et al. High levels of type VII collagen expression in recessive dystrophic epidermolysis bullosa cutaneous squamous cell carcinoma keratinocytes increases PI3K and MAPK signalling, cell migration and invasion. Br J Dermatol 2014;170:1256–65.
[15]. Ideker T, Thorsson V, Siegel AF, et al. Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. J Comput Biol 2000;7:805–17.
[16]. Mitsui H, Suárez-Fariñas M, Gulati N, et al. Gene expression profiling of the leading edge of cutaneous squamous cell carcinoma: IL-24-driven MMP-7. J Invest Dermatol 2014;134:1418–27.
[17]. Lambert SR, Mladkova N, Gulati A, et al. Key differences identified between actinic keratosis and cutaneous squamous cell carcinoma by transcriptome profiling. Br J Cancer 2014;110:520–9.
[18]. Gautier L, Cope L, Bolstad BM, et al. affy-analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 2004;20:307–15.
[19]. Irizarry RA, Hobbs B, Collin F, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003;4:249–64.
[20]. Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43:e47.
[21]. Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protoc 2009;4:44–57.
[22]. Zhou Y, Zhou B, Pache L, et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun 2019;10:1523.
[23]. Stark C, Breitkreutz BJ, Reguly T, et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006;34:D535–9.
[24]. Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003;13:2498–504.
[25]. Saito R, Smoot ME, Ono K, et al. A travel guide to cytoscape plugins. Nat Methods 2012;9:1069–76.
[26]. Cotto KC, Wagner AH, Feng YY, et al. DGIdb 3.0: a redesign and expansion of the drug-gene interaction database. Nucleic Acids Res 2018;46:D1068–73.
[27]. Marks R. Squamous cell carcinoma. Lancet 1996;347:735–8.
[28]. Stratigos A, Garbe C, Lebbe C, et al. Diagnosis and treatment of invasive squamous cell carcinoma of the skin: European consensus-based interdisciplinary guideline. Eur J Cancer 2015;51:1989–2007.
[29]. Eisemann N, Waldmann A, Geller AC, et al. Non-melanoma skin cancer incidence and impact of skin cancer screening on incidence. J Invest Dermatol 2014;134:43–50.
[30]. Hess J, Angel P, Schorpp-Kistner M. AP-1 subunits: quarrel and harmony among siblings. J Cell Sci 2004;117(Pt 25):5965–73.
[31]. Ameyar M, Wisniewska M, Weitzman JB. A role for AP-1 in apoptosis: the case for and against. Biochimie 2003;85:747–52.
[32]. Kang MI, Baker AR, Dextras CR, et al. Targeting of noncanonical Wnt5a signaling by AP-1 blocker dominant-negative jun when it inhibits skin carcinogenesis. Genes Cancer 2012;3:37–50.
[33]. Angel P, Szabowski A, Schorpp-Kistner M. Function and regulation of AP-1 subunits in skin physiology and pathology. Oncogene 2001;20:2413–23.
[34]. Takkunen M, Hukkanen M, Liljeström M, et al. Podosome-like structures of non-invasive carcinoma cells are replaced in epithelial-mesenchymal transition by actin comet-embedded invadopodia. J Cell Mol Med 2010;14:1569–93.
[35]. Tewari D, Nabavi SF, Nabavi SM, et al. Targeting activator protein 1 signaling pathway by bioactive natural agents: possible therapeutic strategy for cancer prevention and intervention. Pharmacol Res 2018;128:366–75.
[36]. Vial D, McKeown-Longo PJ. Epidermal growth factor (EGF) regulates α5β1 integrin activation state in human cancer cell lines through the p90RSK-dependent phosphorylation of filamin A. J Biol Chem 2012;287:40371–80.
[37]. Rosenberg LH, Lafitte M, Quereda V, et al. Therapeutic targeting of casein kinase 1δ in breast cancer. Sci Transl Med 2015;7:318ra202.
[38]. Winters R, Naud S, Evans MF, et al. Ber-EP4, CK1, CK7 and CK14 are useful markers for basaloid squamous carcinoma: a study of 45 cases. Head Neck Pathol 2008;2:265–71.
[39]. Lu NZ, Wardell SE, Burnstein KL, et al. International Union of Pharmacology. LXV. The pharmacology and classification of the nuclear receptor superfamily: glucocorticoid, mineralocorticoid, progesterone, and androgen receptors. Pharmacol Rev 2006;58:782–97.
[40]. Pandini G, Mineo R, Frasca F, et al. Androgens up-regulate the insulin-like growth factor-I receptor in prostate cancer cells. Cancer Res 2005;65:1849–57.
[41]. Saito Y, Doi K, Yamagishi N, et al. Screening of Hsp105alpha-binding proteins using yeast and bacterial two-hybrid systems. Biochem Biophys Res Commun 2004;314:396–402.
[42]. Zuo D, Subjeck J, Wang XY. Unfolding the role of large heat shock proteins: new insights and therapeutic implications. Front Immunol 2016;7:75.

actinic keratosis; cutaneous squamous cell carcinoma; microarray analysis

Copyright © 2020 the Author(s). Published by Wolters Kluwer Health, Inc.