In the past decade, proteomic profiling using mass spectrometry (MS) has undergone a remarkable transformation from a highly specialized (and largely impractical) subdiscipline to a much more accessible technique for the nonspecialist and clinical research scientist (1,10). The high-quality protein expression data generated by these advances have contributed greatly to the nascent field of systems biology-a new approach toward biological research that seeks to model and predict concurrent processes in the cell, tissue, or organism (1,13).
Although microarrays were initially greeted with great fanfare by the exercise physiology and health sciences community, the sheer volume and daunting complexity of messenger RNA (mRNA) expression data have resulted in a protracted postgenomics hangover. Protein identification and quantitative analysis using MS has traditionally lagged behind high-throughput mRNA profiling (complementary DNA and oligonucleotide microarray) techniques because of the highly technical nature of protein preparation, expense, and a lack of well-defined quality control and experimental design criteria (1,5,12,13). Unfortunately, this has limited the scope of past proteomic publications, which were produced in a handful of specialist laboratories (1,13). With the advent of new proteomic tools and their enabling technologies, physiologists are being lured back to proteins, which, after all, are the direct physical interface between genome and the environment.
In this review, we will summarize recent advances in MS-based proteomic profiling techniques and their use in interpreting physiological adaptations in skeletal muscle for the nonspecialist end user.
CLASSIC MASS SPECTROMETRY-BASED PROTEOMICS
The overarching goal of quantitative proteomics is to measure protein abundance between samples representing different biological and physiological states. In all methods, MS plays a crucial role for identification of proteins. In the past decade, the number of proteomics publications has increased considerably (Table 1) because of the increased availability of genome sequence data, accessibility, and affordability of MS equipment.
Proteomics still faces significant technical challenges presented by the complexity of cellular proteins (posttranslational modifications, alternate splicing isoforms), the vast dynamic range (106) of cellular protein abundance, and the small sample size of most clinical samples. However daunting these challenges may seem, similar problems encountered during microarray and DNA sequencing projects in the past were overcome by unforeseen technical advances (1).
Unlike microarray analysis, proteins cannot be identified in a simple one-step process. Instead, they must first be fractionated by a combination of differential detergent fractionation and two-dimensional gel electrophoresis (2DE). In addition, current instrumentation limitations to the mass window for most MS-based techniques require that proteins be digested into peptides before MS identification (Fig. 1) (6). Accordingly, the workflow for classic proteomic profiling (Fig. 1) starts with protein separation by 2DE, then each protein spot is quantified based on its staining intensity using densitometry coupled to pattern-matching software, which identifies differential intensity of the spot in two or more samples. This technique combines software-assisted pattern-matching algorithms with the skills of proteomics researchers and has been the standard method for quantitative proteomics for more than 20 yr (10). An example of this classic proteomics technique is our investigation of human rectus abdominus skeletal muscle from lean, obese, and morbidly obese women (Fig. 1) (6). In this study, cytosolic proteins were analyzed by 2DE, and differentially expressed proteins were identified by a combination of densitometry and software-guided spot analysis. These proteins were subsequently identified from their peptide mass fingerprint using matrix-assisted laser-desorption ionization MS (MALDI-MS) and computer-database matching using the Mascot search engine (http://www.matrixscience.com/search_form_select.html). Using these techniques, we were able to demonstrate the increased abundance of several glycolytic enzymes and the enzyme adenylate kinase 1 with the progression of obesity (6).
Typically, protein/peptide content does not correlate well with the measured MS signal, making mass spectrometers poor devices for quantitation (1). Most of the MS-based proteomics analyses are carried out at the peptide level, where the peptides are ionized and transferred into the gas phase that the instrument can analyze. The mass spectrometer instrument basically consists of an ion source, a mass analyzer that measures the mass-to-charge ratio, and a detector that registers the number or intensity of ions at each mass-to-charge ratio value (1,13). Electrospray ionization (ESI) and MALDI are the two most common techniques to ionize peptides for MS analysis. Whereas MALDI requires individual proteins that have been excised from a 2DE gel, digested and desalted, ESI-MS lends itself to automation because it relies on a steady liquid phase (liquid chromatography (LC)-MS/MS) that can be coupled to an LC to separate complex mixtures of proteins and peptides before analysis (1,13,14). One of the limitations of gel-based proteomics is that only the most abundant proteins are identified. In fact, many of the same proteins have been identified in very different proteomic studies, suggesting that, although the field has incrementally improved, the total pool of identifiable proteins is limited by the dynamic range of MS equipment (1,13). In summary, the specificity of gel-based proteomics have vastly improved in the past 20 yr largely due to improvements in MS instrumentation and the proliferation of genomic and proteomics databases. However, the dynamic range and sensitivity of gel-based proteomics has not kept pace and will require a paradigm shift for further improvements to occur.
STABLE ISOTOPE-BASED PROTEOMICS
The quantitative dimension of classic MS-based proteomics relies on densitometric measurements of spot intensity to ascertain differences in protein abundance between samples (1,6). Although densitometry scanners and image-processing software have improved significantly over the years, they still deal poorly with estimating relative protein abundance changes, particularly for spots that are saturated or poorly focused (5,9). To add a truly quantitative dimension to MS-based proteomics, techniques have been developed that exploit the ability of MS instruments to differentiate between chemically identical peptides of different stable isotope composition (1,5,13) (Fig. 2). The practical application of this finding is that the ratio of signal intensities for (heavy and light) peptide pairs reflects the ratio of their corresponding protein abundance in a complex protein mixture (Fig. 3A). The three most common ways to label protein mixtures are stable isotope labeling by amino acids in culture (SILAC) using 13C-labeled amino acids, proteolytic or end labeling with 18O-water, and chemical labeling using isotope-coded affinity tags (ICAT), which are covalently attached to cysteine residues (1,4,5,9). Presently, postisolation chemical tagging (ICAT) and 18O proteolytic labeling are the only practical stable-isotope labeling techniques for comparing the proteome of extant tissue samples (1). In our laboratory, we have found SILAC a most satisfactory strategy for accurate comparative proteomics (5) when dealing with cell cultures. Because the postlabeling workflow and principle behind all stable isotope-based proteomics techniques are similar, we will only discuss the use of SILAC in this review.
There are two practical methodological approaches to stable-isotope proteomic profiling. The first (Fig. 2) is in combination with a 2DE gel-based approach where a 1/1 mixture of labeled and unlabeled proteins is separated by 2DE, followed by in-gel digestion with trypsin and analysis of resulting peptides by MS for both identification and quantitation. This approach is very similar to the classic gel-based proteomics except that the quantitation is based on the ratios of the peak height of labeled to unlabeled peptide pairs (Fig. 3A). As with most spot-based proteomics, this technique relies on MALDI-MS to identify and quantify proteins. Using this method, we have recently analyzed SILAC-labeled human retinal pigment epithelial (RPE) cells (5) and myotubes cultured from skeletal muscle of lean and very obese women (unpublished data, D. Hittel, June 2006). In both studies, 200-300 proteins were identified and quantified, a handful of which was found to be differentially expressed. Shown in Figure 3A are the typical peptide mass fingerprints (note that each peptide has a 13C-labeled companion) generated by SILAC analysis, in this case, a protein (β-actin) that is unchanged in dividing versus differentiated RPE cells and a protein (60S acidic ribosomal protein) that is upregulated (2/1;) in dividing RPE cells (5). In both cases, the ratio was calculated from the average ratio of all peptide pairs identified for a particular protein spot. The 25 differentially expressed proteins in obese (versus lean) cultured human myotubes (data not shown) were consistent with previous genomic and proteomic studies of obese human skeletal muscle (4,6-8,15) and have generated several nonobvious hypotheses about the metabolic changes with the progression of obesity (6).
Although gel-based stable isotope proteomic profiling is an accurate way to ascertain differences in protein abundance between samples (because several peptides are detected for each protein), it is still very time consuming and technically demanding. In addition, there are no software tools available for the arduous process of identifying and calculating peptide ratios, nor is there any statistical tool specifically designed for this purpose. As such, all raw data were manually created and entered into Microsoft Excel or an SAS-programmed script (5) for the calculation of ratios and statistical analyses.
From its inception, researchers have been trying to adapt stable isotope proteomics to a high-throughput workflow using a combination of shotgun proteomics and multidimensional LC (Fig. 2) (1,3,14). This strategy differs from the 2DE gel-based approaches in that mixtures of stable isotope-labeled and isotope-unlabeled proteins are first fully digested into peptides that are then separated on an immobilized pH gradient strip (3), which is then cut up and processed for LC and ESI-MS (LC-MS/MS) (Fig. 2). The high-density spectra produced by this process (Fig. 3B) can then be quantitatively analyzed using a combination of statistically guided software tools and manual interpretation.
A recent proof-of-principle experiment conducted in our laboratory using SILAC-labeled RPE cells (5) has shown this technique extremely effective in identifying differentially expressed proteins in the cytosol fraction of dividing versus resting RPE cells. More than 400 cytosolic proteins were simultaneously identified and quantified (Fig. 3C). Because one measure of success in proteomics is the number of proteins identified, a current challenge to high-throughput proteomics is identifying large numbers of proteins from complex peptide mixtures. Presently, the interpretation of large spectra such as those produced by LC-MS/MS must go through several rounds of data interpretation and validation to deconvolute these data sets of tens of thousands of peptides, a capability found in few laboratories (1). In addition, filtering criteria and quality control criteria vary wildly from one laboratory to the next and are highly instrument- and preparation-dependent, making it extremely difficult to compare data from different experimental platforms (1). Even with these drawbacks, the potential benefits of a high-throughput proteomics platform vastly outweigh the current technical limitations; in fact, many of these same issues plagued the early days of genome sequencing and microarray analysis before improvements in equipment and the establishment and acceptance of clear quality-control criteria (12). These technologies and tools are currently being developed to create robust platforms for quantitative high-throughput proteomics. In the future, stable isotope techniques in combination with LC-MS/MS will be used to detect changes in quantitative protein profiles and to infer biological function from these patterns. Advancements in this field will also make quantitative proteomics more accessible to the nonspecialist end user. In fact, commercial kits are now available for ICAT and 18O labeling of protein samples, and many companies currently offer quantitative analysis on protein or whole tissue samples. This is often the more prudent choice for those laboratories without access to core facilities or who cannot devote the time to retool their laboratory for proteomic analysis.
PROTEOMICS AND SYSTEMS BIOLOGY
Mass spectrometry-based proteomics has become an indispensable tool for the emerging field of systems biology- the systematic study of all concurrent physiological processes in a cell, organ, or organism during differentially perturbed states (exercise, disease) (1,13). The field itself was born with the completion of and public access to fully sequenced genomes and has been greatly enhanced by the vast numbers of published gene expression studies (1,2,11,13). The goal of systems biology is the integration of genomic, proteomic, and metabolic data into predictive physiological models (1). The practical application of this field will broaden all aspects of biology and medicine, including the exercise and sports sciences (1,2). The large volumes of information generated by protein and mRNA profiling studies require sophisticated software tools for interpretations within a meaningful physiological context. Considerable progress has been made on this front with the appearance of many free (www.GenMAPP.org) and commercially available (www.ingenuity.com) network analysis tools.
Although there have been many gene and protein expression studies of muscle disease, exercise adaptations, and obesity (2,6,7), it continues to be a challenge to produce a molecular model that links gene expression patterns with their corresponding physiological effects. Instead, these studies have traditionally focused on clusters of temporally or coordinately expressed genes with loosely defined gene ontologies. Network analysis software has recently evolved to a point where it has become useful for modeling complex gene expression data sets. Protein (SILAC) and mRNA profiles of obese versus lean human cultured myotubes were used to generate a large network of interactions representing the biological processes that may underlie the pathophysiology of obesity and its associated disease states (Fig. 4). This network consists of differentially expressed mRNA and proteins in a graphical representation of their molecular relationships. Genes or gene products are represented as nodes, and the biological relationship between two nodes is represented as an edge (line); nodes are displayed using various shapes that represent the functional class of the gene product (www.ingenuity.com). All edges are supported by at least one reference from the literature or textbook. This analysis revealed significant associations between differentially expressed mRNA and proteins as well as potential regulatory, protein-protein, and protein-mRNA interactions, which would otherwise have gone unnoticed using a classic approach. We feel that a systems biology approach toward complex biological models will increasingly be used to interpret transcriptional and protein data sets and help generate new and nonobvious hypothesis that would not otherwise arise from any one individual approach.
For proteomics to solve biologically meaningful questions, the data generated must be of high quality; and the analysis, semiquantitative and statistically robust. Current limitations of proteomics to the nonspecialist end user include the cost of equipment, clearly defined quality-control standards, and software tools for analyzing large sets of data. That being said, other techniques that were once highly specialized are now available in simplified forms (and at low cost) and are relatively easy to use such as genome sequencing and microarray-based mRNA profiling. Rapid advances and new technology suggest that the current limitations facing proteomics are limited; in fact, whole proteome and interactome maps of simple organisms (budding yeast) and organelles have recently been published (4). In the near future, new MS instrumentation will likely be developed for whole protein analysis, allowing for the immediate investigation of clinical samples.
Proteomics is highly complementary to other functional genomics approaches such as microarrray analysis. The integration of these measurement can be used to build a powerful database of gene-environment interactions that will be useful to the exercise and sports science researcher to both build and test hypothesis and crucial for the emerging field of systems biology. Such technological advances will undoubtedly advance our understanding of complex biological processes underlying the adaptations to exercise, obesity, and other inactivity-associated disease states. As proteomics becomes increasingly affordable and the density of all known interactions increases, testable hypotheses should emerge from data sets at an increasing rate and patterns of disease prediction developed from large clinical data sets.
The studies described in this article that were performed in the laboratories of the authors were supported by the A. James Clark Endowed Chair, The Parsons Family Foundation, and National Institutes of Health Programs in Genomic Applications (grant NHLBI U01-HL-66614).
1. Aebersold, R., and M. Mann. Mass spectrometry
-based proteomics. Nature
2. Baldwin, K.M. Research in the exercise sciences: where do we go from here? J. Appl. Physiol.
3. Cargile, B.J., J.R. Sevinsky, A.S. Essader, J.L. Stephenson Jr., and J.L. Bundy. Immobilized pH gradient isoelectric focusing as a first-dimension separation in shotgun proteomics. J. Biomol. Tech.
4. Forner, F., L.J. Foster, S. Campanaro, G. Valle, and M. Mann. Quantitative proteomic comparison of rat mitochondria from muscle, heart, and liver. Mol. Cell Proteomics
5. Hathout, Y., J. Flippin, C. Fan, P. Liu, and K. Csaky. Metabolic labeling of human primary retinal pigment epithelial cells for accurate comparative proteomics. J. Proteome Res.
6. Hittel, D.S., Y. Hathout, E.P. Hoffman, and J.A. Houmard. Proteome analysis of skeletal muscle
from obese and morbidly obese women. Diabetes
7. Hittel, D.S., W.E. Kraus, C.J. Tanner, J.A. Houmard, and E.P. Hoffman. Exercise training increases electron and substrate shuttling proteins in muscle of overweight men and women with the metabolic syndrome. J. Appl. Physiol.
8. Kussmann, M., F. Raymond, and M. Affolter. OMICS-driven biomarker discovery in nutrition and health. J. Biotechnol.
9. Ong, S.E., B. Blagoev, I. Kratchmarova, D.B. Kristensen, H. Steen, A. Pandey, and M. Mann. Stable isotope
labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell Proteomics
10. Rabilloud, T. Two-dimensional gel electrophoresis in proteomics: old, old fashioned, but it still climbs up the mountains. Proteomics
11. Storey, K.B. Genomic and proteomic approaches in comparative biochemistry and physiology. Physiol. Biochem. Zool.
12. The Tumor Analysis Best Practices Working Group. Guidelines: expression profiling-best practices for data generation and interpretation in clinical trials. Nat. Rev. Genet.
13. Tyers, M., and M. Mann. From genomics to proteomics. Nature
14. Wang, G., W.W. Wu, W. Zeng, C.L. Chou, and R.F. Shen. Label-free protein quantification using LC-coupled ion trap or FT mass spectrometry
: reproducibility, linearity, and application with complex proteomes. J. Proteome Res.
15. Yuan, Q., J.D. Fontenele-Neto, and L.D. Fricker. Effect of voluntary exercise on genetically obese Cpefat/fat mice: quantitative proteomics of serum. Obes. Res.
Keywords:©2007 The American College of Sports Medicine
network analysis; stable isotope; mass spectrometry; skeletal muscle; obesity