INTRODUCTION
Living systems are complex molecular machines. To understand them and enhance their performance, or help them recover quickly and efficiently from injuries, we need to first catalog the components, understand their basic functions, and then understand how they interact with each other and work together. In this new era for biosciences, living systems are studied in a holistic approach, thanks to the development of new high-throughput molecular techniques, the so-called “omics,” such as genomics, transcriptomics, proteomics, metabolomics (and likely many more as this approach gets understood and used to greater degrees) (1,14,20,32 ). A 2001 review of bioinformatics by Luscombe et al. (25 ) defined bioinformatics as a management information system for molecular biology. In its purest sense, it serves as an intersection between molecular data and advanced mathematical approaches. As such, critical advances continue to be made that allow for more and more information to be computed and analyzed into meaningful outcomes. As one can see, the data generated from such studies are enormous, complex, and usually noisy. Highly empowered computers, sophisticated software, and statistical methods are nowadays absolutely essential to analyze the vast amounts of these genetic and molecular data, generated even by a single study, not to mention the integration of information from hundreds or even thousands of studies. Therefore, the field of exercise research faces the same challenges/needs as many other fields of biosciences.
The main directions of research in the field of physical exercise and elite sports continue to focus on understanding the physiological as well as genetic and molecular basis of certain biological attributes and responses. As a discipline, great strides have been made in applying molecular biology and genetic techniques to provide additional layers of understanding (or inquiry depending on your perspective), and informatics approaches stand as a next transition point for researchers examining these types of questions. For example, what is the genetic and molecular basis of speed, endurance, and strength? For years now, research groups led by John Hawley, Jamie Timmons, Ken Balwin, Fadia Haddad, Frank Booth, Keith Baar, and Mark Tarnopolsky (and many others) have all masterfully applied many molecular biology techniques (RT-PCR, Western blotting, etc.) to research questions that relate greatly to exercise physiology and performance. Notably, Claude Bouchard has been and continues to lead efforts in areas related to health, metabolic health, exercise, and genetic interactions. More of this work is needed in understanding the genetic/molecular basis of sports-related injuries, what makes an athlete vulnerable to them, and how the athlete may recover more quickly, efficiently, and completely. Furthermore, we need to understand the genetic/molecular basis of nutritional requirements during intense exercise and how this may affect performance or recovery from injuries. In particular, several biological processes and their underlying molecular pathways attract the attention of athletic performance research, such as (a) muscle, cartilage, and bone formation, (b) muscle energy production and metabolism (mitochondrial biogenesis, lactic acid removal), and (c) blood and tissue oxygenation (erythropoiesis, angiogenesis, vasodilatation) (20 ). Finally, a flaming issue is how to efficiently use these new technologies and tools to win the “war on doping” that threatens to destroy the spirit of athletics. This review will discuss how bioinformatics may assist the new generation of “omics” research in this exciting field. It is not intended to discuss or explain the various techniques and interpretations used by bioinformatics experts but rather to highlight what popular databases (Table 1 ) and tools are already available for analyzing such complex data and what needs to be done in terms of developing new tools to assist future research in this field. Through this introduction and discussion, the authors hope to intrigue, inspire, and motivate other researchers by the near infinite number of possibilities this analytical approach can offer to exercise physiology, sports nutrition, athletic training, and performance research.
Table 1: A collection of the most important publicly available databases.
Standing on the Shoulders of Giants—The Need to Efficiently Retrieve Hidden Knowledge From Past Research
Perhaps, the most important thing in any field of research is to be able to store, organize, and retrieve past knowledge, so as to build upon it, plan future experiments, and avoid reproducing studies performed by other groups, thus saving valuable time and resources. Toward this end, a database that contains information about the relevant literature is absolutely necessary and needs to be publicly available and easy to query. Indeed, PubMed is the database that performs this task and allows researchers to query over 24 million bibliographic citations and abstracts in the fields of biomedicine, veterinary medicine, nursing, dentistry, and health care (http://www.ncbi.nlm.nih.gov/books/NBK3827/ ) (34 ). A query in PubMed with the keyword “sports” yielded ∼185,055 publications (as of September 2014), whereas a query with the keywords “exercise” OR “physical exercise” yielded ∼282,283 publications. Moreover, there has been a steady increase in papers published in the field of physical exercise, from ∼1,019 articles published just in the year 1972 to 19,103 published in the year 2012 (and over 20,297 in 2013, accessed January 11, 2013). One important question that is not easy to answer just from a query in PubMed is how many of these publications are truly relevant. In addition, from the ∼282,283 publications, only ∼58,644 are accessible to everyone as full-text articles in PubMed Central.
Obviously, it is a major challenge to filter, organize, and make available all this relevant information to the community. One issue is how much of this literature is fully searchable, as PubMed searches in the title, abstract, and Mesh terms (keywords that describe the main theme of the study); nevertheless, it does not search within the full text. Thus, relevant papers may be missed when searching PubMed for something very specific (such as some aspect of exercise or dietary prescription, genetic or protein targets, a therapeutic agent, or dosing regimen). Therefore, a need exists to create a targeted database of sports- and physical exercise–related literature. Toward this end, the generic tools needed are already available. Textpresso is a very successful tool/database system that has already been widely used by various communities that organize and make searchable the literature in their field (27 ). The teams that take the responsibility have to first identify the relevant papers and obtain their full text in PDF or HTML form. Next, each document is converted to plain text and its fields are identified, such as Title, Authors, Abstract, Introduction, Materials and Methods, Results, Discussion, etc. The processed papers are then indexed by the Textpresso search engine and a Web interface is used to query the database using Boolean logic, even within certain fields of the documents. For example, one may want to identify all those sports-related documents that use a certain experimental method. Therefore, it is possible to search for the appropriate term only within the Materials and Methods section. This level of sophisticated search is not yet available in PubMed, and the community needs to consider developing a physical exercise–related Textpresso database. Once all those relevant documents are retrieved and processed, they may be clustered in groups of related themes, based on available methods that use the content of keywords found within documents (8,30 ). This will allow a researcher to have a more complete view of the relevant literature, instead of looking only at the papers that a specific document cites, or papers that cite this specific document, or search the literature for publications of a certain research team.
Apart from the literature-oriented databases (such as PubMed, Scopus, Web of Science, or Google Scholar), key information is also stored in many other primary or secondary databases that organize and allow the retrieval of raw or processed genetic and molecular data. The complexity of these types of data and the vast amount of information generated by the various “omics” technologies is clearly demonstrated by the large number of databases publicly available. Nowadays, it is virtually impossible to follow the new databases that become available or the updates of existent databases. This problem has been solved by the compilation of an annual catalog of databases of Molecular Biology, in a special January issue of the Nucleic Acids Research journal (12,13 ). This catalog listed 1,552 different databases for 2014, organized in thematic units such as sequences, genomes, gene expression, proteomics, metabolic and signaling pathways, diseases, and structures. Therefore, this annual database issue is the point of entry for someone who wants to know what types of databases are available to query, to interpret their results or plan future experiments. Certainly, the construction of such a database comprising material focusing on exercise-related literature would be valuable for exercise researchers.
Moving Toward a Common Language—The Need for Sports Ontologies
Ontologies are composed of a controlled vocabulary with a hierarchical structure. In very simple terms, it is a nomenclature, a list of keywords with specific and hierarchical relationships to each other that are agreed on by the community to describe processes, components (e.g., genes, proteins, tissues), functions, and phenotypes. Thus, the problem of polysemy and homonymy are dealt with in databases and publications. The best known ontology is the Gene ontology (15 ).
The BBC (http://www.bbc.co.uk/ontologies/sport/ ) has already developed sports ontology to describe competitive events. Other sports-related ontologies are described within papers by Liao et al. (23 ) and Zhai and Zhou (40 ). In addition, there exists the human phenotype ontology (33 ). Probably, a combination of these efforts together with an ontology that will capture information about athletic performance will need to be developed by the community to address issues of antidoping and fundamental research in athletics.
These ontologies, in combination with statistical methods, can simplify the analysis of exercise-related “omics” data and allow the identification of statistically over/underrepresented ontology terms in a certain experiment. For example, in a hypothetical future experiment where thousands of genomes from various categories of athletes are sequenced or genotyped, the use of these ontologies will help to identify mutations or genotypes that are associated with certain types of sports or performances.
The Upcoming Genomic Revolution and Personalization
One of the main goals of genetics and genomics is to map from genotype to phenotype. Thus, in the future, the genotype of a person will be fed into an algorithm together with life-style choices, where predictions of variable accuracy will be made about certain attributes, skills, and diseases. In the same manner, exercise genomics aims to identify those genetic variations that affect sports-related attributes, such as endurance, strength, speed, vulnerability to sports-related injuries, and nutritional requirements. The idea is to obtain the genotype of an athlete, which will be fed into an algorithm and suggest an individualized and thus optimized training program. From there, extrapolations can be made toward the impact of a certain exercise program, therapeutic drug regimen, or nutritional intervention and how they may further impact the genotype and subsequent phenotype.
A recent review organized and presented the major genetic variants that are associated with sports performance based on 4 criteria of relevance, prevalence, modification, and measurability (20 ). In particular, genetic variations in ∼20 genes should be considered for the choice of sports activity, the tailoring of training programs, susceptibility to injuries, the psychological aptitude during highly stressful periods, false-positive results during antidoping control, and tailoring of a proper nutritional program. For example, actinin (ACTN) is an actin-binding protein, and the 2 ACTN2 and ACTN3 isoforms are found in skeletal muscle. Yang et al. (39 ) reported the positive association of an ACTN3-RR and ACTN3-RX genotype with power athletes and later reported the presence of the genotype being associated with strength increases in response to resistance training in women (9 ).
Currently, 2 main approaches are routinely used to identify genetic variations that affect sports-related attributes; single nucleotide polymorphism (SNP) array–based Genome-Wide Association Studies (GWAS) and Next Generation Sequencing (NGS). Genome-Wide Association Studies currently take advantage of SNP arrays and linkage Disequilibrium to scan for the presence of tens of thousands of SNPs within a large group of controls and a large group of cases (people with a certain trait/disease). Briefly, an SNP is a variation in the normal expected sequence of DNA bases that can occur in coding and noncoding regions of a person's DNA; SNPs can be viewed as one of the most basic levels of natural selection between individuals, athletes, or otherwise. Consequently, for each particular SNP, statistical tests may reveal whether this SNP (or its associated SNPs because of linkage disequilibrium) is overrepresented in the trait/disease group (or exercise training group). This approach seemed very promising at its early stages; nevertheless, many studies revealed that the majority of phenotypic traits have a very complex genetic structure, where a large number of different SNPs may control the appearance of a certain phenotypic trait with a small effect and variable penetrance. A manually curated GWAS database from the National Human Genome Research Institute (NHGRI) (19 ) (http://www.genome.gov/gwastudies/ ) catalogs the published GWAS studies that have been performed (over 1,961 studies as of August 2014). Slightly more than 100 of these studies could be of interest to the sports community, as they study relevant phenotypic traits of metabolism, hormones, and obesity. Nevertheless, in that catalog, only 1 study specifically targeted the field of physical exercise. In particular, that study concluded that there is a heritability component in leisure time exercise behavior, but it is due to many genetic variants, each one of them individually having a small effect (10 ). Although operating as an exciting entry point in this line of work, SNP array–based GWAS studies suffer from several problems and shortcomings that ultimately limit their usefulness. As a result, they are ultimately being rapidly replaced by the upcoming genomic revolution from NGS technologies (29 ).
A new generation of sequencing technologies, based for example on semiconductor technology or nanopores, promises to deliver personal genomes within the next 1 or 2 years at a cost estimated to be around $1,000 (28 ). Possibly, within the next 5–10 years, the cost will drop even further because of competition and high investment on this critical field. One only needs to observe the famous NHGRI graph (http://www.genome.gov/sequencingcosts/ ) on the cost of sequencing (that drops faster than what would be expected by Moore's law) to understand the trends. Therefore, in the near future, it will be very feasible to study whole genomes from large groups of athletes. In this new era, the bottleneck will no longer be the cost of sequencing (as has been the case up until this point), rather, the computing and storage demands and expertise required to perform bioinformatics analysis of the results will be the rate-limiting step (35 ). Creation of new databases will be required and tailored to the specific needs of the physical exercise–related research community. In addition to the genome of each individual being stored, metadata such as fitness, strength, endurance, and clinical representations will be equally valuable, to interpret the genomic results. A consensus metadata capturing scheme, specifically tailored to the physical exercise and athletic performance research community (that will accompany each personal genome), will be needed, just like Minimum Information About a Microarray Experiment (MIAME) standards in microarrays or the National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI) biosamples databases (6,17 ). This is an extremely important point as the community will need to extensively discuss what phenotypic information will accompany each genome, by taking into account issues of ethics and privacy. For example, how much of the accompanying information and the genome will be publicly available, or whether certain regions of the genome will be blocked from public view (a policy of sanction on unrelated sports polymorphisms). In cases of investigations (scientific or otherwise) that relate to drug use or abuse, the topics encroach on areas of privacy, ethics, and potential litigation. Furthermore, how much health-related information should be passed to the athletes participating in such sequencing projects? In a similar manner to the NCBI and EBI biosample databases, a new database focused on athletes could be generated that will link the sample with its personal genome and with different omics experiments. The structure will have to be more complex, as the individual will have some of its information stable and some in a time-series manner. Therein, a key consideration is made whereby involved scientists and other key players are strongly encouraged to work together to build a robust infrastructure upon which such massive amounts of valuable information can be cataloged and retrieved.
For those who wish to analyze NGS data, there is a plethora of bioinformatics tools, but these analyses are still quite complex, take tremendous expertise, time, patience, and perspective and will need to be performed by expert bioinformaticians. Analyzing genomic data from NGS technologies is a demanding procedure that requires several steps. These are (a) quality control and filtering of sequence reads, (b) mapping/alignment of the filtered reads to the reference genome, (c) identification of genetic variations with the help of variant-calling statistical tools and most importantly, and (d) statistical analysis and interpretation of the genetic variations. The bioinformatics field is undergoing a revolution with several different sequencing technologies and many bioinformatics teams developing even better tools to analyze more and more data in a sophisticated manner. Therefore, it becomes demanding just to keep up with the latest tools, technologies, and literature. How this field will evolve and allow for more entry-level analysis to be completed remains to be seen. Moreover, it has become evident in the NGS community that the bottleneck in such projects will not be the cost or time of sequencing, rather, the cost and time of people to analyze and interpret the data (35 ).
One very successful approach that the community has adopted to cope with the rapid advancements in the field is the development of a NGS-focused database SEQanswers wiki (22 ). This database wiki is edited by the NGS community members. It includes reviews on specific topics, tutorials on how to run certain analyses, and a structured catalog of >500 bioinformatics tools (and their characteristics) for high-throughput analysis, available to the public domain. A bioinformatician will be needed to follow the literature, evaluate the various computational tools, and install in a workstation or a computer cluster a series of software, where each one of them will perform a certain step of the NGS analysis. Nevertheless, there is also the possibility to run complex NGS analyses in Web interfaces, such as Galaxy (7 ), where the user may not need to run each software for every step of the analysis separately but may instead create workflows. Still, the user will need to understand the available components/tools, how to structure the workflow, and what parameters will need to be fed to each of the tools.
From the various sequencing technologies, at present, Illumina contributes ∼85% of sequenced bases in the publicly available Sequence Read Archive (SRA) (21 ). This percentage may easily be turned upside down in the near future by any of the new generations of sequencing technologies that are based, for example, on semiconductors (i.e., Ion Proton) or nanopores (i.e., Oxford nanopore). Sequence Read Archive is a publicly available database, where raw data from High-Throughput Sequencing projects are submitted, together with metadata that describe the individual project, the samples, the technology, and analyses among other things. It also includes RNA-Seq data for gene expression analyses. A search in SRA with the keyword “exercise” revealed just 3 projects, with one of them published (26 ), all of them studying the gene expression (with RNA-Seq) of skeletal muscle in mice and horses. Therefore, not many NGS genomic data are available for physical exercise and athletic performance, but this will probably change very soon and the teams involved will need to take into account the necessity for highly skilled bioinformaticians.
Functional Analyses
Functional genomics and proteomics are increasingly becoming necessary tools in the effort to understand the molecular basis of certain adaptations and responses to exercise as well as diagnostic tools for developing new more sensitive and more reliable biomarkers of physical condition as well as the abuse of prohibited substances.
The principle behind functional genomics and proteomics is the measurement of the expression of all RNAs or proteins (and possibly their modifications as well) in certain tissues or cells before and after the implementation of a certain stimulus (such as an exercise, nutritional, or pharmaceutical intervention), whether it is of physical or chemical nature. For the global measurement of RNA gene expression, there exist 2 main technological approaches, the microarray and the RNA-Seq based. The RNA molecules whose expression changes significantly after the implementation of the stimulus are identified; the list of these molecules is typically rather long as in the order of tens or hundreds of thousands. Next, with the help of gene ontologies and statistical tools, such as FatiGO, certain biological processes or molecular functions and pathways are identified, which are enriched within this list of significantly over/underexpressed molecules. Thus, we can understand what pathways and processes are heavily involved. For the analysis of gene expression data, there exist commercial tools and Bioconductor (16 ). The latter is a community project based on open source and open development (thus free) software that use the R statistical programming language. There exist over 600 tools, developed to analyze genomic and functional genomic data, from high-throughput sequencing and microarrays.
The data from such gene expression experiments performed with any of the 2 types of technologies (microarrays or RNA-Seq) are stored and organized in publicly available databases such as the Gene Expression Omnibus (GEO) (6 ). As of September 2014, GEO contained gene expression data (provided by various microarray or RNA-seq platforms) from ∼50,775 experiments (defined as series in GEO) that composed of ∼1,236,744 individual samples. This is a huge number of data that reflects more than 10 years of intense research in large-scale expression studies. Although this number is very impressive, when the database (http://www.ncbi.nlm.nih.gov/geo/browse/ ) is queried with the keyword “sports,” there is no relevant entry. When the database (in the series field) is queried with the keyword “exercise,” 97 series of experiments are retrieved, which have resulted in 68 publications. As expected, the majority of experiments are for human (57%) and the rest are for mouse (23%), rat (14%), pig (4%), horse (2%), dog, and salmon. The majority of the gene expression experiments are based on microarray technology, whereas only 7 series of data are based on RNA-Seq technology. One study investigates the levels of micro-RNAs in PBMCs and neutrophils. Of those 97 series of studies, 50% investigate the effects of exercise on muscle (in various species), whereas 24% investigate the effects of exercise on heart and arteries, 16% on blood cells (predominantly PBMCs), and 6% on brain. There are also individual studies for adipose tissue, colon mucosa, liver, and mammary gland.
Although global measurements of gene expression can help to elucidate the molecular basis of certain physiological adaptations, they may also allow for identification of reliable biomarkers of health state or drug abuse. The principle is based on toxicogenomics, where the abuse of a substance triggers a cascade of molecular changes at the expression level. In an early study, it was demonstrated that substances of similar toxicity and mode of action may exhibit similar gene expression profiles. Thus, by identifying the expression profiles of a battery of substances of known toxicity and mode of action, one may predict the toxicity and mode of action of a new substance (38 ). The Athlete's Biological Passport is an initiative that is based exactly on this principle (36 ). Instead of continuously developing new drug tests to identify newly developed doping substances that are evolving all the time, one needs to have the expression profile of a certain substance of that family. A newly modified substance of the same family and mode of action should in principle have a very similar expression profile in a certain tissue.
Lately, there has been an interest in developing new biomarkers based on the expression of mRNA or even better, micro-RNA, in body fluids (11 ). Micro-RNAs from body fluids have shown great potential as stable, sensitive, and early biomarkers of toxicity (37 ). It will be very interesting to assess whether these body fluid micro-RNAs can function as good biomarkers of stress, oxidative stress, tissue damage, and restoration, regarding athletic performance, especially in elite athletes. Although there have been more than 10 years of intense research in gene expression in general, a very recent study casted a shadow on the interpretation of gene expression results from thousands of experiments (24 ). The authors of that study observed that the assumptions that are made to normalize gene expression results, such as the hypothesis that total RNA levels do not change between conditions, do not always hold. Therefore, in the future, gene expression studies should incorporate cell count and cell volume to perform correct normalization of the measured gene expression values. Another fact that must be taken into account when interpreting these gene expression studies is that the correlation between RNA and protein levels is weak (18 ). This is because of limitations of the gene expression technologies and the fact that there exists a whole level of posttranscriptional regulation still being discovered. Therefore, proteomic experiments can shed more light and provide more accurate snapshots of the interior of a cell, but this work has been slow to work its way into exercise research. A recent review describes all those gel-based (1- and 2-dimensional polyacrylamide gel electrophoresis, differential gel electrophoresis) and gel-free (Isotope Coded Affinity Tag, Stable Isotopic Labeling by Amino Acids in Cell Culture, and Isobaric Tags for Relative and Absolute Quantitation) systems to analyze the proteome in various types of samples, for research in the field of physical exercise (32 ). As they note, these tools have not been extensively adopted by the exercise research community thus far, although some of them have been available for almost 30 years. It is much more informative to have protein expression profiles, as these are the molecular machines of the cell. In addition, high-throughput posttranslational modification analyses (i.e., phosphorylation) provide even more information about the molecular state of the cell, as these posttranslational modifications may function as molecular switches that turn on/off the function of an expressed protein (3 ). For studying proteomic data, there is a plethora of databases that are organized in a structured catalog in the annual Nucleic Acids Research database issue, in the ExPASy Bioinformatics Resource Portal of the Swiss institute of Bioinformatics (http://www.expasy.org/proteomics ) that contains the 2DPAGE database and lists a plethora of related bioinformatics tools (4 ) as well as in EMB-BL portal (http://www.ebi.ac.uk/services/proteins ) (Tables 2 and 3 ).
Table 2: Potential applications of bioinformatics and functional proteomics in exercise physiology and sport performance.(
2,5,31 )
Table 3: Potential applications of toxicogenomics in exercise physiology and sport performance.
Practical Applications
Direct applications of this topic to the coach or athlete will not be immediate and viewed by many as nonexistent. However, it must be recognized that scientific effort and development made in research areas that are outlining and explaining various cellular and genetic outcomes will continue to tell athletes, coaches, and researchers more information about how to optimally train and approach athletic performance. Research in this area has expanded and will continue to expand into more sophisticated analytical approaches creating the need to develop efficient systems to accurately retrieve appropriate, accurate, and necessary information. In addition to fostering a more complete approach to athletic performance, amazing potential exists within this context to ethically and accurately develop cataloging systems to best identify optimal training responses, inappropriate recovery, the illicit use of foreign substances, and other influences that may impact overall health and performance of all types of athletes. As such, this review could be viewed as being written to better inform coaches and practitioners of the demands of performing this type of analysis along with the struggles and challenges of doing so. Additionally, this review is also targeted to exercise science researchers as a way to highlight an upcoming approach to answering molecular and genetic questions. In this respect, practical applications from this paper are expected to stimulate future thought and discussion to better position scientists and clinicians to inform coaches and athletes what best drives the health and performance of all athletes.
Conclusions
In conclusion, no other area of biomedical research is growing as quickly as areas related to gene and protein expression. Although advancements in these areas have led to a never seen before ability to generate massive amounts of data, these achievements do not come without difficulties. Aside from the massive expense associated with these techniques and the need for highly qualified bioinformaticians to perform the analysis, cataloging, and organizing the information is important. Because of the massive amount of data produced, improvements in author's abilities to locate and extract necessary information related to subject background, training prescriptions, physical responses to exercise, etc., can be compromised which could lead to wasted time and financial resources. A number of strides have been made in several of the core areas of science. Disciplines such as physical activity and exercise physiology are ripe for massive changes in the coming decades, and it is important for these colleagues to adopt efficient strategies and databases to be able to extract and catalog key information. Research in this specific area continues to grow rapidly, and as more and more research laboratories worldwide begin to implement more “omic” approaches, the need for these considerations will grow. An open (and growing) question is the currently limited number of studies in the field, while more and more commercial genetic testing approaches have become available in the market to predict or personalization a phenotypic response when in reality there is much too limited data for this to be accurately and reliably completed. Finally, the area of toxicogenomics is an area that may likely spawn from the further development of these technologies, which may afford researchers and scientists improved ability to detect drug use and abuse as well as aid in developing more effective drugs for any number of therapeutic outcomes.
Acknowledgments
All authors contributed to the conception and drafting of the manuscript and provided edits and final approval. The authors declare that they have no financial or nonfinancial competing interests.
References
1. Afshari CA, Hamadeh HK, Bushel PR. The evolution of bioinformatics in toxicology: advancing toxicogenomics. Toxicol Sci 120(Suppl 1): S225–S237, 2011.
2. Ahmetov II, Rogozkin VA. Genes, Athlete, Status, and Training—An overview. Med Sport Sci 54: 43–71, 2009.
3. Amoutzias GD, He Y, Lilley KS, Van de Peer Y, Oliver SG. Evaluation and properties of the budding yeast phosphoproteome. Mol Cell Proteomics 11: 1–13, 2012.
4. Artimo P, Jonnalagedda M, Arnold K, Baratin D, Csardi G, de Castro E, Duvaud S, Flegel V, Fortier A, Gasteiger E, Grosdidier A, Hernandez C, Ioannidis V, Kuznetsov D, Liechti R, Moretti S, Mostaguir K, Redaschi N, Rossier G, Xenarios I, Stockinger H. ExPASy: SIB bioinformatics resource portal. Nucleic Acids Res 40: W597–W603, 2012.
5. Baoutina A, Alexander IE, Rasko JEJ, Emslie KR. Potential use of gene transfer in athletic performance enhancement. Mol Ther 15: 1751–1766, 2007.
6. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy K, Sherman P, Holko M, Yefanov A, Lee H, Zhang N, Robertson C, Serova N, Davis S, Soboleva A. NCBI geo: archive for high-throughput functional genomic data-update. Nucleic Acids Res 41: D991–D995, 2013.
7. Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J. Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol 10: 11–21, 2010.
8. Chen D, Muller HM, Sternberg PW. Automatic document classification of biological literature. BMC Bioinformatics 7: 370, 2006.
9. Clarkson PM, Devaney JM, Gordish-Dressman H, Thompson PD, Hubal MJ, Urso M, Price TB, Angelopoulos TJ, Gordon PM, Moyna NM, Pescatello LS, Visich PS, Zoeller RF, Seip RL, Hoffman EP. ACTN3 genotype is associated with increases in muscle strength in response to resistance training in women. J Appl Physiol (1985) 99: 154–163, 2005.
10. De Moor MH, Liu YJ, Boomsma DI, Li J, Hamilton JJ, Hottenga JJ, Levy S, Liu XG, Pei YF, Posthuma D, Recker RR, Sullivan PF, Wang L, Willemsen G, Yan H, De Geus EJ, Deng HW. Genome-wide association study of exercise behavior in Dutch and American adults. Med Sci Sports Exerc 41: 1887–1895, 2009.
11. Etheridge A, Lee I, Hood L, Galas D, Wang K. Extracellular microRNA: a new source of biomarkers. Mutat Res 717: 85–90, 2011.
12. Fernandez-Suarez XM, Galperin MY. The 2013 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids Res 41: D1–D7, 2013.
13. Fernandez-Suarez XM, RIgden D, Galperin MY. The 2014 nucleic acids research database issue and an updated NAR online molecular biology database collection. Nucleic Acids Res 42: D1–D6, 2014.
14. Fodor A. Utilizing “omics” tools to study the complex gut ecosystem. Adv Exp Med Biol 817: 25–38, 2014.
15. Gene Ontology Consortium. The gene ontology: enhancements for 2011. Nucleic Acids Res 40: D559–D564, 2012.
16. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smyth G, Tierney L, Yang JY, Zhang J. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5: R80, 2004.
17. Gostev M, Faulconbridge A, Brandizi M, Fernandez-Banet J, Sarkans U, Brazma A, Parkinson H. The BioSample database (BioSD) at the european bioinformatics Institute. Nucleic Acids Res 40: D64–D70, 2012.
18. Griffin TJ, Gygi SP, Ideker T, Rist B, Eng J, Hood L, Aebersold R. Complementary profiling of gene expression at the transcriptome and proteome levels in Saccharomyces cerevisiae. Mol Cell Proteomics 1: 323–333, 2002.
19. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106: 9362–9367, 2009.
20. Kambouris M, Ntalouka F, Ziogas G, Maffulli N. Predictive genomics DNA profiling for athletic performance. Recent Pat DNA Gene Seq 6: 229–239, 2012.
21. Kodama Y, Shumway M, Leinonen R; International Nucleotide Sequence Database C. The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res 40: D54–D56, 2012.
22. Li JW, Robison K, Martin M, Sjodin A, Usadel B, Young M, Olivares EC, Bolser DM. The SEQanswers wiki: a wiki database of tools for high-throughput sequencing analysis. Nucleic Acids Res 40: D1313–D1317, 2012.
23. Liao SH, Chen JL, Hsu TY. Ontology-based data mining approach implemented for sport marketing. Expert Syst Appl 36: 11045–11056, 2009.
24. Loven J, Orlando DA, Sigova AA, Lin CY, Rahl PB, Burge CB, Levens DL, Lee TI, Young RA. Revisiting global gene expression analysis. Cell 151: 476–482, 2012.
25. Luscombe NM, Greenbaum D, Gerstein M. What is bioinformatics? A proposed definition and overview of the field. Methods Inf Med 40: 346–358, 2001.
26. McGivney BA, McGettigan PA, Browne JA, Evans AC, Fonseca RG, Loftus BJ, Lohan A, MacHugh DE, Murphy BA, Katz LM, Hill EW. Characterization of the equine skeletal muscle transcriptome identifies novel functional responses to exercise training. BMC Genomics 11: 398, 2010.
27. Muller HM, Kenny EE, Sternberg PW. Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol 2: e309, 2004.
28. Niedringhaus TP, Milanova D, Kerby MB, Snyder MP, Barron AE. Landscape of next-generation sequencing technologies. Anal Chem 83: 4327–4341, 2011.
29. Nielsen R, Paul JS, Albrechtsen A, Song YS. Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12: 443–451, 2011.
30. Papanikolaou N, Pafilis E, Nikolaou S, Ouzounis CA, Iliopoulos I, Promponas VJ. BioTextQuest: a web-based biomedical text mining suite for concept discovery. Bioinformatics 27: 3327–3328, 2011.
31. Pérusse L, Rankinen T, Hagberg HG, Ruth JFL, Roth SM, Sarzynski MA, Wolfarth B, Bouchard C, Advances in exercise, fitness, and performance genomics in 2012. Med Sci Sports Exerc 45: 824–831, 2013.
32. Petriz BA, Gomes CP, Rocha LA, Rezende TM, Franco OL. Proteomics applied to exercise physiology: a cutting-edge technology. J Cell Physiol 227: 885–898, 2012.
33. Robinson PN, Mundlos S. The human phenotype ontology. Clin Genet 77: 525–534, 2010.
34. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S, Feolo M, Fingerman IM, Geer LY, Helmberg W, Kapustin Y, Krasnov S, Landsman D, Lipman DJ, Lu Z, Madden TL, Madej T, Maglott DR, Marchler-Bauer A, Miller V, Karsch-Mizrachi I, Ostell J, Panchenko A, Phan L, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Slotta D, Souvorov A, Starchenko G, Tatusova TA, Wagner L, Wang Y, Wilbur WJ, Yaschenko E, Ye J. Database resources of the national Center for biotechnology information. Nucleic Acids Res 40: D13–D25, 2012.
35. Sboner A, Mu XJ, Greenbaum D, Auerbach RK, Gerstein MB. The real cost of sequencing: higher than you think!. Genome Biology 12: 125, 2011.
36. Sottas PE, Robinson N, Rabin O, Saugy M. The athlete biological passport. Clin Chem 57: 969–976, 2011.
37. Wang K, Zhang S, Marzolf B, Troisch P, Brightman A, Hu Z, Hood LE, Galas DJ. Circulating microRNAs, potential biomarkers for drug-induced liver injury. Proc Natl Acad Sci U S A 106: 4402–4407, 2009.
38. Waring JF, Ciurlionis R, Jolly RA, Heindel M, Ulrich RG. Microarray analysis of hepatotoxins in vitro reveals a correlation between gene expression profiles and mechanisms of toxicity. Toxicol Lett 120: 359–368, 2001.
39. Yang N, MacArthur DG, Gulbin JP, Hahn AG, Beggs AH, Easteal S, North K. ACTN3 genotype is associated with human elite athletic performance. Am J Hum Genet 73: 627–631, 2003.
40. Zhai J, Zhou K. Semantic retrieval for sports information based on ontology and SPARQL. Int Conf Inf Sci Management Eng (ISME) 395–398, 2010.