Der teufel steckt im detail [The devil is in the details]
Friedrich Wilhelm Nietzsche (1844–1900)
Designing, conducting, analyzing, reporting, and interpreting the findings of a research study require an understanding of the types and characteristics of data and variables. This basic statistical tutorial discusses the following fundamental concepts about research data and variables:
- Population parameter versus sample variable;
- Types of research variables;
- Descriptive statistics versus inferential statistics;
- Primary data versus secondary data and analyses;
- Measurement scales and types of data;
- Normal versus non-normal data distribution;
- Assessing for normality of data;
- Parametric versus nonparametric statistical tests; and
- Data transformation to achieve normality.
POPULATION PARAMETER VERSUS SAMPLE VARIABLE
In conducting a research study, one ideally would obtain the pertinent data from all the members of the specific, targeted population, which defines the population parameter. However, this is seldom feasible, unless the entire targeted population is relatively small, and all its members are easily and readily accessible.1–4
Pertinent data instead are typically collected on a random, representative subset or sample chosen from the members of the overall specific population, which defines the sample variable. The unknown population parameter, representing the characteristic or association of interest, is then estimated from this chosen study sample, with a varying degree of accuracy or precision. One essentially extrapolates from this sample to make conclusions about the population.1–4
TYPES OF RESEARCH VARIABLES
When undertaking research, there are 4 basic types of variables to consider and define:1,5–7
- Independent variable: A variable that is believed to be the cause of some of the observed effect or association and one that is directly manipulated by the researcher during the study or experiment.
- Dependent variable: A variable that is believed to be directly affected by changes in an independent variable and one that is directly measured by the researcher during the study or experiment.
- Predictor variable: A variable that is believed to predict another variable and one that is identified, determined, and/or controlled by the researcher during the study or experiment (essentially synonymous with an independent variable).
- Outcome variable: A variable that is believed to change as a result of a change in a predictor variable and one that is directly measured by the researcher during the study or experiment (essentially synonymous with a dependent variable).
DESCRIPTIVE STATISTICS VERSUS INFERENTIAL STATISTICS
Descriptive statistics are specific methods used simply to calculate, describe, and summarize the collected research data in a logical, meaningful, and efficient way. Descriptive statistics are reported numerically in the text and tables or in graphical forms.1,8 Descriptive statistics will be the topic of the next basic tutorial in this series.
Researchers often pose a hypothesis (“if this is done, then this occurs” or “if this occurs, then this happens”) and seek to describe and to compare the quantitative or qualitative characteristics of 2 or more populations: 1 with and 1 without a specific intervention, or before and after the intervention in the same group. Purely descriptive statistics alone do not allow a conclusion to be made about association or effect and thus cannot answer a research hypothesis.1,8
Inferential statistics involves using available data about a sample variable to make a valid inference (estimate) about its corresponding, underlying, but unknown population parameter. Inferential statistics also allow researchers to make a valid estimate of the association between an intervention and the treatment effect (causal-effect) in a specific population, based upon their randomly collected, representative sample data.1,3,8
For example, Castro-Alves et al9 recently reported on their prospective, randomized, placebo-controlled, double-blinded trial, in which the perioperative administration of duloxetine improved postoperative quality of recovery after abdominal hysterectomy. Based upon their study sample, these researchers made the valid inference that duloxetine appears to be an effective medication to improve postoperative quality of recovery in all similar patients undergoing abdominal hysterectomy.9
PRIMARY DATA VERSUS SECONDARY DATA AND ANALYSIS
Frequently, there is confusion about the terms primary data and primary data analysis versus secondary data and secondary data analysis.10
Primary data are intentionally and originally collected for the purposes of a specific research study, and it is a priori planned primary data analysis.10–12 Primary data are usually collected prospectively but can be collected retrospectively.12 Valid and reliable primary clinical data collection tends to be time consuming, labor intensive, and costly, especially if undertaken on a large scale and/or at multiple, independent, care-delivery locations or sites.13
For example, a large-scale randomized study is being undertaken in 40 centers in 5 countries over 3 years to determine whether a stronger association (and thus more likely causality) exists between relatively deep anesthesia, as guided by the bispectral index, and increased postoperative mortality.14
Likewise, the General Anesthesia compared to Spinal anesthesia study is an ongoing prospective randomized, controlled, multisite, trial designed to assess the influence of general anesthesia on neurodevelopment at 5 years of age.15
Secondary data are initially collected for other purposes, and these existing data are subsequently used for a research study and its secondary data analyses.11,16 Examples include the myriad of bedside clinical data recorded for routine patient care and administrative claims data utilized for billing and third-party payer purposes.13
Such hospital administrative data (health care claims data) represent an important alternative data source that can be used to answer a broad range of research questions, including perioperative and critical care medicine, which would be difficult to study with a prospective randomized controlled trial.17,18
Secondary clinical data can also be gathered and coalesced into a large-scale research data repository or warehouse, which is intentionally created for quality assurance, performance improvement, health services, or clinical outcomes research purposes. Data on study-specific variables are then extracted (“abstracted”) from one of these already existing secondary data sources.16,19,20 An example is the National Anesthesia Clinical Outcomes Registry, developed by the Anesthesia Quality Institute of the American Society of Anesthesiologists.21
Despite the resources needed for their creation, maintenance, and extraction, secondary data are typically less time consuming, labor intensive, and costly than primary data, especially if needed on a large scale (eg, health services and outcomes research questions in perioperative and critical care medicine).22 However, the possible study variables are limited to those that already exist.16,20,22 Furthermore, the validity of the findings of the research study can be adversely affected by a poorly constructed or executed secondary data collection or extraction process (“garbage in—garbage out”).16,22
The term “secondary analysis of existing data” is generally preferred to the traditional term “secondary data analysis” because the former avoids the need to decide whether the data used in an analysis are primary or secondary.10 An example is the predefined secondary analysis of existing data, prospectively collected in the Vascular Events in Non-Cardiac Surgery Patients Cohort Evaluation study, which assessed the association between preoperative heart rate and myocardial injury after noncardiac surgery.23
MEASUREMENT SCALES AND TYPES OF DATA
Some demographic and clinical characteristics can be parsed into and described using separate, discrete categories. The key distinction is the lack of rank order to these discrete categories. Categorical data can also be called nominal data (from the Latin word, nomen, for “name”), implying that there is no ordering to the categories, but rather simply names. Categorical data can be either dichotomous (2 categories) or polytomous (more than 2 categories).1,5,24,25
Dichotomous data have only 2 categories, and thus are considered binary (yes or no; positive or negative).1,5,24,25 Many clinical outcomes (eg, postoperative nausea/vomiting, myocardial infarction, stroke, sepsis, and mortality) can be recorded and reported as dichotomous data.
Polytomous data have more than 2 categories. Examples of such data include sex (man, woman, or transgender), race/ethnicity (American Indian or Alaska Native, Asian, black or African American, Hispanic or Latino, Native Hawaiian or other Pacific Islander, and white or Caucasian), body habitus (ectomorph, mesomorph, or endomorph), hair color (black, brown, blond, or red), blood type (A, B, AB, or O), and diet (carnivore, omnivore, vegetarian, or vegan).1,5,6,24,25
Unlike nominal or categorical data, ordinal data follow a logical order. Ordinal data are rank ordered, typically based on a numerical scale that is comprised of a small set of discrete classes or integers.1,5,24,25 A key characteristic is that the response categories have a rank order, but the intervals between the values cannot be presumed to be equal.26 The numeric Likert scale (1 = strongly disagree to 5 = strongly agree), which is commonly used to measure respondent attitude, generates ordinal data.26 Other examples of ordinal data include socioeconomic status (low, medium, or high), highest educational level completed (elementary, middle school, high school, college, or postcollege graduate), the American Society of Anesthesiologists Physical Status Score (I, II, III, IV, or V), and the 11-point numerical rating scale (0–10) for pain intensity.
Categorical data that are counts or integers (eg, the number of episodes of intraoperative bradycardia or hypotension experienced by a patient) are typically called discrete data. Discrete data may be more appropriately analyzed using different statistical methods than ordinal data27; however, in practice, the same methods are often used for these 2 variable types. In general, “discrete data variables” refer to those which can only take on certain specific values and are thus distinguished from continuous data, which are discussed next.
Continuous (Interval or Ratio) Data
Continuous data are measured on a continuum and can have or occupy any numeric value over this continuous range. Continuous data can be meaningfully divided into smaller and smaller or finer and finer increments, depending upon the sensitivity or precision of the measurement instrument.25
Interval data are a form of continuous data in which equal intervals represent equal differences in the property being measured.1,5,6,28 For example, the 1° difference between a temperature of 37° and 36° is the same 1° difference as between a temperature of 36° and 35°. However, when using the Fahrenheit or Celsius scale, a temperature of 100° is not twice as hot as 50° because a temperature of 0° on either scale does not mean “no heat” (but this would be true for Kelvin temperature).28 This leads us naturally to a definition of ratio data.
Ratio data are another form of continuous data, which have the same properties as interval data, plus a true definition of an absolute zero point, and the ratios of the values on the measurement scale must make sense.1,5,6,28 Age, height, weight, heart rate, and blood pressure are also ratio data. For example, a weight of 4 g is twice the weight of 2 g.28 The visual analog scale (VAS) pain intensity tool generates ratio data.29 A VAS score of 0 represents no pain, and a VAS score of 60 actually represents twice as much pain as a VAS score of 30.
NORMAL VERSUS NON-NORMAL DATA DISTRIBUTION
A statistical distribution is a graph of the possible specific values or the intervals of values of a variable (on the x-axis) and how often the observed values occur (on the y-axis). There are multiple types of data distribution, including the normal (Gaussian) distribution, binomial distribution, and Poisson distribution.30,31 The so-inclined reader is referred to a more in-depth discussion of the various types or patterns of data distribution.32
The normal (Gaussian) distribution (the “bell-shaped curve”) (Figure 1) is one of the most common statistical distributions.33,34 Many applied inferential statistical tests are predicated on the assumption that the analyzed data follow a normal distribution. Therefore, the normal distribution is also one of the most relevant to basic inferential statistics.30,31,33–35
METHODS FOR ASSESSING DATA NORMALITY
The histogram and the Q–Q plot are 2 graphical methods to visually assess if a set of data have a normal distribution (display “normality”). The Shapiro-Wilk test and Kolmogorov-Smirnov test are 2 well-known and historically widely applied quantitative methods to assess for data normality.36 Graphical methods and quantitative testing can complement one another; therefore, it is preferable that data normality be assessed both visually and with a statistical test.30,37,38 However, if one is uncertain about how to correctly interpret the more subjective histogram or the Q–Q plot, it is better to rely instead on a numerical test statistic.37 See the study by Kuhn et al39 for an example.
The histogram or frequency distribution of the study data can be used to graphically assess for normality. If the study data are normally distributed, the histogram or frequency distribution of these data will fall within the shape of a bell curve (Figure 2A), whereas if the study data are not normally distributed, the histogram or frequency distribution of these data will fall outside the shape of a bell curve (Figure 2B).35 When applicable, authors state in their manuscript their use of a histogram to assess the normality of their primary outcome data, but they do not reproduce this graph. See the study by Blitz et al40 for an example.
One can also use the output of a quantile–quantile or Q–Q plot to graphically assess if a set of data plausibly came from a normal distribution. The Q–Q plot is a scatterplot of the quantiles of a theoretical normal data set (on x-axis) and the quantiles of the actual sample data set (on y-axis). If the data are normally distributed, the data points on the Q–Q plot will be closely aligned with the 45°, reference diagonal line (Figure 3A). If the individual data points stray from the reference diagonal line in an obviously nonlinear fashion, the data are not normally distributed (Figure 3B). When applicable, authors state in their manuscript their use of a Q–Q plot to assess the normality of their primary outcome data, but they do not reproduce this graph. See the study by Jæger et al41 for an example.
Shapiro-Wilk Test and Kolmogorov-Smirnov Test
Both the Shapiro-Wilk and the Kolmogorov-Smirnov tests compare the scores in the study sample with a normally distributed set of scores with the same mean and SD; their null hypothesis is that sample distribution is normal. Therefore, if the test is significant (P< .05), the sample data distribution is non-normal.30,36 When applicable, authors should state in their manuscript which test was used to assess the normality of their primary outcome data and report its corresponding P value.
The Shapiro-Wilk test is more appropriate for small sample sizes (N ≤ 50), but it can also be validly applied with large sample sizes. The Shapiro-Wilk test provides greater power than the Kolmogorov-Smirnov test (even with its Lilliefors correction). For these reasons, the Shapiro-Wilk test has been recommended as the numerical means for assessing data normality.30,36,37
PARAMETRIC VERSUS NONPARAMETRIC STATISTICAL TESTS
The details and appropriate use of the wide array of available inferential statistical tests will be the topics of several future tutorials in this current series.
These statistical tests are commonly classified as parametric versus nonparametric. This distinction is generally predicated on the number and rigor of the assumptions (requirements) regarding the underlying study population.42 Parametric statistical tests make certain assumptions about the characteristics and/or parameters of the underlying population distribution upon which the test is based, whereas nonparametric tests make fewer or less rigorous assumptions.42
Specifically, parametric statistical tests assume that the data have been sampled from a specific probability distribution (a normal distribution); nonparametric statistical tests make no such distribution assumption.43,44 In general, parametric tests are more powerful (“robust”) than nonparametric tests, and so if possible, a parametric test should be applied.43
DATA TRANSFORMATION TO ACHIEVE NORMALITY
Researchers may find that their available study data are not normally distributed, ostensibly calling into question the validity of using a more robust parametric statistical test.
While the results of the above tests of normality are typically reported (including in Anesthesia & Analgesia), they are not a panacea. With small sample sizes, these normality tests do not have much power to detect a non-Gaussian distribution. With large sample sizes, minor deviations from the Gaussian “ideal” might be deemed “statistically significant” by a normality test; however, the commonly applied parametric t test and analysis of variance are then fairly tolerant of a violation of the normality assumption.36,45 The decision to apply a parametric test versus a nonparametric test is thus sometimes a difficult one, requiring thought and perspective, and should not be simply automated.36
If the normality test concludes that study data deviate significantly from a Gaussian distribution, rather than applying a less robust nonparametric test, the problem can potentially be remedied by judiciously and openly: (1) performing a data transformation of all the data values; or (2) eliminating any obvious data outlier(s).36,46 Most commonly, logarithmic, square root, or reciprocal data transformation are applied to achieve data normality.47 See the studies by Law et al48 and Maquoi et al49 for examples.
A basic understanding of data and variables is required to design, conduct, analyze, report, and interpret, as well as to understand and apply, the findings of a research study. The assumption of study data demonstrating a normal (Gaussian) distribution, and the corresponding choice of a parametric versus nonparametric statistical test, can be a complex and vexing issue. As will be discussed in detail in future tutorials, the type and characteristics of study data and variables essentially determine the appropriate descriptive statistics and inferential statistical tests to apply.
Name: Thomas R. Vetter, MD, MPH.
Contribution: This author wrote and revised the manuscript.
This manuscript was handled by: Jean-Francois Pittet, MD.
1. Urdan TC. Introduction to social science research principles and terminology. Statistics in Plain English. 2017:4th ed. New York, NY: Routledge, Taylor & Francis Group, 1–12.
2. Levy PS, Lemeshow S. The population and the sample. Sampling of Populations Methods and Applications. 2009:Hoboken, NJ: John Wiley & Sons Inc, 13–46.
3. Motulsky H. From sample to population. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. 2014:New York, NY: Oxford University Press, 22–28.
4. Glasser SP. Glasser SP. Introduction to clinical research concepts, essential characteristics of clinical research, overview of clinical research study designs. Essentials of Clinical Research. 2014:2nd ed. Cham, Switzerland: Springer, 11–32.
5. Field A. Why is my evil lecturer forcing me to learn statistics? Discovering Statistics Using IBM SPSS Statistics: And Sex and Drugs and Rock‘n’ Roll. 2013:Los Angeles, CA: Sage, 1–39.
6. StatSoft Inc. Overview of elementary concepts in statistics. Electronic Statistics Textbook. 2013.Tulsa, OK: StatSoft.
7. Maciejewski ML, Diehr P, Smith MA, Hebert P. Common methodological terms in health services research and their synonyms [correction of symptoms]. Med Care. 2002;40:477–484.
8. Salkind NJ. Statistics or sadistics? It’s up to you. Statistics for People Who
(Think They) Hate Statistics. 2016:6th ed. Thousand Oaks, CA: Sage Publications, 5–18.
9. Castro-Alves LJ, Oliveira de Medeiros AC, Neves SP, et al. Perioperative duloxetine to improve postoperative recovery after abdominal hysterectomy: a prospective, randomized, double-blinded, placebo-controlled study. Anesth Analg. 2016;122:98–104.
10. Cheng HG, Phillips MR. Secondary analysis of existing data: opportunities and implementation. Shanghai Arch Psychiatry. 2014;26:371–375.
11. Hox JJ, Boeije HR. Data collection, primary vs. secondary. Encyclopedia Soc Measur. 2005;1:593–599.
12. Goodman C. HTA 101: Primary Data Methods. HTA 101: Introduction to Health Technology Assessment. Available at: https://www.nlm.nih.gov/nichsr/hta101/ta10105.html
. Accessed May 10, 2017.
13. Selby JV, Whicher DM. Robertson D, Williams GH. The patient-centered outcomes research institute: current approach to funding clinical research and future directions. Clinical and Translational Science: Principles of Human Research. 2017:2nd ed. Amsterdam, the Netherlands: Elsevier/Academic Press, 72–91.
14. Short TG, Leslie K, Chan MT, Campbell D, Frampton C, Myles P. Rationale and design of the balanced anesthesia study: a prospective randomized clinical trial of two levels of anesthetic depth on patient outcome after major surgery. Anesth Analg. 2015;121:357–365.
15. McCann ME, Withington DE, Arnup SJ, et al. The GAS Consortium. Differences in blood pressure in infants after general anesthesia compared to awake regional anesthesia (GAS study-a prospective randomized trial). Anesth Analg. 2017;125:837–845.
16. Olsen J. Rothman KJ, Greenland S, Lash TL. Using secondary data. Modern Epidemiology. 2012:3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins, 481–491.
17. Garland A, Gershengorn HB, Marrie RA, Reider N, Wilcox ME. A practical, global perspective on using administrative data to conduct intensive care unit research. Ann Am Thorac Soc. 2015;12:1373–1386.
18. Ackland GL, Stephens RC. Big data: a cheerleader for translational perioperative medicine. Anesth Analg. 2016;122:1744–1747.
19. Prokosch HU, Ganslandt T. Perspectives for medical informatics. Reusing the electronic medical record for clinical research. Methods Inf Med. 2009;48:38–44.
20. Murphy SN, Cheuh HC, Herrick CD. Robertson D, Williams GH. Information technology. Clinical and Translational Science: Principles of Human Research. 2017:2nd ed. Amsterdam, the Netherlands: Elsevier/Academic Press, 228–242.
21. Liau A, Havidich JE, Onega T, Dutton RP. The national anesthesia clinical outcomes registry. Anesth Analg. 2015;121:1604–1610.
22. Cooke CR, Iwashyna TJ. Using existing data to address important clinical questions in critical care. Crit Care Med. 2013;41:886–896.
23. Abbott TE, Ackland GL, Archbold RA, et al. Preoperative heart rate and myocardial injury after non-cardiac surgery: results of a predefined secondary analysis of the VISION study. Br J Anaesth. 2016;117:172–181.
24. Campbell MJ, Swinscow TDV. Data display and summary. Statistics at Square One. 2009:Chichester, UK; Hoboken, NJ: Wiley-Blackwell/BMJ Books, 1–13.
25. Hulley Stephen B, Newman TB, Cummings SR. Planning the measurements: precision, accuracy, and validity. Designing Clinical Research. 2013:4th ed. Philadelphia, PA: Wolters Kluwer Health/Lippincott Williams & Wilkins, 32–42.
26. Jamieson S. Likert scales: how to (ab)use them. Med Educ. 2004;38:1217–1218.
27. Nevill AM, Atkinson G, Hughes MD, Cooper SM. Statistical methods for analysing discrete and categorical data recorded in performance analysis. J Sports Sci. 2002;20:829–844.
28. Motulsky H. Type of variables. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. 2014:New York, NY: Oxford University Press, 72–76.
29. Price DD, Staud R, Robinson ME. How should we use the visual analogue scale (VAS) in rehabilitation outcomes? II: Visual analogue scales as ratio scales: an alternative to the view of Kersten et al
. J Rehabil Med. 2012;44:800–801.
30. Ghasemi A, Zahediasl S. Normality tests for statistical analysis: a guide for non-statisticians. Int J Endocrinol Metab. 2012;10:486–489.
31. Motulsky H. The Gaussian distribution. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. 2014:New York, NY: Oxford University Press; 85–89.
32. Viti A, Terzi A, Bertolaccini L. A practical overview on probability distributions. J Thorac Dis. 2015;7:E7–E10.
33. Urdan TC. The normal distribution. Statistics in Plain English. 2017:4th ed. New York, NY: Routledge, Taylor & Francis Group, 33–41.
34. Salkind NJ. Are your curves normal? Probability and why it counts. Statistics for People Who (Think They) Hate Statistics. 2016:6th ed. Thousand Oaks, CA: Sage Publications, 149–174.
35. Altman DG, Bland JM. Statistics notes: the normal distribution. BMJ. 1995;310:298.
36. Motulsky H. Normality tests. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. 2014:New York, NY: Oxford University Press, 203–208.
37. Das KR, Rahmatullah Imon AHM. A brief review of tests for normality. Am J Theor Appl Stat. 2016;5:5–12.
38. D’Agostino RB. D’Agostino RB, Stephens MA. Tests for normal distribution. Goodness-of-Fit Techniques. 1986:New York, NY: Marcel Dekker, 367–420.
39. Kuhn JC, Hauge TH, Rosseland LA, Dahl V, Langesæter E. Hemodynamics of phenylephrine infusion versus lower extremity compression during spinal anesthesia for cesarean delivery: a randomized, double-blind, placebo-controlled study. Anesth Analg. 2016;122:1120–1129.
40. Blitz JD, Shoham MH, Fang Y, et al. Preoperative renal insufficiency: underreporting and association with readmission and major postoperative morbidity in an academic medical center. Anesth Analg. 2016;123:1500–1515.
41. Jæger P, Grevstad U, Koscielniak-Nielsen ZJ, Sauter AR, Sørensen JK, Dahl JB. Does dexamethasone have a perineural mechanism of action? A paired, blinded, randomized, controlled study in healthy volunteers. Br J Anaesth. 2016;117:635–641.
42. Sheskin DJ. Lovric M. Parametric versus nonparametric tests. International Encyclopedia of Statistical Science. 2011:Berlin, Heidelberg, Germany: Springer Berlin Heidelberg, 1051–1052.
43. Greenhalgh T. How to read a paper. Statistics for the non-statistician. I: Different types of data need different statistical tests. BMJ. 1997;315:364–366.
44. Motulsky H. Nonparametric methods. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. 2014:New York, NY: Oxford University Press, 390–400.
45. Motulsky H. Q&A: normality tests - why the term “normality?” Frequently Asked Questions. 2010. Available at: http://www.graphpad.com/support/faqid/959/
. Accessed May 14, 2017.
46. Motulsky H. Outliers. Intuitive Biostatistics
: A Nonmathematical Guide to Statistical Thinking. 2014:New York, NY: Oxford University Press, 209–215.
47. Manikandan S. Data transformation. J Pharmacol Pharmacother. 2010;1:126–127.
48. Law LS, Lo EA, Gan TJ. Xenon anesthesia: a systematic review and meta-analysis of randomized controlled trials. Anesth Analg. 2016;122:678–697.
49. Maquoi I, Joris JL, Dresse C, et al. Transversus abdominis plane block or intravenous lignocaine in open prostate surgery: a randomized controlled trial. Acta Anaesthesiol Scand. 2016;60:1453–1460.