Letters

# A Simple, Interpretable Conversion from Pearson’s Correlation to Cohen’s for d Continuous Exposures

Mathur, Maya B.; VanderWeele, Tyler J.

Author Information
doi: 10.1097/EDE.0000000000001105

## To the Editor:

Meta-analysts often must convert effect sizes reported on different scales to a common scale for analysis.1 In particular, it is common to convert Pearson’s correlation, , computed between an exposure and an outcome to Cohen’s (also called the “standardized mean difference”), which is the difference in expected for a fixed contrast in , standardized by the standard deviation of conditional on . Letting denote the total sample size, the standard conversion1 from to is:  An important, yet infrequently discussed, point is that this conversion was derived for a Pearson correlation computed between a binary exposure and a continuous outcome , also called a “point-biserial” correlation.2–4 Note that when represents a dichotomization of a truly continuous underlying exposure, a special approach3 is required to estimate the correlation between the underlying, continuous exposure and ; one cannot simply apply the standard Pearson’s correlation formula to the observed, dichotomized . Stated otherwise, the point-biserial correlation does not consistently estimate the Pearson correlation that would have been obtained using the underlying continuous variable.

Despite the standard conversion’s origins in the binary-exposure setting, meta-analysts in practice often unknowingly apply Equation (1.1) to obtain Cohen’s from correlations and regression results computed using a continuous . In fact, a widely referenced textbook on meta-analysis describes Equation (1.1) without stipulating that it is only known to apply for the point-biserial case.1 Even if Equation (1.1) can be used for correlations computed using a continuous , its interpretation is unclear: that is, the interpretation of Cohen’s depends on the choice of “groups” in whose means are compared, but because Equation (1.1) applies for a correlation in which is already binary, it is not clear which “groups” of are created when the conversion is instead applied to a correlation using a continuous .

To allow direct computation of Cohen’s from Pearson’s or simple linear regression, we provide a similar conversion and approximate standard error that apply when is continuous. The resulting effect size represents the average increase in the standardized associated with an increase in of units. To preserve the sign of the effect size, should be set to be positive regardless of the sign of . Letting denote the sample standard deviation of , the conversion is:  Derivations of these estimates of and its standard error are provided in the eAppendix, http://links.lww.com/EDE/B601. If r = 0 exactly, the standard error estimate is undefined, so could be replaced by $\lim_{r \to 0} \widehat{\text{SE}}(d)$, which is provided in the eAppendix. The standard error estimate assumes that is approximately normal and that is large. If the standard deviation of is known rather than estimated, then the term should be omitted. As a potential practical limitation, some papers to be meta-analyzed might not report , in which case the meta-analyst might need to substitute an estimate from, for example, a comparable second study or a subsample of the study used to estimate . In this case, the in the term should be replaced with the size of the second sample used to estimate (see Supplement, http://links.lww.com/EDE/B601). The conversion is easy to calculate manually or using the function r_to_d in the R package MetaUtility.

Comparing Equations (1.1) and (1.2) clarifies the meaning of the “Cohen’s ” that results from unknowingly applying Equation (1.1) to a correlation computed with a continuous . Specifically, the result coincides with the effect size associated with an increase in of two standard deviations. (However, even with , the standard error estimates in Equations (1.1) and (1.2) will, in general, still not coincide.) In many applications, this may represent a rather extreme contrast: for example, if is normal, then a two-standard-deviation contrast with the reference level set to the mean would involve comparison to the 97.7th quantile of . Alternatively, a two-standard deviation contrast from one standard deviation below the mean to one standard deviation above is a comparison of the 15.8th quantile to the 81.1th quantile of . Additionally, the absolute size of a two-standard-deviation contrast in may differ substantially across study populations and may therefore be challenging to interpret in practice.5 Thus, it is perhaps preferable, when possible, to instead fix a specific, scientifically meaningful contrast of interest, , which is held constant across all meta-analyzed studies, and then to apply the proposed conversion in Equation (1.2). The meta-analytic pooled estimate would then correspond to a well-defined contrast in of units, rather than to a contrast whose size may vary arbitrarily across studies.

The standard conversion is alternatively sometimes described in terms of the contrast that arises from dichotomizing at a given threshold,1 yet in fact, the conversion often substantially overestimates the contrast produced by dichotomization, even at extreme thresholds of . For example, we simulated bivariate normal data ( observations) where and , such that . The standard conversion estimates (Figure, dashed red line). We also calculated the true two-group Cohen’s arising from dichotomizing at various thresholds in (Figure, solid black curve). The Figure shows that the “Cohen’s ” from the standard conversion is 47% larger than the true two-group arising from dichotomization at the mean and still overestimates the true two-group for extreme dichotomization thresholds near or . For example, for dichotomization at (i.e., the 97.7th percentile), the standard conversion still overestimates the true two-group Cohen’s by 12%. FIGURE.: True two-group Cohen’s corresponding to dichotomizing a standard normal X at varying points (solid black curve) versus “Cohen’s “ calculated from the standard conversion in Equation (1.1) (dashed red line).

In summary, when approximating Cohen’s from Pearson’s or simple linear regression with a continuous , we caution against using conversions derived for a binary . We provide a straightforward conversion designed to accommodate the case of a continuous through specification of a fixed contrast in ; we believe its use in meta-analysis would enable more precisely interpretable and scientifically meaningful effect sizes.

## ACKNOWLEDGMENTS

An anonymous contributor on a statistics forum made helpful contributions to the proof of Lemma 2.1 (Supplement).

Maya B. Mathur
Department of Epidemiology
Harvard T. H. Chan School of Public Health
Boston, MA
Quantitative Sciences Unit
Stanford University
Palo Alto, CA
mmathur@stanford.edu

Tyler J. VanderWeele
Department of Epidemiology
Harvard T. H. Chan School of Public Health
Boston, MA

## REFERENCES

1. Borenstein M, Hedges LV, Higgins J, Rothstein HR. Introduction to Meta-Analysis. 2009.Hoboken, NJ. Wiley Online Library;
2. McGrath RE, Meyer GJ. When effect sizes disagree: the case of r and d. Psychol Methods. 2006;11:386.
3. Jacobs P, Viechtbauer W. Estimation of the biserial correlation and its sampling variance for use in meta-analysis. Res Synth Methods. 2017;8:161–180.
4. Hunter JE, Schmidt FL. Methods of Meta-Analysis: Correcting Error and Bias in Research Findings. 2004.Newbury Park, CA. Sage;
5. Greenland S, Schlesselman JJ, Criqui MH. The fallacy of employing standardized regression coefficients and correlations as measures of effect. Am J Epidemiol. 1986;123:203–208.