A cornerstone paper by Carlsen and coworkers1 triggered worldwide debate over a possible decline in male fertility. Although a decreasing trend in semen quality over time has not been firmly established,2 the possibility of such a trend has raised new concerns about environmental factors that might affect human fertility.3
Studies of time to pregnancy or infertility offer a more direct measure of fertility than do studies of semen quality. Three studies have examined time trends in time to pregnancy or infertility.4–6 Their findings are conflicting: fertility in Finland has reportedly decreased over recent decades,4 whereas an increase in fertility was reported in Britain6 and in Sweden,5 with the increase in Sweden likely to be smaller than originally reported.7,8
Reproductive behaviors have changed over recent decades, leading to changes in pregnancy planning and possible bias in measures of fertility. In this article, we consider how the study of time trends in fertility could be biased by (1) changes in the occurrence of unprotected intercourse among couples not intending to conceive and (2) changes in the proportion of women who decide to terminate unintended pregnancies. We explore the strength and direction of these biases with simulation studies, and use the results to consider the feasibility of studying fertility changes over recent decades.
We use “fertility” in the medical and colloquial sense to denote reproductive capacity (rather than the demographic sense of number of live births). Aside from some medically determined causes of sterility, the reproductive capacity of a couple cannot be measured directly. Rather, it has to be assessed stochastically, by whether pregnancy occurs and how long it takes the couple to conceive.
There is a range of reproductive capacity among nonsterile couples (eg, papers by Tietze9 and Dunson et al10). Specific methods, based on either time to pregnancy11,12 or infertility, have been developed to assess population fertility and environmental factors affecting fertility. Time to pregnancy refers to the number of menstrual cycles it takes a couple to conceive. Fecundability, in turn, is a couple-specific probability of conceiving a recognized pregnancy per menstrual cycle, given no contraception. An individual couple's fecundability is not observable, but the average fecundability for groups of individuals can be estimated from time-to-pregnancy data. We use “infertile” and “infertility” to refer to couples who try for more than a year to conceive. Many infertile couples will eventually conceive, but a sterile subset cannot conceive without medical intervention.
BIAS FROM UNCOUNTED AT-RISK CYCLES AND UNINTENDED PREGNANCIES
Many couples have 1 or more menstrual cycles with unprotected intercourse when they have no intention to start a pregnancy. These cycles, referred to as at-risk cycles, may or may not lead to pregnancy. At-risk cycles complicate the study of fertility because they add to a couple's time to pregnancy without being easily measurable. Lack of information on these unmeasured cycles can create bias.
The typical time-to-pregnancy study ignores unintended pregnancies and focuses instead on couples trying to conceive, because most women are able to remember the number of cycles to conception for planned pregnancies. (Design 1 in Fig. 1.) There is, however, bias in this approach;13 in cultures where fertility is imperfectly controlled, couples who eventually try to conceive are less fertile on average. The basis for this bias is simple: The more-fertile couples tend to conceive unintentionally, whereas less-fertile couples do not. Thus, less-fertile couples remain available for later planned pregnancies and hence contribute more than their share to time-to-pregnancy studies.
The effective use of contraception and elective abortion both mitigate the bias caused by unintended pregnancies. Couples who consistently use effective birth control have no at-risk cycles. Couples who conceive an unintended pregnancy and then abort the conceptus return to the pool of potential planners. In both cases, highly fertile couples are kept in the pool of couples available for time-to-pregnancy studies.
Another proposed design solution for dealing with unintended pregnancies has been simply to include couples as “fertile” if they had a previous unintended pregnancy that survived to birth (Design 2 in Fig. 1). Such a strategy can increase the sample size and alleviate the bias toward lower fertility that afflicts time-to-pregnancy studies. However, this approach presents a new problem. It still ignores previous uncounted at-risk cycles. Some couples who should be categorized as infertile will be wrongly defined as fertile because the number of “counted” cycles to their pregnancy is less than a year. This type of misclassification causes overestimation of fertility.
THE PROBLEM OF STUDYING FERTILITY OVER TIME
Studies of time trends in fertility based on time to pregnancy can easily be biased by the mechanisms discussed here. During a time period with high rates of unintended births (which would result in more exclusions of highly fertile couples), observed fertility will be low compared with fertility at times of fastidious use of contraception or widespread reliance on abortion for unintended pregnancies. A time-to-pregnancy-based comparison of fertility before and after the introduction of effective contraception and legal abortion could consequently appear to show increasing fertility over time.
Alternatively, consider a study of infertility that categorizes all couples as either fertile (pregnancy without intending to conceive or within 12 months of trying) or infertile. In times of ineffective contraception (many at-risk cycles) and few abortions, fertility will be overestimated compared with times of effective contraception and abortion of unintended pregnancies. Thus, a study of infertility will show an apparent decrease in fertility over recent decades. Figure 2 illustrates schematically the biases we have described.
In the following sections, we construct simulations to quantify these biases. In constructing these simulations, we make illustrative assumptions about the changes in numbers of at-risk cycles and abortion rates for unintended pregnancies. The findings provide a context for considering the feasibility of identifying true changes in fertility over time.
SIMULATIONS TO QUANTIFY BIAS
Simulation Populations and Assumptions
We generated a reference population of 100,000 couples, with 5% being sterile. The remaining 95% were assigned fecundabilities based on a beta distribution (a = 1.5, b = 3.5) with mean fecundability of 0.30, standard deviation 0.188. This combination of sterile and fertile couples produced an overall mean fecundability of 0.285, standard deviation 0.194, and an infertility rate of 15.1%, consistent with published estimates.14
We then generated 12 study populations. Without varying the fecundability distribution, we allowed the values of 2 parameters to vary: (1) the number, N, of previous at-risk menstrual cycles during which pregnancy had not been intended but might have occurred; and (2) the probability, K (“keep”), that an unintended pregnancy continued to birth (ie, was not lost naturally or terminated by induced abortion). We assigned women 2, 4, 6, or 8 at-risk cycles before the first attempt to get pregnant. If pregnancies occurred in at-risk cycles, we assigned probabilities of 0.1, 0.5, or 0.9 that the unintended pregnancies would continue to birth. The 4 × 3 combinations of N and K were used to generate the 12 separate study populations.
Each simulation follows couples longitudinally and records (1) the number of cycles required to conceive, (2) whether the conception occurred during an at-risk cycle or during an intentional attempt (which automatically follow the at-risk cycles), and (3) whether the pregnancy continues to birth if conception occurs during an at-risk cycle. We impose no age structure (ie, we assume that all couples are the same age or that any heterogeneity due to age is captured by thefecundability distribution). Thus, a couple's assigned fecundability does not change during a simulation.
The 12 study populations with differing values of N and K can be regarded as corresponding to the various demographic conditions found in industrialized countries between the early 1960s and the early 1990s,15–20 with a decline over time in the number of at-risk cycles (N) as better birth control methods became available.15 Our assumptions about “at-risk cycles” reflect the changes in contraceptive behavior seen in many Western countries.20 In the late 1960s or early 1970s, approximately half of women used no contraception or only traditional methods. In the early 1990s, this proportion had dropped to 20% to 30%. Thus, the higher N values (6 or 8 unprotected cycles) might represent the late 1950s or early 1960s, whereas the lower values (2 or 4 cycles) might represent more recent times.
We used information from United States surveys16 and nationwide statistics from the United Kingdom17,18 and Finland19 for our assumptions on trends in induced abortion. Clinical abortions were legalized in these countries starting in the late 1960s. The highest probability of keeping an unintended pregnancy, 0.9, assumes no induced abortions and a 10% natural miscarriage rate. This extreme is perhaps not far from the situation in some areas before legalization of abortion. The value of 0.5 corresponds roughly to U.S. statistics in the 1970s to 1990s,16 whereas a value of 0.1 may be close to that which would occur under stringent population policies such as the one-child policy in China.
We use 2 study designs (Fig. 1), 1 based on time-to-pregnancy data (design 1) and the other on infertility assessment (design 2). Design 1 assesses time to pregnancy among first-pregnancy planners. Couples with any previous unintended pregnancy that ended in birth are excluded. For a given couple with fecundability, p, the probability that the couple does not have a first child as a result of an unintended pregnancy, and hence is available for study, is (1-pK)N. We measure the number of cycles it takes nulliparous couples to become pregnant at their first attempt (or the number of cycles they attempt if no pregnancy is achieved). This study design resembles that used in a Danish study of fecundability.21
In design 2, the end point is primary infertility. Couples with a previous unintended pregnancy leading to birth are classified as fertile. For all the other couples, we assume that they make an attempt at pregnancy, and fertility status is defined on the basis of their first attempt. Couples who are unsuccessful for at least a year (13 menstrual cycles) are categorized as infertile regardless of whether they eventually conceive. Those who conceive within a year are classified as fertile. This design approximates that used in the Finnish study of time trends in fertility.4
Statistical Analyses of Simulated Data
We used our 2 study designs to measure fertility in each of the 12 sample cohorts, comparing the cohort fertility to the “true” fertility as measured in the reference population. In the reference population, the number of at-risk cycles is zero, implying no couple has an unintended pregnancy.
The count of cycles required to conceive (or number of cycles followed for couples not conceiving) during an intentional attempt provided the time-to-pregnancy data for analysis. We analyzed time-to-pregnancy data using a proportional probability model,22 a discrete analog of the Cox23 proportional hazards model. In assessing time to pregnancy, we treated times after the 13th cycle (1 year) as censored. This is routinely done because 1 year is the time at which clinical intervention often begins. This model allows estimation of the fecundability ratio (FR), here the ratio of the cycle-specific conception probability in the sample cohort relative to the reference population. Because our analysis imposed no change on actual fecundability (ie, the true FR would be 1.0), a FR below 1 means that the fertility in the sample cohort is biased downward. Within the context of design 1, we also focused on clinical primary infertility using logistic regression to study the attempt-time data dichotomized at 13 cycles or less compared with 14 or more cycles. This analysis was also restricted to intentional attempt times for couples who had not previously had an unintended pregnancy resulting in birth. The outcome parameter in this analysis was the infertility odds ratio (OR).
In design 2, couples with an unintended pregnancy leading to birth were not excluded, but rather were categorized as “fertile.” Once again, for couples attempting pregnancy, we dichotomized at up to 13 cycles and 14 or more cycles, with the latter couples regarded as infertile.
The infertility ORs from the 2 designs can be compared directly because the reference population is the same. An infertility OR above 1.0 indicates a higher rate of apparent infertility in the sample cohort, thus underestimating true fertility.
With design 1, 24% to 94% of couples survived the exclusion criteria of “no previous unintended pregnancy ending in birth” and were therefore eligible for analysis (Table 1). The fecundability ratio decreased with an increase in the number of previous at-risk cycles (N) and the probability that an unintended pregnancy continued to birth (K). When N and K were at their highest values (representing the 1950s), the measured fecundability was less than half of the true fecundability (Table 1), and the infertility odds ratio was overestimated by more than 4-fold (Table 2). This demonstrates that a study of fertility trends from the 1960s to the 1990s using this design could spuriously produce a more than 2-fold increase in fecundability.
With design 2, bias was in the opposite direction. Infertility was underestimated by approximately 30% for the cohort with the highest N and K (Table 2). An apparent decrease in fertility over calendar time arises from the nearly 40% difference in the rate of infertility that can be produced by bias. Comparing the bias in the 2 designs (Table 2), the strength of the bias is weaker in the second design but still substantial.
Our simulations suggest that studies of changes in fertility (ie, reproductive capacity) over recent decades may suffer substantial bias. In the presence of social changes that have led to more available and more effective contraception, and to more widespread abortion of unintended pregnancies, the apparent fertility of the population can appear to increase or decrease depending on the study design. This can occur despite no actual change in fertility.
Our analyses are relevant to the interpretation of the 3 cited studies of time trends in fertility. The studies in the United Kingdom6 and Sweden5 suggested an increase in fertility over time. They used a design similar to design 15 or sharing common features with it.6 In contrast, the study almost identical to our design 24 found a decrease in fertility over recent decades. These findings are consistent with biases from changes in birth control and abortion practice. Thus, the conflicting results in the literature with regard to time trends in fertility could be due at least in part to conflicting biases.
The use of a standard study design over time has been suggested as a key element in estimating time trends.24–26 Our findings demonstrate that when key social factors change over time, a standard study design is not sufficient. Furthermore, we focused on only 2 factors—the number of at-risk menstrual cycles and the probability that an unintended pregnancy is carried to birth. Even if there were a way to adjust for the 2 factors we examined, changes in other social factors must also be considered in real-life studies. Examples include desired family size and desired age at first birth, factors which may also change over time. The problems that such factors present for monitoring fertility trends over time have been described.26
There are complicated interactions among various social factors. For example, women who have had 1 or more unwanted pregnancies may decide to use a more effective method of birth control. Thus, population-level data on relevant social changes are insufficient and we see no reasonable way to control for all those factors.
Not only has reproductive behavior changed over time, but the definition of intendedness is elusive.27 The terms “intended,” “unintended,” “mistimed,” “wanted,” “unwanted,” “planned,” and “unplanned” have been frequently used to address both behavioral and attitudinal aspects of pregnancy planning. In this article, we have used the term “unintended” to mean the woman has little or no intention to become pregnant. The other simulation variable, “probability that an unintended pregnancy is carried to birth,” is related to the decision to have an abortion. Ultimately, it is the woman herself who must recall the time taken to conceive and who defines the pregnancy as planned or unplanned. If social changes cause these self-assessments to change, then such changes could provide another artifact in studying fertility.
The biases described here apply to more than just comparisons of fertility over time. Other authors25,28–30 have already noted the difficulties in comparing fertility among countries or cultures or occupational groups with differing patterns of contraceptive use, pregnancy planning, or persistence of trying. To our knowledge, only the biasing effect of differing persistence of trying30 and poor comparability between sperm count and fecundability31,32 have been quantified. However, if the groups being compared are generally similar in pregnancy planning characteristics, the biases we describe are probably unimportant. Thus, time-to-pregnancy studies within an occupation or across occupational groups with similar social patterns are unlikely to be strongly affected by these problems.
We started with an important question: “Has fertility declined over recent decades?” We see no easy way to avoid the biases we describe. Proposed designs14,33,34 do not solve the problem. Perhaps one could restrict a study to settings in which there has either been no unprotected intercourse among those who do not want to be pregnant (N = 0) or if such exposure occurs, no survival of unintended pregnancies (K close to 0). Such opportunities are extremely limited. K might be small where national policies strongly discourage unintended births such as in China. The other extreme is one where no contraception is used at all, for example, the Hutterites in North America or the Laestadians in Finland, who traditionally practice no family planning after marriage. In such cultures, by definition, no pregnancy can be unintended, so the biases we describe would not pertain. Outside these settings, we see little possibility for clear and unbiased estimates of changes in fertility over recent time. As long as human fertility can be measured only by some assessment of time to conception, we believe that evolving social patterns are likely to produce differential attrition and frustrate efforts to detect changes in fertility on the population level.
We thank Chirayath Suchindran, Beth Gladen, and Olga Basso for their valuable comments and critical review of the manuscript.
1. Carlsen E, Giwercman A, Keiding N, et al. Evidence for decreasing quality of semen during past 50 years. BMJ
2. Bonde JP. Environmental fertility research at the turn of the century. Scand J Work Environ Health
. 1999;25(special issue):529–536.
3. Toppari J, Larsen JC, Christiansen P, et al. Male reproductive health and environmental xenoestrogens. Environ Health Perspect
. 1997;104(suppl 4):741–803.
4. Notkola I. New information on the prevalence of infertility. Uutta tietoa hedelmättömyyden yleisyydestä [in Finnish]. Suomen Lääkärilehti
5. Akre O, Cnattingius S, Bergström R, et al. Human fertility does not decline: evidence from Sweden. Fertil Steril
6. Joffe M. Time trends in biological fertility in Britain. Lancet
7. Jensen TK, Keiding N, Scheike T, et al. Declining human fertility? [Letter] Fertil Steril
8. Akre O, Cnattingius S, Kvist U, et al. Re: Declining human fertility? [Letter] Fertil Steril
9. Tietze C. Fertility after discontinuation of intrauterine and oral contraception. Int J Fertil
10. Dunson DB, Colombo B, Baird DD. Changes with age in the level and duration of fertility in the menstrual cycle. Human Reprod
11. Baird DD, Wilcox AJ, Weinberg CR. Use of time to pregnancy to study environmental exposures. Am J Epidemiol
12. Wood JW. Dynamics of Human Reproduction
. New York: Aldine de Gruyter; 1994;653:
13. Weinberg CR, Baird DD, Wilcox AJ. Sources of bias in studies of time to pregnancy. Stat Med
14. Weinberg CR, Gladen BC. The beta-geometric distribution applied to comparative fecundability studies. Biometrics
15. Henshaw SK. Unintended pregnancy in the United States. Fam Plann Perspect
16. Henshaw SK. Abortion incidence and service in the United States, 1995–1996. Fam Plann Perspect
17. Office for National Statistics, United Kingdom. Abortion Rates: by Age 1968–1999: Social Trends 31
. Available at: www.statistics.gov.uk
. Accessed December 12, 2002.
18. Office for National Statistics, United Kingdom. Abortion Rates: by Age: Social Trends 32
. Available at: www.census2001.gov.uk
. Accessed July 2, 2003.
19. STAKES: Statistical Yearbook on Social Welfare and Health Care 2002
. Finland's Official Statistics: Social Security. Gummerus Kirjapaino Oy. Saarijärvi; 2002.
20. U.S. Bureau of Census, International Data Base. Available at: http://blue.census.gov
. Accessed December 12, 2002.
21. Bonde JPE, Hjollund NHI, Jensen TK, et al. A follow-up study of environmental and biologic determinants of fertility among 430 Danish first-pregnancy planners: design and methods. Reprod Toxicol
22. Weinberg CR, Wilcox AJ. Reproductive epidemiology. In: Modern Epidemiology
. Philadelphia: Lippincott-Raven; 1998:585–608.
23. Cox RD. Regression models and life tables (with discussion). J Royal Stat Soc Ser B
24. Juul S, Karmaus W, Olsen J, and The European Infertility and Subfecundity Study Group. Regional differences in waiting time to pregnancy: pregnancy-based surveys from Denmark, France, Germany, Italy and Sweden. Hum Reprod
25. Karmaus W, Juul S. Infertility and subfecundity in population-based samples from Denmark, Germany, Italy, Poland and Spain. Eur J Public Health
26. Olsen J, Rachootin P. Invited commentary: monitoring fertility over time—if we do it, then let's do it right. Am J Epidemiol
27. Klerman LV. The intendedness of pregnancy: a concept in transition. Matern Child Health J
28. Tuntiseranee P, Olsen J, Chongsuvivatwong V, et al. Fecundity in Thai and European regions: results based on waiting time to pregnancy. Hum Reprod
29. Zhu JL, Hjollund NH, Boggild H, et al. Shift work and subfecundity: a causal link or an artefact? Occup Environ Med
30. Basso O, Juul S, Olsen J. Time to pregnancy as a correlate of fecundity: differential persistence in trying to become pregnant as a source of bias. Int J Epidemiol
31. Bonde JP, Ernst E, Kold Jensen TK, et al. Relation between semen quality and fertility: a population-based study of 430 first-pregnancy planners. Lancet
32. Slama R, Kold-Jensen T, Scheike T, et al. How would a decline in sperm concentration over time influence the probability of pregnancy? Epidemiology
33. Keiding N, Kvist K, Hartvig H, et al. Estimating time to pregnancy from current durations in a cross-sectional sample. Biostatistics
34. Olsen J, Andersen PK. We should monitor human fecundity, but how? A suggestion for a new method that may also be used to identify determinants of low fecundity. Epidemiology