Mental Health and Abortions among Young Women: Time-varying Unobserved Heterogeneity, Health Behaviors, and Risky Decisions

In this paper, we provide causal evidence on abortions and risky health behaviors as determinants of mental health development among young women. Using administrative in- and outpatient records from Sweden, we apply a novel grouped fixed-effects estimator proposed by Bonhomme and Manresa (2015) to allow for time-varying unobserved heterogeneity. We show that the positive association obtained from standard estimators shrinks to zero once we control for grouped time-varying unobserved heterogeneity. We estimate the group-specific profiles of unobserved heterogeneity, which reflect differences in unobserved risk to be diagnosed with a mental health condition. We then analyze mental health development and risky health behaviors other than unwanted pregnancies across groups. Our results suggest that these are determined by the same type of unobserved heterogeneity, which we attribute to the same unobserved process of decision-making. We develop and estimate a theoretical model of risky choices and mental health, in which mental health disparity across groups is generated by different degrees of self-control problems. Our findings imply that mental health concerns cannot be used to justify restrictive abortion policies. Moreover, potential self-control problems should be targeted as early as possible to combat future mental health consequences.


Introduction
In recent years, economists have increasingly paid attention to mental health problems and their consequences, especially when occurring during adolescence and young adulthood (Biasi et al., 2021;Cuddy and Currie, 2020). Mental health problems are often first diagnosed in early adulthood and are very pervasive, in particular among young women (see Eaton et al., 2008). In 2017, about 13-19% of adolescents between 15-25 in the US experienced at least one major depressive episode (NIH, 2019). As pointed out by Currie (2020) mental health problems can reflect deficits in non-cognitive skills that are crucial for human capital development and labor market outcomes in adulthood. Thus, knowing about potential determinants of mental problems is of first-order importance.
One possible determinant that is often discussed in connection with mental health problems is abortion. In the US, abortions for women aged 15-24 years account for almost 40% of all abortions in (Kortsmit et al., 2020. As pointed out by Reardon (2018) abortion is consistently associated with elevated rates of mental illness compared to women without a history of abortion. While there are different perspectives on the interpretation of this association, there is hardly any evidence for a causal relationship.
Yet, in many countries, the association between abortion and mental health seems to be sufficient for politicians to justify restrictions on abortion access such as waiting times, mandatory disclosures, or parental consent laws (Guttmacher Institute, 2020). This paper investigates the impact of having an abortion from an unwanted pregnancy on the incidence of mental health conditions in young women in Sweden. We use individual-level administrative panel data that includes the universe of inpatient and outpatient contacts with the healthcare system, including general practitioners and specialists. While most studies on mental health rely on inpatient records or prescription drug data as a proxy for diagnoses, our records contain detailed information on mental health diagnoses and abortions, thus providing a comprehensive picture of the prevalence of mental health conditions and abortions from unwanted pregnancies in the population. Our primary measure of mental health is diagnoses on mood disorders which mainly consist of diagnoses on depression. We also analyze anxiety and fear-related disorders as important dimensions of mental health problems.
In the absence of any policy variation in abortion legislation, identifying a causal effect is challenging. Traditional estimators using within-person variation such as event-study or individual-specific fixed-effects assume that individual unobserved heterogeneity is time-constant. In our application, this seems too restrictive, as it neglects that selection into abortion is dynamic. To address this issue, we use a grouped fixed-effects estimator, henceforth GFE, proposed by Bonhomme and Manresa (2015). The basic idea of the GFE estimator is that individuals who share similar unobserved characteristics are clustered in groups. Within these groups, unobserved heterogeneity can vary with age, with no further restrictions on the functional form of these unobserved heterogeneity trajectories.
We compare the results from the individual-specific fixed-effects (OLS FE) and the GFE estimator. The estimated OLS FE-coefficient for abortion is positive and highly statistically significant. By contrast, we estimate a precise zero effect of abortion on mental health diagnoses when using the GFE estimator. The significant difference in the estimated coefficients stresses the importance of accounting for time-varying unobserved heterogeneity in addition to individual-specific time-constant fixed-effects. We also compare the identifying assumptions of the Differences-in-Differences (DiD) estimator under random treatment assignment with those of the GFE estimator, showing that the assumptions are not nested. Thus, the choice of estimator depends on the particular application.
Since the GFE estimator is a fixed-effects estimator, we perform a within-person comparison to estimate causal effects. This implies that our estimates can answer questions about how much a variable of interest affects the outcome trajectory of an individual. In our case, we estimate the joint event of an unwanted pregnancy followed by an abortion.
Because our estimated effect is close to zero, we can reasonably conclude that this adverse Our estimated unobserved heterogeneity profiles would violate the parallel trends assumptions of the DiD even with randomized treatment assignment and thus fail to identify a causal effect, see Section 4.3. life event does not change the mental health trajectory of an affected woman. It implies that in the counterfactual where a woman is denied an abortion, we would expect her mental health to deteriorate unless we were willing to assume that continuing the unwanted pregnancy would improve her mental health trajectory. Thus, an abortion can make up for the (potentially) adverse life event of an unwanted pregnancy as if it had never happened.
The GFE estimator requires the researcher to select the number of groups of timevarying unobserved heterogeneity. We employ several performance measures to select the correct number of groups and choose the GFE estimator with two groups as our main specification. The estimated unobserved mental health profiles differ considerably across groups in both scale and slope. While most young women share a relatively flat age profile of unobserved heterogeneity, about 6% exhibit a profile that steeply increases with age. We interpret the profiles as the age-dependent, unobserved risk of developing mental health problems. This implies that the majority of women exhibit a low unobserved mental health risk as they age. By contrast, a small but significant share of women has a low mental health risk at age 16 that sharply accumulates as these women age.
To investigate the robustness of our main specification, we discuss alternative dynamic processes and implement several alternative estimators. We find no evidence for reverse causality or dynamic abortion effects. We moreover instrument abortion decisions with miscarriages, showing that abortions have no detrimental mental health effects.
We next address the question of what factors are potentially picked up by the profiles of unobserved mental health risks. Since abortions from unwanted pregnancies are primarily the result of a woman's decision to engage in unprotected sex, we link mental health and abortions to other risky health behaviors observable in our data, i.e., chlamydia infections, STD screenings, and alcohol intoxication. The correlation between these observed behaviors and abortion is substantial, but controlling for them does not alter the point estimates An unwanted pregnancy could also be a neutral event in terms of mental health costs. Then, abortion restrictions would not affect mental health. Due to other costs of denying an abortion documented in the literature, abortion restrictions would have detrimental effects without improving mental health. of abortion. Moreover, estimated coefficients of these other behaviors exhibit a similar pattern as the abortion coefficients across all considered specifications. Finally, we show that the estimated unobserved mental health risk profiles are strongly correlated with these behaviors. Overall, these results suggest that risky health behaviors are also outcomes of the same choice process as abortion, rather than omitted control variables.
We propose a model of inter-temporal choices and mental health to understand how dynamic decision-making may lead to diverging unobserved heterogeneity profiles. As discussed by O'Donoghue and Rabin (2001), adolescents may engage in unprotected sexual activities because they place a much higher weight on immediate gratification than on the considerable costs they may face in the future. We thus model women's time preferences as quasi-hyperbolic to induce self-control problems. We link the model to our empirical results by allowing for two groups of women who vary by the degree of present bias. This leads to different trade-offs, decisions, and a different evolution of risky behaviors and mental health. The estimated parameters indicate significant heterogeneity in the present bias across groups, resulting in different mental health trajectories.
Many studies have investigated fertility and economic outcomes of abortion (e.g. Currie et al., 1996;Gruber et al., 1999;Pop-Eleches, 2006;Ananat et al., 2007;Ananat et al., 2009;Myers, 2017). Nevertheless, mental health consequences have been understudied by economists. The medical literature has found mixed conclusions on whether an abortion negatively impacts mental health. To a large extent, these inconclusive results can be attributed to methodological issues of a difficult-to-study subject. Randomized controlled trials are ethically not feasible. Survey data often suffer from non-classical measurement error, under-reporting, and recall bias in the presence of stigma. Individual-level data is rarely available, even in countries where administrative data is widely used. Reardon (2018) provides a detailed discussion of the medical literature on abortion and mental health. Biggs et al. (2020) show that in the US perceived abortion stigma at baseline is associated with higher self-reported psychological distress five years after an abortion.
Two medical studies address some methodological issues using an event-study design and Danish healthcare registers. Munk-Olsen et al. (2011) find no evidence of an increased risk of mental disorders after a first-trimester induced abortion. Steinberg et al. (2018) show that women who had a first-trimester An innovative approach to quantify the effect of abortion denial on women's lives is the Turnaway Study. With this data, Biggs et al. (2017) find no effect of abortion on depression. However, there are two potential concerns with the Turnaway study: First, the treatment and control groups differ substantially in their observable characteristics. This raises concerns about potential differences in unobservables and endogenous selection into treatment and control groups. Second, the sample size is very small, and thus power is an issue, implying that effects would need to be very large to be detected. At least in Biggs et al. (2017), this leads to very wide confidence intervals and inconclusive results.
In economics, studies analyzing abortion effects typically exploit changes in legislation for identification and focus on the US (see, for instance, Ananat et al., 2007;Currie et al., 1996;Fischer et al., 2018;Gruber et al., 1999;Lindo et al., 2020;Miller et al., 2020a;Steingrimsdottir, 2016). Myers (2017) uses state-level variation in access to the contraceptive pill and abortions to estimate the impact on fertility and marriage. She shows that while legalizing the pill for minors does not significantly affect these outcomes, abortion legalization had a considerable impact. Only a few studies have looked at changes in abortion legislation outside the US (Mølland, 2016;Pop-Eleches, 2006). Clarke and Mühlrad (2021) examine the effect of abortion on health in Mexico, with mental health as a secondary outcome. Exploiting both progressive and regressive changes in abortion legislation, they show that the initial legalization resulted in a sharp decline in maternal morbidity but find no effect on mental health in either direction. However, the study uses inpatient postpartum depression as the only measure of mental health, limiting the scope of their result. A common limitation of the studies discussed above is that changes in legislation might be intertwined with changes in stigma, thus potentially violating the induced abortion have higher rates of antidepressant use. Event-study approaches have the disadvantage of failing to identify key components of the model (Borusyak and Jaravel, 2017) and cannot account for time-varying unobserved heterogeneity. Thus, a causal interpretation is unlikely to be valid.
The Turnaway Study collects individual longitudinal information of women who received an abortion and women who were denied an abortion due to ineligibility based on cut-off dates in the US. The study followed women over five years after the initial abortion encounter to collect information about health, well-being, education, and labor market outcomes (Miller et al., 2020b). identifying assumption of the DiD estimation strategy. This may be particularly important when mental health is the outcome of interest (see Biggs et al., 2020).
We complement this economics literature in several ways. Our analysis uses administrative records, covering all women in the region of Skåne over ten years. Hence, we observe all abortions from unwanted pregnancies and mental health diagnoses on the individual level. Our identification strategy does not rely on state-or cohort variation in abortion legalization, as the Swedish abortion policy has not changed since the early 1970s.
Instead, we deal with unobserved heterogeneity in the abortion decision using a novel estimator -the GFE estimator -which allows for time-varying unobserved heterogeneity within groups of individuals (Bonhomme and Manresa, 2015). Our analysis is carried out in Sweden, a country with virtually no restrictions on abortion or contraception, which minimizes the potentially confounding effects of abortion stigma on mental health. The joint analysis of abortions and other risky health behaviors highlights the importance of accounting for dynamic unobserved heterogeneity. In particular, we show that it is not sufficient to control for other behaviors in conventional individual fixed-effects models, as they may be driven by a similar underlying decision-making process as abortion decisions.
Our theoretical model shows that heterogeneity in the degree of present bias is sufficient to explain heterogeneity in mental health trajectories. Using non-standard time-preferences is motivated by a large literature in behavioral-and health economics (for comprehensive reviews see Cawley and Ruhm (2011) in health economics; Gruber (2001) andFrederick et al. (2002) in behavioral economics). Gruber and Köszegi (2001) is an early, highly influential paper showing that inconsistent time preferences can generate economic models which rationalize risky health behaviors. Among adolescents, present-biased preferences have been analyzed in the context of smoking or alcohol consumption (Sutter et al., 2013), and risky sexual behavior (Chesson et al., 2006). Our theoretical model combines these insights and links them to results generated by a novel econometric estimation approach to illustrate the evolution of mental health among young women.
Finally, our study adds to a growing literature on the relationship between preferences, non-cognitive skills, and mental health. As pointed out by Currie (2020), mental health issues are an important determinant of human capital development as they reflect deficits in non-cognitive skills. Heckman et al. (2006) show that non-cognitive skills play a substantial role in explaining adolescents' decisions to engage in risky behavior, such as marĳuana use or illegal activities. Studying the relationship between time-inconsistent preferences, non-cognitive skills, and depression, Cobb-Clark et al. (2020) show that selfcontrol problems are strongly correlated with non-cognitive skills such as the internal locus of control and partly explain the depression gap in risky health behaviors among adults. While we cannot incorporate a link between non-cognitive skills and present biased preferences, our theoretical model illustrates how mental health develops as a consequence of dynamic decisions under preference heterogeneity.
Our work has several implications. First, the precisely estimated null-effect indicates that an abortion from an unintended pregnancy has no detrimental effect on mental health.
Thus, mental health can not justify policies that impose restrictions on abortions. By contrast, they may even have unintended negative consequences if more restrictive policies lead to a stronger political and social stigmatization of abortions (see, e.g., Biggs et al., 2020). Second, restricting abortion access seems inadvisable: there is previous evidence on adverse economic consequences of restrictive abortion policies (see for instance Felkey and Lybecker, 2018;Lindo and Pineda-Torres, 2021;Miller et al., 2020a,b). Our null results imply that unrestricted access to abortion does not lead to additional mental health costs. Taken together, restrictive abortion policies are thus unlikely to be welfareenhancing. Third, the substantial differences in the estimated unobserved heterogeneity profiles between high-risk and low-risk women imply that general mental health screenings are unlikely very effective tools for combating mental illness in adolescents. Instead, interventions should target high-risk women at younger ages, using tools similar as in Borghans et al. (2008) discuss how to incorporate preferences and personality traits in economic models.
Alan and Ertac (2018) to reduce self-control problems and the likelihood to develop severe mental illnesses. By doing so, one may keep not only direct medical costs low but also reduce indirect costs of mental health disorders such as lower educational attainment and fewer earnings (Biasi et al., 2021;Currie et al., 2010;Fletcher, 2010).
The paper is organized as follows. Section 2 outlines the Swedish health care system and the abortion history in Sweden. In Section 3, we describe the data and measures for mental health and abortion. Section 4 introduces our empirical strategy, and Section 5 discusses our results and associated robustness checks. The theoretical model is presented in Section 6. Section 7 concludes.

The Swedish health care system
In Sweden, health care is primarily public and organized at the regional level. Within a region (e.g., Skåne), different municipalities have different health care centers (or primary care units) that house all out-patient care. Here, "out-patient" refers to all contacts with care providers that do not include at least one night's stay, i.e., all ambulatory care, such as visits to physicians, emergency care, nurses, or physiotherapists. In addition, it covers consultations by telephone. Typically, a small municipality has only one health care center.
Larger cities have multiple centers. "In-patient" care, as opposed to out-patient care, refers to visits at health centers or hospitals that include at least one night's stay.
Every individual is assigned to one health care center, usually the nearest one. When necessary, an individual goes to the center and is helped by the next available health care worker. There is no path dependence in the identity of the health care worker across consecutive contacts. Individuals are dealt with sequentially by the first available health Aizer (2017) discusses different approaches of reducing self-control problems among adolescents. Based on a model of skill formation, she argues that programs to be effective should be implemented in pre-school age as it allows to control the environment interacting with such investments. care worker on a given day. The health care system is funded through a proportional regional income tax. Healthcare is free of charge, except for a small deductible capped at 900 SEK (about 117 USD) per year during our observation period.

Abortions in Sweden
In Sweden, abortions were first legalized by the Abortion Act of 1938, guaranteeing access for limited cases. The act states that pregnancies may be terminated if the child's birth threatens the mother's life or health or if the child is expected to have severe malformations or mental deficiencies (Glass, 1938). The current version of the abortion act took effect in January 1975. It grants access to abortions on request until week 18 without any restrictions. Importantly, minors do not require parental consent to receive an abortion (Socialstyrelsen Sweden, 2010. Thus, the decision to terminate a pregnancy is solely made by the pregnant woman regardless of her age. In 1992, Sweden approved the "abortion pill" (mifepristone), which allows terminating a pregnancy at an early stage (at most 49-56 days after conception) without a hospital stay (Jones and Henshaw, 2002). Between weeks 9-13, abortions are conducted through surgical intervention. After week 13, an overnight stay at the hospital is required. Since the mid-1990s, the emergency contraceptive pill (ECP), also known as morning-after pill, has been available. In 2001, the ECP was approved to over-the-counter (OTC) purchase (Guleria et al., 2020). Figure B.1 in Appendix B shows the aggregate time trends in abortions by gestation week and age for Skåne and the whole of Sweden. There is a trend to substitute later abortions (week 9-11) with earlier abortions (before week 9) regardless of age. Besides, there is no discernible discontinuity around the date of OTC availability.
In 1999, 26.3 per 1,000 women had an abortion in the age group 20-24, and 19.0 per 1,000 women aged 19 and below. These numbers increased over time to 34.7 and 24.4 abortions per 1,000 women in these age groups. In Skåne, numbers are slightly lower, but with 33.9 and 22.3 abortions per 1,000 women, they are still very high (Socialstyrelsen Sweden, 2020), in particular, compared to other developed countries (Haegele, 2005).
Figure 1 compares abortion rates and alternative birth outcomes among adolescents in Sweden to those in the US, a country in which access to abortion is more restricted in practice. Abortion rates are much higher in Sweden. However, teenage birth-and miscarriage rates in Sweden are only about 15% and 20% of those in the US.
What would we expect from restricting access to abortions in Sweden? According to the literature, abortions could be substituted by increased birth rates, abstinence, or higher contraceptive use. Fischer et al. (2018) show that proclivities for risky sexual behavior are not very sensitive to restrictive abortion policies, at least not among adolescents in the US. This is in line with the finding that abstinence-only sexual education programs are not effective in increasing abstinence (Santelli et al., 2017) or reducing birth rates (Kearney and Levine, 2015). Substituting abortions by higher contraceptive use is also unlikely to happen, at least not in Sweden, where contraception is widely available and easily accessible. Sydsjö et al. (2014) find no evidence that increased contraceptive use is associated with lower rates of induced abortions. Thus, introducing abortion restrictions in Sweden would most likely lead to an increase in teenage birth rates, all else being equal.
Abortion access may determine not only pregnancy outcomes but also the level of abortion stigma. Abortion stigma can be generated through negative judgments of the social environment and structurally through restrictive abortion policies of governments and institutions, which can increase social stigmatization. Biggs et al. (2020) shows that abortions are associated with stigma, which increases psychological distress in women.
Restricting access to abortions may increase stigma and mental health problems in women seeking an abortion but leave the abortion effect itself unchanged. Thus, in a country with a very restrictive abortion policy and strong stigma, increased abortion access without In Sweden, teenage women rarely bear children. According to Lager et al. (2012), approximately six children were born per 1,000 young women aged 15-19. 80% of all pregnant women aged 15-19 and 41% of all pregnant women aged 20-24 opted for abortion in 2009. These numbers are similar to our sample statistics. Among all 16-to 19-year-old pregnant women, 76% opted for an abortion, while 19% gave birth and 5% had a miscarriage. reducing stigma may not immediately lead to the desired effect of reduced mental health problems. Instead, such policies could even increase mental disorders in the short run.

Description of different data registers
Our empirical analysis is based on combined register data for Skåne, the third most populous and southernmost region in Sweden. It consists of individual-level longitudinal records from the intergenerational register, the Skåne inhabitant register, the income tax register, and the in-and out-patient registers. The in-and out-patient registers are from the "patient administrative register systems" administrated by the Regional Council of Skåne. A unique feature of our data is the detailed records of all occurrences of in-patient and out-patient care for all inhabitants of the region. The registers have previously been used by Tertilt and van den Berg (2015), Nilsson and Paul (2018) and van den Berg and Siflinger (2021). The health care registers are collected to determine the monetary streams from the region to the health care centers and hospitals.
In Sweden, each individual has a unique identifier that is used to record all contacts with the health care system and the general public administration, tax boards, employment offices, and other public agencies. We use the identifier to merge the health care registers to the LISA dataset, which combines several other registers. LISA covers all persons born in Sweden between 1940 and 1985, their parents, and all their children (Meghir and Palme, 2005). For individuals aged 16 and above, LISA provides a rich set of annual socioeconomic information, such as employment status, incomes by type, level of education, or marital status. Further, the intergenerational register allows linking individuals to their children and parents. The merged dataset contains about 1 million individuals, which is the vast majority of inhabitants of Skåne in 1999-2008. From these data, we construct an annual panel data set which comprises all women born between 1983 and 1985 and living in Skåne between 1999-2008. We chose to select these birth cohorts to guarantee that we observe women aged 16 to 23 years in all periods.

Diagnosis variables & abortions
We define individual measures for mental health and abortions using ICD-10 diagnosis codes. Chapter 5 of the ICD-10 catalog comprises diagnosis codes for mental and behavioral disorders. The chapter is divided into 11 sub-chapters that classify diagnoses into, e.g., organic mental disorders, schizophrenia, affective, somatoform disorders, behavioral or developmental mental disorders. Our main outcome of interest is the diagnosis of men-  Figure 2(a) shows the incidence of our mental health diagnoses per 1,000 women by age and birth cohort. Diagnoses are relatively low at age 16, with about 2-4 diagnoses per 1,000 women in these birth cohorts. From age 17, the numbers steadily increase to about 30 diagnoses in 1,000 women at age 23. Trends are similar across the three cohorts.
In the subsequent analysis, we define mental health problems as an absorbing state (cumulative): once a woman is diagnosed with a mental disorder, she is classified as ill for the remaining observation period. This is motivated by the medical literature, which has shown that an episode of mood disorder, e.g., a depressive episode, among adolescents can last between a few months and several years (Eaton et al., 2008). While short-term recovery rates are high, recurrence rates increase after 1-2 years, up to more than 50% in the long run (see, e.g., Curry et al., 2011).
To measure abortions, we use pregnancy-related ICD-10 diagnosis codes. The codes O00-O08 refer to pregnancies with abortive outcomes. The code O04 defines induced Abortions can be complete or incomplete, with or without complications. We do not distinguish them. medical abortions. These can be surgical or pharmaceutical abortions as well as voluntary and medically-indicated terminations of pregnancy. The code Z64.0 defines an unwanted pregnancy. It includes women who later have an abortion, women who carried the pregnancy to term, or women who had a spontaneous abortion. We combine these two codes to define our measure of abortion as a medical abortion from an unwanted pregnancy. Fig-ure 2(b) shows the incidence of abortions per 1,000 women by age and birth cohort. The cohorts exhibit similar trends in abortion rates. The rates sharply increase between ages 16-18 but remain roughly constant at later ages. The numbers in Figure 2(b) correspond to those reported by Socialstyrelsen Sweden (2020). Table 1 shows the descriptive statistics for all variables used in the empirical analysis.
Our sample comprises 20,703 women aged 16-23 with an average of 19.5 years. Women are, on average, born in 1984, which implies that our birth cohorts are of similar size. As expected for such a young sample, most women are single, about 20% are employed, and less than 30% hold a college degree. The annual rate of abortions is about 2%, and the incidence of mental health problems per year is about 1.6%. In total, 10.6% of women had an abortion, and 6.5% had mental health problems during ages 16-23. Since our main estimation strategy requires a balanced panel, we construct two censoring indicators: one to flag missing observation periods and one to flag missing values. This balancing procedure leads to a final sample of × = 165, 624 observations.
We also compare the incidence rates of mental health diagnoses by (non)abortive outcomes. Figure 3 shows the fraction of women who were ever diagnosed with mental health problems among women who had an abortion after an unwanted pregnancy, experienced a miscarriage, or never had any abortion. Women with abortions are about twice as likely to be diagnosed with mental health problems than women without abortions. Women with a The ICD-10 also codes spontaneous abortions/miscarriages (O03). Miscarriages are not the scope of our main analysis but will be used in a complementary analysis, see Subsection 5.3.1. Figure B.3 in Appendix B plots the number of abortions after an unwanted pregnancy per woman in our age group. About 82% receive one abortion between age 16-23, about 14% receive two abortions, and about 3% receive three. Less than 1% of women undergo four or more abortions in this age group. miscarriage have the highest incidence of mental health problems. Figure 3 suggests that there is a relationship between abortions and mental health. This relationship is the topic of the coming sections.
Our measure of mental health does not comprise mental health issues without an official diagnosis. For our analysis to be valid, we must assume that women who have an abortion are not systematically more underdiagnosed than the female population. Our descriptive evidence does not indicate such an issue.

Underreporting of mental health issues
Our health records capture the universe of inpatient and outpatient contacts with the healthcare system and all diagnoses made by health care professionals. This provides us with comprehensive data of all mental disorders in the region of Skåne. However, our data do not capture women who suffer from mild, non-clinical forms of mental disorders or do not seek out care. Thus, we may face an issue with underreporting of mild cases, which may lead to underestimating the impact of abortions on mental health problems.
We first validate our health records against survey information. Since we do not have survey data on depression, we compare the incidence of diagnoses on anxiety and fearrelated disorders with self-reported anxiety and fear-related symptoms from the survey of living conditions (ULF) in 2008. ULF asks respondents whether they have experienced problems or symptoms of anxiety or fear. If a respondent answered the question with yes, she is asked whether the problems are mild or severe. In 2008, 27% of women aged 16-24 ULF data can be accessed from https://www.scb.se/hitta-statistik/statistik-efter-amne/ levnadsforhallanden/levnadsforhallanden/undersokningarna-av-levnadsforhallanden-ulf-silc/pong/ tabell-och-diagram/halsa/halsa--fler-indikatorer/. had reported symptoms of anxiety or fear in the survey, and 7% had experienced serious symptoms. In our sample, 10% of women aged 16-23 were diagnosed with anxiety or fear-related disorders. Thus, our records may also capture less severe cases of anxiety.
ULF utilizes a single question to assess anxiety and fear-related problems, which may not reliably capture the complex nature of anxiety disorders (see, for instance, Turon et al., 2019). More reliable information about mental disorders is obtained from selfadministered screening tools or diagnostic interviews. Olsson and von Knorring (1997) and Olsson and Von Knorring (1999) have investigated the prevalence of depression among 16-17-year-old high school students in the Swedish city of Uppsala. Depending on screening tools and cut-off values used, between 9 and 16 percent of women had depressive symptoms. When using diagnostic interviews, the lifetime depression prevalence among young women was 11.5%. Screening 13-20-year-old youths in a clinical youth center at the university hospital in Uppsala in 2006, Kristjánsdóttir et al. (2011) find that about onethird of young women were screened for at least mild depression, and 12% were screened for at least moderate levels of depression. In our data, about 8% of women have received a depression diagnosis at age 16-23, which is only slightly lower than what was found with screening tools and diagnostic interviews. Thus, while our data may not capture all potential cases and non-cases of depression, they also do not seem to suffer from severe underreporting of mental disorders.

Empirical strategy
In this section, we present the empirical strategy to estimate the causal effect of abortion on mental health. We discuss the shortcomings of established linear methods, such as individual fixed-effects models, and introduce the grouped fixed-effects (GFE) estimator to overcome potential identification issues. We then discuss the identifying assumptions of the GFE estimator and compare them with the differences-in-differences (DiD) approach, one of the most popular methods for causal inference in applied microeconomics.

A linear model that links abortion and mental health diagnosis is
wherex it comprises covariates for woman and her parents. is an idiosyncratic error term with [ ] = 0 and Cov (X, ).
is an unobserved individual-specific fixed-effect that varies across age. The parameter of interest is , capturing the association between an abortion from an unwanted pregnancy and a mental health diagnosis .
Under the assumption that 0 = 1 = ... = for all = 1, ..., , i.e., individual unobserved heterogeneity is time constant, in Equation (1) can be consistently estimated with a standard model with individual-specific, time-constant fixed-effects. Here, unobserved heterogeneity implies that decisions affecting both mental health development and abortion probabilities are independent over time. If this assumption is violated, estimates are biased. In our application, it seems plausible that is dynamic: abortions from an unwanted pregnancy are outcomes of decisions that depend on past decisions and are determined by preferences. Thus, selection into abortions is likely dynamic, and a standard fixed-effects model fails to estimate a causal effect of abortion on mental health.
Formally, for two time periods = 0, 1, this implies that for two individuals, and , with 0 > 0 , we get 0 − 0 < 1 − 1 . In general, an unobserved time-varying is indistinguishable from without further assumptions.

Time-varying grouped fixed-effects estimator (GFE)
One solution to the problem described above is proposed by Bonhomme and Manresa (2015) who suggest clustering individuals with similar unobserved characteristics into a finite number of groups. This implies that women belonging to the same group share the same age profile of unobserved heterogeneity, where represents time-varying, group-specific unobserved heterogeneity term for ∈ {1, . . . , } groups. The error term may contain an individual-specific, timeconstant fixed-effect , such that [ | ] = 0. We write Equation (2) The GFE estimator is defined as the solution to where ( , ) is the optimal group assignment determined by For a given number of groups , the estimator assigns individuals to groups via clustering and estimates the coefficients as well as the group profiles in an iterative procedure. Standard errors are clustered at the individual level and obtained from analytical expressions in Bonhomme and Manresa (2015).
In our main specification, we will also account for individual-specific, time-constant unobserved heterogeneity by applying time demeaning. Thus, the solution is given as A well-known issue with the GFE estimator is its sensitivity to the choice of initial values. To validate our results we randomly vary the seed and thus initial values. Our results are robust to different seed choices. where = −¯and = −¯, and¯,¯are time-demeaned quantities.

Choosing the number of groups
The GFE estimator requires the researcher to choose the correct number of groups. Ideally, we obtain this number by data-driven methods. Yet, selecting the correct number is nontrivial as the choice of information criterion depends on the data generating process. This is a well-known problem when information criteria are used for model selection, see Choi and Jeong (2019) and Bai and Ng (2002). The number of groups selected by an information criterion is a function of the penalty whose size depends on the number of groups and the numbers of covariates , individuals and time periods . Thus, no single criterion will select the correct number of groups in all potential applications.
Bonhomme and Manresa (2015) suggest a Bayesian information criterion (BIC), where the penalty is the second part of Equation (6). The estimated error variance 2 is calculated using , the maximum feasible number of groups chosen by the researcher.
In our simulation exercise, we show that this BIC chooses the correct number of groups if is not much larger than . Otherwise, this BIC does not sufficiently discriminate between different numbers of groups. As an alternative, we use a BIC with a modified penalty 2 ( + − + ) ln( ), which puts more weight on . However, this alternative criterion tends to penalize too much. We will thus use both criteria together with other sensitivity checks to pick the number of groups.
In recent work on factor models, Moon and Weidner (2015) show that if both and grow to infinity, the limiting distribution of the least-squares estimator of the parameter of This BIC only estimates consistently if and go to infinity at the same rate (Bonhomme and Manresa, 2015). In our application, this BIC thus might overestimate the true number of groups.
interest is robust to including additional factors. While it is useful to understand whether this also holds for the GFE estimator, exploring this is beyond the scope of this paper.

Assumptions on time-varying unobserved heterogeneity
In this section, we discuss the key assumption on individual time-varying unobserved heterogeneity needed to identify causal effects with the GFE estimator. We compare this assumption to that of the DiD estimator, and discuss situations in which these assumptions can be maintained. For this illustration, we use a potential outcome framework notation.
Let˜be the time-varying unobserved treatment assignment and˜= − − .
are the group-specific profiles (see Section 4.1), and is an individual-specific, time-constant fixed-effect. The key identifying assumption of the GFE estimator is that the expected value of mental health given that no abortion has taken place, denoted as (0), should be the same regardless of the "treatment assignment", and given covariates, time and unobserved group effects . Broadly speaking, the assumption states that captures the relevant time-varying variation determining dynamic selection into treatment.
where may contain covariates and a time indicator. Under the assumption of constant treatment effects, the conditional expectation of observation under treatment is Further assuming a linear functional form of the conditional mean function leads to The DiD estimator relies on a similar set of assumptions about the potential outcomes under treatment (1) and under non-treatment (0). The main difference to the GFE estimator is the restrictions imposed on time-varying unobserved heterogeneity. The reason is that the identification of a causal effect with the DiD estimator relies on group differences in a before and after comparison (conditional on treatment assignment).
Suppose we have two groups ∈ {0, 1}, where = 0 indicates the control group and = 1 is the treatment group. We assume that where is time-varying unobserved heterogeneity that can only vary between the treatment and control group, i.e.˜= − . The difference in the time trends is constrained to be constant. This restriction is necessary to fulfill the parallel-trends assumption used in DiD estimation. In practice, this restricts all individuals in the treatment and control group to have parallel unobserved heterogeneity profiles.
The crucial difference in the identifying assumptions of the GFE and the DiD estimators is the restriction on the time-varying unobserved heterogeneity: the GFE estimator puts no restrictions on but restricts the number of distinct profiles. The DiD estimator allows all individuals to be on individual slopes, but only within treatment and control group.
The identifying assumption of the DiD estimator discussed above apply to situations in which the treatment assignment is random. With non-random treatment assignment and when using individual-level panel data, the identifying assumptions of the DiD estimator and the standard individual-specific fixed-effects estimator are basically identical.
Even if the treatment assignment was random, the time-varying unobserved heterogeneity in the population is restricted by the parallel trends assumption. Suppose both, treatment and control group, contain two different types of individuals with non-parallel unobserved heterogeneity profiles in different proportions. Then the DiD estimator fails to recover the true treatment effect, even with random assignment, because the parallel trends assumption is violated. For further discussion see Lechner et al. (2010).

Results
In this section, we first present our main results on the effect of abortion on mental health obtained from different estimators. We then determine the optimal number of groups and present the group-specific unobserved heterogeneity age profiles ˆ. Because the GFE estimator is relatively new and has not been used extensively in empirical work, we provide a detailed simulation framework. In Appendix C, we introduce a data generating process based on Equation (2) that matches the key characteristics of our data. We will refer to our simulation exercise when interpreting certain aspects of our estimation strategy, and we also validate specification choices made in the empirical model.

Effect of abortion
We estimate the parameter of interest from Equation (1)  are very close to zero and precisely estimated. We attribute this precision to the large differences in the unobserved heterogeneity profiles. Accounting for these patterns drastically reduces the overall variance. Adding more groups further reduces the estimated standard errors. We observe a similar behavior in our simulations: due to the objective minimized by the estimator, we group individuals with similar time-varying unobserved We also use our simulation to validate the inference results in our setting, since the asymptotic results in Bonhomme and Manresa (2015) only apply for large , . Our results have meaningful implications for the expected incidence of mental health diagnoses resulting from an abortion. At the sample mean, the OLS estimate predicts that an abortion increases the probability of mental health conditions from 3.2% to 6.3%, thus mental health problems almost double. The OLS estimate with individual fixed-effects is much smaller, but still predicts a significant increase in mental health problems by about 29%, to 4.1%. By contrast, the GFE estimator always predicts a marginal decrease in the incidence of mental health problems. For = 2, for instance, the incidence of mental The 95% confidence intervals for the GFE estimates and the OLS FE estimate, FE , ([0.01627, 0.00213]) only marginally overlap for GFE health issues slightly reduce to 3.1% at the sample mean.
These results illustrate that group-specific time-varying unobserved heterogeneity absorbs considerable variation that may otherwise be attributed to the effect of abortion on mental health. Ignoring time-varying unobserved heterogeneity would lead to severe overestimation of the true abortion effect.

Time profiles of group-specific unobserved heterogeneity
We next address the question of the optimal number of groups. First, we describe how individuals are assigned to groups for an increasing number of groups. Second, we compute the BIC with two different penalties and discuss coefficient behaviors for different number of groups. Finally, we present the estimated profiles of unobserved heterogeneity.  the GFE estimator cannot deal with more than four groups in our application.
We next determine the optimal using the two BIC from Section 4.2. Figure 6 shows that both criteria are minimized at = 2 (highlighted in red). The standard BIC hardly varies with , making a clear selection difficult (Figure 6(a)). The BIC with the steeper penalty increases sharply in and is unambiguously minimized at = 2 ( Figure   6(b)). However, our simulations show that the performance of both BIC depends on the true DGP (see Figures C.4 and C.5). Thus we interpret these results with caution.
As shown in Figure 4, the estimated GFE coefficients are stable after we reach = 2. In our simulations, we observe a similar coefficient behavior after reaching the true number of groups, suggesting that coefficient estimates are stable for any greater than the optimal We set = 10, which is the highest number of groups where the algorithm converges reliably. (see Figure C.2). By combining the insights from group movements, the BIC, and the coefficient behavior, we conclude that the true number of groups is likely = 2. ages. We call these women the "low-risk" group. The dashed line represents the profile of women with an unobserved mental health risk that is low at age 16 but steeply rises with age. We call these women the "high-risk" group. The two profiles differ greatly in both intercept and slope, revealing considerable time-varying unobserved heterogeneity.
The group assignment is not only conditional on abortions, but on all covariates controlled for. This implies that our estimated profiles are net of this information. After all, if covariates were sufficient to describe the dynamics of the individual mental health Figure 7 shows the profiles from the GFE without individual-specific time-constant fixed-effects. The profiles net individual-specific fixed-effects can be found in Figure B.5 in Appendix B.
Due to a lack of variation in the group, the high-risk group profile remains rather flat after age 22. trajectories, additionally controlling for unobserved time-varying heterogeneity would be redundant and the group profiles would be uninformative. Yet, it may be informative to compare observed individual characteristics between groups. Table A.

in Appendix
A shows that women in the high-risk group have on average a lower socioeconomic background, such as lower parental earnings and higher parental unemployment rates.
Also, these women were slightly younger when they terminated an unwanted pregnancy.

Robustness checks: alternative dynamic processes and alternative measures of mental health
Our findings suggest that the association between abortion and mental health disappears once we allow for group-specific time-varying unobserved heterogeneity. We also find that the estimated group-specific heterogeneity profiles starkly diverge with age. To rule out that dynamic processes other than time-varying unobserved heterogeneity cause these patterns, we employ several alternative identification strategies. Finally, we investigate One could predict the group membership in a nonlinear way by higher order interactions of observed covariates with e.g. 2 -Boosting or other unsupervised learning methods. whether abortion affects anxiety disorders and other dimensions of mental health.

Alternative identification strategies and dynamic processes
Several studies have shown that women who undergo an abortion have significantly more contacts for psychiatric care, show more symptoms of anxiety and have a history of receiving anti-psychotic or anti-anxiety medication before the abortion takes place (see, for instance, Munk-Olsen et al., 2011;Steinberg et al., 2018;Steinberg and Russo, 2008). Thus, the probability of having an abortion from an unplanned pregnancy could be determined by a woman's past mental health condition. We address such potential issue with reverse causality by adding the first lag of our mental health measure to the right-hand side of Equation (2). To tackle the endogeneity in the lagged measure of mental health, we combine the GFE estimator for = 2 in first differences with an instrumental variable strategy, using the second lag of mental health as instrument (Anderson and Hsiao, 1982). Column (1) in Table 2 shows that the estimated impact of abortion on mental health is close to zero and insignificant when controlling for past mental health. It suggests that time-varying unobserved heterogeneity captures a large share of the mental health dynamics.
Another dynamic process could be that abortion effects on mental health occur with some lag. In our empirical specification, the existence of such dynamic effects would downward bias the estimated GFE coefficients. To investigate potentially dynamic abortion effects, we plot mental health against the abortion event. Figure B.2, Appendix B, shows that our unconditional mental health measure exhibits a strong time trend but develops very smoothly around the abortion event. Next, we apply the dynamic differencein-difference (dynamic-DiD) estimator suggested by Callaway and Sant' Anna (2021). This method estimates the group-time average treatment effect on the treated (ATT), extending the doubly robust DiD estimator of Sant'Anna and Zhao (2020) to multiple time periods.
Our estimated GFE coefficient is downward biased if the dynamic effects have the same sign as the contemporaneous effect. Standard errors clustered on the individual level; *** < 0.01, ** < 0.05, * < 0.1; Column (1): IV estimates in first differences using estimated group assignments from GFE with individual-specific fixed effects and the second lag of mental health diagnoses as instrument. Column (2): Estimated coefficient of abortion from an unintended pregnancy obtained from TSLS. The sample contains only women who either gave birth, had a miscarriage or had an abortion from an unintended pregnancy. Miscarriages are used as an instrument. Column (3): Estimated coefficient of all medical abortion obtained from TSLS. The sample contains only women who either gave birth, had a miscarriage or had a medical abortion. Miscarriages are used as an instrument. Control variables: woman: relationship status (single, in a relationship), log earnings, college degree, employed; mother: log earnings, employed, college degree, relationship status; father: log earnings, employed, college degree; log household disposable income; year fixed-effects, municipality FE, year of birth FE for woman/mother/father; indicator missing observations.
The main identifying assumption for the ATT is that parallel trends hold conditionally on covariates. Figure 8 shows the estimated group-time ATT if not-yet-treated observations are the control group. As in Figure B.2, there is no discontinuity around the abortion event. However, the estimated coefficients follow a clear time trend, with some being significantly different from zero already several periods before the abortion event. This points towards a violation of the conditional parallel trends assumption. Pre-testing the parallel trends assumption further rejects the null hypothesis of parallel trends (Wald statistic -We also estimated the dynamic-DiD using the never-treated as a control group. The results are very similar to those with the not-yet-treated control group and thus not presented here. Our empirical results could also be explained by age dynamics in abortion effects. If the effect of abortion on mental health depends on age, e.g., early abortions have stronger effects than late abortions, we might mistakenly attribute an early-abortion effect to agevarying unobserved heterogeneity, resulting in underestimating the abortion effect. Table   A.4, Appendix A, displays the results from an OLS model with individual fixed-effects and age-dependent abortion. We do not find any significant age-dependent abortion effects. We finally apply an instrumental variables (IV) estimator to address general endogeneity concerns in the abortion decision. We follow Hotz et al. (1997) and Hotz et al. (2005) who used miscarriages as instrument for teenage birth to investigate the long-term consequences of teenage childbearing. We adapt the estimator by using miscarriages as an instrument for abortion decisions. The instrument is valid if miscarriages are random and if the latent proportion in women with a miscarriage that would have had an abortion is equal to the observed proportion in the population (see Hotz et al., 1997). To implement the IV strategy, we restrict the sample to women who either had an abortion or a miscarriage or gave birth at a given age. We construct two different samples: one that contains all medical abortions; and one that contains abortions from an unintended pregnancy. In the former sample, 59% of women had an abortion and 35% gave birth. Only a minority of 6% had a miscarriage. The share is similar for the sample that only contains abortions from an unwanted pregnancy. Columns (2) and (3) in Table 2 display the IV results. The first stage results suggest that miscarriages are reasonably relevant for having an abortion.
The point estimates for abortion in the second stage are negative or close to zero and highly imprecisely estimated. Besides, there are concerns that miscarriages are a valid instrument. A significant share of miscarriages does not occur at random but is related to non-random, unobserved behavioral risk-factors which may be correlated with mental health problems (see Rellstab et al., 2021).

Other dimensions of mental health problems
In following the psychiatric literature, we consider as alternative measures of mental health issues: anxiety and fear-related disorders, bipolar disorders, depression and affective mood disorders without bipolar disorders (see, for instance, Foster et al., 2015;Steinberg et al., 2018;Steinberg and Russo, 2008). Table 3 presents the corresponding estimated coefficients on abortion obtained from OLS without and with individual-specific fixed effects and from the GFE estimator with two groups and individual-specific fixed effects. Line E shows the estimated coefficient from our main specification. Regardless of the estimation strategy, the estimated association between an abortion due to an unplanned pregnancy is strongest for anxiety and fear-related disorders. However, the estimated GFE coefficient of abortion on anxiety disorders is only about 10 percent of the magnitude of the OLS FE coefficient in Column (2) Both confidence intervals include the OLS point estimates obtained from these samples (0.0278 (0.011) for Column (2), 0.0245 (0.011) for Column (3)). As shown in Young (2021) imprecisely estimated coefficients are a common feature of IV strategies. Standard errors clustered on the individual level; *** < 0.01, ** < 0.05, * < 0.1; Column (1): OLS regression of alternative measures of cumulative mental health diagnoses on abortion. Column (2): OLS regression with individual fixed-effects. Column (3): GFE estimation with = 2 groups and individual-specific fixed-effects. Control variables: woman: relationship status (single, in a relationship), log earnings, college degree, employed; mother: log earnings, employed, college degree, relationship status; father: log earnings, employed, college degree; log household disposable income; year fixed-effects, municipality FE, year of birth FE for woman/mother/father; indicator missing observations. and not significantly different from zero. The coefficient estimates for bipolar disorders are relatively small. The estimated coefficients for depression are similar in magnitude to those of our main specification, suggesting that depression is the primary driver. This is confirmed from the estimation results for mood disorders without bipolar disorders.

Sources of unobserved heterogeneity: Abortions, unwanted pregnancies and other risky behavior
A natural question is what factors are captured by profiles of unobserved mental health risk.
While there may be several answers, one explanation is that the estimated profiles proxy common risky behaviors among young women. This includes unprotected sexual activity but also other risky behaviors such as drug-and alcohol consumption (Cawley and Ruhm, 2011). In this case, controlling for such observed behaviors would alter the estimated association between abortion and mental health. Alternatively, the profiles might absorb choice processes underlying different behaviors. In that case, observed risky behavior would result from a similar decision process. It would imply that (1) the association of abortion and mental health is robust to controlling for other risky behaviors; (2) the GFE estimate for abortion is unaffected, but estimates for other behaviors behave similarly to that for abortion; (3) other risky behaviors are contemporaneously correlated with the estimated unobserved heterogeneity profiles. We now assess how other risky behaviors are related to abortions, mental health, and the estimated unobserved heterogeneity profiles.

Mental health, abortions and other risky health behaviors
An important determinant for having an abortion is a woman's decision to engage in unprotected sexual activities, resulting in an unwanted pregnancy. Ex-ante, it is not clear whether an unwanted pregnancy reflects such a choice, including careless use of birth control, or whether it resulted from a random failure in contraception or sexual assault. In the latter cases, unwanted pregnancies are not in the choice set that may be captured by time-varying unobserved heterogeneity. If abortions are outcomes of a woman's choice to engage in unprotected sex, then this may not only result in unwanted pregnancies but also other byproducts of unprotected sex. In our data, we observe a few other risky sexual and health behaviors (e.g. Cawley and Ruhm, 2011;Markowitz et al., 2005;Mulligan, 2016): chlamydia infections and sexually transmittable disease (STD) screenings as risky sexual behavior, and excessive alcohol consumption as other risky behavior.
Even if all women face the same failure probability of contraception, one may still find a positive correlation between abortion probabilities and mental health diagnoses. For instance, women with mental health problems might start to have sex at earlier ages than women without mental health problems or have sex more frequently. Then this correlation would not be indicative of risky sexual behavior.
Chlamydia is the most frequently observed STD among young women (e.g., Danielsson et al. (2012), European Centre for Disease Prevention and Control (2020) for Sweden, and Centers for Disease Control and Prevention (2019) Table 4 shows that other risky behaviors are strongly correlated with abortions and unwanted pregnancies. Column (1) shows that women with chlamydia infection at age 16-23 have, on average, a 14.5 percentage points higher likelihood for an abortion, translating into a more than 130% increase at the sample mean. STD screenings increase the likelihood for an abortion by 1.4 percentage points or 13% at the sample mean. Excessive drinking at age 16-23 increases the probability of abortion by 9.8 percentage points or 92% at the sample mean. Column (2) shows that the correlations between risky behaviors and unwanted pregnancies are somewhat stronger than for abortions but otherwise very similar.
We next examine whether other risky behaviors are omitted controls or whether they result from a similar choice process as abortions. To this end, we re-estimate our main specmeasured with the ICD-10 code F110, "acute drunkenness (in alcoholism)". Abortions and unwanted pregnancies may follow differential selection. In our sample, 82% of women with an unwanted pregnancy have an abortion. When regressing mental health on abortions and unwanted pregnancies, the abortion effect becomes small and insignificant, indicating no differential selection. Standard errors clustered on the individual level; *** < 0.01, ** < 0.05, * < 0.1; Columns (1)-(4): OLS regression of cumulative mental health diagnoses on abortion and current risky health behavior, controlling for individual-specific FE. Column (5): GFE estimation with = 2 groups and individual-specific FE. Control variables: woman: relationship status (single, in a relationship), log earnings, college degree, employed; mother: log earnings, employed, college degree, relationship status; father: log earnings, employed, college degree; log household disposable income; year fixed-effects, municipality FE, year of birth FE for woman/mother/father; indicator missing observations. ifications and gradually add excessive drinking, chlamydia infections, and STD screenings. Table 5 display the estimated coefficients for models with individual fixed-effects. All associations are positive, suggesting that these behaviors increase the probability of being diagnosed with mental health problems, but only the coefficient on STD screenings is significantly different from zero. Adding these behaviors as controls barely changes the estimated association between abortion and mental health. Column (5) presents the results from the GFE estimator for = 2. The added controls do not change the impact of abortion on mental health. The estimated GFE coefficients for other behaviors are 5-10 times smaller than in Column (4).

Columns (1)-(4) in
The relationship between mental health and abortions could be influenced by past rather than current health behaviors (e.g. Elkington et al., 2010;Hallfors et al., 2005). Table A.5 in Appendix A shows the results when using lagged diagnoses on acute drunkenness, chlamydia infections, and STD screenings. The estimated associations between mental health and past health behaviors are strong and significant, but the abortion effect is again robust. Overall, our results suggest that our observed health behaviors are unlikely to cause the omitted variable bias observed in OLS-FE regressions. Instead, these behaviors seem to result from similar decisions as abortions from unwanted pregnancies.

Unobserved heterogeneity profiles and risky health behaviors
We finally investigate whether the estimated profiles of unobserved mental health risk are correlated with other risky behaviors. We regress STD screenings, chlamydia infections, and excessive drinking onˆand covariates and plot the group-specific predictions againstˆ. Figure 9 shows the predicted diagnosis risks for = 2. In the high-risk group, the probability of STD screenings and chlamydia infections steeply increases witĥ . By contrast, the probabilities are flat in the low-risk group. Alcohol intoxication is an exception. Here, group differences in predicted probabilities are very small and slightly negative for high-risk women. Overall, Figure 9 shows that high-risk women have higher probabilities of risky sexual behavior, which confirms the suggested correlation between risky behaviors and unobserved heterogeneity.
These findings strengthen our interpretation that time-varying unobserved heterogeneity captures choice processes for engaging in risky behavior. Researchers rarely observe these decisions but measure realized behaviors which are outcomes of these decisions. We have shown that controlling for such observed behaviors is insufficient to obtain an unbiased estimate in our application. Instead, the GFE estimator seems necessary to account for the unobserved decision-making processes.
The respective coefficient estimates can be found in Table A.6 in Appendix A. Risky drinking typically happens at earlier ages than risky sex. Marcus and Siedler (2015) show that most hospitalizations from alcohol intoxication among women take place before age 16 and then sharply declines. We observe a similar decline in excessive drinking at age 16-20 (see Figure B.6, Appendix B).

A framework of mental health and risky behavior
The results obtained in Section 5.4 suggest the following explanation: women differ in their decisions to engage in risky behaviors, which are reflected in differences in estimated group profiles. One reason for a large amount of group-specific heterogeneity could be that women have different preferences, leading to differences in dynamic decisions and thus to different mental health trajectories. O'Donoghue and Rabin (2001) discuss the role of time-inconsistent preferences for risky behaviors among youths, e.g., unprotected sex.
Present-biased preferences make unprotected sex today more likely since teenagers weigh the benefits today much higher than potential future costs (Levine, 2001). This behavioral bias affects all dynamic behaviors, e.g., educational choices like school drop-outs. Cobb-Clark et al. (2020) suggest that self-control problems explain differences in the correlation between depression and risky behaviors like a lack of exercise.
Based on this discussion, we formulate a theoretical model of endogenous mental health and risky choices. Risky behavior leads to short-term benefits but harms mental health development. At the same time, mental health problems change the preferences for risky behavior and thereby shape its time paths. To allow for heterogeneity across women, we introduce non-standard time preferences. Women in the high-risk group have a high degree of present bias, over-weighting current pleasure compared to future mental health risks. Women in the low-risk group have preferences that are close to time consistent. As such, our model closely follows the literature in behavioral economics.
Our model offers an interpretation for differences in inter-temporal decision-making across the two groups and the consequences for mental health development. Of course, it is not the only model that could explain the observed patterns. For instance, heterogeneity in decision-making and mental health could be driven by heterogeneity in impatience, i.e., by different time discounting without present bias. Through the present bias, we stress the importance of now regardless of the future (Laibson, 1997;O'Donoghue and Rabin, 2015). It seems plausible that a woman at a party who meets a handsome guy decides in the "heat-of-the-moment" to have unprotected sex even though she may be aware of future costs, e.g., in mental health costs. However, she might say no if you ask her whether she should behave this way at the next party. The behavioral literature discusses several other models that incorporate anomalies in discounted utility, such as "visceral influences", habit formation or projection bias (for a discussion, see Frederick et al., 2002). Our exploratory theoretical analysis does not aim at differentiating between these models.

A DGP for mental health, risky behavior and abortion
We formulate a data generating process (DGP) of risky decision making, abortion, and mental health. For simplicity, we assume that latent mental health is generated by Woman 's mental health at age + 1, +1 , is determined by her mental health at age , her risky choice, , and an iid mental health production shock ∼ (0, ). To keep the model tractable we ignore covariates. Abortion probabilities do not enter Equation In our empirical analysis we proxy this latent mental health status by observed diagnoses.
(11) directly but are correlated with risky choices. We model the probability of having an abortion at age , , as a function of unobserved risky choices (systematically varying with ), and an idiosyncratic error (e.g. ∼ (0, )), Together with Equation (11), Equation (12) implies that a regression of on would produce a spurious correlation even with individual-specific fixed-effects. Since we only observe the abortion but not women's decision to engage in unprotected sex, , we could interpret the observed abortion as a signal for risky decision making.

Preferences
We assume that women are sophisticated decision-makers who know about their selfcontrol problems when making choices (O'Donoghue and Rabin, 1999). At each age , a woman enjoys flow utility, ( , ), which is a function of mental health and chosen risky behavior. We assume to be a constant relative risk aversion (CRRA) utility function with mental health dependent risk aversion minus quadratic mental health costs, The parameter is the baseline level of risk aversion, which is modified by the mental health dependent term · . Women with positive become more risk-averse as their mental health problems increase, introducing additional preference heterogeneity. The second term captures direct costs of mental health problems, which are determined by .
We subsequently suppress the individual subscript for ease of notation.
In the CRRA term, we multiply by the sign of the exponent rather than dividing by it which would be more common. This is useful when calibrating/estimating the model because in this formulation small changes in risk aversion do not have a strong impact on the levels of utility.
By letting risk aversion depend on mental health, we follow an early version of Cronin et al. (2020). In our application allowing for ≠ 0 leads to a considerably better model fit than = 0.
We formulate an infinite horizon decision problem and focus on the first eight periods, corresponding to ages 16-23 in our data. Realized future utility at age is given by where the first term, ( , ) is the current flow utility. The second term aggregates future flow utility using − discounting. is the usual exponential discount factor. The parameter induces the self-control problem. For = 1, the model is one with standard exponential discounting. For 0 < < 1, a woman exhibits some degree of present bias.
For a current level of mental health , the problem a woman solves at age is The * are the optimal decisions of the future selves as functions of current mental health. * is the mental health trajectory that starts at when choosing at age and choosing * ( * ) at later ages. Solving this problem for every possible value of pins down the decision function * . is the conditional expectation at age . We solve the model by backward recursion. This is a variation of classical dynamic programming that takes into account the time-inconsistency introduced through . As in the empirical analysis, we allow for two groups of unobserved mental health risks. We assume two different degrees of present bias, defined by two parameters 1 , 2 ∈ (0, 1) (Laibson, 1997), while all other parameters are the same across groups.
To estimate the parameters of interest, we match model moments to observed moments for group-specific unobserved mental health trajectories. We perform a simulated annealing procedure (e.g. Goffe et al., 1994) to avoid being stuck in local minima of the mean-squared objective function and refine the solution using the Nelder-Mead algorithm. Details about the model solution can be found in Appendix D.

Results
Figure 10(a) plots the average mental health trajectories for the two groups in our sample.
Women belonging to the high-risk group exhibit a steeper observed mental health trajectory than low-risk women. In the high-risk group, 7.6 of 100 women have been diagnosed with mental health problems by age 23. Low-risk women are on a somewhat lower mental health trajectory. About 6.4% of them have received a mental health diagnosis by age 23.
The difference in mental health problems across groups is 19% at age 23. Table 6 displays the estimated parameters obtained from moment matching. We find a clear difference in the estimated present bias between the two groups. For low-risk women,ˆ1 is close to one, indicating that they have almost no present bias. The high-risk group exhibits a large degree of present bias,ˆ2 = 0.598. The estimated period (yearly) discount factorˆis 0.925. For the low risk group, the estimated one-year discount factor isˆ1 ·ˆ= 0.944. The corresponding factor for the high risk group isˆ2 ·ˆ= 0.553.
These values are well in the range found in the literature (e.g. Laibson (1997) or Frederick Estimated parameters for time preferences obtained from final Nelder-Mead optimization after having applied simulated annealing (SA) for global optimization. For SA, we set the initial temperature to 1000 and the reduction of the temperature to 0.8. We set the number of inner loop iterations to 200. For more details about the SA procedure see Husmann et al. (2017). The full set of parameter estimates for time preferences, flow utility and mental health dynamics can be found in Table A.7 in Appendix A.
et al. (2002)). Overall, high-risk women discount the future much more strongly than low-risk women. Consequently, high-risk women are more prone to trading off short-term utility from risky behavior, e.g. immediate sexual pleasure, against long-run mental health deficits. As a result, these women face a more pronounced deterioration in mental health.
Figure 10(b) shows the group-specific mental health development obtained from the estimated parameters. While we cannot perfectly replicate the trajectories in the data i.e. the intersection at age 18-19, we do obtain a close match between the simulated trajectories and data moments. This suggests that heterogeneity in the present bias can generate most of the group-specific heterogeneity in observed mental health trajectories. Figure 10(b) does not only illustrate the mental health trajectories for high-risk and low-risk women. It also shows the counterfactual mental health trajectory for high-risk women if they did not exhibit self-control problems. If their self-control problems could be cured, mental health problems could be reduced by about 19% by age 23. A back-ofthe-envelope calculation using data on mental health costs suggests that the total mental health costs of all women would be reduced by 15.1% if high-risk women had the same mental health trajectory as low-risk women. Alan and Ertac (2018) investigate how a classroom intervention that aims at improving children's patience and self-control affects inter-temporal decision making. One result is that 9-10 year-old children who are present Since we do not observe risky choices, we cannot match moments to estimate risky choice trajectories. From age 16-23, average mental health costs per woman are about 389 USD in the low-risk group and about 1,517 USD in the high-risk group. Given the share of women in the low-risk (93.9%) and high-risk group (6.1%), the average costs are about 458 USD. biased in the baseline benefit the most by delaying immediate gratification. Girls are particularly responsive to the intervention in the medium run. Alan and Ertac (2018) do not consider risky health behavior as an outcome. Yet, such an intervention fostering self-control could be a promising tool to reduce risky health behaviors among adolescents.

Conclusion
In this study, we use individual-level administrative records from Sweden and the novel GFE estimator to quantify the causal impact of abortion on mental health in young women. The GFE estimator clusters individuals with similar unobserved characteristics in groups. Within these groups, unobserved heterogeneity is allowed to vary with age. Using this method, we estimate a precise null-effect of abortion on mental health. The result stands in contrast to the positive and significant associations between abortion and mental health obtained from several different identification strategies, not taking time-varying unobserved heterogeneity into account.
In our main specification with two groups, a small but significant share of women exhibits a high unobserved mental health risk while most women have a low unobserved risk.
We show that the estimated profiles of unobserved heterogeneity likely capture decisions that result in risky health behaviors, such as unprotected sex or excessive drinking. These decisions are generally unobserved by researchers. Thus, the GFE estimator is necessary to obtain an unbiased estimate of the parameter of interest. Based on these considerations, we propose a model of risky choices and mental health. The estimated parameters from moment matching suggest a large degree of self-control problems among high-risk women.
Our model can explain observed disparities in mental health trajectories across groups.
Our work has several implications. First, we show that an abortion from an unwanted pregnancy does not lead to more mental health problems, at least not in Sweden. In other countries, if the relationship between abortion and mental health would appear causal, this could be attributed to stigma rather than the abortion itself. Abortion opponents thus cannot use mental health problems as an argument for more restrictive abortion policies.
Second, the estimated null effects imply no additional mental health care costs associated with abortion. With existing evidence on adverse economic outcomes, restrictive abortion policies thus are unlikely to be welfare-enhancing. Third, self-control problems and associated risky behaviors rather than abortions may trigger mental health problems. Thus, policymakers should find tools to identify and reduce self-control problems early rather than provide cost-intensive general mental health screenings.     Standard errors clustered on the individual level; *** < 0.01, ** < 0.05, * < 0.1; Columns (1)-(4): OLS regression of cumulative mental health diagnoses on abortion and past risky health behavior, controlling for individual-specific fixed-effects. Column (5): GFE estimation with = 2 and individual-specific fixed-effects. Control variables: woman: relationship status, log earnings, college degree, employed; mother: log earnings, employed, college degree, relationship status; father: log earnings, employed, college degree; log household disposable income; year fixed-effects, municipality FE, year of birth FE for woman/mother/father; indicator missing observations. Standard errors clustered on the individual level; *** < 0.01, ** < 0.05, * < 0.1. OLS regression of current risky health behaviors on estimated profiles of unobserved mental health risk; Estimated profiles of unobserved heterogeneityˆfor = 2; Control variables: woman: relationship status, log earnings, college degree, employed; mother: log earnings, employed, college degree, relationship status; father: log earnings, employed, college degree; log household disposable income; municipality FE, year of birth FE for woman/mother/father; indicator missing observations.

C Simulation Example
We build a general simulation set-up to check for known problems, such as dependence on starting values. This simulation set-up also aids us with the interpretation of our results, as well as with validating some specification choices, specifically determining the optimal number of groups. All replication files for this simulation are available on https://github.com/LJanys/Mental_Health_Abortions_Risky_Behaviors.
Our data is generated by the general data generating process as outlined in Equation  (C.2) 3 = −1 + ( /10) 1.2 (C.3) The group membership is determined by the value of the unobserved, individualspecific fixed-effects : For each individual, we draw from a binomial distribution whether To match the characteristics in our real data, we define the groups to not be of equal size: The largest group is group one ("low-risk group"), which comprises 70% of individuals; group two ("medium-risk group") comprises 20% of individuals; and group three ("high-risk group") is the smallest group, with 10% of individuals.
With this DGP, we compare the results of the simulations along three margins: (1) we ascertain that the estimated curves of the unobserved heterogeneity are comparable to the true ones and to investigate adding "superfluous" groups.
(2) the estimated parameters for the OLS estimator ( OLS ), the individual-specific fixedeffects estimator ( FE ) and the grouped fixed-effects estimator ( GFE , ) behave similar to the pattern we observe in our empirical analysis.
(3) the chosen information criterion is reliably minimized at the correct number of groups.
The right-hand side of Figure C.1 shows the estimated unobserved heterogeneity profiles obtained from the GFE estimator for = 10, 000 observations. The profiles look very similar, indicating that the GFE can reliably estimate the group-specific profiles of unobserved heterogeneity.
The resulting estimates for the parameter of interest in the different specifications for the effect of mental health are displayed in Figure C.2. The OLS estimator overestimates the effect by a significant amount due to the omitted variable bias, but even in OLS with individual-specific fixed-effects, the effect estimate remains sizable and significant for sample sizes similar to ours, although we reduced by half to reduce computation time.
When we control for dynamic grouped fixed-effects, the estimate shrinks toward zero and the confidence interval includes zero.
The GFE estimator is not "identified" in the sense that it requires the number of groups to be known, i.e. chosen by the researcher. Note that the optimal number of groups in our simulation example is three. As shown in Figure C.2, the GFE correctly estimates a zero effect when the correct number of groups is chosen. However, for = 2 the estimated coefficient is heavily upward biased to a similar amount as the OLS estimator with individual-specific fixed-effects. By contrast, selecting too many groups does not bias the estimated coefficients. This indicates that the GFE estimator consistently estimates the true effect, once the number of groups corresponds at least to the optimal one, at least for our data generating process.  to these superfluous groups. Rather, the GFE splits existing groups which leads to an "overfitting" of the time profiles (see Figures C.3(d)

and C.3(e)). This behavior is similar
to what we observe in our empirical, real data application. Adding more groups splits up the existing groups and the generated trajectories of unobserved mental health profiles for the additional groups are similar to the group that was split up.
Finally, we investigate the finite sample behavior of the BIC criterion with two different penalty terms (Figures C.4 and C.5) in a setting with large and fixed . As discussed in Section 5.2 the BIC preferred by Bonhomme and Manresa (2015) (BIC standard) does not discriminate sufficiently for all ≥ in our our application. Our simulation exercise clearly shows that the number of groups selected by the BIC standard depends on the number of observations relative to the number of time periods . As shown in Figure C.5, the BIC selects the correct number of groups, = 3, for 1, 000 observations. However, when we increase , the number of groups selected by this BIC increases, indicating that the penalization used in this BIC is not steep enough. As in our application, the BIC standard remains practically unchanged when increasing the number of groups once > 1.
By contrast, the BIC with a steeper penalty term (in ) always chooses two groups regardless of the number of observations. As indicated by the steep increase in the value of this BIC, the penalization with respect to the number of groups is too strong (Figure C.4). We observe a similar behavior in our application.

D Model solution
Numerically, the decision function * can be computed by backward induction over , starting with a guess in the far future which does not affect behavior in initial periods. The backward recursion is a variation of classical dynamic programming that takes into ac- problem computationally, we first discretize the state variables and over a suitable This guess is incorrect but can be expected not to affect behavior in early time periods = 1, . . . 8. We verify this by checking that results do not change if we initialize instead at = 200. grid. Thus, the maximizations do not have to be performed for every possible value of and , but only for every value on the grid. The resulting discrete functions are interpolated using monotone Hermite splines. To compute the conditional expectations at each age, we take a Monte Carlo average over 1,000 possible scenarios for +1 for the next step given each combination ( , ). In this way, the functions and * can be computed backward in time one by one. With the resulting decision functions * , we then simulate 10,000 optimal mental health trajectories * and the associated risky behavior * ( * ).
We assume that the choice of is discrete and the possible value are 0, 0.05, 0.1, ..., 0.95, 1. For we simulate 1,000 trajectories for the two most extreme values of , ≡ 0 and ≡ 1. We use the maximum and minimum of resulting mental health trajectories to determine the boundaries of the grid. We choose 200 equidistant levels in between.