Pathways to Science and Engineering Bachelor’s Degrees for Men and Women

Despite the striking reversal of the gender gap in educational attainment and the near–gender parity in math performance, women pursue science and engineering (S/E) degrees at much lower rates than their male peers do. Current efforts to increase the number of women in these fields focus on different life-course periods but lack a clear understanding of the importance of these periods and how orientations toward S/E fields develop over time. In this article, we examine the gendered pathways to a S/E bachelor’s degree from middle school to high school and college based on a representative sample from the 1973 to 1974 birth cohort. Using a counterfactual decomposition analysis, we determine the relative importance of these different life-course periods and thereby inform the direction of future research and policy. Our findings confirm previous research that highlights the importance of early encouragement for gender differences in S/E degrees, but our findings also attest to the high school years as a decisive period for the gender gap, while challenging the focus on college in research and policy. Indeed, if female high school seniors had the same orientation toward and preparation for S/E fields as their male peers, the gender gap in S/E degrees would be closed by as much as 82 percent.


Data and Coding of Variables
The analyses presented in this article are based on the NELS 1988 to 2000, which is a nationally representative sample of about 25,000 eighthgrade students who were first surveyed in spring 1988. Subsamples of these students were resurveyed in 1990, 1992, 1994, and 2000. We restrict the NELS 1988 to 2000 sample to students who participated in the 8th-and 12-grade surveys, and the 2000 follow-up, and exclude high school dropouts. As reported in the article, the size of this restricted sample is 10,230. Out of these cases, 3,140 (30.7 percent) had missing information on at least one of the variables (mostly, 12thgrade test scores). We use multiple imputations based on the chained-equations approach to recover missing values. Auxiliary variables such as 10th-grade test scores are used to improve the imputation mainly for the cases with missing 12th-grade test scores. Essentially the same results were obtained using casewise deletion. Note that all sample sizes are rounded to the nearest 10 (data contract requirement).
The coding of the variables is described in Table A1. The coding of the S/E variables (planned major in 12th grade and major of bachelor's degree) separates science and engineering fields that are traditionally gender typed and usually require precollege preparation in math and science from other fields. On the basis of this general motivation, we exclude nursing and other health care majors from S/E fields. Sensitivity analysis that includes clinical and health sciences such as nursing in the S/E category showed essentially the same results, with small differences in the point estimates.

Field of Study Classifications
Non-S/E fields. No bachelor's degree earned, agricultural business/production, agriculture/animal /plant sci, conservation/natural resources, forestry, architect/environmental design, graphic/industrial design, drama, speech, film arts, music, fine arts/art history, fpa: other, accounting, finance, ops research/administrative science, business admin /management, hrd/labor relations, other business, other business support, medical office support, marketing/distribution, journalism, communications, radio/TV/film, communication technologies, early childhood education, elementary education, secondary education, special education, physical education, education: other, med/vet lab tech/assist, dental assist/hygiene, hper, allied health: other, physical therapy, occupational therapy, other therapies, speech path/audiology, clinical health sci, nursing, health/hospital admin, public health, oth health sci/profess, para-legal/pre-law, law, psychology, anthropol/archaeology, economics, geography, history, sociology, political science, internat relations, other, amer studies/civiliz, area studies, ethnic studies, retailing, hospitality mgmnt, real estate, information technols, other personal service, engin tech: el/electron, computer technology, foreign languages, nutrition/ food sci, textiles/fashion, fcs and oth human ecology, child study/guidance, culinary arts/food mgt, english/amer literature, writing: creative/ tech, letters: other, liberal/general studies, library/archival sci, women's studies, environ studies, biopsychology, integrated/gen science, interdisc humanities, social sci: general, interior design, recreation/sports, philosophy, religious studies, theology, Bible studies, clin/counsel psych, admin of justice, social work, public adminis- Coding based on the survey question "What kind of work do you expect to be doing when you are 30 years old?" The question provides thirteen answer categories, such as craftspersons, housewife, and business owner. Eighth-grade expectations for a career in S/E are defined based on the answer category "science or engineering professional, such as engineer or scientist." S/E major plans (12th grade) Our coding is based on two questions from the NELS 1994 second follow-up survey. These questions are the filter question "Do you plan to continue your education past high school at some time in the future?" and the intended field of study question "Indicate the field that comes closest to what you would most like to study if you go to school." We categorize the responses into two groups: (1) no college or college without S/E major and (2) college with plans to major in a S/E field. The first category includes students who do not intend to study at a four-year college and those who intend to study agriculture, architecture, art, business, communications, education, English, ethnic studies, foreign language, health occupations, home economics, interdisciplinary studies, music, philosophy/religion, preprofessional, psychology, social sciences, and other fields. The second category includes students who intend to study in S/E fields, which are defined as biological science, computer science, engineering, mathematics, and physical science. Bachelor's degree in S/E field Attainment of a bachelor's degree in S/E fields is defined as graduating from a four-year college in a S/E field until 2000 (eight years after the normal high school graduation). S/E field follows the same definition as the planned major in 12th grade. The coding of S/E fields is based on the recoded variable "majcod4" from the Postsecondary Education Transcript Study (PETS)-a NELS supplement. This variable combines information from the fourth follow-up survey with postsecondary transcript data to obtain detailed field codes for bachelor's degrees. See Appendix A for a detailed description of the field of study codes. Performance (8th and 12th grade) Eighth and 12th-grade reading, math, and science test scores (separate, cts variables)

Math and science interest (8th grade)
Four variables with 4-point Likert scale: "I usually look forward to mathematics class" "I usually look forward to science class" "Math will be useful in my future" "Science will be useful in my future" Math and science interest (12th grade) Measured based on two derived variables from the NELS PETS supplement, mainly based on questions about the reasons why a student took certain classes.
S/E fields. Biochemistry, biological science: other, math sciences/statistics, chemistry, geology/earth science, physics, phys sci: other, computer programming, data/information management, computer science, electrical/communication engineer, chemical engineering, civil engineering, mechanical engineering, engineering: other, computer engineering, and engineering tech: nonelect.

Counterfactual Decomposition
We use Blinder-Oaxaca decomposition techniques for nonlinear models (Fairlie 2005) to analyze the gender gap in S/E bachelor's degree attainment by gender and determine the contribution of different life-course periods. Blinder-Oaxaca decomposition is a popular approach to study wage gaps by race and gender. It divides the differences between two groups into the part that is explained by a set of observed characteristics, such as educa-tion and work experience (the endowment effect), and an unexplained part related to residual differences that are often interpreted as discrimination but also subsume group differences in unobserved characteristics (the coefficient effect). Here we use this counterfactual decomposition technique to divide the gender gap in S/E bachelor's degree attainment in the part that is explained by gender differences in 8th-and 12th-grade orientation toward and preparation for S/E (endowment effect) and the unexplained part. The unexplained portion of the gap is generally difficult to interpret but can be understood as an upper-bound estimate for the role of post-middle school (first decomposition) and post-high school (second decomposition) choices and transitions. 1 This approach allows us to estimate the gender gap under two counterfactual scenarios: "how would the gender gap in S/E degrees change if women had the same orientation toward and preparation for S/E in middle school (decomposition 1) and at the end of high school (decomposition 2)?" Following Fairlie (2005), the decomposition of the gender gap in S/E bachelor's degree attainment can be expressed as where Y is an n × 1 vector for the dependent variable so that Y M denotes the proportion of men with a S/E bachelor's degree, X is an n × k matrix of independent variables, β is a k × 1 vector of coefficients from a regression P (Y ) = logit −1 (Xβ), and Φ an inverse-logit function with f (x) = e x /(1 + e x ). The superscripts M and W index gender-specific vectors and matrices.
1 The estimates for 8th-and 12th-grade characteristics are lower bound, and for post-middle and high school choices, they are upper bound, because unobserved factors are subsumed in the second part. In contrast to research on discrimination, this nature of the estimates further supports our main argument that the gender gap is largely explained by gender differences at the end of high school. In the end, however, we believe that our set of variables is comprehensive and captures most important characteristics considering that the variables are directly related to the selection of students in schools, which is important from a causal perspective (Legewie 2012).
The first term on the right-hand side corresponds to the part of the gap that can be attributed to differences in observed characteristics (endowment effect), and the second term corresponds to the unexplained part related to differences in coefficients (coefficient effect). The contribution of each variable can be determined by assigning women the men's distribution of that variable while leaving the distributions of the other variables unchanged (Fairlie 2005). Assuming that N W = N M , the contribution of x 1 can be expressed as whereβ * denotes the logistic regression coefficients from the pooled sample. This approach relies on the assumption that the two groups are equally large (Fairlie 2005). To circumvent this assumption, we average over multiple computations of the decomposition based on 1,000 equally large samples from the two groups. The results of this decomposition analysis are sensitive to the reference coefficients and the ordering of the covariates. The formulas use the male coefficients as the reference coefficients, but the same analyses with women as the reference category are presented in Table B1 and discussed throughout the article. In one case, the effect of differences in coefficients is weighted, and in the other case it is weighted by the female coefficients. If the female coefficients for the covariates that have very different means for males and females are smaller than the male coefficients, then the percentage explained by covariates will be smaller when females are used as the reference category.
The ordering effect only matters for the detailed decomposition and reflects different underlying causal models of educational decisions. As discussed in the main text of the article, we report ranges of estimates relating to different assumptions about the causal order of the covariates.

Additional Analyses
We conducted the same counterfactual decomposition with women as the reference group, for different S/E subfields and for the subset of students who graduate from college with a four-year bachelor's degree. First, Table B1 shows the result for women at the reference, which are discussed throughout the main text of the article. Second, Table C1 presents the result for physical science and engineering (PS/E), which includes math and computer science and excludes biological and life science. As such, PS/E represents the fields for which women have made relatively few gains, as compared to biological and life science, in which they have largely caught up and partly surpassed men. Finally, Table C2 presents the results conditional on graduating from a four-year college so that it examines gender differences in the majority of students who actually obtain a bachelor's degree. The findings from both additional analyses show small differences but are largely comparable to the ones presented in the main paper. They further support the argument that the gender gap would be reduced substantially if men and women had the same orientation toward and preparation for science and engineering at the end of high school. The results from the detailed decomposition depend on the ordering of the variables, which reflect different causal models of educational decisions. The table presents the range of results indicated by the notation [x, y], which corresponds to the minimum and maximum contributions from the range of possible causal orderings. Note: The decomposition results presented in this table use the male coefficients as the reference coefficients. The corresponding analysis with female as the reference are presented in Table A1. a The results from the detailed decomposition depend on the ordering of the variables, which reflect different causal models of educational decisions. The table presents the range of results indicated by the notation [x, y], which corresponds to the minimum and maximum contributions from the range of possible causal orderings.