Problem Set 2: Sources of Empathy in the Circuit Court
You can find instructions for obtaining and submitting problem sets here.
For Gov 51 students, you can find the GitHub Classroom link to download the template repository here.
For Gov E-1005 students, you can find the GitHub Classroom link to download the template repository here.
In this problem set, we will analyze the relationship between the gender composition among a judge’s children and voting behavior among circuit court judges. In a recent paper, Adam N. Glynn and Maya Sen argue that having a female child causes circuit court judges to make more pro-feminist decisions. The paper can be found at:
Glynn, Adam N., and Maya Sen. (2015). “Identifying Judicial Empathy: Does Having Daughters Cause Judges to Rule for Women’s Issues?.” American Journal of Political Science Vol. 59, No. 1, pp. 37–54.
judges.csv contains the following variables about individual judges:
||The judge’s name|
||The number of children each judge has.|
||Which federal circuit the judge serves in.|
||The number of female children the judge has.|
||The proportion of the judge’s votes on women’s issues which were decided in a pro-feminist direction.|
||The judge’s race (
||The judge’s religion (
||Takes a value of
||The number of male children the judge has.|
||Takes a value of
||The year the judge was born.|
Question 1 (5 points)
judges.csv file into a data frame called
judges. Create a cross-tab (of proportions, not counts) of judge gender on the rows and whether the appointing president was Republican on the columns. Save this table with the name
knitr::kable() to create a nicely formatted version of this table. In your write-up, answer the following questions:
- How many judges are in this data set?
- What proportion of the judges are men?
- Is the party composition different for male and female judges?
NOTE: to change the row and column labels for the output table using
knitr::kable(), you can change the row and column names of the table you pass to it. For example, if I am passing a table called
mytab, then I can use the following:
rownames(mytab) <- c("Row 1 Label", "Row 2 Label") colnames(mytab) <- c("Column 1 Label", "Column 2 Label")
Question 2 (4 points)
Our outcome in this exercise will be the proportion of feminist rulings on issues related to gender,
progressive_vote. Create a nicely formatted histogram of this variable (remember to use
freq = FALSE) and provide a written summary of this graph. Roughly speaking, where is the region of highest density of this variable?
NOTE: nicely formatted means axis labels and a main title that don’t contain random R syntax and informative labels.
Question 3 (6 points)
Next, we consider differences between some groups. Create a new factor variable called
judges$gender_party that takes on separate values for each of the four groups:
"Dem. Woman"for women appointed by Democratic presidents
"Rep. Woman"for women appointed by Republican presidents
"Dem. Man"for men appointed by Democratic presidents
"Rep. Man"for men appointed by Republican presidents
tapply to calculate the mean of
progressive_vote in each of these groups and save this vector as
gender_party_means. Plot these means using a barplot.
Briefly interpret the results of the analysis. For example, do any of the results surprise you? Does it appear that partisanship, gender, or both contribute to progressive voting patterns? Should we interpret any of these effects causally? Why or why not?
Question 4 (4 points)
What is the difference in the proportion of pro-feminist decisions between judges who have at least one daughter and those who do not have any? To compute this difference, first create a variable called
judges$any_girls that is
1 when the judge has at least 1 girl and
0 otherwise. Then, create a subset of the data called
parents that contains judges that have at least one child. Create an object called
ate that is the difference in means of
progressive_vote between judges that have at least one girl versus those that have no girls among those judges with any children.
Why might we worry about interpreting this estimate causally, considering number of children as a possible confounder?
Question 5 (6 points)
Given that the number of children might be a confounder for the relationship between number of girls and voting, let’s estimate the effects using statistical control for the number of children using the following steps:
- Create one subset of the data called
girls_123that restricts to judges with one, two or three children and have at least one girl.
- Create another subset of the data called
nogirls_123that restricts to judges with one, two or three children and have no girls.
- Calculate the mean of
progressive_votewithin levels of the numbers of kids (
num_kids) for each of these subsets and save these vectors as
- Use these two vectors to estimate the average treatment effect within levels, saving this vector as
Are these estimated effects largely similar or largely different than what you found using all of the data? What assumption do you need to make to interpret these effects causally? Do you think it is plausible in this case?
Question 6 (EXTRA CREDIT, 5 points)
This problem is optional. Any points earned on this problem can be applied to lost points on other parts of the problem set. You cannot earn more than the maximum score on the problem set. There will be no autograder for this question.
Let’s consider the design of this study. The original authors assume that, conditional on the number of children a judge has, the number of daughters is random (as we did in the previous question). If this is true, half of a judge’s children should be female, on average. A deviation from this proportion could indicate that a gender preference among judges due a stopping rule such as “have children until we get one girl,” which would violate the randomization assumption.
To check this assumption, use the subset of judges with at least one child and create and save a vector that contains the total number of girls (across all judges) for each level of the number of children. Create and save another vector that is the total number of children (across all judges) for each level of the number of children. (HINT: you might check out the
sum function to help carry out these steps.) Next, divide the total number of girls vector by the total number of children vector to create a new vector that is the proportion girls for each family type. Create a barplot that plots these proportions on the y-axis with the number of children on the x-axis. This barplot should have (a) informative labels on each axis, (b) a y-axis range that runs from 0 to 1, and (c) a horizontal line at 0.5 to compare against. Does it appear that there is strong gender preference/selection happening among judges?