Problem Set 0: Bias in Self-reported Turnout
You can find instructions for obtaining and submitting problem sets here.
For Gov 51 students, you can find the GitHub Classroom link to download the template repository here.
For Gov E-1005 students, you can find the GitHub Classroom link to download the template repository here.
Surveys are frequently used to measure political behavior such as
voter turnout, but some researchers are concerned about the accuracy
of self-reports. In particular, they worry about possible social
desirability bias where in post-election surveys, respondents who did
not vote in an election lie about not having voted because they may
feel that they should have voted. Is such a bias present in the
American National Election Studies (ANES)? The ANES is a nation-wide
survey that has been conducted for every election since 1948. The
ANES conducts face-to-face interviews with a nationally representative
sample of adults. The table below displays the names and descriptions
of variables in the
turnout.csv data file.
||ANES estimated turnout (percentage)|
||Voting Eligible Population (in thousands)|
||Voting Age Population (in thousands)|
||Total ballots cast for highest office (in thousands)|
||Total ineligible felons (in thousands)|
||Total non-citizens (in thousands)|
||Total eligible overseas voters (in thousands)|
||Total ballots counted by overseas voters (in thousands)|
Load the data into R and assign it the name
turnout. Save two
year_range that are the dimensions of the
data and the range of the
year variable. Use these to answer the
following questions (in the text): How many observations are there?
What is the range of years covered in this data set?
Create two new variables in the data set:
turnout$VAPtr: the turnout rate (as a percentage) based on the voting age population or VAP. Note that for this data set, we must add the total number of eligible overseas voters since the
VAPvariable does not include these individuals in the count.
turnout$VEPtr: the turnout rate (as a percentage) based on the voting eligible population or VEP.
Create a table using the line of code
knitr::kable(turnout[,c("year", "VAPtr", "VEPtr")]). What difference do you observe between the different measures of turnout?
Compute the difference between the ANES and VAP estimates of turnout
rate and the difference between the ANES and VEP estimates of
turnout. Save these as new columns in the data called
turnout$diffVEP. How big are each of these
differences on average? What is the range of the differences?