Syllabus for Gov 51: Data Analysis and Politics

Head Instructor

Prof. Matthew Blackwell
mblackwell@gov.harvard.edu
https://www.mattblackwell.org
Office Hours: Thursdays, 10am-noon

Teaching Fellows

Mafalda Pratas Fernandes
pratasfernandes@g.harvard.edu
Office Hours: Fridays, 12:30pm-2:15pm ET

Soichiro Yamauchi
syamauchi@g.harvard.edu
https://soichiroy.github.io/
Office Hours: Tuesdays, 3:00pm – 4:30pm

Description

How can we measure racial discrimination in job hiring? What is the best way to predict election outcomes? What factors drive the onset of civil wars? Is it possible to determine what members of Congress are more or less liberal given their voting record? These are just a few of the numerous question that social scientists are tackling with quantitative data. Beyond academia, companies and non-profits have invested heavily in data science techniques to learn about their users, platforms, and programs. Data scientists at these institutions are essentially applied social scientists and employ many of the same techniques you will learn in this course.

What will you learn in this course? Our goal is to give you the ability to understand, explain, and perform social science research, with a special focus on data analysis and causal reasoning. You will be able to read and understand the methodology of most academic articles in the social sciences, but more importantly you will have a foot in the door of the data science world. The ability to collect and analyze data in a sophisticated manner is becoming a crucial skill set for the modern job market across industries. Finally, you will obtain data literacy that will help you be a critical consumer of evidence for the rest of your life.

Expectations

In this course, you will be expected to

  • complete four problem sets,
  • complete ten weekly tutorials,
  • take two take-home, open-book exams, and
  • write one final data analysis project.

Course objectives

In this course, you will learn to:

  • Evaluate claims about causality
  • Summarize and visualize data
  • Be able to use linear regression to analyze data
  • Understand uncertainty in data analysis and how to quantify it
  • Use professional tools for data analysis such as R, RStudio, git, and GitHub

We will also attempt to inspire a passion for data analysis and create a community among the students to deepen their learning.

Prerequisites

Most students will take Gov 51 after Gov 50 (formerly Gov 1005: Data), so that class will provide sufficient background material. Students with previous statistics exposure (via AP Statistics or similar) or a solid background in high-school level math should also be prepared for the course.

We will assume a basic familiarity with high-school algebra and a working knowledge of computers. If you are unfamiliar with downloading and installing software programs on your Mac or PC, you may want to allocate additional time to make sure those aspects of the course go smoothly. In particular, we have developed a Problem Set 0 to guide you through installing R, RStudio, and git to give you a sense of the tools we will be using. Previous experience with statistical methods and computing (Stata, R, SAS, SPSS) is helpful, but not required. You can always get in touch with the teaching staff for additional help on these issues.

No matter your background, you should be prepared to engage the class material on a regular, almost daily basis even beyond the time dedicated to assignments and exam review. Furthermore, you should feel comfortable with engaging in real-life data analysis using statistical software. We will guide you through that process, but it may be unfamiliar and therefore challenging. You should especially take this course if you plan to use quantitative data at all in a senior thesis.

Credit

This course satisfies the Methods requirement for the Government department and the “Quantitative Reasoning with Data” requirement in the Harvard College Curriculum. It also counts toward completion of the Government department data science track.

Grading

You (the student) and I (the instructor) should care the most about what you learn, not what numerical/letter summary of that learning you get at the end of the semester. So I would love to not have grades at all, but unfortunately we humans are very good at procrastinating on our good intentions when there is no incentive not to. Thus, we have grades to help solve this commitment problem and to encourage you to put effort into learning the course material.

Here are how each portion of course contributes to the overall grade:

Category Percent of Final Grade
R Tutorials 10%
Four Problem Sets 40%
Midterm Exams 30%
Final Project 20%

We will use Gradescope for submission of the various assignments throughout the semester. Once enrollment is finished, Gradescope will automatically connect through Canvas.

Bump-up policy: We reserve the right to “bump up” the grades of students who have made valuable contributions to the course in the lecture, sections, study halls, or discussion/Slack. This also applies to students who show tremendous progress and growth over the semester.

Video Lectures

Each week we will release 2-4 short (10-20 min) videos to watch at the beginning of the week. These videos will cover core material for the week and help augment the reading for better understanding. We expect that all students will watch these videos ahead of the first lecture and post any questions they may have about the lectures to Canvas for students and course staff to answer. The purpose of these videos is to make sure that all students are on the same page when we meet together as a group.

We will encourage and facilitate students watching these videos in watch parties with a small group of students. These parties are useful for two reasons. First, by having a fixed time to watch with others, you are more likely to actually watch them. Second, social viewing of the videos allows opportunities to stop the video to ask questions among your group (what did Blackwell just say?).

Live class meetings

We will hold live class meetings over Zoom once per week to perform an in-class activity related to the material in that week. In these sessions, a member of the course staff will usually walk through a data example and you will be expected to follow along. These meetings will take place once a week Tuesdays at 1:30pm ET. Depending on enrollment, we may add an additional live meeting to accommodate students in other time zones.

Sections

Every student will be assigned to a small-group section of the course which will meet in over Zoom. These sections will focus on reviewing material from class that is useful for the homeworks and exams. These section meetings are crucial for your understanding of the material in this course.

Tutorials

We will assign short weekly tutorials to assess your knowledge of the material covered in the reading and video lectures that week. You will complete these in RStudio. While you are expected to complete them on time, they will be graded based on completion not on how correct the answers are. They will be due Mondays at 11:59pm ET.

At the beginning of the term, these tutorials will focus on getting up to speed in R and over the course of the term, they will focus more on the theoretical aspects of data analysis.

Problem Sets

Only reading about data science is about as instructive as reading a lot about hammers or watching someone else wield a hammer. You need to get your hands on a hammer or two. Thus, in this course, you will have 4 problem sets to complete throughout the semester that will give you an opportunity to apply the statistical techniques you are learning. They will usually be focused on data analysis in general and will often involve a real dataset.

We encourage students to rely on peer working groups as they work on these homeworks, but each student will submit their own work individually. We will facilitate the formation of homework peer groups.

The schedule for the problem sets will be:

Problem Set Topic Release Date Due Date
Problem Set 1 Randomized Experiments Thu, Sep 10th 12:00pm ET Wed, Sep 16th 11:59pm ET
Problem Set 2 Summarizing Data Thu, Sep 24th 12:00pm ET Wed, Sep 30th 11:59pm ET
Problem Set 3 Regression Thu, Oct 29nd 12:00pm ET Wed, Nov 4th 11:59pm ET
Problem Set 4 Inference Thu, Nov 19th 12:00pm ET Wed, Nov 25th 11:59pm ET

Grace policy: When calculating the final homework portion of the overall grade, we will drop the lowest of the four scores and use the remaining scores. Thus, if you have an emergency that forces you to miss one homework, your grade will not be severely affected.

Exams

There will be two take-home exams during the course. This exam will be similar to a homework in format and in the sense that it will be open book and open internet, but you will not be allowed to collaborate with other students or be able to communicate with any humans about the exam. You will be given several days to complete the exam. We will provide more information about the exam as it approaches.

Exam Release Date Due Date
Exam 1 Thu, Oct 15th 5:00pm ET Wed, Oct 21th 11:59pm ET
Exam 2 Tue, Dec 1st 5:00pm ET Mon, Dec 7th 11:59pm ET

Final Project

The final project for the course will be a data analysis project where students will find a dataset of interest, state an interesting research question about that data, and answering this question using that data. Students may work individually, or can work in groups of up to 3 students.

Milestone Due Date
Proposal Wed, Oct 28th 11:59pm ET
Draft Analyses Wed, Nov 18th 11:59pm ET
Final Report Mon, Dec 14th 11:59pm ET

Flow of the Course

The course will follow a basic flow each week, with small differences if a tutorial or problem set is due or not.

  • Monday: Watch the week’s lecture videos; submit tutorial (if due).
  • Tuesday: Live class meetings.
  • Wednesday: Submit problem set or exam (if due).
  • Thursday: Problem set posted; sections meet.
  • Friday: Sections meet.

Discussion

We will be using the Ed platform for discussions on course material. There is a users guide to help orient yourself to the platform. We will enroll you into the platform toward the start of the semester.

Regrading Policy

If you feel there has been an error in the grading of one your assignment, you may request (in writing) a regrade of the assignment. A member of the teaching staff will regrade the entire assignment, not just the part you are disputing. Therefore, your regrade might increase or decrease the overall grade on the assignment.

Office Hours and Availability

My office hours are Wednesdays 10am-12pm ET. If you have questions about the course material, computational issues, or other course-related issues please do not hesitate to set up an appointment with either any of us.

If you have a general question, you can also post it on Canvas. This is almost always the fastest way to get an answer. However, you can also email me directly at mblackwell@gov.harvard.edu. If the question is of general interest, I will forward the question and my answer to the class. Make sure to tell me explicitly in your email if you would like to stay anonymous.

Books

The following textbook is required for this course:

  • Imai, Kosuke Quantitative Social Science: An Introduction. Princeton University Press.

The following books are optional, but may be helpful to build you understanding of the material:

  • Diez, David M., Christopher D. Barr, and Mine Cetinkaya-Rundel. 2015. OpenIntro Statistics. 3rd edition. https://www.openintro.org/
  • Freedman, David, Pisani, Robert, and Purves, Roger. 2007. Statistics. W.W. Norton & Company. 4th edition.
  • Gonick, Larry, and Woollcott Smith. 1993. The Cartoon Guide to Statistics. HarperPerinnial.

Computing

We’ll use R in this class to conduct data analysis. R is free, open source, and available on all major platforms (including Solaris, so no excuses). RStudio (also free) is a graphical interface to R that is widely used to work with the R language. You can find a virtually endless set of resources for R and RStudio on the internet. For beginners, there are several web-based tutorials. In these, you will be able to learn the basic syntax of R. We’ll post more R resources on the course website. We will also use git and Github to manage our projects.

You can get setup with all of these tools by completing Problem Set 0.

Mental Health

College is a stressful time in one’s life and mixing it with a global pandemic, remote learning, and dislocation makes this one of the most fraught semester any of us have probably faced. We acknowledge that nothing is quite normal and that there may be times when you feel overwhelmed by this course or by life more generally. Please feel free to reach out to any of the course staff if you want to talk about any issues you are having with the course or anything else. We will always try to help and we are committed to being extra accommodating this semester on course policy issues. Please just get in touch.

Of course, there are other resources at Harvard if you need them. A few are listed below:

Academic Honesty

The work that you do in both the problem sets should be your own work. You may seek help from others so long as this does not result in someone else completing your work for you. When asking for help, you may show others your code to help diagnose a bug or highlight a potential issue, but you should not view their (working) code. You should cite any discussions you have with other students in your problem set and note if they helped you with your code. You should never copy and paste code from another student or elsewhere (e.g., websites, former students).

I also strongly suggest that you make a solo effort at all the problems before consulting others. The exams will be very difficult if you have no experience working on your own. There is no collaboration allowed on the exams.