FAQ
Should I take this course?
Meet some of the types of students you will find in this class.
Jeri
- General background
- PhD student in sociology
- Studies the science of science
- Uses advanced metrics to set her fantasy football lineup. Seems to be effective as she has won two years in a row.
- Starting points
- Has been analyzing data in Stata for the past three years
- Feels comfortable with regression and econometric methods
- Tried to learn Git on her own once, and quickly became frustrated and gave up
- Needs
- Will be analyzing a large-scale dataset for her dissertation
- Wants to produce high-quality visualizations
- Seeks a reproducible workflow to manage all her exploratory and confirmatory analysis
Ryan
- General background
- Entering the MPS program
- Undergraduate degree in journalism
- Enjoys attending Weird Al Yankovic concerts
- Starting points
- Hasn’t taken a statistics class in five years
- Isn’t sure whether to pursue a PhD or go back into the private sector after graduating
- Took an online course of introduction to R, but hasn’t used it in his day-to-day work
- Needs
- Writing a master’s thesis in a single year
- Expects to analyze a collection of published news articles
- Wants to understand code samples he finds online so he can repurpose them for his own work
Fernando
- General background
- Third-year undergraduate student
- Majoring in political science
- Makes an annual pilgrimage to Comic-Con where he traditionally cosplays as Spock
- Starting points
- Has taken general education math courses, plus the departmental methods course
- Isn’t afraid to tackle a new challenge
- Possesses some experience writing scripts in Stata to automate statistical analysis
- Needs
- Wants to work as a research assistant on a project exploring the onset of civil conflict
- Faculty advisor’s lab works exclusively in R
- Needs to start contributing to a new research paper next quarter
Fang
- General background
- Born and raised in Shenzhen, China
- First time living in the United States
- Improves her English skills by watching the Great British Bake-Off (but was heartbroken when Mary Berry, Mel, and Sue left)
- Starting points
- Background in psychology, plans to apply for doctoral programs in marketing
- Uses a mix of Excel, SPSS, and Matlab
- Needs
- Is going to run 300 experiments on Amazon MTurk in the next six months
- Wants to easily share her analysis notebooks with peers in her research lab
- Expects to take courses in machine learning and Bayesian statistics which require a background in R
General description
This course is open to any graduate (or advanced undergraduate) at Cornell. I anticipate drawing students from a wide range of departments such as Information Science, Sociology, Psychology, and Political Science. Typically these students are looking to learn basic computational and analytic skills they can apply to master’s projects or dissertation research.
If you have never programmed before or don’t even know what the shell is, prepare for a shock. This class will prove to be immensely beneficial if you stick with it, but that will require you to commit for the full semester. I do not presume any prior programming experience, so everyone starts from the same knowledge level. I guarantee that the first few weeks and assignments will be rough - but the good news is that they will be rough for everyone! Your classmates are struggling with you and you can lean on one another to get through the worst part of the learning curve.
A highly selective sampling of feedback from when I taught a similar course at the University of Chicago:
I think this class is really well-organized. The homework is craftily designed as a way to solidify the materials learned in class. Dr. Soltoff is wonderful and helpful! He came to class fully prepared and made the lectures enjoyable and productive. I suggest that all Ph.D. students in Political Science (at least in my field), who likes to conduct quantitative research, should choose this class in the first year, because this class can well set students up with a good understanding of programming techniques.
Very useful material that I hated learning until 2/3 through the quarter.
This class can set you up really nicely with conversant knowledge in R programming and a large amount of coding materials that are helpful for future research. And it also provides students with a first-hand experience of using some of the cutting edge methods and makes students have a taste of them.
I’m so so so glad I ended up taking this course. I had a lot of doubts about my own (lack of) skills and my inability to to handle so many things in one quarter. I had a lot of apprehensions about this course but they all quickly disappeared. It’s quite honestly been one of the most valuable courses I’ve taken at this University and I attribute all of that to your excellence as a lecturer. You and the TAs have always been extremely accessible to everyone and I can’t appreciate that enough. In other classes, I would’ve been more hesitant to ask “dumb questions” but you all have made me comfortable doing so, and I have benefited immensely from it.
I feel like every time I have a question or want to participate, I am always acknowledged. I also built a strong relationship with my classmates which is crucial for some of the difficult assignments.
What do I need for this course?
You will need to bring a computer to class each day. Class sessions are a mix of lecture, demonstration, and live coding. It is essential to have a computer so you can follow along and complete the exercises.
Textbooks/Readings
R for Data Science – Garrett Grolemund and Hadley Wickham
- Hardcover available for purchase online
- Open-source online version is available for free
Completing the exercises in the book? No official solution manual exists, but several can be found online. I recommend this version by Jeffrey B. Arnold. Your exact solutions may vary, but these can be a good starting point.
Additional resources
- ggplot2: Elegant Graphics for Data Analysis, 3rd Edition – Hadley Wickham
- Excellent resource for the
ggplot2
graphics library.
- Excellent resource for the
- Advanced R – Hadley Wickham
- A deep dive into R as a programming language, not just a tool for data science.
- An Introduction to Statistical Learning: with Applications in R – Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani
- A thorough introduction to statistical learning and machine learning methods, focusing on the fundamentals of how these methods work and the assumptions that go into them.
- ISLR
tidymodels
Labs - all the R demonstrations in ISL are written using base R. This site demonstrates how to implement all the labs usingtidymodels
.
- RStudio Cheatsheets - printable cheat sheets for common R tasks and features
Resources for under-represented groups in programming
- R LGBTQ Twitter: Affinity group for queer people in the R community – Twitter often promotes events, panels and talks by and for queer R users.
- Gayta Science Twitter: Alliance that uses data science techniques to give LGBTQ+ experiences a voice – Twitter will often share data-driven work concerning the LGBTQ+ community.
- RLadies Community Slack: A global programming meetup for non-binary, trans, and female R users.
- RLadies Remote Twitter: Remote chapter of R Ladies – has Slack coffee chats to discuss programming topics in a supportive environment.
- People of Color Code Meetup: A meetup for POC software developers – has events where POC developers can work on personal projects, collaborate, and learn.
- R Forwards: A task force set up by the R Foundation to address the under-representation of under-represented groups in the R community – collects representation data in the R community, produces workshops and teaching materials
- R Community Diversity and Inclusion Working Group: Working group set up by the R Consortium to encourage and support diversity and inclusion across a variety of events and platforms in the R community
Software
By the end of the first week (or even better, before the course starts), you should make sure you can access the following software:
- R - easiest approach is to select a pre-compiled binary appropriate for your operating system.
- RStudio’s IDE - this is a powerful user interface for programming in R. You could use base R, but you would regret it.
- Git - Git is a version control system which is used to manage projects and track changes in computer files. Once installed, it can be integrated into RStudio to manage your course assignments and other projects.
Comprehensive instructions for downloading and setting up this software can be found here.
How will I be evaluated?
Students will complete a series of (roughly) weekly programming assignments linked to class materials. These assignments will generally be due the following week prior to Monday’s class. While students are encouraged to assist one another in debugging programs and solving problems in these assignments, it is imperative students also learn how to do this for themselves. That is, students need to understand, write, and submit their own work.
Each homework will be evaluated by either myself or a TA, as well as by two peer reviewers. Each of you is required to provide two peer reviews for each assignment; failure to complete these reviews will result in a deduction of your final grade.
- General guidelines for submitting homework
- Evaluation criteria for homework
- How to perform peer review
- How to properly ask for help
Academic integrity
Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Under the provisions of the Code, anyone who gives or receives unauthorized assistance in the preparation of work at home or during tests in class will be subject to disciplinary action. A student’s name on any piece of work is our assurance that they have neither given nor received any unauthorized help in its preparation. Students may assist each other on assignments by answering questions and explaining various concepts. However, one student should not allow another student to copy their work directly. All University policies with respect to cheating will be enforced. A student who is found to have cheated on an exam, or any other graded assignment, will receive an âFâ in the course.
Statement on diversity, inclusion, and disability
Cornell University (as an institution) and I (as a human being and instructor of this course) am committed to full inclusion in education for all persons. Services and reasonable accommodations are available to persons with temporary and permanent disabilities, to students with DACA or undocumented status, to students facing mental health or other personal challenges, and to students with other kinds of learning challenges. Please feel free to let me know if there are circumstances affecting your ability to participate in class. Some resources that might be of use include:
- Office of Student Disability Services
- Cornell Health CAPS (Counseling & Psychological Services)
- Undocumented/DACA Student support
Disability accommodations
Your access in this course is important to me. Please request your accommodation letter early in the semester, or as soon as you become registered with Student Disability Services (SDS), so that we have adequate time to arrange your approved academic accommodations.
- Once SDS approves your accommodation letter, it will be emailed to both you and me. Please follow up with me to discuss the necessary logistics of your accommodations.
- If you experience any access barriers in this course, such as with printed content, graphics, online materials, or any communication barriers, reach out to me or SDS right away.
- If you need immediate accommodation, please speak with me after class or send an email message to me and SDS.
If you have, or think you may have a disability, please contact Student Disability Services for a confidential discussion: SDS or visit sds.cornell.edu to learn more.
Acknowledgments
- Stock photos of student learners by Generated Photos