Machine learning

Overview

  • Review the major goals of machine learning
  • Introduce the tidymodels and parsnip packages for estimating regression models
  • Define resampling methods for evaluating model performance
  • Demonstrate how to conduct cross-validation using rsample

Before class

This is not a math/stats class. In class we will briefly summarize how these methods work and spend the bulk of our time on estimating and interpreting these models. That said, you should have some understanding of the mathematical underpinnings of statistical learning methods prior to implementing them yourselves. See below for some recommended readings:

For those with little/no statistics training
  • Chapters 7-8 of OpenIntro Statistics - an open-source statistics textbook written at the level of an introductory undergraduate course on statistics
For those with prior statistics training
  • Chapters 2-3, 4.1-3 in An Introduction to Statistical Learning - a book on statistical learning written at the level of an advanced undergraduate/master’s level course
  • Chapters 4-5 in Hands-On Machine Learning with R - a recent publication which approaches these methods from the perspective of machine learning rather than traditional statistical inference. Includes code examples using R and the caret package.

Class materials

Run the code below in your console to download the exercises for today.

usethis::use_course("cis-ds/machine-learning")

Additional readings

  • caret - a package which unifies hundreds of separate algorithms for generating statistical/machine learning models into a single standardized interface. Very robust, but pre-tidyverse and on the path to deprecation.
  • tidymodels - a collection of packages for machine and statistical learning using tidyverse principles.

What you need to do after class

Benjamin Soltoff
Benjamin Soltoff
Lecturer in Information Science