Machine learning

Oct 19, 2022

Overview

Review the major goals of machine learning
Introduce the tidymodels and parsnip packages for estimating regression models
Define resampling methods for evaluating model performance
Demonstrate how to conduct cross-validation using rsample

Before class

Read Statistical learning: the basics
Read Build a model
Read Evaluate your model with resampling

This is not a math/stats class. In class we will briefly summarize how these methods work and spend the bulk of our time on estimating and interpreting these models. That said, you should have some understanding of the mathematical underpinnings of statistical learning methods prior to implementing them yourselves. See below for some recommended readings:

For those with little/no statistics training

Chapters 7-8 of OpenIntro Statistics - an open-source statistics textbook written at the level of an introductory undergraduate course on statistics

For those with prior statistics training

Chapters 2-3, 4.1-3 in An Introduction to Statistical Learning - a book on statistical learning written at the level of an advanced undergraduate/master’s level course
Chapters 4-5 in Hands-On Machine Learning with R - a recent publication which approaches these methods from the perspective of machine learning rather than traditional statistical inference. Includes code examples using R and the caret package.

Class materials

Run the code below in your console to download the exercises for today.

usethis::use_course("cis-ds/machine-learning")

Materials derived from Tidymodels, Virtually: An Introduction to Machine Learning with Tidymodels by Allison Hill.

Additional readings

caret - a package which unifies hundreds of separate algorithms for generating statistical/machine learning models into a single standardized interface. Very robust, but pre-tidyverse and on the path to deprecation.
tidymodels - a collection of packages for machine and statistical learning using tidyverse principles.