Machine learning
Overview
- Review the major goals of machine learning
- Introduce the
tidymodels
andparsnip
packages for estimating regression models - Define resampling methods for evaluating model performance
- Demonstrate how to conduct cross-validation using
rsample
Before class
This is not a math/stats class. In class we will briefly summarize how these methods work and spend the bulk of our time on estimating and interpreting these models. That said, you should have some understanding of the mathematical underpinnings of statistical learning methods prior to implementing them yourselves. See below for some recommended readings:
For those with little/no statistics training
- Chapters 7-8 of OpenIntro Statistics - an open-source statistics textbook written at the level of an introductory undergraduate course on statistics
For those with prior statistics training
- Chapters 2-3, 4.1-3 in An Introduction to Statistical Learning - a book on statistical learning written at the level of an advanced undergraduate/master’s level course
- Chapters 4-5 in Hands-On Machine Learning with R - a recent publication which approaches these methods from the perspective of machine learning rather than traditional statistical inference. Includes code examples using R and the
caret
package.
Class materials
Run the code below in your console to download the exercises for today.
usethis::use_course("cis-ds/machine-learning")
Materials derived from Tidymodels, Virtually: An Introduction to Machine Learning with Tidymodels by Allison Hill.
Additional readings
caret
- a package which unifies hundreds of separate algorithms for generating statistical/machine learning models into a single standardized interface. Very robust, but pre-tidyverse
and on the path to deprecation.tidymodels
- a collection of packages for machine and statistical learning usingtidyverse
principles.