Build better training data
Overview
- Identify the importance of preprocessing data sets
- Introduce the
recipes
package for preprocessing data - Utilize
usemodels
to automatically construct code templates for common model types - Construct workflows for machine learning
Before class
- Read Preprocess your data
This is not a math/stats class. In class we will briefly summarize how these methods work and spend the bulk of our time on estimating and interpreting these models. That said, you should have some understanding of the mathematical underpinnings of statistical learning methods prior to implementing them yourselves. See below for some recommended readings:
- Chapter 5 in An Introduction to Statistical Learning
- Chapters 2-3 in Hands-On Machine Learning with R
- Feature Engineering and Selection: A Practical Approach for Predictive Models
Class materials
Run the code below in your console to download the exercises for today.
usethis::use_course("cis-ds/machine-learning")
Materials derived from Tidymodels, Virtually: An Introduction to Machine Learning with Tidymodels by Allison Hill.
Additional readings
caret
tidymodels
- Tidy Modeling with R - a book-length introduction to tidy modeling in R
- ISLR
tidymodels
Labs - complement to the 2nd edition of Introduction to Statistical Learning with translations of the labs into using thetidymodels
set of packages.