Machine learning
The basics of statistical learning
Statistical models attempt to summarize relationships between variables by reducing the dimensionality of the data. For example, here we have some simulated data on sales of Shamwow in 200 different markets.
Build a linear model
library(tidymodels) library(tidyverse) library(rcis) library(rstanarm) library(broom.mixed) set.seed(123) theme_set(theme_minimal()) Introduction There are several different approaches to fitting a linear model in R.1 Here, we introduce tidymodels and demonstrate how to construct a basic linear regression model.
Logistic regression
library(tidyverse) library(tidymodels) set.seed(123) theme_set(theme_minimal()) Run the code below in your console to download this exercise as a set of R scripts. usethis::use_course("cis-ds/statistical-learning") Classification problems The sinking of RMS Titanic provided the world with many things:
Working with statistical models
library(tidyverse) library(tidymodels) library(rcis) set.seed(123) theme_set(theme_minimal()) Run the code below in your console to download this exercise as a set of R scripts. usethis::use_course("cis-ds/statistical-learning") Exercise: linear regression with scorecard Recall the scorecard data set which contains information on U.
Preprocess your data
library(tidyverse) library(tidymodels) library(rcis) library(naniar) # visualize missingness library(skimr) # summary statistics tables set.seed(123) theme_set(theme_minimal()) Introduction So far we have learned to build linear and logistic regression models, using the parsnip package to specify and train models with different engine.
Evaluate your model with resampling
library(tidyverse) library(tidymodels) library(ranger) library(rcis) set.seed(123) theme_set(theme_minimal()) Introduction So far, we have built a model and preprocessed data with a recipe. We also introduced workflows as a way to bundle a parsnip model and recipe together.
Tune model parameters
library(tidymodels) library(rpart) library(modeldata) library(kableExtra) library(vip) set.seed(123) doParallel::registerDoParallel() theme_set(theme_minimal()) Introduction Some model parameters cannot be learned directly from a data set during model training; these kinds of parameters are called hyperparameters. Some examples of hyperparameters include the number of predictors that are sampled at splits in a tree-based model (we call this mtry in tidymodels) or the learning rate in a boosted tree model (we call this learn_rate).