Text analysis: classification and topic modeling
Overview
- Introduce supervised text classification
- Implement a
tidymodels
workflow using text features - Define topic modeling
- Explain Latent Dirichlet allocation and how this process works
- Demonstrate how to use LDA to recover topic structure from an unknown set of topics
- Identify methods for selecting the appropriate parameter for $k$
Before class
- Read Supervised classification with text data
- For greater depth of coverage, read Classification in Supervised Machine Learning for Text Analysis in R
- Read chapter 6 in Tidy Text Mining with R
- Topic modeling from the lecture notes demonstrates how to implement this in a (semi)-tidy workflow