Text analysis

  • Basic workflow for text analysis

    Obtain your text sources Text data can come from lots of areas: Web sites Twitter Databases PDF documents Digital scans of printed materials The easier to convert your text data into digitally stored text, the cleaner your results and fewer transcription errors.

  • Practicing tidytext with song titles

    library(tidyverse) library(acs) library(tidytext) library(here) set.seed(1234) theme_set(theme_minimal()) Run the code below in your console to download this exercise as a set of R scripts. usethis::use_course("cis-ds/text-analysis-fundamentals-and-sentiment-analysis") Today let’s practice our tidytext skills with a basic analysis of song titles.

  • Practicing sentiment analysis with Harry Potter

    library(tidyverse) library(tidytext) library(harrypotter) set.seed(1234) theme_set(theme_minimal()) Run the code below in your console to download this exercise as a set of R scripts. usethis::use_course("cis-ds/text-analysis-fundamentals-and-sentiment-analysis") Load Harry Potter text Run the following code to download the harrypotter package:

  • Practicing tidytext with Hamilton

    library(tidyverse) library(tidytext) library(ggtext) library(here) set.seed(123) theme_set(theme_minimal()) About seven months ago, my wife and I became addicted to Hamilton. My name is Alexander Hamilton I admit, we were quite late to the party.

  • Supervised classification with text data

    library(tidyverse) library(tidymodels) library(tidytext) set.seed(1234) theme_set(theme_minimal()) A common task in social science involves hand-labeling sets of documents for specific variables (e.g. manual coding). In previous years, this required hiring a set of research assistants and training them to read and evaluate text by hand.

  • Predicting song artist from lyrics

    library(tidyverse) library(tidymodels) library(here) library(stringr) library(textrecipes) library(themis) library(vip) set.seed(123) theme_set(theme_minimal()) Run the code below in your console to download this exercise as a set of R scripts. usethis::use_course("cis-ds/text-analysis-classification-and-topic-modeling") Beyoncé and Taylor Swift at the 2009 MTV Video Music Awards.

  • Topic modeling

    library(tidyverse) library(tidymodels) library(tidytext) library(textrecipes) library(topicmodels) library(here) library(rjson) library(tm) library(tictoc) library(appa) set.seed(1234) theme_set(theme_minimal()) Typically when we search for information online, there are two primary methods: Keywords - use a search engine and type in words that relate to whatever it is we want to find Links - use the networked structure of the web to travel from page to page.