Getting data from the web

  • Using APIs to get data

    library(tidyverse) library(forcats) library(broom) library(wbstats) library(wordcloud) library(tidytext) library(viridis) set.seed(1234) theme_set(theme_minimal()) There are many ways to obtain data from the Internet. Four major categories are: click-and-download on the internet as a “flat” file, such as .

  • Practice getting data from the Twitter API

    library(tidyverse) library(rtweet) set.seed(1234) theme_set(theme_minimal()) Run the code below in your console to download this exercise as a set of R scripts. usethis::use_course("cis-ds/getting-data-from-the-web-api-access") There are several packages for R for accessing and searching Twitter.

  • Writing API queries

    library(tidyverse) library(stringr) library(jsonlite) library(httr) theme_set(theme_minimal()) What happens if someone has not already written a package for the API from which we want to obtain data? We have to write our own function!

  • Simplifying lists

    library(tidyverse) library(httr) library(repurrrsive) set.seed(123) theme_set(theme_minimal()) Run the code below in your console to download this exercise as a set of R scripts. usethis::use_course("cis-ds/getting-data-from-the-web-api-access") Not all lists are easily coerced into data frames by simply calling content() %>% as_tibble().

  • Scraping web pages

    library(tidyverse) library(rvest) library(lubridate) theme_set(theme_minimal()) Run the code below in your console to download this exercise as a set of R scripts. usethis::use_course("cis-ds/getting-data-from-the-web-scraping") What if data is present on a website, but isn’t provided in an API at all?