Getting data from the web: scraping

Overview

  • Define HTML and CSS selectors
  • Introduce the rvest package
  • Demonstrate how to extract information from HTML pages
  • Demonstrate how to extract tables and convert to data frames
  • Practice scraping data

Before class

Class materials

  • Web scraping
  • rvest
    • Load the library (library(rvest))
    • demo("tripadvisor") - scraping a Trip Advisor page
    • demo("united") - how to scrape a web page which requires a login
    • Scraping IMDB

What you need to do after class

Benjamin Soltoff
Benjamin Soltoff
Lecturer in Information Science