class: center, middle, inverse, title-slide .title[ # Getting data from the web: API access 🎃 ] .author[ ### INFO 5940
Cornell University ] --- class: inverse, middle # Methods for obtaining data online --- ## Methods for obtaining data online * Click and download * Install and play * API query * Scraping --- ## Click and download * `read.csv` or `readr::read_csv` * `downloader` package or `curl` --- ## Application programming interface (API) - Representational State Transfer (REST) - Uniform Resource Location (URL) - HTTP methods - GET - POST --- ## Application programming interface (API) <img src="../../../../../../../../img/wikipedia.png" width="80%" style="display: block; margin: auto;" /> --- ## RESTful queries 1. Submit request to server via URL 1. Return result in a structured format 1. Parse results into a local format --- ## Install and play packages * Packages with R functions written for existing APIs * Useful because * Reproducible * Up-to-date (ideally) * Ease of access --- class: inverse, middle # Using APIs with existing R packages --- ## `manifestoR` * Collects and organizes political party manifestos from around the world * Over 1000 parties from 1945 until today in over 50 countries on five continents * [`manifestoR`](https://github.com/ManifestoProject/manifestoR) --- ## API authentication * Key/token * Obtain key * Store in `.Rprofile` ```r # in .Rprofile options(this_is_my_key = "XXXX") # later, in the R script: key <- getOption("this_is_my_key") ``` -- * `usethis::edit_r_profile()` * Read the documentation - different packages have different storage methods --- ## Load library and set API key ```r library(manifestoR) # retrieve API key stored in .Rprofile mp_setapikey(key = getOption("manifesto_key")) ``` --- ## Retrieve the database ```r (mpds <- mp_maindataset()) ``` ``` ## Connecting to Manifesto Project DB API... ## Connecting to Manifesto Project DB API... corpus version: 2022-1 ``` ``` ## # A tibble: 4,778 × 174 ## country countryname oecdmem…¹ eumem…² edate date party party…³ party…⁴ ## <dbl> <chr> <dbl> <dbl> <date> <dbl> <dbl> <chr> <chr> ## 1 11 Sweden 0 0 1944-09-17 194409 11220 Commun… "SKP" ## 2 11 Sweden 0 0 1944-09-17 194409 11320 Social… "SAP" ## 3 11 Sweden 0 0 1944-09-17 194409 11420 People… "FP" ## 4 11 Sweden 0 0 1944-09-17 194409 11620 Right … "" ## 5 11 Sweden 0 0 1944-09-17 194409 11810 Agrari… "" ## 6 11 Sweden 0 0 1948-09-19 194809 11220 Commun… "SKP" ## 7 11 Sweden 0 0 1948-09-19 194809 11320 Social… "SAP" ## 8 11 Sweden 0 0 1948-09-19 194809 11420 People… "FP" ## 9 11 Sweden 0 0 1948-09-19 194809 11620 Right … "" ## 10 11 Sweden 0 0 1948-09-19 194809 11810 Agrari… "" ## # … with 4,768 more rows, 165 more variables: parfam <dbl>, coderid <dbl>, ## # manual <dbl>, coderyear <dbl>, testresult <dbl>, testeditsim <dbl>, ## # pervote <dbl>, voteest <dbl>, presvote <dbl>, absseat <dbl>, ## # totseats <dbl>, progtype <dbl>, datasetorigin <dbl>, corpusversion <chr>, ## # total <dbl>, peruncod <dbl>, per101 <dbl>, per102 <dbl>, per103 <dbl>, ## # per104 <dbl>, per105 <dbl>, per106 <dbl>, per107 <dbl>, per108 <dbl>, ## # per109 <dbl>, per110 <dbl>, per201 <dbl>, per202 <dbl>, per203 <dbl>, … ``` --- <img src="index_files/figure-html/manifesto-dist-1.png" width="80%" style="display: block; margin: auto;" /> --- ## Download manifestos <img src="index_files/figure-html/manifestor-corpus-wordcloud-1.png" width="80%" style="display: block; margin: auto;" /> --- ## Census data with `tidycensus` * API to access data from US Census Bureau * Decennial census * American Community Survey * Returns tidy data frames with (optional) `sf` geometry * Search for variables with `load_variables()` --- ## Store API key ```r library(tidycensus) ``` ```r census_api_key("YOUR API KEY GOES HERE") ``` --- ## Obtain data ```r usa_inc <- get_acs( geography = "state", variables = c(medincome = "B19013_001"), year = 2020 ) usa_inc ``` ``` ## # A tibble: 52 × 5 ## GEOID NAME variable estimate moe ## <chr> <chr> <chr> <dbl> <dbl> ## 1 01 Alabama medincome 52035 377 ## 2 02 Alaska medincome 77790 1134 ## 3 04 Arizona medincome 61529 286 ## 4 05 Arkansas medincome 49475 431 ## 5 06 California medincome 78672 270 ## 6 08 Colorado medincome 75231 379 ## 7 09 Connecticut medincome 79855 587 ## 8 10 Delaware medincome 69110 1112 ## 9 11 District of Columbia medincome 90842 1580 ## 10 12 Florida medincome 57703 269 ## # … with 42 more rows ``` --- ## Visualize data <img src="index_files/figure-html/income-usa-plot-1.png" width="70%" style="display: block; margin: auto;" /> --- # Twitter API * REST API * Streaming API -- * [`rtweet`](https://docs.ropensci.org/rtweet/) --- # Using `rtweet` ```r library(rtweet) ``` * Requires a Twitter account * Prompt to authorize application on first usage --- # Searching tweets ```r rt <- search_tweets( q = "#rstats", n = 3000, include_rts = FALSE ) rt ``` ``` ## # A tibble: 1,021 × 43 ## created_at id id_str full_…¹ trunc…² displ…³ entities ## <dttm> <dbl> <chr> <chr> <lgl> <dbl> <list> ## 1 2022-10-03 13:39:02 1.58e18 15769902220… "Códig… FALSE 165 <named list> ## 2 2022-10-03 09:38:00 1.58e18 15769295627… "I wan… FALSE 135 <named list> ## 3 2022-09-29 00:51:18 1.58e18 15753474632… "The c… FALSE 272 <named list> ## 4 2022-10-05 11:32:28 1.58e18 15776831442… "Me re… FALSE 51 <named list> ## 5 2022-10-05 11:27:54 1.58e18 15776819973… "Wie m… FALSE 95 <named list> ## 6 2022-10-05 11:27:29 1.58e18 15776818927… "Metap… FALSE 71 <named list> ## 7 2022-10-05 11:23:00 1.58e18 15776807620… "🎫 Do… FALSE 285 <named list> ## 8 2022-10-05 11:22:54 1.58e18 15776807400… "Here’… FALSE 210 <named list> ## 9 2022-10-05 11:22:39 1.58e18 15776806760… "Daily… FALSE 198 <named list> ## 10 2022-10-05 11:21:55 1.58e18 15776804886… "Image… FALSE 73 <named list> ## # … with 1,011 more rows, 36 more variables: metadata <list>, source <chr>, ## # in_reply_to_status_id <dbl>, in_reply_to_status_id_str <chr>, ## # in_reply_to_user_id <dbl>, in_reply_to_user_id_str <chr>, ## # in_reply_to_screen_name <chr>, geo <list>, coordinates <list>, ## # place <list>, contributors <lgl>, is_quote_status <lgl>, ## # retweet_count <int>, favorite_count <int>, favorited <lgl>, ## # retweeted <lgl>, possibly_sensitive <lgl>, lang <chr>, … ``` --- # Searching users ```r countvoncount <- get_timeline(user = "countvoncount", n = 4000) countvoncount ``` ``` ## # A tibble: 3,250 × 43 ## created_at id id_str full_…¹ trunc…² displ…³ entities ## <dttm> <dbl> <chr> <chr> <lgl> <dbl> <list> ## 1 2022-10-04 22:52:10 1.58e18 15774918103… Three … FALSE 42 <named list> ## 2 2022-10-04 16:52:10 1.58e18 15774012115… Three … FALSE 40 <named list> ## 3 2022-10-04 11:52:09 1.58e18 15773257125… Three … FALSE 41 <named list> ## 4 2022-10-03 20:52:08 1.58e18 15770992154… Three … FALSE 51 <named list> ## 5 2022-10-03 10:52:08 1.58e18 15769482177… Three … FALSE 42 <named list> ## 6 2022-10-02 22:52:07 1.58e18 15767670198… Three … FALSE 40 <named list> ## 7 2022-10-02 12:52:06 1.58e18 15766160220… Three … FALSE 40 <named list> ## 8 2022-10-01 15:52:04 1.58e18 15762989262… Three … FALSE 36 <named list> ## 9 2022-09-30 22:52:03 1.58e18 15760422298… Three … FALSE 38 <named list> ## 10 2022-09-30 15:52:03 1.58e18 15759365312… Three … FALSE 38 <named list> ## # … with 3,240 more rows, 36 more variables: source <chr>, ## # in_reply_to_status_id <lgl>, in_reply_to_status_id_str <lgl>, ## # in_reply_to_user_id <lgl>, in_reply_to_user_id_str <lgl>, ## # in_reply_to_screen_name <lgl>, geo <list>, coordinates <list>, ## # place <list>, contributors <lgl>, is_quote_status <lgl>, ## # retweet_count <int>, favorite_count <int>, favorited <lgl>, ## # retweeted <lgl>, lang <chr>, possibly_sensitive <list>, … ``` --- # Visualizing tweets ```r ts_plot(countvoncount, by = "1 week") ``` <img src="index_files/figure-html/rstats-freq-1.png" width="70%" style="display: block; margin: auto;" /> --- # Visualizing tweets ```r ts_plot(countvoncount, by = "1 month") ``` <img src="index_files/figure-html/rstats-freq-day-1.png" width="70%" style="display: block; margin: auto;" /> --- # Visualizing tweets .panelset.sideways[ .panel[.panel-name[Code] ```r ts_plot(countvoncount, by = "1 week") + theme(plot.title = element_text(face = "bold")) + labs( x = NULL, y = NULL, title = "Frequency of @countvoncount Twitter posts", subtitle = "Twitter status (tweet) counts aggregated using one week intervals", caption = "\nSource: Data collected from Twitter's REST API via rtweet" ) ``` ] .panel[.panel-name[Output] <img src="index_files/figure-html/rstats-freq-clean-1.png" width="80%" style="display: block; margin: auto;" /> ] ] --- # Exercise: Practice using `rtweet` <img src="https://media.giphy.com/media/xTka01sEV7sFaU3PUY/giphy.gif" width="80%" style="display: block; margin: auto;" /> --- class: inverse, middle # Writing an API function --- ## Writing an API function * No package for API * Write your own function! * [Open Movie Database](http://www.omdbapi.com/) --- class: center, middle <iframe width="840" height="473" src="https://www.youtube.com/embed/9LmAEVdPhl4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- ## Expected elements 1. Authentication Key/Token 1. Base URL 1. Search Parameters 1. Response Format --- ## Determine the shape of an API request [Use the documentation](https://www.omdbapi.com/#examples) -- ```http http://www.omdbapi.com/?apikey=[apikey]&t=Sharknado&y=2013 ``` ``` ## { ## "Title": "Sharknado", ## "Year": "2013", ## "Rated": "Not Rated", ## "Released": "11 Jul 2013", ## "Runtime": "86 min", ## "Genre": "Action, Adventure, Comedy", ## "Director": "Anthony C. Ferrante", ## "Writer": "Thunder Levin", ## "Actors": "Ian Ziering, Tara Reid, John Heard", ## "Plot": "When a freak hurricane swamps Los Angeles, nature's deadliest killer rules sea, land, and air as thousands of sharks terrorize the waterlogged populace.", ## "Language": "English", ## "Country": "United States", ## "Awards": "1 win & 2 nominations", ## "Poster": "https://m.media-amazon.com/images/M/MV5BODcwZWFiNTEtNDgzMC00ZmE2LWExMzYtNzZhZDgzNDc5NDkyXkEyXkFqcGdeQXVyMTQxNzMzNDI@._V1_SX300.jpg", ## "Ratings": [ ## { ## "Source": "Internet Movie Database", ## "Value": "3.3/10" ## }, ## { ## "Source": "Rotten Tomatoes", ## "Value": "74%" ## } ## ], ## "Metascore": "N/A", ## "imdbRating": "3.3", ## "imdbVotes": "50,032", ## "imdbID": "tt2724064", ## "Type": "movie", ## "DVD": "03 Sep 2013", ## "BoxOffice": "N/A", ## "Production": "N/A", ## "Website": "N/A", ## "Response": "True" ## } ## ``` --- ## `httr::GET()` ```r sharknado <- GET(url = "http://www.omdbapi.com/?", query = list(t = "Sharknado", y = 2013, apikey = getOption("omdb_key")) ) ``` --- ## JavaScript Object Notation (JSON) ``` ## { ## "Title": "Sharknado", ## "Year": "2013", ## "Rated": "Not Rated", ## "Released": "11 Jul 2013", ## "Runtime": "86 min", ## "Genre": "Action, Adventure, Comedy", ## "Director": "Anthony C. Ferrante", ## "Writer": "Thunder Levin", ## "Actors": "Ian Ziering, Tara Reid, John Heard", ## "Plot": "When a freak hurricane swamps Los Angeles, nature's deadliest killer rules sea, land, and air as thousands of sharks terrorize the waterlogged populace.", ## "Language": "English", ## "Country": "United States", ## "Awards": "1 win & 2 nominations", ## "Poster": "https://m.media-amazon.com/images/M/MV5BODcwZWFiNTEtNDgzMC00ZmE2LWExMzYtNzZhZDgzNDc5NDkyXkEyXkFqcGdeQXVyMTQxNzMzNDI@._V1_SX300.jpg", ## "Ratings": [ ## { ## "Source": "Internet Movie Database", ## "Value": "3.3/10" ## }, ## { ## "Source": "Rotten Tomatoes", ## "Value": "74%" ## } ## ], ## "Metascore": "N/A", ## "imdbRating": "3.3", ## "imdbVotes": "49,884", ## "imdbID": "tt2724064", ## "Type": "movie", ## "DVD": "03 Sep 2013", ## "BoxOffice": "N/A", ## "Production": "N/A", ## "Website": "N/A", ## "Response": "True" ## } ## ``` --- ## JSON ```r sharknado_df <- content(sharknado) %>% as_tibble() sharknado_df ``` ``` ## # A tibble: 2 × 25 ## Title Year Rated Relea…¹ Runtime Genre Direc…² Writer Actors Plot Langu…³ ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 Sharkna… 2013 Not … 11 Jul… 86 min Acti… Anthon… Thund… Ian Z… When… English ## 2 Sharkna… 2013 Not … 11 Jul… 86 min Acti… Anthon… Thund… Ian Z… When… English ## # … with 14 more variables: Country <chr>, Awards <chr>, Poster <chr>, ## # Ratings <list>, Metascore <chr>, imdbRating <chr>, imdbVotes <chr>, ## # imdbID <chr>, Type <chr>, DVD <chr>, BoxOffice <chr>, Production <chr>, ## # Website <chr>, Response <chr>, and abbreviated variable names ¹Released, ## # ²Director, ³Language ``` --- ## Additional information from `GET()` ```r sharknado$url ``` ``` ## [1] "http://www.omdbapi.com/?t=Sharknado&y=2013&apikey=[apikey]" ``` -- ```r status_code(sharknado) ``` ``` ## [1] 200 ``` --- ## HTTP status code Code | Status -------|--------| 1xx | Informational 2xx | Success 3xx | Redirection 4xx | Client error (you did something wrong) 5xx | Server error (server did something wrong) > [A more intuitive guide](https://www.flickr.com/photos/girliemac/sets/72157628409467125) --- ## Iteration through a set of movies ```r omdb_api <- function(title, api_key){ # send GET request response <- GET(url = "http://www.omdbapi.com/?", query = list(t = title, apikey = api_key) ) # parse response to JSON response_df <- content(response) %>% as_tibble() # print a message to track progress message(glue::glue("Scraping {title}...")) return(response_df) } ``` --- ## Iteration through a set of movies ```r sharknados <- c("Sharknado", "Sharknado 2", "Sharknado 3", "Sharknado 4", "Sharknado 5") ``` ```r # modify function to delay by one second omdb_api_slow <- purrr::slowly(f = omdb_api, rate = rate_delay(1)) # iterate over all the films sharknados_df <- map_dfr(.x = sharknados, .f = omdb_api_slow, api_key = getOption("omdb_key")) ``` ``` ## Scraping Sharknado... ``` ``` ## Scraping Sharknado 2... ``` ``` ## Scraping Sharknado 3... ``` ``` ## Scraping Sharknado 4... ``` ``` ## Scraping Sharknado 5... ``` --- ## Iteration through a set of movies ```r sharknados_df ``` ``` ## # A tibble: 10 × 25 ## Title Year Rated Relea…¹ Runtime Genre Direc…² Writer Actors Plot Langu…³ ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 Sharkn… 2013 Not … 11 Jul… 86 min Acti… Anthon… Thund… Ian Z… When… English ## 2 Sharkn… 2013 Not … 11 Jul… 86 min Acti… Anthon… Thund… Ian Z… When… English ## 3 Sharkn… 2014 TV-14 30 Jul… 95 min Acti… Anthon… Thund… Ian Z… Fin … English ## 4 Sharkn… 2014 TV-14 30 Jul… 95 min Acti… Anthon… Thund… Ian Z… Fin … English ## 5 Sharkn… 2015 TV-14 22 Jul… 93 min Acti… Anthon… Thund… Ian Z… A mo… English ## 6 Sharkn… 2015 TV-14 22 Jul… 93 min Acti… Anthon… Thund… Ian Z… A mo… English ## 7 Sharkn… 2016 TV-14 31 Jul… 95 min Acti… Anthon… Thund… Ian Z… Fin,… English ## 8 Sharkn… 2016 TV-14 31 Jul… 95 min Acti… Anthon… Thund… Ian Z… Fin,… English ## 9 Sharkn… 2017 TV-14 06 Aug… 93 min Acti… Anthon… Thund… Ian Z… With… English ## 10 Sharkn… 2017 TV-14 06 Aug… 93 min Acti… Anthon… Thund… Ian Z… With… English ## # … with 14 more variables: Country <chr>, Awards <chr>, Poster <chr>, ## # Ratings <list>, Metascore <chr>, imdbRating <chr>, imdbVotes <chr>, ## # imdbID <chr>, Type <chr>, DVD <chr>, BoxOffice <chr>, Production <chr>, ## # Website <chr>, Response <chr>, and abbreviated variable names ¹Released, ## # ²Director, ³Language ``` --- ## Messy API responses ```r content(sharknado) %>% as_tibble() ``` ``` ## # A tibble: 2 × 25 ## Title Year Rated Relea…¹ Runtime Genre Direc…² Writer Actors Plot Langu…³ ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 Sharkna… 2013 Not … 11 Jul… 86 min Acti… Anthon… Thund… Ian Z… When… English ## 2 Sharkna… 2013 Not … 11 Jul… 86 min Acti… Anthon… Thund… Ian Z… When… English ## # … with 14 more variables: Country <chr>, Awards <chr>, Poster <chr>, ## # Ratings <list>, Metascore <chr>, imdbRating <chr>, imdbVotes <chr>, ## # imdbID <chr>, Type <chr>, DVD <chr>, BoxOffice <chr>, Production <chr>, ## # Website <chr>, Response <chr>, and abbreviated variable names ¹Released, ## # ²Director, ³Language ``` --- ## Whoops ``` ## List of 25 ## $ Title : chr "Sharknado" ## $ Year : chr "2013" ## $ Rated : chr "Not Rated" ## $ Released : chr "11 Jul 2013" ## $ Runtime : chr "86 min" ## $ Genre : chr "Action, Adventure, Comedy" ## $ Director : chr "Anthony C. Ferrante" ## $ Writer : chr "Thunder Levin" ## $ Actors : chr "Ian Ziering, Tara Reid, John Heard" ## $ Plot : chr "When a freak hurricane swamps Los Angeles, nature's deadliest killer rules sea, land, and air as thousands of s"| __truncated__ ## $ Language : chr "English" ## $ Country : chr "United States" ## $ Awards : chr "1 win & 2 nominations" ## $ Poster : chr "https://m.media-amazon.com/images/M/MV5BODcwZWFiNTEtNDgzMC00ZmE2LWExMzYtNzZhZDgzNDc5NDkyXkEyXkFqcGdeQXVyMTQxNzM"| __truncated__ ## $ Ratings :List of 2 ## ..$ :List of 2 ## .. ..$ Source: chr "Internet Movie Database" ## .. ..$ Value : chr "3.3/10" ## ..$ :List of 2 ## .. ..$ Source: chr "Rotten Tomatoes" ## .. ..$ Value : chr "74%" ## $ Metascore : chr "N/A" ## $ imdbRating: chr "3.3" ## $ imdbVotes : chr "49,884" ## $ imdbID : chr "tt2724064" ## $ Type : chr "movie" ## $ DVD : chr "03 Sep 2013" ## $ BoxOffice : chr "N/A" ## $ Production: chr "N/A" ## $ Website : chr "N/A" ## $ Response : chr "True" ``` --- class: inverse, middle # Rectangling messy data --- ## Rectangling and `tidyr` .task[ Art and craft of taking a deeply nested list and taming it into a tidy data set of rows and columns ] -- * `unnest_longer()` - each row contains multiple observations * `unnest_wider()` - each row contains a single observation * `unnest_auto()` - make an educated guess * `hoist()` - extract a specific element --- ## `unnest_wider()` and `hoist()` ```r str(gh_users, list.len = 3) ``` ``` ## List of 6 ## $ :List of 30 ## ..$ login : chr "gaborcsardi" ## ..$ id : int 660288 ## ..$ avatar_url : chr "https://avatars.githubusercontent.com/u/660288?v=3" ## .. [list output truncated] ## $ :List of 30 ## ..$ login : chr "jennybc" ## ..$ id : int 599454 ## ..$ avatar_url : chr "https://avatars.githubusercontent.com/u/599454?v=3" ## .. [list output truncated] ## $ :List of 30 ## ..$ login : chr "jtleek" ## ..$ id : int 1571674 ## ..$ avatar_url : chr "https://avatars.githubusercontent.com/u/1571674?v=3" ## .. [list output truncated] ## [list output truncated] ``` --- ## `unnest_wider()` and `hoist()` ```r (users <- tibble(user = gh_users)) ``` ``` ## # A tibble: 6 × 1 ## user ## <list> ## 1 <named list [30]> ## 2 <named list [30]> ## 3 <named list [30]> ## 4 <named list [30]> ## 5 <named list [30]> ## 6 <named list [30]> ``` -- ```r names(users$user[[1]]) ``` ``` ## [1] "login" "id" "avatar_url" ## [4] "gravatar_id" "url" "html_url" ## [7] "followers_url" "following_url" "gists_url" ## [10] "starred_url" "subscriptions_url" "organizations_url" ## [13] "repos_url" "events_url" "received_events_url" ## [16] "type" "site_admin" "name" ## [19] "company" "blog" "location" ## [22] "email" "hireable" "bio" ## [25] "public_repos" "public_gists" "followers" ## [28] "following" "created_at" "updated_at" ``` --- ## `unnest_wider()` ```r users %>% unnest_wider(col = user) ``` ``` ## # A tibble: 6 × 30 ## login id avata…¹ grava…² url html_…³ follo…⁴ follo…⁵ gists…⁶ starr…⁷ ## <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 gaborcsa… 6.60e5 https:… "" http… https:… https:… https:… https:… https:… ## 2 jennybc 5.99e5 https:… "" http… https:… https:… https:… https:… https:… ## 3 jtleek 1.57e6 https:… "" http… https:… https:… https:… https:… https:… ## 4 juliasil… 1.25e7 https:… "" http… https:… https:… https:… https:… https:… ## 5 leeper 3.51e6 https:… "" http… https:… https:… https:… https:… https:… ## 6 masalmon 8.36e6 https:… "" http… https:… https:… https:… https:… https:… ## # … with 20 more variables: subscriptions_url <chr>, organizations_url <chr>, ## # repos_url <chr>, events_url <chr>, received_events_url <chr>, type <chr>, ## # site_admin <lgl>, name <chr>, company <chr>, blog <chr>, location <chr>, ## # email <chr>, hireable <lgl>, bio <chr>, public_repos <int>, ## # public_gists <int>, followers <int>, following <int>, created_at <chr>, ## # updated_at <chr>, and abbreviated variable names ¹avatar_url, ²gravatar_id, ## # ³html_url, ⁴followers_url, ⁵following_url, ⁶gists_url, ⁷starred_url ``` --- ## `hoist()` ```r users %>% hoist( .col = user, followers = "followers", login = "login", url = "html_url" ) ``` ``` ## # A tibble: 6 × 4 ## followers login url user ## <int> <chr> <chr> <list> ## 1 303 gaborcsardi https://github.com/gaborcsardi <named list [27]> ## 2 780 jennybc https://github.com/jennybc <named list [27]> ## 3 3958 jtleek https://github.com/jtleek <named list [27]> ## 4 115 juliasilge https://github.com/juliasilge <named list [27]> ## 5 213 leeper https://github.com/leeper <named list [27]> ## 6 34 masalmon https://github.com/masalmon <named list [27]> ``` --- ## `gh_repos` and nested list structures ```r (repos <- tibble(repo = gh_repos)) ``` ``` ## # A tibble: 6 × 1 ## repo ## <list> ## 1 <list [30]> ## 2 <list [30]> ## 3 <list [30]> ## 4 <list [26]> ## 5 <list [30]> ## 6 <list [30]> ``` --- ## `unnest_longer()` ```r repos <- repos %>% unnest_longer(col = repo) repos ``` ``` ## # A tibble: 176 × 1 ## repo ## <list> ## 1 <named list [68]> ## 2 <named list [68]> ## 3 <named list [68]> ## 4 <named list [68]> ## 5 <named list [68]> ## 6 <named list [68]> ## 7 <named list [68]> ## 8 <named list [68]> ## 9 <named list [68]> ## 10 <named list [68]> ## # … with 166 more rows ``` --- ## `unnest_longer()` ```r repos %>% hoist( .col = repo, login = c("owner", "login"), name = "name", homepage = "homepage", watchers = "watchers_count" ) ``` ``` ## # A tibble: 176 × 5 ## login name homepage watchers repo ## <chr> <chr> <chr> <int> <list> ## 1 gaborcsardi after <NA> 5 <named list [65]> ## 2 gaborcsardi argufy <NA> 19 <named list [65]> ## 3 gaborcsardi ask <NA> 5 <named list [65]> ## 4 gaborcsardi baseimports <NA> 0 <named list [65]> ## 5 gaborcsardi citest <NA> 0 <named list [65]> ## 6 gaborcsardi clisymbols "" 18 <named list [65]> ## 7 gaborcsardi cmaker <NA> 0 <named list [65]> ## 8 gaborcsardi cmark <NA> 0 <named list [65]> ## 9 gaborcsardi conditions <NA> 0 <named list [65]> ## 10 gaborcsardi crayon <NA> 52 <named list [65]> ## # … with 166 more rows ``` --- count: false .panel1-gh-repos-auto-auto[ ```r *tibble(repo = gh_repos) ``` ] .panel2-gh-repos-auto-auto[ ``` ## # A tibble: 6 × 1 ## repo ## <list> ## 1 <list [30]> ## 2 <list [30]> ## 3 <list [30]> ## 4 <list [26]> ## 5 <list [30]> ## 6 <list [30]> ``` ] --- count: false .panel1-gh-repos-auto-auto[ ```r tibble(repo = gh_repos) %>% * unnest_auto(col = repo) ``` ] .panel2-gh-repos-auto-auto[ ``` ## # A tibble: 176 × 1 ## repo ## <list> ## 1 <named list [68]> ## 2 <named list [68]> ## 3 <named list [68]> ## 4 <named list [68]> ## 5 <named list [68]> ## 6 <named list [68]> ## 7 <named list [68]> ## 8 <named list [68]> ## 9 <named list [68]> ## 10 <named list [68]> ## # … with 166 more rows ``` ] --- count: false .panel1-gh-repos-auto-auto[ ```r tibble(repo = gh_repos) %>% unnest_auto(col = repo) %>% * unnest_auto(col = repo) ``` ] .panel2-gh-repos-auto-auto[ ``` ## # A tibble: 176 × 68 ## id name full_name owner private html_url description fork url ## <int> <chr> <chr> <list> <lgl> <chr> <chr> <lgl> <chr> ## 1 6.12e7 after gaborcsa… <named list> FALSE https:/… Run Code i… FALSE http… ## 2 4.05e7 argu… gaborcsa… <named list> FALSE https:/… Declarativ… FALSE http… ## 3 3.64e7 ask gaborcsa… <named list> FALSE https:/… Friendly C… FALSE http… ## 4 3.49e7 base… gaborcsa… <named list> FALSE https:/… Do we get … FALSE http… ## 5 6.16e7 cite… gaborcsa… <named list> FALSE https:/… Test R pac… TRUE http… ## 6 3.39e7 clis… gaborcsa… <named list> FALSE https:/… Unicode sy… FALSE http… ## 7 3.72e7 cmak… gaborcsa… <named list> FALSE https:/… port of cm… TRUE http… ## 8 6.80e7 cmark gaborcsa… <named list> FALSE https:/… CommonMark… TRUE http… ## 9 6.32e7 cond… gaborcsa… <named list> FALSE https:/… <NA> TRUE http… ## 10 2.43e7 cray… gaborcsa… <named list> FALSE https:/… R package … FALSE http… ## # … with 166 more rows, and 59 more variables: forks_url <chr>, keys_url <chr>, ## # collaborators_url <chr>, teams_url <chr>, hooks_url <chr>, ## # issue_events_url <chr>, events_url <chr>, assignees_url <chr>, ## # branches_url <chr>, tags_url <chr>, blobs_url <chr>, git_tags_url <chr>, ## # git_refs_url <chr>, trees_url <chr>, statuses_url <chr>, ## # languages_url <chr>, stargazers_url <chr>, contributors_url <chr>, ## # subscribers_url <chr>, subscription_url <chr>, commits_url <chr>, … ``` ] <style> .panel1-gh-repos-auto-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-gh-repos-auto-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-gh-repos-auto-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ## ASOIAF characters ```r chars <- tibble(char = got_chars) chars ``` ``` ## # A tibble: 30 × 1 ## char ## <list> ## 1 <named list [18]> ## 2 <named list [18]> ## 3 <named list [18]> ## 4 <named list [18]> ## 5 <named list [18]> ## 6 <named list [18]> ## 7 <named list [18]> ## 8 <named list [18]> ## 9 <named list [18]> ## 10 <named list [18]> ## # … with 20 more rows ``` --- ## ASOIAF characters ```r chars2 <- chars %>% unnest_wider(col = char) chars2 ``` ``` ## # A tibble: 30 × 18 ## url id name gender culture born died alive titles aliases father ## <chr> <int> <chr> <chr> <chr> <chr> <chr> <lgl> <list> <list> <chr> ## 1 https://w… 1022 Theo… Male "Ironb… "In … "" TRUE <chr> <chr> "" ## 2 https://w… 1052 Tyri… Male "" "In … "" TRUE <chr> <chr> "" ## 3 https://w… 1074 Vict… Male "Ironb… "In … "" TRUE <chr> <chr> "" ## 4 https://w… 1109 Will Male "" "" "In … FALSE <chr> <chr> "" ## 5 https://w… 1166 Areo… Male "Norvo… "In … "" TRUE <chr> <chr> "" ## 6 https://w… 1267 Chett Male "" "At … "In … FALSE <chr> <chr> "" ## 7 https://w… 1295 Cres… Male "" "In … "In … FALSE <chr> <chr> "" ## 8 https://w… 130 Aria… Female "Dorni… "In … "" TRUE <chr> <chr> "" ## 9 https://w… 1303 Daen… Female "Valyr… "In … "" TRUE <chr> <chr> "" ## 10 https://w… 1319 Davo… Male "Weste… "In … "" TRUE <chr> <chr> "" ## # … with 20 more rows, and 7 more variables: mother <chr>, spouse <chr>, ## # allegiances <list>, books <list>, povBooks <list>, tvSeries <list>, ## # playedBy <list> ``` --- ## Nested list objects ```r chars2 %>% select(where(is.list)) ``` ``` ## # A tibble: 30 × 7 ## titles aliases allegiances books povBooks tvSeries playedBy ## <list> <list> <list> <list> <list> <list> <list> ## 1 <chr [3]> <chr [4]> <chr [1]> <chr [3]> <chr [2]> <chr [6]> <chr [1]> ## 2 <chr [2]> <chr [11]> <chr [1]> <chr [2]> <chr [4]> <chr [6]> <chr [1]> ## 3 <chr [2]> <chr [1]> <chr [1]> <chr [3]> <chr [2]> <chr [1]> <chr [1]> ## 4 <chr [1]> <chr [1]> <NULL> <chr [1]> <chr [1]> <chr [1]> <chr [1]> ## 5 <chr [1]> <chr [1]> <chr [1]> <chr [3]> <chr [2]> <chr [2]> <chr [1]> ## 6 <chr [1]> <chr [1]> <NULL> <chr [2]> <chr [1]> <chr [1]> <chr [1]> ## 7 <chr [1]> <chr [1]> <NULL> <chr [2]> <chr [1]> <chr [1]> <chr [1]> ## 8 <chr [1]> <chr [1]> <chr [1]> <chr [4]> <chr [1]> <chr [1]> <chr [1]> ## 9 <chr [5]> <chr [11]> <chr [1]> <chr [1]> <chr [4]> <chr [6]> <chr [1]> ## 10 <chr [4]> <chr [5]> <chr [2]> <chr [1]> <chr [3]> <chr [5]> <chr [1]> ## # … with 20 more rows ``` --- ## Choose your own adventure <img src="https://images-na.ssl-images-amazon.com/images/I/81E88dflPeL.jpg" width="30%" style="display: block; margin: auto;" /> --- class: inverse, middle # Every appearance per book/season --- count: false ## Every appearance per book/season .panel1-got-appearances-auto[ ```r *select( * .data = chars2, * name, books, tvSeries *) ``` ] .panel2-got-appearances-auto[ ``` ## # A tibble: 30 × 3 ## name books tvSeries ## <chr> <list> <list> ## 1 Theon Greyjoy <chr [3]> <chr [6]> ## 2 Tyrion Lannister <chr [2]> <chr [6]> ## 3 Victarion Greyjoy <chr [3]> <chr [1]> ## 4 Will <chr [1]> <chr [1]> ## 5 Areo Hotah <chr [3]> <chr [2]> ## 6 Chett <chr [2]> <chr [1]> ## 7 Cressen <chr [2]> <chr [1]> ## 8 Arianne Martell <chr [4]> <chr [1]> ## 9 Daenerys Targaryen <chr [1]> <chr [6]> ## 10 Davos Seaworth <chr [1]> <chr [5]> ## # … with 20 more rows ``` ] --- count: false ## Every appearance per book/season .panel1-got-appearances-auto[ ```r select( .data = chars2, name, books, tvSeries ) %>% * pivot_longer( * cols = c(books, tvSeries), * names_to = "media", * values_to = "value" * ) ``` ] .panel2-got-appearances-auto[ ``` ## # A tibble: 60 × 3 ## name media value ## <chr> <chr> <list> ## 1 Theon Greyjoy books <chr [3]> ## 2 Theon Greyjoy tvSeries <chr [6]> ## 3 Tyrion Lannister books <chr [2]> ## 4 Tyrion Lannister tvSeries <chr [6]> ## 5 Victarion Greyjoy books <chr [3]> ## 6 Victarion Greyjoy tvSeries <chr [1]> ## 7 Will books <chr [1]> ## 8 Will tvSeries <chr [1]> ## 9 Areo Hotah books <chr [3]> ## 10 Areo Hotah tvSeries <chr [2]> ## # … with 50 more rows ``` ] --- count: false ## Every appearance per book/season .panel1-got-appearances-auto[ ```r select( .data = chars2, name, books, tvSeries ) %>% pivot_longer( cols = c(books, tvSeries), names_to = "media", values_to = "value" ) %>% * unnest_longer(col = value) ``` ] .panel2-got-appearances-auto[ ``` ## # A tibble: 180 × 3 ## name media value ## <chr> <chr> <chr> ## 1 Theon Greyjoy books A Game of Thrones ## 2 Theon Greyjoy books A Storm of Swords ## 3 Theon Greyjoy books A Feast for Crows ## 4 Theon Greyjoy tvSeries Season 1 ## 5 Theon Greyjoy tvSeries Season 2 ## 6 Theon Greyjoy tvSeries Season 3 ## 7 Theon Greyjoy tvSeries Season 4 ## 8 Theon Greyjoy tvSeries Season 5 ## 9 Theon Greyjoy tvSeries Season 6 ## 10 Tyrion Lannister books A Feast for Crows ## # … with 170 more rows ``` ] <style> .panel1-got-appearances-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-got-appearances-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-got-appearances-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle # Match character's title to their name --- count: false ## Match character's title to their name .panel1-got-title-name-auto[ ```r *select( * .data = chars2, * name, * title = titles *) ``` ] .panel2-got-title-name-auto[ ``` ## # A tibble: 30 × 2 ## name title ## <chr> <list> ## 1 Theon Greyjoy <chr [3]> ## 2 Tyrion Lannister <chr [2]> ## 3 Victarion Greyjoy <chr [2]> ## 4 Will <chr [1]> ## 5 Areo Hotah <chr [1]> ## 6 Chett <chr [1]> ## 7 Cressen <chr [1]> ## 8 Arianne Martell <chr [1]> ## 9 Daenerys Targaryen <chr [5]> ## 10 Davos Seaworth <chr [4]> ## # … with 20 more rows ``` ] --- count: false ## Match character's title to their name .panel1-got-title-name-auto[ ```r select( .data = chars2, name, title = titles ) %>% * unnest_longer(col = title) ``` ] .panel2-got-title-name-auto[ ``` ## # A tibble: 60 × 2 ## name title ## <chr> <chr> ## 1 Theon Greyjoy "Prince of Winterfell" ## 2 Theon Greyjoy "Captain of Sea Bitch" ## 3 Theon Greyjoy "Lord of the Iron Islands (by law of the green lands)" ## 4 Tyrion Lannister "Acting Hand of the King (former)" ## 5 Tyrion Lannister "Master of Coin (former)" ## 6 Victarion Greyjoy "Lord Captain of the Iron Fleet" ## 7 Victarion Greyjoy "Master of the Iron Victory" ## 8 Will "" ## 9 Areo Hotah "Captain of the Guard at Sunspear" ## 10 Chett "" ## # … with 50 more rows ``` ] <style> .panel1-got-title-name-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-got-title-name-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-got-title-name-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # May the force be with you <img src="https://media.giphy.com/media/C0ZArORmrDQCRTIFnQ/giphy.gif" width="80%" style="display: block; margin: auto;" />