class: center, middle, inverse, title-slide .title[ # Pipes and functions in R ] .author[ ### INFO 5940
Cornell University ] --- class: inverse, middle # Using the pipe operator --- > Using the [`penguins`](https://github.com/allisonhorst/palmerpenguins) dataset, calculate the average body mass for Adelie penguins on different islands. 1. Filter `penguins` to only keep observations where the species is "Adelie" 1. Group the filtered `penguins` data frame by island 1. Summarize the grouped and filtered `penguins` data frame by calculating the average body mass --- ## Intermediate steps ```r penguins_1 <- filter(penguins, species == "Adelie") penguins_2 <- group_by(penguins_1, island) (penguins_3 <- summarize(penguins_2, body_mass = mean(body_mass_g, na.rm = TRUE))) ``` ``` ## # A tibble: 3 × 2 ## island body_mass ## <fct> <dbl> ## 1 Biscoe 3710. ## 2 Dream 3688. ## 3 Torgersen 3706. ``` --- ## Overwrite the original ```r penguins <- filter(penguins, species == "Adelie") penguins <- group_by(penguins, island) (penguins <- summarize(penguins, body_mass = mean(body_mass_g, na.rm = TRUE))) ``` ``` ## # A tibble: 3 × 2 ## island body_mass ## <fct> <dbl> ## 1 Biscoe 3710. ## 2 Dream 3688. ## 3 Torgersen 3706. ``` --- ## Function composition ```r summarize( group_by( filter( penguins, species == "Adelie" ), island ), body_mass = mean(body_mass_g, na.rm = TRUE) ) ``` ``` ## # A tibble: 3 × 2 ## island body_mass ## <fct> <dbl> ## 1 Biscoe 3710. ## 2 Dream 3688. ## 3 Torgersen 3706. ``` --- ## Function composition ```r summarize(group_by(filter(penguins, species == "Adelie"), island), body_mass = mean(body_mass_g, na.rm = TRUE)) ``` ``` ## # A tibble: 3 × 2 ## island body_mass ## <fct> <dbl> ## 1 Biscoe 3710. ## 2 Dream 3688. ## 3 Torgersen 3706. ``` --- ## Piping ```r penguins %>% filter(species == "Adelie") %>% group_by(island) %>% summarize(body_mass = mean(body_mass_g, na.rm = TRUE)) ``` ``` ## # A tibble: 3 × 2 ## island body_mass ## <fct> <dbl> ## 1 Biscoe 3710. ## 2 Dream 3688. ## 3 Torgersen 3706. ``` --- <img src="../../../../../../../../img/Pipe_baking_magrittr_backAssign.gif" width="50%" style="display: block; margin: auto;" /> .footnote[Source: [Arthur Welle](https://github.com/arthurwelle/VIS/blob/master/Pipe_Cake/Pipe_baking_magrittr_backAssign.gif)] --- class: inverse, middle # Functions --- ## Functions * Easy to reuse * Self-documenting * Easy-ier to debug -- .task[ If you have copied and pasted a block of code more than twice, convert it to a function ] --- ## Function components * Name * Arguments * Body --- ## Rescale function ```r rescale01 <- function(x) { rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1]) } rescale01(c(0, 5, 10)) ## [1] 0.0 0.5 1.0 rescale01(c(-10, 0, 10)) ## [1] 0.0 0.5 1.0 rescale01(c(1, 2, 3, NA, 5)) ## [1] 0.00 0.25 0.50 NA 1.00 ``` * Name * Arguments * Body
01
:
00
--- ## What is that? ```r pythagorean <- function(a, b){ hypotenuse <- sqrt(a^2 + b^2) return(hypotenuse) } ``` * Name * Arguments * Body
01
:
00
--- ## How to use a function .small[ ```r # print the output of the function pythagorean(a = 3, b = 4) ``` ``` ## [1] 5 ``` ```r # save the output as a new object (tri_c <- pythagorean(a = 3, b = 4)) ``` ``` ## [1] 5 ``` ```r # what happens to the hypotenuse from inside the function? pythagorean(a = 3, b = 4) ``` ``` ## [1] 5 ``` ```r hypotenuse ``` ``` ## Error in eval(expr, envir, enclos): object 'hypotenuse' not found ``` ] --- ## Write a function <img src="https://media.giphy.com/media/nVLg9q1hL6yre/giphy.gif" width="60%" style="display: block; margin: auto;" />
10
:
00
--- class: inverse, middle # Conditional execution --- ## Conditional execution ```r if (condition) { # code executed when condition is TRUE } else { # code executed when condition is FALSE } ``` --- ## Conditional execution ```r if (this) { # do that } else if (that) { # do something else } else { # do something completely different } ``` --- ## Conditional execution and `cut()` ```r penguins %>% select(body_mass_g) %>% mutate( body_mass_g_autobin = cut(body_mass_g, breaks = 5), body_mass_g_manbin = cut(body_mass_g, breaks = c(2700, 3600, 4500, 5400, 6300), labels = c("Small", "Medium", "Large", "Huge") ) ) ``` ``` ## # A tibble: 344 × 3 ## body_mass_g body_mass_g_autobin body_mass_g_manbin ## <int> <fct> <fct> ## 1 3750 (3.42e+03,4.14e+03] Medium ## 2 3800 (3.42e+03,4.14e+03] Medium ## 3 3250 (2.7e+03,3.42e+03] Small ## 4 NA <NA> <NA> ## 5 3450 (3.42e+03,4.14e+03] Small ## 6 3650 (3.42e+03,4.14e+03] Medium ## 7 3625 (3.42e+03,4.14e+03] Medium ## 8 4675 (4.14e+03,4.86e+03] Large ## 9 3475 (3.42e+03,4.14e+03] Small ## 10 4250 (4.14e+03,4.86e+03] Medium ## # … with 334 more rows ``` --- ## `if()` versus `if_else()` ```r library(rcis) data("gun_deaths") (educ <- select(gun_deaths, education)) ``` ``` ## # A tibble: 100,798 × 1 ## education ## <fct> ## 1 BA+ ## 2 Some college ## 3 BA+ ## 4 BA+ ## 5 HS/GED ## 6 Less than HS ## 7 HS/GED ## 8 HS/GED ## 9 Some college ## 10 <NA> ## # … with 100,788 more rows ``` --- ## `if()` versus `if_else()` ```r educ_if <- educ %>% mutate(hsPlus = if(education == "Less than HS"){ "Less than HS" } else{ "HS+" }) ``` ``` ## Error in `mutate()`: ## ! Problem while computing `hsPlus = if (...) NULL`. ## Caused by error in `if (education == "Less than HS") ...`: ## ! the condition has length > 1 ``` --- ## `if()` versus `if_else()` ```r (educ_ifelse <- educ %>% mutate(hsPlus = if_else( condition = education == "Less than HS", true = "Less than HS", false = "HS+" ))) ## # A tibble: 100,798 × 2 ## education hsPlus ## <fct> <chr> ## 1 BA+ HS+ ## 2 Some college HS+ ## 3 BA+ HS+ ## 4 BA+ HS+ ## 5 HS/GED HS+ ## 6 Less than HS Less than HS ## 7 HS/GED HS+ ## 8 HS/GED HS+ ## 9 Some college HS+ ## 10 <NA> <NA> ## # … with 100,788 more rows ``` --- ## `if()` versus `if_else()` ```r count(educ_ifelse, hsPlus) ## # A tibble: 3 × 2 ## hsPlus n ## <chr> <int> ## 1 HS+ 77553 ## 2 Less than HS 21823 ## 3 <NA> 1422 ``` --- ## Playing children's games <img src="https://media.giphy.com/media/l2JJJJUUQBvUspZV6/giphy.gif" width="80%" style="display: block; margin: auto;" />
10
:
00