Vectors and iteration

---

---

# Atomic vectors

---

## Logical vectors

```r
parse_logical(c("TRUE", "TRUE", "FALSE", "TRUE", "NA"))
## [1]  TRUE  TRUE FALSE  TRUE    NA
```

## Numeric vectors

```r
parse_integer(c("1", "5", "3", "4", "12423"))
## [1]     1     5     3     4 12423
parse_double(c("4.2", "4", "6", "53.2"))
## [1]  4.2  4.0  6.0 53.2
```

## Character vectors

```r
parse_character(c("Goodnight Moon", "Runaway Bunny", "Big Red Barn"))
## [1] "Goodnight Moon" "Runaway Bunny"  "Big Red Barn"
```

---

## Scalars

```r
(x <- sample(10))
```

```
##  [1] 10  6  5  4  1  8  2  7  9  3
```

```r
x + c(100, 100, 100, 100, 100, 100, 100, 100, 100, 100)
```

```
##  [1] 110 106 105 104 101 108 102 107 109 103
```

```r
x + 100
```

```
##  [1] 110 106 105 104 101 108 102 107 109 103
```

---

## Vector recycling

```r
# create a sequence of numbers between 1 and 10
(x1 <- seq(from = 1, to = 2))
```

```
## [1] 1 2
```

```r
(x2 <- seq(from = 1, to = 10))
```

```
##  [1]  1  2  3  4  5  6  7  8  9 10
```

```r
# add together two sequences of numbers
x1 + x2
```

```
##  [1]  2  4  4  6  6  8  8 10 10 12
```

---

## Subsetting vectors

```r
x <- c("one", "two", "three", "four", "five")
```

* With positive integers

```r
x[c(3, 2, 5)]
## [1] "three" "two"   "five"
```

* With negative integers

```r
x[c(-1, -3, -5)]
## [1] "two"  "four"
```

* Don't mix positive and negative

```r
x[c(-1, 1)]
## Error in x[c(-1, 1)]: only 0's may be mixed with negative subscripts
```

---

## Subset with a logical vector

```r
(x <- c(10, 3, NA, 5, 8, 1, NA))
```

```
## [1] 10  3 NA  5  8  1 NA
```

```r
# All non-missing values of x
!is.na(x)
```

```
## [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
```

```r
x[!is.na(x)]
```

```
## [1] 10  3  5  8  1
```

```r
# All even (or missing!) values of x
x[x %% 2 == 0]
```

```
## [1] 10 NA  8 NA
```

---

# Lists

---

## Lists

```r
x <- list(1, 2, 3)
x
```

```
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
```

---

## Lists: `str()`

```r
str(x)
```

```
## List of 3
##  $ : num 1
##  $ : num 2
##  $ : num 3
```

```r
x_named <- list(a = 1, b = 2, c = 3)
str(x_named)
```

```
## List of 3
##  $ a: num 1
##  $ b: num 2
##  $ c: num 3
```

---

## Store a mix of objects

```r
y <- list("a", 1L, 1.5, TRUE)
str(y)
```

```
## List of 4
##  $ : chr "a"
##  $ : int 1
##  $ : num 1.5
##  $ : logi TRUE
```
   
---

---

## Nested lists

```r
z <- list(list(1, 2), list(3, 4))
str(z)
```

```
## List of 2
##  $ :List of 2
##   ..$ : num 1
##   ..$ : num 2
##  $ :List of 2
##   ..$ : num 3
##   ..$ : num 4
```

---

## Secret lists

```r
str(gun_deaths)
```

```
## spec_tbl_df [100,798 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id       : num [1:100798] 1 2 3 4 5 6 7 8 9 10 ...
##  $ year     : num [1:100798] 2012 2012 2012 2012 2012 ...
##  $ month    : chr [1:100798] "Jan" "Jan" "Jan" "Feb" ...
##  $ intent   : chr [1:100798] "Suicide" "Suicide" "Suicide" "Suicide" ...
##  $ police   : num [1:100798] 0 0 0 0 0 0 0 0 0 0 ...
##  $ sex      : chr [1:100798] "M" "F" "M" "M" ...
##  $ age      : num [1:100798] 34 21 60 64 31 17 48 41 50 NA ...
##  $ race     : chr [1:100798] "Asian/Pacific Islander" "White" "White" "White" ...
##  $ place    : chr [1:100798] "Home" "Street" "Other specified" "Home" ...
##  $ education: Factor w/ 4 levels "Less than HS",..: 4 3 4 4 2 1 2 2 3 NA ...
```

---

---

## Exercise on subsetting vectors

---

# Iteration

---

## Iteration

```r
df <- tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)
```

```r
median(df$a)
## [1] 0.1642894
median(df$b)
## [1] 0.01641118
median(df$c)
## [1] 0.2734794
median(df$d)
## [1] -0.639297
```

---

## Iteration three ways

1. `for` loops
1. `map_*()` functions
1. `across()`

---

# Iteration with `for` loops

---

## Iteration with `for` loop

```r
output <- vector(mode = "double", length = ncol(df))
for (i in seq_along(df)) {
  output[[i]] <- median(df[[i]])
}
output
```

```
## [1]  0.16428940  0.01641118  0.27347942 -0.63929695
```

---

## Output

```r
output <- vector(mode = "double", length = ncol(df))
```

```r
vector(mode = "double", length = ncol(df))
## [1] 0 0 0 0
vector(mode = "logical", length = ncol(df))
## [1] FALSE FALSE FALSE FALSE
vector(mode = "character", length = ncol(df))
## [1] "" "" "" ""
vector(mode = "list", length = ncol(df))
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
```

---

## Sequence

```r
i in seq_along(df)
```

```r
seq_along(df)
```

```
## [1] 1 2 3 4
```

---

## Body

```r
output[[i]] <- median(df[[i]])
```

---

## Preallocation

for(i in 1:100){
  mpg_no_preall <- bind_rows(mpg_no_preall, mpg)
}

# with preallocation using a list
mpg_preall <- vector(mode = "list", length = 100)

for(i in 1:100){
  mpg_preall[[i]] <- mpg
}

mpg_preall <- bind_rows(mpg_preall)
```
]

.panel[.panel-name[Plot]
<img src="index_files/figure-html/unnamed-chunk-28-1.png" width="70%" style="display: block; margin: auto;" />
]
]

---

## Exercise on `for()` loops

---

# Iteration with `map_*()` functions

---

## Map functions

* Why `for` loops are good
* Why `map()` functions may be better
* Types of `map()` functions
    * `map()` makes a list
    * `map_lgl()` makes a logical vector
    * `map_int()` makes an integer vector
    * `map_dbl()` makes a double vector
    * `map_chr()` makes a character vector

---

## Map functions

```r
map_dbl(df, mean)
```

```
##          a          b          c          d 
##  0.1694536 -0.1974360  0.3113976 -0.5095255
```

```r
map_dbl(df, median)
```

```
##           a           b           c           d 
##  0.16428940  0.01641118  0.27347942 -0.63929695
```

```r
map_dbl(df, sd)
```

```
##         a         b         c         d 
## 0.5311992 1.0300788 0.8834578 1.0414939
```

---

## Map functions

```r
map_dbl(df, mean, na.rm = TRUE)
```

```
##          a          b          c          d 
##  0.1694536 -0.1974360  0.3113976 -0.5095255
```

```r
df %>%
  map_dbl(mean, na.rm = TRUE)
```

```
##          a          b          c          d 
##  0.1694536 -0.1974360  0.3113976 -0.5095255
```

---

## Exercise on writing `map_*()` functions

---

# Iteration in data frames with `across()`

---

# Single column

```r
car_prices %>%
  summarize(Price = mean(Price))
```

```
## # A tibble: 1 × 1
##    Price
##    <dbl>
## 1 21343.
```

---

# Multiple columns

```r
car_prices %>%
  summarize(
    Price = mean(Price),
    Mileage = mean(Mileage),
    Cylinder = mean(Cylinder),
    Doors = mean(Doors),
    Cruise = mean(Cruise),
    Sound = mean(Sound),
    Leather = mean(Leather),
    Buick = mean(Buick),
    Cadillac = mean(Cadillac),
    Chevy = mean(Chevy),
    Pontiac = mean(Pontiac),
    Saab = mean(Saab),
    Saturn = mean(Saturn),
    convertible = mean(convertible),
    coupe = mean(coupe),
    hatchback = mean(hatchback),
    sedan = mean(sedan),
    wagon = mean(wagon)
  )
```

```
## # A tibble: 1 × 18
##    Price Mileage Cylin…¹ Doors Cruise Sound Leather  Buick Cadil…² Chevy Pontiac
##    <dbl>   <dbl>   <dbl> <dbl>  <dbl> <dbl>   <dbl>  <dbl>   <dbl> <dbl>   <dbl>
## 1 21343.  19832.    5.27  3.53  0.752 0.679   0.724 0.0995  0.0995 0.398   0.187
## # … with 7 more variables: Saab <dbl>, Saturn <dbl>, convertible <dbl>,
## #   coupe <dbl>, hatchback <dbl>, sedan <dbl>, wagon <dbl>, and abbreviated
## #   variable names ¹Cylinder, ²Cadillac
```

---

---

## `dplyr::across()`

`across()` has two primary arguments:

* `.cols`, selects the columns you want to operate on
* `.fns`, is a function or list of functions to apply to each column

---

## `summarize()`, `across()`, and `everything()`

```r
car_prices %>%
  summarize(across(.cols = everything(), .fns = mean))
```

```r
car_prices %>%
  summarize(across(everything(), .fns = list(min, max)))
```

```
## # A tibble: 1 × 36
##   Price_1 Price_2 Mileage_1 Mileage_2 Cylinder_1 Cylin…¹ Doors_1 Doors_2 Cruis…²
##     <dbl>   <dbl>     <int>     <int>      <int>   <int>   <int>   <int>   <int>
## 1   8639.  70755.       266     50387          4       8       2       4       0
## # … with 27 more variables: Cruise_2 <int>, Sound_1 <int>, Sound_2 <int>,
## #   Leather_1 <int>, Leather_2 <int>, Buick_1 <int>, Buick_2 <int>,
## #   Cadillac_1 <int>, Cadillac_2 <int>, Chevy_1 <int>, Chevy_2 <int>,
## #   Pontiac_1 <int>, Pontiac_2 <int>, Saab_1 <int>, Saab_2 <int>,
## #   Saturn_1 <int>, Saturn_2 <int>, convertible_1 <int>, convertible_2 <int>,
## #   coupe_1 <int>, coupe_2 <int>, hatchback_1 <int>, hatchback_2 <int>,
## #   sedan_1 <int>, sedan_2 <int>, wagon_1 <int>, wagon_2 <int>, and …
```
]

```r
car_prices %>%
  summarize(across(everything(), .fns = list(min = min, max = max)))
```

```
## # A tibble: 1 × 36
##   Price_min Price_max Mileage_…¹ Milea…² Cylin…³ Cylin…⁴ Doors…⁵ Doors…⁶ Cruis…⁷
##       <dbl>     <dbl>      <int>   <int>   <int>   <int>   <int>   <int>   <int>
## 1     8639.    70755.        266   50387       4       8       2       4       0
## # … with 27 more variables: Cruise_max <int>, Sound_min <int>, Sound_max <int>,
## #   Leather_min <int>, Leather_max <int>, Buick_min <int>, Buick_max <int>,
## #   Cadillac_min <int>, Cadillac_max <int>, Chevy_min <int>, Chevy_max <int>,
## #   Pontiac_min <int>, Pontiac_max <int>, Saab_min <int>, Saab_max <int>,
## #   Saturn_min <int>, Saturn_max <int>, convertible_min <int>,
## #   convertible_max <int>, coupe_min <int>, coupe_max <int>,
## #   hatchback_min <int>, hatchback_max <int>, sedan_min <int>, …
```
]

```r
car_prices %>%
  group_by(Cylinder) %>%
  summarize(across(everything(), .fns = mean))
```

```
## # A tibble: 3 × 18
##   Cylinder  Price Mileage Doors Cruise Sound Leather Buick Cadil…¹ Chevy Pontiac
##      <int>  <dbl>   <dbl> <dbl>  <dbl> <dbl>   <dbl> <dbl>   <dbl> <dbl>   <dbl>
## 1        4 17863.  20108.  3.44  0.599 0.698   0.746 0      0      0.457   0.127
## 2        6 20081.  19564.  3.74  0.868 0.706   0.606 0.258  0.0645 0.387   0.258
## 3        8 38968.  19575.  3.2   1     0.52    1     0      0.6    0.2     0.2  
## # … with 7 more variables: Saab <dbl>, Saturn <dbl>, convertible <dbl>,
## #   coupe <dbl>, hatchback <dbl>, sedan <dbl>, wagon <dbl>, and abbreviated
## #   variable name ¹Cadillac
```
]

]

---

## `worldbank`

```r
data("worldbank", package = "rcis")
worldbank
```

```
## # A tibble: 78 × 14
##    iso3c date  iso2c country   perc_en…¹ rnd_g…² percg…³ real_…⁴ gdp_c…⁵ top10…⁶
##    <chr> <chr> <chr> <chr>         <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 ARG   2005  AR    Argentina      89.1   0.379    15.5   6198.   5110.    35  
##  2 ARG   2006  AR    Argentina      88.7   0.400    22.1   7388.   5919.    33.9
##  3 ARG   2007  AR    Argentina      89.2   0.402    22.8   8182.   7245.    33.8
##  4 ARG   2008  AR    Argentina      90.7   0.421    21.6   8576.   9021.    32.5
##  5 ARG   2009  AR    Argentina      89.6   0.519    18.9   7904.   8225.    31.4
##  6 ARG   2010  AR    Argentina      89.5   0.518    17.9   8803.  10386.    32  
##  7 ARG   2011  AR    Argentina      88.9   0.537    17.9   9528.  12849.    31  
##  8 ARG   2012  AR    Argentina      89.0   0.609    16.5   9301.  13083.    29.7
##  9 ARG   2013  AR    Argentina      89.0   0.612    15.3   9367.  13080.    29.4
## 10 ARG   2014  AR    Argentina      87.7   0.613    16.1   8903.  12335.    29.9
## # … with 68 more rows, 4 more variables: employment_ratio <dbl>,
## #   life_exp <dbl>, pop_growth <dbl>, pop <dbl>, and abbreviated variable names
## #   ¹perc_energy_fosfuel, ²rnd_gdpshare, ³percgni_adj_gross_savings,
## #   ⁴real_netinc_percap, ⁵gdp_capita, ⁶top10perc_incshare
```

---

## `summarize()`, `across()`, and `where()`

```r
worldbank %>%
  group_by(country) %>%
  summarize(across(.cols = where(is.numeric), .fns = mean, na.rm = TRUE))
```

```
## # A tibble: 6 × 11
##   country        perc_…¹ rnd_g…² percg…³ real_…⁴ gdp_c…⁵ top10…⁶ emplo…⁷ life_…⁸
##   <chr>            <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Argentina         89.1  0.501     17.5   8560.  10648.    31.6    55.4    75.4
## 2 China             87.6  1.67      48.3   3661.   5397.    30.8    69.8    74.7
## 3 Indonesia         65.3  0.0841    30.5   2041.   2881.    31.2    62.5    69.5
## 4 Norway            58.9  1.60      37.2  70775.  85622.    21.9    67.3    81.3
## 5 United Kingdom    86.3  1.68      13.5  34542.  43416.    26.2    58.7    80.4
## 6 United States     84.2  2.69      17.6  42824.  51285.    30.1    60.2    78.4
## # … with 2 more variables: pop_growth <dbl>, pop <dbl>, and abbreviated
## #   variable names ¹perc_energy_fosfuel, ²rnd_gdpshare,
## #   ³percgni_adj_gross_savings, ⁴real_netinc_percap, ⁵gdp_capita,
## #   ⁶top10perc_incshare, ⁷employment_ratio, ⁸life_exp
```
]

```r
worldbank %>%
  group_by(country) %>%
  summarize(across(
    .cols = where(is.numeric) & starts_with("perc"),
    .fn = mean, na.rm = TRUE
  ))
```

```
## # A tibble: 6 × 3
##   country        perc_energy_fosfuel percgni_adj_gross_savings
##   <chr>                        <dbl>                     <dbl>
## 1 Argentina                     89.1                      17.5
## 2 China                         87.6                      48.3
## 3 Indonesia                     65.3                      30.5
## 4 Norway                        58.9                      37.2
## 5 United Kingdom                86.3                      13.5
## 6 United States                 84.2                      17.6
```
]

]

---

## `across()` and `mutate()`

```r
car_prices %>%
  mutate(across(.cols = Price:Doors, .fns = log10))
```

```
## # A tibble: 804 × 18
##    Price Mileage Cylinder Doors Cruise Sound Leather Buick Cadil…¹ Chevy Pontiac
##    <dbl>   <dbl>    <dbl> <dbl>  <int> <int>   <int> <int>   <int> <int>   <int>
##  1  4.36    4.30    0.778 0.602      1     0       0     1       0     0       0
##  2  4.34    4.13    0.778 0.301      1     1       0     0       0     1       0
##  3  4.46    4.50    0.602 0.301      1     1       1     0       0     0       0
##  4  4.49    4.35    0.602 0.301      1     0       0     0       0     0       0
##  5  4.52    4.25    0.602 0.301      1     1       1     0       0     0       0
##  6  4.48    4.37    0.602 0.301      1     0       0     0       0     0       0
##  7  4.52    4.24    0.602 0.301      1     1       1     0       0     0       0
##  8  4.48    4.44    0.602 0.301      1     0       1     0       0     0       0
##  9  4.48    4.40    0.602 0.301      1     0       0     0       0     0       0
## 10  4.43    4.24    0.602 0.602      1     0       1     0       0     0       0
## # … with 794 more rows, 7 more variables: Saab <int>, Saturn <int>,
## #   convertible <int>, coupe <int>, hatchback <int>, sedan <int>, wagon <int>,
## #   and abbreviated variable name ¹Cadillac
```

---

## ~~`across()`~~ and `filter()`

```r
worldbank %>%
  filter(if_any(everything(), ~ !is.na(.x)))
```

```r
worldbank %>%
  filter(if_all(everything(), ~ !is.na(.x)))
```

```
## # A tibble: 42 × 14
##    iso3c date  iso2c country   perc_en…¹ rnd_g…² percg…³ real_…⁴ gdp_c…⁵ top10…⁶
##    <chr> <chr> <chr> <chr>         <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 ARG   2005  AR    Argentina      89.1   0.379    15.5   6198.   5110.    35  
##  2 ARG   2006  AR    Argentina      88.7   0.400    22.1   7388.   5919.    33.9
##  3 ARG   2007  AR    Argentina      89.2   0.402    22.8   8182.   7245.    33.8
##  4 ARG   2008  AR    Argentina      90.7   0.421    21.6   8576.   9021.    32.5
##  5 ARG   2009  AR    Argentina      89.6   0.519    18.9   7904.   8225.    31.4
##  6 ARG   2010  AR    Argentina      89.5   0.518    17.9   8803.  10386.    32  
##  7 ARG   2011  AR    Argentina      88.9   0.537    17.9   9528.  12849.    31  
##  8 ARG   2012  AR    Argentina      89.0   0.609    16.5   9301.  13083.    29.7
##  9 ARG   2013  AR    Argentina      89.0   0.612    15.3   9367.  13080.    29.4
## 10 ARG   2014  AR    Argentina      87.7   0.613    16.1   8903.  12335.    29.9
## # … with 32 more rows, 4 more variables: employment_ratio <dbl>,
## #   life_exp <dbl>, pop_growth <dbl>, pop <dbl>, and abbreviated variable names
## #   ¹perc_energy_fosfuel, ²rnd_gdpshare, ³percgni_adj_gross_savings,
## #   ⁴real_netinc_percap, ⁵gdp_capita, ⁶top10perc_incshare
```
]

]

---

## Exercise on `across()` iteration