pak::pak("cran/dplyr@1.1.0")
case_when()
, case_match()
, and consecutive_id()
dplyr 1.1.0
Install dplyr 1.1.0 with:
Load the package with:
case_when()
case_when()
is a general vectorised if-else.
NA
Have you ever run case_when()
and gotten the error message:
x <- c(1, 12, -5, 6, -2, NA, 0)
case_when(
x >= 10 ~ "large",
x >= 0 ~ "small",
x < 0 ~ NA
)
Error: `NA` must be <character>, not <logical>.
In this case, you had to use NA_character_
instead of NA
.
But not anymore!
In dplyr 1.1.0, the switch to vctrs means that the above code now “just works”:
case_when(
x >= 10 ~ "large",
x >= 0 ~ "small",
x < 0 ~ NA
)
[1] "small" "large" NA "small" NA NA "small"
TRUE
To set a default in case_when()
, you used to have to do this:
[1] "small" "large" "other" "small" "other" "missing" "small"
Now there’s an explicit argument .default
:
[1] "small" "large" "other" "small" "other" "missing" "small"
TRUE
isn’t deprecated yet but the team is planning on deprecating it in the future.
case_match()
Sometimes, case_when()
can be a bit repetitive:
x <-
c("USA", "Canada", "Wales", "UK", "China", NA, "Mexico", "Russia")
case_when(
x %in% c("USA", "Canada", "Mexico") ~ "North America",
x %in% c("Wales", "UK") ~ "Europe",
x %in% "China" ~ "Asia"
)
[1] "North America" "North America" "Europe" "Europe"
[5] "Asia" NA "North America" NA
case_match()
is a special case that matches values and a nice way to do replacements. You can streamline your code:
case_match(
x,
c("USA", "Canada", "Mexico") ~ "North America",
c("France", "UK") ~ "Europe",
"China" ~ "Asia"
)
[1] "North America" "North America" NA "Europe"
[5] "Asia" NA "North America" NA
They are no longer logical vectors, just values. You can also put NA
on the left-hand side:
case_match(
x,
c("USA", "Canada", "Mexico") ~ "North America",
c("France", "UK") ~ "Europe",
"China" ~ "Asia",
NA ~ "missing"
)
[1] "North America" "North America" NA "Europe"
[5] "Asia" "missing" "North America" NA
It also works with .default
:
case_match(
x,
c("USA", "Canada", "Mexico") ~ "North America",
c("France", "UK") ~ "Europe",
"China" ~ "Asia",
NA ~ "missing",
.default = "unknown"
)
[1] "North America" "North America" "unknown" "Europe"
[5] "Asia" "missing" "North America" "unknown"
consecutive_id()
Here’s an example transcript:
friends_dialogue
# A tibble: 10 × 2
text speaker
<chr> <chr>
1 There's nothing to tell! He's just some guy I work with! Monica…
2 C'mon, you're going out with the guy! There's gotta be something wro… Joey T…
3 All right Joey, be nice. So does he have a hump? A hump and a hairpi… Chandl…
4 Wait, does he eat chalk? Phoebe…
5 Just, 'cause, I don't want her to go through what I went through wit… Phoebe…
6 Okay, everybody relax. This is not even a date. It's just two people… Monica…
7 Sounds like a date to me. Chandl…
8 Alright, so I'm back in high school, I'm standing in the middle of t… Chandl…
9 Then I look down, and I realize there's a phone... there. Chandl…
10 Instead of...? Joey T…
What if we want to put the continuous dialogue together in one line?
friends_dialogue |>
summarise(text = stringr::str_flatten(text, collapse = " "),
.by = speaker)
# A tibble: 4 × 2
speaker text
<chr> <chr>
1 Monica Geller There's nothing to tell! He's just some guy I work with! Okay,…
2 Joey Tribbiani C'mon, you're going out with the guy! There's gotta be somethi…
3 Chandler Bing All right Joey, be nice. So does he have a hump? A hump and a …
4 Phoebe Buffay Wait, does he eat chalk? Just, 'cause, I don't want her to go …
This smushes everything together - what if we want to keep the consecutive runs?
Enter consecutive_id()
!
friends_dialogue |>
mutate(id = consecutive_id(speaker))
# A tibble: 10 × 3
text speaker id
<chr> <chr> <int>
1 There's nothing to tell! He's just some guy I work with! Monica… 1
2 C'mon, you're going out with the guy! There's gotta be somethi… Joey T… 2
3 All right Joey, be nice. So does he have a hump? A hump and a … Chandl… 3
4 Wait, does he eat chalk? Phoebe… 4
5 Just, 'cause, I don't want her to go through what I went throu… Phoebe… 4
6 Okay, everybody relax. This is not even a date. It's just two … Monica… 5
7 Sounds like a date to me. Chandl… 6
8 Alright, so I'm back in high school, I'm standing in the middl… Chandl… 6
9 Then I look down, and I realize there's a phone... there. Chandl… 6
10 Instead of...? Joey T… 7
With this, we can correctly group the dialogue:
friends_dialogue |>
mutate(id = consecutive_id(speaker)) |>
summarise(text = stringr::str_flatten(text, collapse = " "),
.by = c(id, speaker))
# A tibble: 7 × 3
id speaker text
<int> <chr> <chr>
1 1 Monica Geller There's nothing to tell! He's just some guy I work with!
2 2 Joey Tribbiani C'mon, you're going out with the guy! There's gotta be s…
3 3 Chandler Bing All right Joey, be nice. So does he have a hump? A hump …
4 4 Phoebe Buffay Wait, does he eat chalk? Just, 'cause, I don't want her …
5 5 Monica Geller Okay, everybody relax. This is not even a date. It's jus…
6 6 Chandler Bing Sounds like a date to me. Alright, so I'm back in high s…
7 7 Joey Tribbiani Instead of...?