case_when(), case_match(), and consecutive_id()

dplyr 1.1.0

dplyr
A grab bag of new dplyr updates and functions.
Published

January 29, 2023

Install dplyr 1.1.0 with:

pak::pak("cran/dplyr@1.1.0")

Load the package with:

case_when()

case_when() is a general vectorised if-else.

NA

Have you ever run case_when() and gotten the error message:

x <- c(1, 12, -5, 6, -2, NA, 0)
case_when(
  x >= 10 ~ "large",
  x >= 0 ~ "small",
  x < 0 ~ NA
)
Error: `NA` must be <character>, not <logical>.

In this case, you had to use NA_character_ instead of NA.

But not anymore!

In dplyr 1.1.0, the switch to vctrs means that the above code now “just works”:

case_when(
  x >= 10 ~ "large",
  x >= 0 ~ "small",
  x < 0 ~ NA
)
[1] "small" "large" NA      "small" NA      NA      "small"

TRUE

To set a default in case_when(), you used to have to do this:

case_when(
  x >= 10 ~ "large",
  x >= 0 ~ "small",
  is.na(x) ~ "missing",
  TRUE ~ "other"
)
[1] "small"   "large"   "other"   "small"   "other"   "missing" "small"  

Now there’s an explicit argument .default:

case_when(
  x >= 10 ~ "large",
  x >= 0 ~ "small",
  is.na(x) ~ "missing",
  .default = "other"
)
[1] "small"   "large"   "other"   "small"   "other"   "missing" "small"  

TRUE isn’t deprecated yet but the team is planning on deprecating it in the future.

case_match()

Sometimes, case_when() can be a bit repetitive:

x <-
  c("USA", "Canada", "Wales", "UK", "China", NA, "Mexico", "Russia")

case_when(
  x %in% c("USA", "Canada", "Mexico") ~ "North America",
  x %in% c("Wales", "UK") ~ "Europe",
  x %in% "China" ~ "Asia"
)
[1] "North America" "North America" "Europe"        "Europe"       
[5] "Asia"          NA              "North America" NA             

case_match() is a special case that matches values and a nice way to do replacements. You can streamline your code:

case_match(
  x,
  c("USA", "Canada", "Mexico") ~ "North America",
  c("France", "UK") ~ "Europe",
  "China" ~ "Asia"
)
[1] "North America" "North America" NA              "Europe"       
[5] "Asia"          NA              "North America" NA             

They are no longer logical vectors, just values. You can also put NA on the left-hand side:

case_match(
  x,
  c("USA", "Canada", "Mexico") ~ "North America",
  c("France", "UK") ~ "Europe",
  "China" ~ "Asia",
  NA ~ "missing"
)
[1] "North America" "North America" NA              "Europe"       
[5] "Asia"          "missing"       "North America" NA             

It also works with .default:

case_match(
  x,
  c("USA", "Canada", "Mexico") ~ "North America",
  c("France", "UK") ~ "Europe",
  "China" ~ "Asia",
  NA ~ "missing",
  .default = "unknown"
)
[1] "North America" "North America" "unknown"       "Europe"       
[5] "Asia"          "missing"       "North America" "unknown"      
Note

if_else() has received the same updates as case_when(). In particular, it is no longer as strict about typed missing values.

consecutive_id()

Here’s an example transcript:

friends_dialogue
# A tibble: 10 × 2
   text                                                                  speaker
   <chr>                                                                 <chr>  
 1 There's nothing to tell! He's just some guy I work with!              Monica…
 2 C'mon, you're going out with the guy! There's gotta be something wro… Joey T…
 3 All right Joey, be nice. So does he have a hump? A hump and a hairpi… Chandl…
 4 Wait, does he eat chalk?                                              Phoebe…
 5 Just, 'cause, I don't want her to go through what I went through wit… Phoebe…
 6 Okay, everybody relax. This is not even a date. It's just two people… Monica…
 7 Sounds like a date to me.                                             Chandl…
 8 Alright, so I'm back in high school, I'm standing in the middle of t… Chandl…
 9 Then I look down, and I realize there's a phone... there.             Chandl…
10 Instead of...?                                                        Joey T…

What if we want to put the continuous dialogue together in one line?

friends_dialogue |>
  summarise(text = stringr::str_flatten(text, collapse = " "),
            .by = speaker)
# A tibble: 4 × 2
  speaker        text                                                           
  <chr>          <chr>                                                          
1 Monica Geller  There's nothing to tell! He's just some guy I work with! Okay,…
2 Joey Tribbiani C'mon, you're going out with the guy! There's gotta be somethi…
3 Chandler Bing  All right Joey, be nice. So does he have a hump? A hump and a …
4 Phoebe Buffay  Wait, does he eat chalk? Just, 'cause, I don't want her to go …

This smushes everything together - what if we want to keep the consecutive runs?

Enter consecutive_id()!

friends_dialogue |>
  mutate(id = consecutive_id(speaker))
# A tibble: 10 × 3
   text                                                            speaker    id
   <chr>                                                           <chr>   <int>
 1 There's nothing to tell! He's just some guy I work with!        Monica…     1
 2 C'mon, you're going out with the guy! There's gotta be somethi… Joey T…     2
 3 All right Joey, be nice. So does he have a hump? A hump and a … Chandl…     3
 4 Wait, does he eat chalk?                                        Phoebe…     4
 5 Just, 'cause, I don't want her to go through what I went throu… Phoebe…     4
 6 Okay, everybody relax. This is not even a date. It's just two … Monica…     5
 7 Sounds like a date to me.                                       Chandl…     6
 8 Alright, so I'm back in high school, I'm standing in the middl… Chandl…     6
 9 Then I look down, and I realize there's a phone... there.       Chandl…     6
10 Instead of...?                                                  Joey T…     7

With this, we can correctly group the dialogue:

friends_dialogue |>
  mutate(id = consecutive_id(speaker)) |>
  summarise(text = stringr::str_flatten(text, collapse = " "),
            .by = c(id, speaker))
# A tibble: 7 × 3
     id speaker        text                                                     
  <int> <chr>          <chr>                                                    
1     1 Monica Geller  There's nothing to tell! He's just some guy I work with! 
2     2 Joey Tribbiani C'mon, you're going out with the guy! There's gotta be s…
3     3 Chandler Bing  All right Joey, be nice. So does he have a hump? A hump …
4     4 Phoebe Buffay  Wait, does he eat chalk? Just, 'cause, I don't want her …
5     5 Monica Geller  Okay, everybody relax. This is not even a date. It's jus…
6     6 Chandler Bing  Sounds like a date to me. Alright, so I'm back in high s…
7     7 Joey Tribbiani Instead of...?                                           

Learn more