pak::pak("cran/tidyr@1.3.0")
New separate_* functions
tidyr 1.3.0
tidyr
Install tidyr 1.3.0 with:
separate_*()
family of functions
A new family of separate_*()
functions supersedes separate()
, separate_rows()
, and extract()
.
Make columns | Make rows | |
---|---|---|
Separate with delimiter | separate(sep = string) |
separate_rows() |
Separate by position | separate(sep = integer_vector |
N/A |
Separate with regular expression | extract() |
Make columns | Make rows | |
---|---|---|
Separate with delimiter | separate_wider_delim() |
separate_longer_delim() |
Separate by position | separate_wider_position() |
separate_longer_position() |
Separate with regular expression | separate_wider_regex() |
Let’s grab some data from the tidyhydat package. The Date
column is made up of a date in YYYY-MM-DD format and time in HH:MM:SS format:
library(tidyhydat)
library(tidyr)
library(dplyr)
station <-
realtime_dd(station_number = c("01CD005", "08MF005"))
station
Queried on: 2023-03-22 03:31:40 (UTC)
Date range: 2023-02-20 to 2023-03-22
# A tibble: 34,144 × 8
STATION_NUMBER PROV_TE…¹ Date Param…² Value Grade Symbol Code
<chr> <chr> <dttm> <chr> <dbl> <chr> <chr> <chr>
1 01CD005 PE 2023-02-20 04:00:00 Flow 0.31 <NA> <NA> 1
2 01CD005 PE 2023-02-20 04:05:00 Flow 0.31 <NA> <NA> 1
3 01CD005 PE 2023-02-20 04:10:00 Flow 0.313 <NA> <NA> 1
4 01CD005 PE 2023-02-20 04:15:00 Flow 0.31 <NA> <NA> 1
5 01CD005 PE 2023-02-20 04:20:00 Flow 0.31 <NA> <NA> 1
6 01CD005 PE 2023-02-20 04:25:00 Flow 0.31 <NA> <NA> 1
7 01CD005 PE 2023-02-20 04:30:00 Flow 0.31 <NA> <NA> 1
8 01CD005 PE 2023-02-20 04:35:00 Flow 0.31 <NA> <NA> 1
9 01CD005 PE 2023-02-20 04:40:00 Flow 0.31 <NA> <NA> 1
10 01CD005 PE 2023-02-20 04:45:00 Flow 0.31 <NA> <NA> 1
# … with 34,134 more rows, and abbreviated variable names ¹PROV_TERR_STATE_LOC,
# ²Parameter
Use separate_wider_position()
to move these into their own columns:
station |>
select(Date) |>
separate_wider_position(Date,
widths = c(ymd = 10, space = 1, hms = 8))
# A tibble: 34,144 × 3
ymd space hms
<chr> <chr> <chr>
1 2023-02-20 " " 04:00:00
2 2023-02-20 " " 04:05:00
3 2023-02-20 " " 04:10:00
4 2023-02-20 " " 04:15:00
5 2023-02-20 " " 04:20:00
6 2023-02-20 " " 04:25:00
7 2023-02-20 " " 04:30:00
8 2023-02-20 " " 04:35:00
9 2023-02-20 " " 04:40:00
10 2023-02-20 " " 04:45:00
# … with 34,134 more rows
What if we don’t want the space
column?
station |>
select(Date) |>
separate_wider_position(Date,
widths = c(ymd = 10, hms = 8))
Error in `separate_wider_position()`:
! Expected 18 characters in each element of `Date`.
! 34144 values were too long.
ℹ Use `too_many = "debug"` to diagnose the problem.
ℹ Use `too_many = "drop"` to silence this message.
station |>
select(Date) |>
separate_wider_position(Date,
widths = c(ymd = 10, hms = 8),
too_many = "debug")
Warning: Debug mode activated: adding variables `Date_ok`, `Date_width`, and
`Date_remainder`.
# A tibble: 34,144 × 6
ymd hms Date Date_width Date_remainder Date_ok
<chr> <chr> <dttm> <int> <chr> <lgl>
1 2023-02-20 " 04:00:0" 2023-02-20 04:00:00 19 0 FALSE
2 2023-02-20 " 04:05:0" 2023-02-20 04:05:00 19 0 FALSE
3 2023-02-20 " 04:10:0" 2023-02-20 04:10:00 19 0 FALSE
4 2023-02-20 " 04:15:0" 2023-02-20 04:15:00 19 0 FALSE
5 2023-02-20 " 04:20:0" 2023-02-20 04:20:00 19 0 FALSE
6 2023-02-20 " 04:25:0" 2023-02-20 04:25:00 19 0 FALSE
7 2023-02-20 " 04:30:0" 2023-02-20 04:30:00 19 0 FALSE
8 2023-02-20 " 04:35:0" 2023-02-20 04:35:00 19 0 FALSE
9 2023-02-20 " 04:40:0" 2023-02-20 04:40:00 19 0 FALSE
10 2023-02-20 " 04:45:0" 2023-02-20 04:45:00 19 0 FALSE
# … with 34,134 more rows
Use NA
if there are components that you don’t want to appear in the output:
station_split <-
station |>
select(Date) |>
separate_wider_position(Date,
widths = c(ymd = 10, 1, hms = 8))
station_split
# A tibble: 34,144 × 2
ymd hms
<chr> <chr>
1 2023-02-20 04:00:00
2 2023-02-20 04:05:00
3 2023-02-20 04:10:00
4 2023-02-20 04:15:00
5 2023-02-20 04:20:00
6 2023-02-20 04:25:00
7 2023-02-20 04:30:00
8 2023-02-20 04:35:00
9 2023-02-20 04:40:00
10 2023-02-20 04:45:00
# … with 34,134 more rows
Use separate_wider_delim()
to break things further down:
station_split |>
separate_wider_delim(ymd,
delim = "-",
names = c("year", "month", "day")) |>
separate_wider_delim(hms,
delim = ":",
names = c("hour", "minute", "second"))
# A tibble: 34,144 × 6
year month day hour minute second
<chr> <chr> <chr> <chr> <chr> <chr>
1 2023 02 20 04 00 00
2 2023 02 20 04 05 00
3 2023 02 20 04 10 00
4 2023 02 20 04 15 00
5 2023 02 20 04 20 00
6 2023 02 20 04 25 00
7 2023 02 20 04 30 00
8 2023 02 20 04 35 00
9 2023 02 20 04 40 00
10 2023 02 20 04 45 00
# … with 34,134 more rows
stations <-
tidyhydat::hy_monthly_levels()