I'd like to use the new native pipe, |>
, with purrr::map_dfr()
. (To make it reproducible, I'm passing the datasets as strings instead of paths, but that shouldn't make a difference.)
csvs <- c(
"csv_a" = "a,b,c\n1,2,3\n4,5,6",
"csv_b" = "a,b,c\n-1,-2,-3"
)
col_types <- readr::cols(.default = readr::col_character())
# Approach 1
csvs |>
purrr::map_dfr(
.f = function(p) {
readr::read_csv(
file = I(p),
col_types = col_types
)
}
)
# Approach 2
library(magrittr)
csvs %>%
purrr::map_dfr(
.x = .,
.f = ~readr::read_csv(
file = I(.),
col_types = col_types
)
)
I have two questions, mostly to continue my understanding of the native pipe .
How do I replace the explicit function(p)
part with the new {\(x)...}()
syntax? The attempt below throws "Error in standardise_path(file): argument "p" is missing, with no default".
csvs |>
purrr::map_dfr(
.f =
{\(p)
readr::read_csv(
file = I(p),
col_types = col_types
)
}()
)
Can I also mimic the magrittr approach (#2)? This somehow reads each row twice, including the header.
csvs |>
{\(p)
purrr::map_dfr(
.x = p,
.f = ~readr::read_csv(
file = I(p),
col_types = col_types
)
)
}()
# Produces
# A tibble: 8 x 3
a b c
<chr> <chr> <chr>
1 1 2 3
2 4 5 6
3 a b c
4 -1 -2 -3
5 1 2 3
6 4 5 6
7 a b c
8 -1 -2 -3
edit : In response to @MrFlick's comment, I've wrapped the argument to file
with I()
in case that becomes a requirement in future versions of readr (it seems to work fine now without it). If you're passing typical file paths (instead of literal strings), remove the call to I()
.
Answer for Question 1 -
csvs |>
purrr::map_dfr(
.f = \(k) {
readr::read_csv(
file = k,
col_types = col_types
)
}
)
# a b c
<chr> <chr> <chr>
#1 1 2 3
#2 4 5 6
#3 -1 -2 -3
Answer for Question 2: for the inner function, you use p
, which reuses csvs
on each call. So the inner function ignores the value its mapping over and instead uses the whole list. You may avoid that using the.x pronoun:
csvs |>
{\(p)
purrr::map_dfr(
.x = p,
.f = ~readr::read_csv(
file = I(.x),
col_types = col_types
)
)
}()
Stylistically, it might be nicer to avoid the formula mapper altogether, since you don't have any custom behavior in your function. The ...
in purrr::map_dfr will be passed on to the function on each call. 1
csvs |>
{\(p) purrr::map_dfr(.x = p, .f = readr::read_csv, col_types = col_types)}()
Since you don't reuse the p
argument, the anonymous function is also unnecessary:
csvs |>
purrr::map_dfr(.f = readr::read_csv, col_types = col_types)
1 @MrFlick is correct in that I()
should be used in principle if you're expecting strings instead of a file name, however in your case, you do not need it because there is a newline in all strings in the csvs
vector. Seehere for details. I take it out to illustrate your alternatives.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.