简体   繁体   中英

Native pipe with purrr::map_dfr()

I'd like to use the new native pipe, |> , with purrr::map_dfr() . (To make it reproducible, I'm passing the datasets as strings instead of paths, but that shouldn't make a difference.)

csvs <- c(
  "csv_a" = "a,b,c\n1,2,3\n4,5,6",
  "csv_b" = "a,b,c\n-1,-2,-3"
)
col_types <- readr::cols(.default = readr::col_character())

# Approach 1
csvs |> 
  purrr::map_dfr(
    .f = function(p) {
      readr::read_csv(
        file = I(p),
        col_types = col_types
      )
    }
  )

# Approach 2
library(magrittr)
csvs %>%
  purrr::map_dfr(
    .x = .,
    .f = ~readr::read_csv(
      file      = I(.),
      col_types = col_types
    )
  )

I have two questions, mostly to continue my understanding of the native pipe .

Question 1

How do I replace the explicit function(p) part with the new {\(x)...}() syntax? The attempt below throws "Error in standardise_path(file): argument "p" is missing, with no default".

csvs |> 
  purrr::map_dfr(
    .f = 
      {\(p)
        readr::read_csv(
          file      = I(p),
          col_types = col_types
        )
      }()
  )

Question 2

Can I also mimic the magrittr approach (#2)? This somehow reads each row twice, including the header.

csvs |> 
  {\(p)
    purrr::map_dfr(
      .x = p,
      .f = ~readr::read_csv(
        file      = I(p),
        col_types = col_types
      )
    )
  }()

# Produces
# A tibble: 8 x 3
  a     b     c    
  <chr> <chr> <chr>
1 1     2     3    
2 4     5     6    
3 a     b     c    
4 -1    -2    -3   
5 1     2     3    
6 4     5     6    
7 a     b     c    
8 -1    -2    -3   

edit : In response to @MrFlick's comment, I've wrapped the argument to file with I() in case that becomes a requirement in future versions of readr (it seems to work fine now without it). If you're passing typical file paths (instead of literal strings), remove the call to I() .

Answer for Question 1 -

csvs |> 
  purrr::map_dfr(
    .f = \(k) {
      readr::read_csv(
        file      = k,
        col_types = col_types
      )
    }
  )

#     a     b     c
   <chr> <chr> <chr>
#1     1     2     3
#2     4     5     6
#3    -1    -2    -3

Answer for Question 2: for the inner function, you use p , which reuses csvs on each call. So the inner function ignores the value its mapping over and instead uses the whole list. You may avoid that using the.x pronoun:

csvs |> 
  {\(p)
    purrr::map_dfr(
      .x = p,
      .f = ~readr::read_csv(
        file      = I(.x),
        col_types = col_types
      )
    )
  }()

Stylistically, it might be nicer to avoid the formula mapper altogether, since you don't have any custom behavior in your function. The ... in purrr::map_dfr will be passed on to the function on each call. 1

csvs |> 
  {\(p) purrr::map_dfr(.x = p, .f = readr::read_csv, col_types = col_types)}()

Since you don't reuse the p argument, the anonymous function is also unnecessary:

csvs |> 
  purrr::map_dfr(.f = readr::read_csv, col_types = col_types)

1 @MrFlick is correct in that I() should be used in principle if you're expecting strings instead of a file name, however in your case, you do not need it because there is a newline in all strings in the csvs vector. Seehere for details. I take it out to illustrate your alternatives.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM