简体   繁体   English

R-使用purrr :: pmap()进行逐行迭代

[英]R - Using purrr::pmap() for row-wise iteration

I am trying to understand how pmap works. 我试图了解pmap的工作原理。 The tibble below contains a list-column values . 下面的小标题包含一个list-column values I would like to create a new column New that depends on whether or not the corresponding elements in the values column are NULL. 我想创建一个新列New ,该列取决于values列中的相应元素是否为NULL。 Since is.null is not vectorised I initially thought to use rowwise() before coming across pmap() . 由于is.null没有被向量化,因此我最初想到在使用rowwise()之前先使用rowwise() pmap()

Using rowwise() prior to mutate() gives me the desired result as shown below: mutate() rowwise()之前使用rowwise()可得到所需的结果,如下所示:

tbl = as.data.frame(do.call(rbind, pars)) %>%
  rowwise() %>%
  mutate(New = ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", ")))

> tbl
Source: local data frame [2 x 6]
Groups: <by row>

# A tibble: 2 x 6
  id        lower     upper     values     default   New        
  <list>    <list>    <list>    <list>     <list>    <chr>        
1 <chr [1]> <dbl [1]> <dbl [1]> <NULL>     <dbl [1]> a 5          
2 <chr [1]> <NULL>    <NULL>    <list [3]> <chr [1]> b 0, b 1, b 2

However, pmap() does not: 但是, pmap()不会:

tbl = as.data.frame(do.call(rbind, pars)) %>%
      mutate(New = pmap(., ~ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", "))))

> tbl
  id lower upper  values default                         New
1  a     1    10    NULL       5 a NULL, b list("0", "1", "2")
2  b  NULL  NULL 0, 1, 2       1 a NULL, b list("0", "1", "2")

It seems to work if I use an anonymous function in place of the tilde: 如果我使用匿名函数代替波浪号,这似乎可以工作:

tbl = as.data.frame(do.call(rbind, pars)) %>%
  mutate(Value = pmap(., function(values, default, id, ...) ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", "))))

> tbl
  id lower upper  values default         Value
1  a     1    10    NULL       5           a 5
2  b  NULL  NULL 0, 1, 2       1 b 0, b 1, b 2

But I don't understand why the tilde version fails? 但是我不明白为什么波浪号版本会失败? I would prefer not having to specify the arguments in full as I need to map the function over multiple columns. 我宁愿不必完全指定参数,因为我需要在多个列上映射函数。 Where am I going wrong? 我要去哪里错了?

I was about to ask a very similar question to this. 我正要问一个与此非常相似的问题。 Basically, asking how to use pmap within mutate without having to use the variable names more than once. 基本上,询问如何在mutate使用pmap ,而不必多次使用变量名。 Instead, I'll post it as an 'answer' here as it includes a reprex and a number of options that I've found, none of which are completely satisfactory to me. 相反,我将其作为“答案”发布在这里,因为它包含一个reprex和许多我发现的选项,这些选项都不令我完全满意。 Hopefully somebody else might be able to answer how to do it as required. 希望其他人可能能够根据需要回答如何做。

I often want to use purrr::pmap inside dplyr::mutate when working with a data.frame with list-columns. 当使用带列表列的data.frame时,我经常想在dplyr::mutate使用purrr::pmap Occassionally this involves a lot of repetition of variable names. 有时,这涉及到很多重复的变量名。 I'd like to be able to do this more succinctly, using an anonymous function so that the variables are only used once, when passed to pmap 's .f argument. 我希望能够使用匿名函数更简洁地执行此操作,以便在传递给pmap.f参数时,变量仅使用一次。

Take this small dataset as an example: 以这个小的数据集为例:

library('dplyr')
library('purrr')

df <- tribble(
  ~x,   ~y,      ~z,         
  c(1), c(1,10), c(1, 10, 100),
  c(2), c(2,20), c(2, 20, 200),
)

Say the function I want to apply to each row is 说我要应用于每一行的函数是

func <- function(x, y, z){c(sum(x), sum(y), sum(z))}

In practice the function will be more complex, with lots of variables. 实际上,该函数将更加复杂,并包含许多变量。 The function is only needed once, so I'd prefer not to have to name it explicitly and clog up my script and my working environment. 该函数只需要使用一次,因此我不希望不必显式命名它并阻塞脚本和工作环境。

Here are the options. 这是选项。 Each creates exactly the same data.frame but in a different way. 每个创建完全相同的data.frame,但以不同的方式。 The reason for including avg`` will be come clear. Note I'm not considering position matching using 包含avg`` will be come clear. Note I'm not considering position matching using的原因avg`` will be come clear. Note I'm not considering position matching using avg`` will be come clear. Note I'm not considering position matching using ..1 , ..2`, etc. as this is easy to mess up. avg`` will be come clear. Note I'm not considering position matching using ..1 , ..2`等进行avg`` will be come clear. Note I'm not considering position matching using ,因为这很容易弄乱。

# Explicitly create a function for `.f`.
# This requires using the variable names (x, y, z) three times.
# It's completely clear what it's doing, but needs a lot of typing.
# It might sometimes fail - see https://github.com/tidyverse/purrr/issues/280

df_explicit <- df %>%
  mutate(
    avg = x - mean(x),
    a = pmap(.l = list(x, y, z), .f = function(x, y, z){ c(sum(x), sum(y), sum(z)) })
  )

# Pass the whole of `df` to `.l` and add `...` in an explicit function to deal with any unused columns. 
# variable names are used twice.
# `df` will have to be passes explicitly if not using pipes (eg, `mutate(.data = df, a = pmap(.l = df, ...`).
# This is probably inefficient for large datasets.

df_dots <- df %>%
  mutate(
    avg = x - mean(x),
    a = pmap(.l = ., .f = function(x, y, z, ...){ c(sum(x), sum(y), sum(z)) })
  )

# Use `pryr::f` (as discussed in https://stackoverflow.com/a/51123520/4269699).
# Variable names are used twice.
# Potentially unexpected behaviour.
# Not obvious to the casual reader why the extra `pryr::f` is needed and what it's doing

df_pryrf <- df %>%
  mutate(
    avg = x - mean(x),
    a = pmap(.l = list(x,y,z), .f = pryr::f({c(sum(x), sum(y), sum(z))} ))
  )

# Use `rowwise()` similar to this: https://stackoverflow.com/a/47734073/4269699
# Variable names are used once.
# It will mess up any vectorised functions used elsewhere in mutate, hence the two `mutate()`s

df_rowwise <- df %>%
  mutate( avg = x - mean(x) ) %>%
  rowwise() %>%
  mutate( a = list( {c(sum(x), sum(y), sum(z))} ) ) %>%
  ungroup()

# Use Romain Francois' neat {rap} package.
# Variable names used once.
# Like `rowwise()` it will mess up any vectorised functions so it needs two `mutate()`s for this particular problem
#

library('rap') #devtools::install_github("romainfrancois/rap")
df_rap <- df %>%
  mutate( avg = x - mean(x) ) %>%
  rap( a = ~ c(sum(x), sum(y), sum(z)) )

# Another solution discussed here https://stackoverflow.com/a/51123520/4269699 doesn't seem to work inside `mutate()`, but maybe could be tweaked?
# Like the `pryr::f` solution, it's not immediately obvious what the purpose of the `with(list(...` bit is.

df_with <- df %>%
  mutate(
    avg = x-mean(x),
    a = pmap(.l = list(x,y,z), .f = ~with(list(...), { c(sum(x), sum(y), sum(z))} ))
  )

As far as I know these are the options, excluding position matching. 据我所知,这些是选项,不包括位置匹配。

Ideally, something like the following would be possible, where the function qmap knows to find (rowwise) variables x , y , and z from the object passed to mutate s .data argument. 理想情况下,类似以下的情况是可能的,其中qmap函数知道从传递给mutate s .data参数的对象中查找(行式)变量xyz

df_new <- df %>%
  mutate(
    avg = x-mean(x),
    a = qmap( ~c(sum(x), sum(y), sum(z)) )
  )

But I don't know how to do this, so consider this only a partial answer. 但是我不知道该怎么做,所以请考虑这只是部分答案。

Related issues: 相关问题:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM