![](/img/trans.png)
[英]R: purrr: using pmap for row-wise operations, but this time involving LOTS of columns
[英]R - Using purrr::pmap() for row-wise iteration
我試圖了解pmap的工作原理。 下面的小標題包含一個list-column values
。 我想創建一個新列New
,該列取決於values
列中的相應元素是否為NULL。 由於is.null沒有被向量化,因此我最初想到在使用rowwise()
之前先使用rowwise()
pmap()
。
在mutate()
rowwise()
之前使用rowwise()
可得到所需的結果,如下所示:
tbl = as.data.frame(do.call(rbind, pars)) %>%
rowwise() %>%
mutate(New = ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", ")))
> tbl
Source: local data frame [2 x 6]
Groups: <by row>
# A tibble: 2 x 6
id lower upper values default New
<list> <list> <list> <list> <list> <chr>
1 <chr [1]> <dbl [1]> <dbl [1]> <NULL> <dbl [1]> a 5
2 <chr [1]> <NULL> <NULL> <list [3]> <chr [1]> b 0, b 1, b 2
但是, pmap()
不會:
tbl = as.data.frame(do.call(rbind, pars)) %>%
mutate(New = pmap(., ~ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", "))))
> tbl
id lower upper values default New
1 a 1 10 NULL 5 a NULL, b list("0", "1", "2")
2 b NULL NULL 0, 1, 2 1 a NULL, b list("0", "1", "2")
如果我使用匿名函數代替波浪號,這似乎可以工作:
tbl = as.data.frame(do.call(rbind, pars)) %>%
mutate(Value = pmap(., function(values, default, id, ...) ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", "))))
> tbl
id lower upper values default Value
1 a 1 10 NULL 5 a 5
2 b NULL NULL 0, 1, 2 1 b 0, b 1, b 2
但是我不明白為什么波浪號版本會失敗? 我寧願不必完全指定參數,因為我需要在多個列上映射函數。 我要去哪里錯了?
我正要問一個與此非常相似的問題。 基本上,詢問如何在mutate
使用pmap
,而不必多次使用變量名。 相反,我將其作為“答案”發布在這里,因為它包含一個reprex和許多我發現的選項,這些選項都不令我完全滿意。 希望其他人可能能夠根據需要回答如何做。
當使用帶列表列的data.frame時,我經常想在dplyr::mutate
使用purrr::pmap
。 有時,這涉及到很多重復的變量名。 我希望能夠使用匿名函數更簡潔地執行此操作,以便在傳遞給pmap
的.f
參數時,變量僅使用一次。
以這個小的數據集為例:
library('dplyr')
library('purrr')
df <- tribble(
~x, ~y, ~z,
c(1), c(1,10), c(1, 10, 100),
c(2), c(2,20), c(2, 20, 200),
)
說我要應用於每一行的函數是
func <- function(x, y, z){c(sum(x), sum(y), sum(z))}
實際上,該函數將更加復雜,並包含許多變量。 該函數只需要使用一次,因此我不希望不必顯式命名它並阻塞腳本和工作環境。
這是選項。 每個創建完全相同的data.frame,但以不同的方式。 包含avg`` will be come clear. Note I'm not considering position matching using
的原因avg`` will be come clear. Note I'm not considering position matching using
avg`` will be come clear. Note I'm not considering position matching using
..1 ,
..2`等進行avg`` will be come clear. Note I'm not considering position matching using
,因為這很容易弄亂。
# Explicitly create a function for `.f`.
# This requires using the variable names (x, y, z) three times.
# It's completely clear what it's doing, but needs a lot of typing.
# It might sometimes fail - see https://github.com/tidyverse/purrr/issues/280
df_explicit <- df %>%
mutate(
avg = x - mean(x),
a = pmap(.l = list(x, y, z), .f = function(x, y, z){ c(sum(x), sum(y), sum(z)) })
)
# Pass the whole of `df` to `.l` and add `...` in an explicit function to deal with any unused columns.
# variable names are used twice.
# `df` will have to be passes explicitly if not using pipes (eg, `mutate(.data = df, a = pmap(.l = df, ...`).
# This is probably inefficient for large datasets.
df_dots <- df %>%
mutate(
avg = x - mean(x),
a = pmap(.l = ., .f = function(x, y, z, ...){ c(sum(x), sum(y), sum(z)) })
)
# Use `pryr::f` (as discussed in https://stackoverflow.com/a/51123520/4269699).
# Variable names are used twice.
# Potentially unexpected behaviour.
# Not obvious to the casual reader why the extra `pryr::f` is needed and what it's doing
df_pryrf <- df %>%
mutate(
avg = x - mean(x),
a = pmap(.l = list(x,y,z), .f = pryr::f({c(sum(x), sum(y), sum(z))} ))
)
# Use `rowwise()` similar to this: https://stackoverflow.com/a/47734073/4269699
# Variable names are used once.
# It will mess up any vectorised functions used elsewhere in mutate, hence the two `mutate()`s
df_rowwise <- df %>%
mutate( avg = x - mean(x) ) %>%
rowwise() %>%
mutate( a = list( {c(sum(x), sum(y), sum(z))} ) ) %>%
ungroup()
# Use Romain Francois' neat {rap} package.
# Variable names used once.
# Like `rowwise()` it will mess up any vectorised functions so it needs two `mutate()`s for this particular problem
#
library('rap') #devtools::install_github("romainfrancois/rap")
df_rap <- df %>%
mutate( avg = x - mean(x) ) %>%
rap( a = ~ c(sum(x), sum(y), sum(z)) )
# Another solution discussed here https://stackoverflow.com/a/51123520/4269699 doesn't seem to work inside `mutate()`, but maybe could be tweaked?
# Like the `pryr::f` solution, it's not immediately obvious what the purpose of the `with(list(...` bit is.
df_with <- df %>%
mutate(
avg = x-mean(x),
a = pmap(.l = list(x,y,z), .f = ~with(list(...), { c(sum(x), sum(y), sum(z))} ))
)
據我所知,這些是選項,不包括位置匹配。
理想情況下,類似以下的情況是可能的,其中qmap
函數知道從傳遞給mutate
s .data
參數的對象中查找(行式)變量x
, y
和z
。
df_new <- df %>%
mutate(
avg = x-mean(x),
a = qmap( ~c(sum(x), sum(y), sum(z)) )
)
但是我不知道該怎么做,所以請考慮這只是部分答案。
相關問題:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.