R：purrr：使用pmap进行逐行操作，但这一次涉及列的LOTS

Question

我了解如何使用pmap()对数据帧执行逐行操作：

library(tidyverse)

df1 = tribble(~col_1, ~col_2, ~col_3,
               1,      5,      12,
               9,      3,      3,
               6,     10,     7)

foo = function(col_1, col_2, col_3) {
  mean(c(col_1, col_2, col_3))
}

df1 %>% pmap_dbl(foo)

这使函数foo应用于每一行：

[1] 6.000000 5.000000 7.666667

但是，当我有多个列时，这变得非常笨拙，因为我必须将它们全部显式传递。 如果我说过，我的数据bar df2有8列，我想应用一个功能bar ，该功能bar可能涉及这些列中的每一列怎么办？

set.seed(12345)
df2 = rnorm(n=24) %>% matrix(nrow=3) %>% as_tibble() %>%
  setNames(c("col_1", "col_2", "col_3", "col_4", "col_5", "col_6", "col_7", "col_8"))

bar = function(col_1, col_2, col_3, col_4, col_5, col_6, col_7, col_8) {
  # imagine we do some complicated row-wise operation here
  mean(c(col_1, col_2, col_3, col_4, col_5, col_6, col_7, col_8))
}

df2 %>% pmap_dbl(bar)

给出：

[1]  0.45085420  0.02639697 -0.28121651

这显然是不够的-我要补充一个新的参数，以bar为每一个列。 大量的输入使代码更不易读且更脆弱。 似乎应该有一种方法可以让它接受单个参数x ，然后通过x$col_1等访问我想要的变量。无论如何，还是比上面更优雅的方法。 有什么方法可以使用purrr清除此代码吗？

Answer 1

您可以使用...并在它们进入功能后将其list 。

dot_tester <- function(...) {
  dots <- list(...)
  dots$Sepal.Length + dots$Petal.Width
}

purrr::pmap(head(iris), dot_tester)

 [[1]] [1] 5.3 [[2]] [1] 5.1 [[3]] [1] 4.9 [[4]] [1] 4.8 [[5]] [1] 5.2 [[6]] [1] 5.8

但是，这不会改变代码的“脆弱性”，因为您仍然需要明确且确切地将列名匹配为函数中的名称。 好处是不必在<- function()调用中列出它们。

Answer 2

我想到的最简单（可能最不安全）的方法是利用...参数，以获取任意数量的列

library(tidyverse)

set.seed(12345)
df2  <-  rnorm(n=24) %>% matrix(nrow=3) %>% as_tibble() %>%
  setNames(c("col_1", "col_2", "col_3", "col_4", "col_5", "col_6", "col_7", "col_8"))
#> Warning: `as_tibble.matrix()` requires a matrix with column names or a `.name_repair` argument. Using compatibility `.name_repair`.
#> This warning is displayed once per session.

bar <- function(...){
  mean(c(...))
}
df2 %>% pmap_dbl(bar)
#> [1]  0.45085420  0.02639697 -0.28121651

^{由reprex软件包（v0.3.0）创建于2019-08-05}

Answer 3

purrr::transpose的答案有效，但我还发现了使用purrr::transpose另一种方法，该方法使我可以使用单个命名变量x而不是... ，并且可以按名称访问任何列：

foo = function(x) {
  (x$col_1 + x$col_2 + x$col_3)/3
}

df1 %>% transpose() %>% map_dbl(foo)

这给出了正确的答案：

[1] 6.000000 5.000000 7.666667

至于其他数据框：

set.seed(12345)
df2 = rnorm(n=24) %>% matrix(nrow=3) %>% as_tibble() %>%
  setNames(c("col_1", "col_2", "col_3", "col_4", "col_5", "col_6", "col_7", "col_8"))

bar = function(x) {
  mean(as.double(x))
}

df2 %>% transpose() %>% map_dbl(bar)

给出：

[1]  0.45085420  0.02639697 -0.28121651

但是我也可以通过引用各个列来做到这一点：

bar_2 = function(x) {
  x$col_2 + x$col_5 / x$col_3
}

df2 %>% transpose() %>% map_dbl(bar_2)

[1]  0.1347090 -1.2776983  0.8232767

我意识到使用mutate可以轻松实现这些特定示例，但有时需要进行真正的逐行迭代，我认为这样做效果很好。

R：purrr：使用pmap进行逐行操作，但这一次涉及列的LOTS

问题描述

3 个解决方案

解决方案1
3 已采纳 2019-08-05 17:50:38

解决方案2
1 2019-08-05 17:29:38

解决方案3
1 2019-08-05 18:46:31

R：purrr：使用pmap进行逐行操作，但这一次涉及列的LOTS

问题描述

3 个解决方案

解决方案1 3 已采纳 2019-08-05 17:50:38

解决方案2 1 2019-08-05 17:29:38

解决方案3 1 2019-08-05 18:46:31

解决方案1
3 已采纳 2019-08-05 17:50:38

解决方案2
1 2019-08-05 17:29:38

解决方案3
1 2019-08-05 18:46:31