dplyr::mutate 当自定义 function 返回一个向量

Question

我正在尝试使用返回向量的自定义 function 使用dplyr::mutate到group_by数据并创建新列，并且 ZC1C425268E68385D1AB5074C17A94F14 需要很长时间才能引导。

我知道这可以在基础 R 中实现，但是在 dplyr 中是否有更优雅的方式。

示例（丢弃）：

iris %>% 
  group_by(Species) %>% 
  mutate(t1 = f(iris$Sepal.Length)[1], t2 = f(iris$Sepal.Length)[2])

f <- function(x) {
  return(c(2*x, x+1))
}

是否可以创建两列，每组只调用一次 function？

我在前面的例子中犯了一个错误。请检查这个例子：

例子：

f <- function(x) {
  return(c(x*2, x+1))
}

iris %>% 
  group_by(Species) %>% 
  
  group_modify(~ {
    .x %>% 
      mutate(t1 := f(mean(.x$Sepal.Length))[1], t2 := f(mean(.x$Sepal.Length))[2])
  })

方法一：

感谢 Darren Tsai 的回答！ 在新示例中使用unnest_wider解决了该问题：

library(dplyr)
library(tidyr)

iris %>% 
  group_by(Species) %>% 
  group_modify(~ {
    .x %>% 
      mutate(t = list(f(mean(.x$Sepal.Length)))) %>% 
      unnest_wider(t, names_sep = "")
  })

# A tibble: 150 × 7
# Groups:   Species [3]
   Species Sepal.Length Sepal.Width Petal.Length Petal.Width    t1    t2
   <fct>          <dbl>       <dbl>        <dbl>       <dbl> <dbl> <dbl>
 1 setosa           5.1         3.5          1.4         0.2  10.0  6.01
 2 setosa           4.9         3            1.4         0.2  10.0  6.01
 3 setosa           4.7         3.2          1.3         0.2  10.0  6.01
 4 setosa           4.6         3.1          1.5         0.2  10.0  6.01
 5 setosa           5           3.6          1.4         0.2  10.0  6.01
 6 setosa           5.4         3.9          1.7         0.4  10.0  6.01
 7 setosa           4.6         3.4          1.4         0.3  10.0  6.01
 8 setosa           5           3.4          1.5         0.2  10.0  6.01
 9 setosa           4.4         2.9          1.4         0.2  10.0  6.01
10 setosa           4.9         3.1          1.5         0.1  10.0  6.01
# … with 140 more rows
# ℹ Use `print(n = ...)` to see more rows

方法二：

感谢康拉德鲁道夫的建议！ 这个问题的更灵活的方法！

to_tibble <- function (x, colnames) {
  x %>%
    matrix(ncol = length(colnames), dimnames = list(NULL, colnames)) %>%
    as_tibble()
}
iris %>%
  group_by(Species) %>%
  mutate(to_tibble(f(mean(Sepal.Length)), c("t1", "t2")))

Answer 1

您的代码的问题在于它将向量传递给f ，因此结果可能不是您所期望的：

f(1 : 5)
# [1]  2  4  6  8 10  2  3  4  5  6

您的调用代码将不得不解开它。

您可以这样做，例如使用以下帮助程序：

to_tibble <- function (x, colnames) {
    x %>%
        matrix(ncol = length(colnames), dimnames = list(NULL, colnames)) %>%
        as_tibble()
}

有了它，您现在可以在mutate中调用f并提供目标列名称：

iris %>%
    group_by(Species) %>%
    mutate(to_tibble(f(Sepal.Length), c("t1", "t2"))

这种方法的优点是它简化了调用代码并利用mutate的内置支持来生成多列——无需手动取消嵌套。

关于您更新的代码/要求，您也可以使用帮助程序 function 来简化它：

iris %>%
    group_by(Species) %>%
    mutate(to_tibble(f(mean(Sepal.Length)), c("t1", "t2")))

Answer 2

您可以将变异值存储在列表中，并使用unnest_wider中的tidyr将它们取消嵌套到多个列中。

library(dplyr)
library(tidyr)

iris %>% 
  group_by(Species) %>% 
  mutate(t = list(f(mean(Sepal.Length)))) %>%
  unnest_wider(t, names_sep = "")

# A tibble: 150 × 7
# Groups:   Species [3]
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species       t1    t2
           <dbl>       <dbl>        <dbl>       <dbl> <fct>      <dbl> <dbl>
  1          5.1         3.5          1.4         0.2 setosa      10.0  6.01
  2          4.9         3            1.4         0.2 setosa      10.0  6.01
  3          4.7         3.2          1.3         0.2 setosa      10.0  6.01

Answer 3

我没有足够的声誉来评论这个data.table解决方案，但是使用data.table您可以执行以下操作：

library(data.table)
setDT(iris)

ff <- function(x,y) {
  return(list(2*x, x+1))
}

iris[, c("t1","t2") := ff(Sepal.Length), by = "Species"]

如果有更多声誉的人可以对此发表评论，将不胜感激。

dplyr::mutate 当自定义 function 返回一个向量

问题描述

3 个解决方案

解决方案1
4 已采纳 2022-08-18 07:40:06

解决方案2
3 2022-08-18 07:17:43

解决方案3
2 2022-08-18 07:14:58

dplyr::mutate 当自定义 function 返回一个向量

问题描述

3 个解决方案

解决方案1 4 已采纳 2022-08-18 07:40:06

解决方案2 3 2022-08-18 07:17:43

解决方案3 2 2022-08-18 07:14:58

解决方案1
4 已采纳 2022-08-18 07:40:06

解决方案2
3 2022-08-18 07:17:43

解决方案3
2 2022-08-18 07:14:58