在所有变量中应用`dplyr :: rowwise`

Question

I have a data: 我有一个数据：

df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE)
)

The follow function work: 以下功能工作：

library(tidyverse)

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(c(x.1, x.3)))

But, the follows functions (for all variables) dooesn't work: 但是，以下函数（对于所有变量）不起作用：

with . 与. : ：

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(.))

with select_if : 使用select_if ：

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(select_if(., is.numeric)))

The both methods return: 这两种方法都返回：

Source: local data frame [30 x 5]
Groups: <by row>

# A tibble: 30 x 5
     x.1   x.2   x.3   x.4   var
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1  32.7  42.7  50.1  20.8 7091.
 2  75.9  71.3  83.6  77.6 7091.
 3  49.6  28.7  97.0  59.7 7091.
 4  47.4  96.1  31.9  79.7 7091.
 5  54.2  47.1  81.7  41.6 7091.
 6  27.9  58.1  97.4  25.9 7091.
 7  61.8  78.3  52.6  67.7 7091.
 8  85.4  51.3  38.8  82.0 7091.
 9  27.9  72.6  68.9  25.2 7091.
10  87.2  42.1  27.6  73.9 7091.
# ... with 20 more rows

Where 7091 is a incorrect sum. 7091是不正确的总和。

How adjustment this functions? 如何调整这个功能？

Answer 1

This can be done using purrr::pmap , which passes a list of arguments to a function that accepts "dots". 这可以使用purrr::pmap来完成，它将参数列表传递给接受“点”的函数。 Since most functions like mean , sd , etc. work with vectors, you need to pair the call with a domain lifter : 由于大多数函数（如mean ， sd等）都使用向量，因此需要将调用与域提升器配对：

df_1 %>% select(-y) %>% mutate( var = pmap(., lift_vd(mean)) )
#         x.1      x.2      x.3      x.4      var
# 1  70.12072 62.99024 54.00672 86.81358 68.48282
# 2  49.40462 47.00752 21.99248 78.87789 49.32063

df_1 %>% select(-y) %>% mutate( var = pmap(., lift_vd(sd)) )
#         x.1      x.2      x.3      x.4      var
# 1  70.12072 62.99024 54.00672 86.81358 13.88555
# 2  49.40462 47.00752 21.99248 78.87789 23.27958

The function sum accepts dots directly, so you don't need to lift its domain: 函数sum直接接受点，因此您无需提升其域：

df_1 %>% select(-y) %>% mutate( var = pmap(., sum) )
#         x.1      x.2      x.3      x.4      var
# 1  70.12072 62.99024 54.00672 86.81358 273.9313
# 2  49.40462 47.00752 21.99248 78.87789 197.2825

Everything conforms to the standard dplyr data processing, so all three can be combined as separate arguments to mutate : 一切都符合标准的dplyr数据处理，因此所有三个都可以组合为mutate单独参数：

df_1 %>% select(-y) %>% 
  mutate( v1 = pmap(., lift_vd(mean)),
          v2 = pmap(., lift_vd(sd)),
          v3 = pmap(., sum) )
#         x.1      x.2      x.3      x.4       v1       v2       v3
# 1  70.12072 62.99024 54.00672 86.81358 68.48282 13.88555 273.9313
# 2  49.40462 47.00752 21.99248 78.87789 49.32063 23.27958 197.2825

Answer 2

A few approaches I've taken in the past: 我过去采取的一些方法：

use a pre-existing row-wise function (eg rowSums ) 使用预先存在的行方式函数（例如rowSums ）
using reduce (which doesn't apply to all functions) 使用reduce （不适用于所有功能）
complicated transposing 复杂的转置
custom function with pmap 使用pmap自定义函数

Using pre-existing row-wise functions 使用预先存在的行方式功能

set.seed(1)
df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE)
)

library(tidyverse)

# rowSums
df_1 %>%
  mutate(var = rowSums(select(., -y))) %>%
  head()
#>        x.1      x.2      x.3      x.4 y      var
#> 1 41.24069 58.56641 93.03007 39.17035 3 232.0075
#> 2 49.76991 67.96527 43.48827 24.71475 2 185.9382
#> 3 65.82827 59.48330 56.72526 71.38306 2 253.4199
#> 4 92.65662 34.89741 46.59157 90.10154 1 264.2471
#> 5 36.13455 86.18987 72.06964 82.31317 3 276.7072
#> 6 91.87117 73.47734 40.64134 83.78471 2 289.7746

Using Reduce 使用Reduce

df_1 %>%
  mutate(var = reduce(select(., -y),`+`))  %>%
  head()
#>        x.1      x.2      x.3      x.4 y      var
#> 1 41.24069 58.56641 93.03007 39.17035 3 232.0075
#> 2 49.76991 67.96527 43.48827 24.71475 2 185.9382
#> 3 65.82827 59.48330 56.72526 71.38306 2 253.4199
#> 4 92.65662 34.89741 46.59157 90.10154 1 264.2471
#> 5 36.13455 86.18987 72.06964 82.31317 3 276.7072
#> 6 91.87117 73.47734 40.64134 83.78471 2 289.7746

ugly transposing and matrix / data.frame conversion 丑陋的转置和矩阵/ data.frame转换

df_1 %>%
  mutate(var = select(., -y) %>% as.matrix %>% t %>% as.data.frame %>% map_dbl(var)) %>%
  head()
#>        x.1      x.2      x.3      x.4 y       var
#> 1 41.24069 58.56641 93.03007 39.17035 3 620.95228
#> 2 49.76991 67.96527 43.48827 24.71475 2 318.37221
#> 3 65.82827 59.48330 56.72526 71.38306 2  43.17011
#> 4 92.65662 34.89741 46.59157 90.10154 1 878.50087
#> 5 36.13455 86.18987 72.06964 82.31317 3 520.72241
#> 6 91.87117 73.47734 40.64134 83.78471 2 506.16785

Custom function with `pmap` 使用`pmap`自定义功能

my_var <- function(...){
  vec <-  c(...)
  var(vec)
}

df_1 %>%
  mutate(var = select(., -y) %>% pmap(my_var)) %>%
  head()
#>        x.1      x.2      x.3      x.4 y      var
#> 1 41.24069 58.56641 93.03007 39.17035 3 620.9523
#> 2 49.76991 67.96527 43.48827 24.71475 2 318.3722
#> 3 65.82827 59.48330 56.72526 71.38306 2 43.17011
#> 4 92.65662 34.89741 46.59157 90.10154 1 878.5009
#> 5 36.13455 86.18987 72.06964 82.31317 3 520.7224
#> 6 91.87117 73.47734 40.64134 83.78471 2 506.1679

^{Created on 2019-04-30 by the reprex package (v0.2.1)} ^{由reprex包创建于2019-04-30（v0.2.1）}

Answer 3

I think this is tricky because the scoped variants of mutate ( mutate_at , mutate_all , mutate_if ) are generally aimed at executing a function on a specific column, instead of creating an operation that uses all columns. 我认为这很棘手，因为mutate（ mutate_at ， mutate_all ， mutate_if ）的范围变体通常旨在执行特定列上的函数，而不是创建使用所有列的操作。

The simplest solution I can come up with basically amounts to creating a vector ( cols ) that is then used to execute the summary operation: 我能提出的最简单的解决方案基本上相当于创建一个向量（ cols ）然后用于执行摘要操作：

library(dplyr)
library(purrr)

df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE)
)

# create vector of columns to operate on
cols <- names(df_1)
cols <- cols[map_lgl(df_1, is.numeric)]
cols <- cols[! cols %in% c("y")]

cols
#> [1] "x.1" "x.2" "x.3" "x.4"

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(
    var = sum(!!!map(cols, as.name), na.rm = TRUE)
  )
#> Source: local data frame [30 x 5]
#> Groups: <by row>
#> 
#> # A tibble: 30 x 5
#>      x.1   x.2   x.3   x.4   var
#>    <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  46.1  28.9  28.9  50.7  155.
#>  2  26.8  68.0  67.1  26.5  188.
#>  3  35.2  63.8  62.5  28.5  190.
#>  4  31.3  44.9  67.3  68.2  212.
#>  5  52.6  23.9  83.2  43.4  203.
#>  6  55.7  92.8  86.3  57.2  292.
#>  7  56.9  50.0  77.6  25.6  210.
#>  8  95.0  82.6  86.1  22.7  286.
#>  9  62.7  26.5  61.0  88.9  239.
#> 10  65.2  23.1  25.5  51.0  165.
#> # … with 20 more rows

^{Created on 2019-04-30 by the reprex package (v0.2.1)} ^{由reprex包创建于2019-04-30（v0.2.1）}

NOTE: if you are unfamiliar with purrr , you can also use something like lapply , etc. 注意：如果您不熟悉purrr ，您也可以使用lapply等。

You can read more about these types of more tricky dplyr operations ( !! , !!! , etc.) here: 你可以在这里阅读更多关于这些类型的更棘手的dplyr操作（ !! ， !!!等）：

https://dplyr.tidyverse.org/articles/programming.html https://dplyr.tidyverse.org/articles/programming.html

Answer 4

This is a tricky problem since dplyr operates column-wise for many operations. 这是一个棘手的问题，因为dplyr在许多操作中按列操作。 I originally used apply from base R to apply over rows, but apply is problematic when handling character and numeric types . 我原本是用来apply从基础R申请了行，但apply是处理字符和数字类型时存在问题。

Instead we can use (the aging) plyr and adply to do this simply, since plyr lets us treat a one-row data frame as a vector: 相反，我们可以使用（老化） plyr和adply来简单地执行此操作，因为plyr允许我们将单行数据帧视为向量：

df_1 %>% select(-y) %>% adply(1, function(df) c(v1 = sd(df[1, ])))

Note some functions like var won't work on a one-row data frame so we need to convert to vector using as.numeric . 注意像var这样的函数不能在as.numeric数据帧上工作，所以我们需要使用as.numeric转换为vector。

在所有变量中应用`dplyr :: rowwise`

问题描述

4 个解决方案

解决方案1
3 已采纳 2019-04-30 19:34:51

解决方案2
2 2019-04-30 17:51:08

Using pre-existing row-wise functions 使用预先存在的行方式功能

Using Reduce 使用Reduce

ugly transposing and matrix / data.frame conversion 丑陋的转置和矩阵/ data.frame转换

Custom function with `pmap` 使用`pmap`自定义功能

解决方案3
1 2019-04-30 17:21:17

解决方案4
1 2019-06-21 13:13:52

在所有变量中应用`dplyr :: rowwise`

问题描述

4 个解决方案

解决方案1 3 已采纳 2019-04-30 19:34:51

解决方案2 2 2019-04-30 17:51:08

Using pre-existing row-wise functions 使用预先存在的行方式功能

Using Reduce 使用Reduce

ugly transposing and matrix / data.frame conversion 丑陋的转置和矩阵/ data.frame转换

Custom function with pmap 使用pmap自定义功能

解决方案3 1 2019-04-30 17:21:17

解决方案4 1 2019-06-21 13:13:52

解决方案1
3 已采纳 2019-04-30 19:34:51

解决方案2
2 2019-04-30 17:51:08

Custom function with `pmap` 使用`pmap`自定义功能

解决方案3
1 2019-04-30 17:21:17

解决方案4
1 2019-06-21 13:13:52