简体   繁体   English

dplyr:为什么有些操作在不调用 rowwise() 的情况下“按行”工作,而其他操作却不行?

[英]dplyr: Why do some operations work "rowwise" without calling rowwise() and others dont?

I am still trying to figure out, how rowwise works exactly in R/dplyr.我仍在尝试弄清楚rowwise在 R/dplyr 中的工作原理。

For example I have this code:例如我有这段代码:

library(dplyr)
df = data.frame(
  group = c("a", "a", "a", "b", "b", "c"),
  var1 = 1:6,
  var2 = 7:12
)

df %>%
  mutate(
    concatNotRW = paste0(var1, "-", group), # work on rows
    meanNotRW = mean(c(var1, var2)), # works not on rows
    charsNotRW = strsplit(concatNotRW, "-") # works on rows
  ) %>%
  rowwise() %>%
  mutate(
    concatRW = paste0(var1, "-", group), # all work on rows
    meanRW = mean(c(var1, var2)),
    charsRW = strsplit(concatRW, "-")
  ) -> res

The res dataframe looks like this: res dataframe 看起来像这样:

  group  var1  var2 concatNotRW meanNotRW charsNotRW concatRW meanRW chars    
  <chr> <int> <int> <chr>           <dbl> <list>     <chr>     <dbl> <list>   
1 a         1     7 1-a               6.5 <chr [2]>  1-a           4 <chr [2]>
2 a         2     8 2-a               6.5 <chr [2]>  2-a           5 <chr [2]>
3 a         3     9 3-a               6.5 <chr [2]>  3-a           6 <chr [2]>
4 b         4    10 4-b               6.5 <chr [2]>  4-b           7 <chr [2]>
5 b         5    11 5-b               6.5 <chr [2]>  5-b           8 <chr [2]>
6 c         6    12 6-c               6.5 <chr [2]>  6-c           9 <chr [2]>

What I do not understand is why paste0 can take each cell of a row and pastes them together (essentially performing a rowwise-operation), yet mean can't do that.我不明白的是为什么paste0可以获取一行中的每个单元格并将它们粘贴在一起(本质上执行逐行操作),但mean不能那样做。 What am I missing and are there any rules on what already works rowwise without the call to rowwise() ?我错过了什么,是否有任何规则可以在不调用rowwise()的情况下按行进行? I did not find so much info in the rowwise()-vi.nette here https://dplyr.tidyverse.org/articles/rowwise.html我没有在 rowwise()-vi.nette 中找到这么多信息https://dplyr.tidyverse.org/articles/rowwise.html

paste can take vectors as input in the variadic argument ( ... ) and return the same length as vector whereas mean takes the variadic argument for other inputs ( trim etc) and return a single value. paste可以将向量作为可变参数 ( ... ) 的输入并返回与向量相同的长度,而mean将可变参数用于其他输入 ( trim等) 并返回单个值。 Here we need rowMeans .这里我们需要rowMeans Regarding strsplit , it returns a list of split elements关于strsplit ,它返回一个拆分元素list

library(dplyr)
df %>%
  mutate(
    concatNotRW = paste0(var1, "-", group),
    meanNotRW = rowMeans(across(c(var1, var2))),
    charsNotRW = strsplit(concatNotRW, "-") 
  )

> mean(c(1:5, 6:10))
[1] 5.5

Note that the vector we are passing is a single vector by c oncatenating both vectors 1:5 and 6:10请注意,我们传递的向量是单个向量,通过c连接两个向量 1:5 和 6:10

whereas然而

> paste(1:5, 6:10)
[1] "1 6"  "2 7"  "3 8"  "4 9"  "5 10"

are two vectors passed into paste是传递到 paste 中的两个向量


For splitting the column into two columns, we can use separate为了将列拆分为两列,我们可以使用separate

library(tidyr)
 df %>%
  mutate(
    concatNotRW = paste0(var1, "-", group),
    meanNotRW = rowMeans(across(c(var1, var2)))) %>% 
    separate(concatNotRW, into = c("ind", "chars"))
 group var1 var2 ind chars meanNotRW
1     a    1    7   1     a         4
2     a    2    8   2     a         5
3     a    3    9   3     a         6
4     b    4   10   4     b         7
5     b    5   11   5     b         8
6     c    6   12   6     c         9

Why some operations work on rowwise depends on the function. If the function is vectorized, it works on the whole column and doesn't need rowwise .为什么某些操作在rowwise上起作用取决于 function。如果 function 被矢量化,它在整个列上起作用并且不需要rowwise Here, both functions paste and mean are vectorized except that paste is vectorized for variadic input and mean is only vectorized to take a single vector and return a single value as output. Suppose, we have a function that checks each value with if/else , then it is not vectorized as if/else expects a single logical value.在这里,函数pastemean都是矢量化的,除了paste是针对可变输入进行矢量化的,而mean只是矢量化为采用单个矢量并返回单个值 output。假设,我们有一个 function 用if/else检查每个值,那么它不会像if/else期望单个逻辑值那样被矢量化。 In that case, can use either rowwise or Vectorize the function在这种情况下,可以使用rowwiseVectorize function

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM