[英]dplyr: Why do some operations work "rowwise" without calling rowwise() and others dont?
I am still trying to figure out, how rowwise
works exactly in R/dplyr.我仍在尝试弄清楚
rowwise
在 R/dplyr 中的工作原理。
For example I have this code:例如我有这段代码:
library(dplyr)
df = data.frame(
group = c("a", "a", "a", "b", "b", "c"),
var1 = 1:6,
var2 = 7:12
)
df %>%
mutate(
concatNotRW = paste0(var1, "-", group), # work on rows
meanNotRW = mean(c(var1, var2)), # works not on rows
charsNotRW = strsplit(concatNotRW, "-") # works on rows
) %>%
rowwise() %>%
mutate(
concatRW = paste0(var1, "-", group), # all work on rows
meanRW = mean(c(var1, var2)),
charsRW = strsplit(concatRW, "-")
) -> res
The res
dataframe looks like this: res
dataframe 看起来像这样:
group var1 var2 concatNotRW meanNotRW charsNotRW concatRW meanRW chars
<chr> <int> <int> <chr> <dbl> <list> <chr> <dbl> <list>
1 a 1 7 1-a 6.5 <chr [2]> 1-a 4 <chr [2]>
2 a 2 8 2-a 6.5 <chr [2]> 2-a 5 <chr [2]>
3 a 3 9 3-a 6.5 <chr [2]> 3-a 6 <chr [2]>
4 b 4 10 4-b 6.5 <chr [2]> 4-b 7 <chr [2]>
5 b 5 11 5-b 6.5 <chr [2]> 5-b 8 <chr [2]>
6 c 6 12 6-c 6.5 <chr [2]> 6-c 9 <chr [2]>
What I do not understand is why paste0
can take each cell of a row and pastes them together (essentially performing a rowwise-operation), yet mean
can't do that.我不明白的是为什么
paste0
可以获取一行中的每个单元格并将它们粘贴在一起(本质上执行逐行操作),但mean
不能那样做。 What am I missing and are there any rules on what already works rowwise without the call to rowwise()
?我错过了什么,是否有任何规则可以在不调用
rowwise()
的情况下按行进行? I did not find so much info in the rowwise()-vi.nette here https://dplyr.tidyverse.org/articles/rowwise.html我没有在 rowwise()-vi.nette 中找到这么多信息https://dplyr.tidyverse.org/articles/rowwise.html
paste
can take vectors as input in the variadic argument ( ...
) and return the same length as vector whereas mean
takes the variadic argument for other inputs ( trim
etc) and return a single value. paste
可以将向量作为可变参数 ( ...
) 的输入并返回与向量相同的长度,而mean
将可变参数用于其他输入 ( trim
等) 并返回单个值。 Here we need rowMeans
.这里我们需要
rowMeans
。 Regarding strsplit
, it returns a list
of split elements关于
strsplit
,它返回一个拆分元素list
library(dplyr)
df %>%
mutate(
concatNotRW = paste0(var1, "-", group),
meanNotRW = rowMeans(across(c(var1, var2))),
charsNotRW = strsplit(concatNotRW, "-")
)
> mean(c(1:5, 6:10))
[1] 5.5
Note that the vector we are passing is a single vector by c
oncatenating both vectors 1:5 and 6:10请注意,我们传递的向量是单个向量,通过
c
连接两个向量 1:5 和 6:10
whereas然而
> paste(1:5, 6:10)
[1] "1 6" "2 7" "3 8" "4 9" "5 10"
are two vectors passed into paste是传递到 paste 中的两个向量
For splitting the column into two columns, we can use separate
为了将列拆分为两列,我们可以使用
separate
library(tidyr)
df %>%
mutate(
concatNotRW = paste0(var1, "-", group),
meanNotRW = rowMeans(across(c(var1, var2)))) %>%
separate(concatNotRW, into = c("ind", "chars"))
group var1 var2 ind chars meanNotRW
1 a 1 7 1 a 4
2 a 2 8 2 a 5
3 a 3 9 3 a 6
4 b 4 10 4 b 7
5 b 5 11 5 b 8
6 c 6 12 6 c 9
Why some operations work on rowwise
depends on the function. If the function is vectorized, it works on the whole column and doesn't need rowwise
.为什么某些操作在
rowwise
上起作用取决于 function。如果 function 被矢量化,它在整个列上起作用并且不需要rowwise
。 Here, both functions paste
and mean
are vectorized except that paste
is vectorized for variadic input and mean
is only vectorized to take a single vector and return a single value as output. Suppose, we have a function that checks each value with if/else
, then it is not vectorized as if/else
expects a single logical value.在这里,函数
paste
和mean
都是矢量化的,除了paste
是针对可变输入进行矢量化的,而mean
只是矢量化为采用单个矢量并返回单个值 output。假设,我们有一个 function 用if/else
检查每个值,那么它不会像if/else
期望单个逻辑值那样被矢量化。 In that case, can use either rowwise
or Vectorize
the function在这种情况下,可以使用
rowwise
或Vectorize
function
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.