[英]group_by() and summarise() by row
I have a data with several line ids per time and with -infinite values, and I would like to use the R packages dplyr and tidyverse to calculate the average number of -infinite per ID per time.我有一个数据,每次有多个行 ID 和 -infinite 值,我想使用 R 包 dplyr 和 tidyverse 来计算每个 ID 每次 -infinite 的平均数。
This is my data:这是我的数据:
dt <- data.frame(id = rep(1:3, each = 4),
time = rep(1:4, time=3),
x = c(1, 2, 1, -Inf, 2, -Inf,1, 1, 5, 1, 2, 1),
y = c(2, -Inf, -Inf, -Inf, -Inf, 5, -Inf, 2, 1, 2, 2, 2))
In the real data I have more than 100 columns but to simplify I put only x and y.在实际数据中,我有 100 多列,但为了简化,我只放了 x 和 y。
The expected result:预期结果:
id time n
2 1 2 0.5
3 1 3 0.5
4 1 4 1.0
5 2 1 0.5
6 2 2 0.5
7 2 3 0.5
The idea is to use some specific columns to generate a vector according to a specific calculation function. After searching I found the rowwise() function, but it did not help, Here is my attempt:想法是使用一些特定的列根据特定的计算生成向量 function。搜索后我找到了 rowwise() function,但它没有帮助,这是我的尝试:
dt %>%
group_by(id,time) %>%
summarise(n = across(x:y, ~mean(is.infinite(x) & x < 0, na.rm=TRUE)))
dt %>%
group_by(id,time) %>%
rowwise() %>%
summarise(n = across(everything(), ~mean(is.infinite(x) & x < 0, na.rm=TRUE)))
dt %>%
rowwise() %>%
summarise(n = across(everything(), ~mean(is.infinite(x) & x < 0, na.rm=TRUE)))
# same results:
`summarise()` has grouped output by 'id'. You can override using the `.groups` argument.
# A tibble: 12 x 3
# Groups: id [3]
id time n$x $y
<int> <int> <dbl> <dbl>
1 1 1 0 0
2 1 2 0 0
3 1 3 0 0
4 1 4 1 1
5 2 1 0 0
6 2 2 1 1
7 2 3 0 0
8 2 4 0 0
9 3 1 0 0
10 3 2 0 0
11 3 3 0 0
12 3 4 0 0
Could you help me to generate this vector n?你能帮我生成这个向量 n 吗?
I think I understand better what you're aiming to do here.我想我更了解你在这里的目的是什么。
across
isn't needed (as it's more for modifying columns in place).不需要
across
(因为它更多地用于修改列)。 Either rowwise
or group_by
would work: rowwise
或group_by
都可以:
library(dplyr)
dt <- data.frame(id = rep(1:3, each = 4),
time = rep(1:4, times = 3),
x = c(1, 2, 1, -Inf, 2, -Inf,1, 1, 5, 1, 2, 1),
y = c(2, -Inf, -Inf, -Inf, -Inf, 5, -Inf, 2, 1, 2, 2, 2))
dt %>%
group_by(id, time) %>%
summarise(n = mean(c(is.infinite(x), is.infinite(y)))) %>%
filter(n != 0)
#> `summarise()` has grouped output by 'id'. You can override using the `.groups`
#> argument.
#> # A tibble: 6 × 3
#> # Groups: id [2]
#> id time n
#> <int> <int> <dbl>
#> 1 1 2 0.5
#> 2 1 3 0.5
#> 3 1 4 1
#> 4 2 1 0.5
#> 5 2 2 0.5
#> 6 2 3 0.5
Here's a possible way of doing the calculation across any number of columns after grouping (by making a quick function to check the negative and the infinite value):这是在分组后跨任意数量的列进行计算的可能方法(通过快速 function 检查负值和无限值):
library(dplyr)
dt <- data.frame(id = rep(1:3, each = 4),
time = rep(1:4, times = 3),
x = c(1, 2, 1, -Inf, 2, -Inf,1, 1, 5, 1, 2, 1),
y = c(2, -Inf, -Inf, -Inf, -Inf, 5, -Inf, 2, 1, 2, 2, 2),
z = sample(c(1, 2, -Inf), 12, replace = TRUE))
is_minus_inf <- function(x) is.infinite(x) & x < 0
dt %>%
group_by(id, time) %>%
mutate(n = mean(is_minus_inf(c_across(everything()))))
#> # A tibble: 12 × 6
#> # Groups: id, time [12]
#> id time x y z n
#> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 2 2 0
#> 2 1 2 2 -Inf -Inf 0.667
#> 3 1 3 1 -Inf 2 0.333
#> 4 1 4 -Inf -Inf 1 0.667
#> 5 2 1 2 -Inf 1 0.333
#> 6 2 2 -Inf 5 2 0.333
#> 7 2 3 1 -Inf -Inf 0.667
#> 8 2 4 1 2 2 0
#> 9 3 1 5 1 1 0
#> 10 3 2 1 2 1 0
#> 11 3 3 2 2 2 0
#> 12 3 4 1 2 -Inf 0.333
(Or even simpler, use mutate(n = mean(c_across(everything()) == -Inf, na.rm = TRUE))
and no new checking function is needed) (或者更简单,使用
mutate(n = mean(c_across(everything()) == -Inf, na.rm = TRUE))
并且不需要新的检查 function)
How about this solution?这个解决方案怎么样? It looks like giving the desired output and is scalable.
它看起来像是提供所需的 output 并且是可扩展的。
First I "melt" the columns x and y and then just summarise over id and time:首先,我“融合”了 x 和 y 列,然后对 id 和时间进行了总结:
dt %>%
reshape2::melt(id = c("id", "time")) %>%
group_by(id, time) %>%
summarise(count_neg_inf = mean(value == -Inf, na.rm = TRUE))
regards,问候,
Samuel塞缪尔
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.