简体   繁体   English

如何使用 R 中的 data.table 对多行、多列进行平均?

[英]How to average across several rows, for many columns, using data.table in R?

I have a dataset where pairs of rows can have the same value on variable X1.我有一个数据集,其中的行对在变量 X1 上可以具有相同的值。 I would like to average these paired rows' values in columns 2:40, into a new single row for each.我想将 2:40 列中这些成对行的值平均到每个新的单行中。 Is there an easy way to do this?是否有捷径可寻?

If it were just one column I was averaging I think I could do this:如果它只是我平均的一列,我想我可以这样做:

d[, X2 := X2, by = X1]    

But this becomes very tedious for multiple columns.但这对于多列来说变得非常乏味。 Is there a way to do this in data.table without having to type out X := X for each column?有没有办法在 data.table 中做到这一点,而不必为每列输入X := X

Edit:编辑:

Here is a reproducible example.这是一个可重现的示例。 I would essentially like to end up with ten rows, one for each value of "cat".我基本上想以十行结束,“cat”的每个值一行。 These rows would contain averages of x1, x2 and x3, for that level of "cat".对于该级别的“猫”,这些行将包含 x1、x2 和 x3 的平均值。

cat <- rep(1:10, times = 2)
x1 <- rnorm(20)
x2 <- rnorm(20)
x3 <- rnorm(20)

dat <- cbind(cat, x1, x2, x3)

dat <- as.data.frame(dat)

I'm not sure if this solution will suit, as you haven't provided a minimal reproducible example , but perhaps something like this?我不确定这个解决方案是否适合,因为你没有提供一个最小的可重现示例,但也许是这样的?

library(data.table)

df <- data.frame(X1 = rep(1:50, each = 2),
                 X2 = rep(x = 1:2, times = 50),
                 X3 = rep(x = 1:2, times = 50),
                 X4 = rep(x = 1:2, times = 50),
                 X5 = rep(x = 1:2, times = 50),
                 X6 = rep(x = 1:2, times = 50),
                 X7 = rep(x = 1:2, times = 50),
                 X8 = rep(x = 1:2, times = 50),
                 X9 = rep(x = 1:2, times = 50),
                 X10 = rep(x = 1:2, times = 50)
                 )
setDT(df)
head(df)
#>    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#> 1:  1  1  1  1  1  1  1  1  1   1
#> 2:  1  2  2  2  2  2  2  2  2   2
#> 3:  2  1  1  1  1  1  1  1  1   1
#> 4:  2  2  2  2  2  2  2  2  2   2
#> 5:  3  1  1  1  1  1  1  1  1   1
#> 6:  3  2  2  2  2  2  2  2  2   2

df2 <- df[ ,lapply(.SD, mean), by = X1, .SDcols = X2:X10]
head(df2)
#>    X1  X2  X3  X4  X5  X6  X7  X8  X9 X10
#> 1:  1 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 2:  2 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 3:  3 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 4:  4 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 5:  5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 6:  6 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5

Created on 2021-07-16 by the reprex package (v2.0.0)reprex 包( v2.0.0 ) 于 2021 年 7 月 16 日创建

-- ——

Or maybe this?或者这个?

library(data.table)

df <- data.frame(X1 = 1:100,
                 X2 = rep(x = 1:2, times = 50),
                 X3 = rep(x = 1:2, times = 50),
                 X4 = rep(x = 1:2, times = 50),
                 X5 = rep(x = 1:2, times = 50),
                 X6 = rep(x = 1:2, times = 50),
                 X7 = rep(x = 1:2, times = 50),
                 X8 = rep(x = 1:2, times = 50),
                 X9 = rep(x = 1:2, times = 50),
                 X10 = rep(x = 1:2, times = 50)
                 )
setDT(df)
head(df)
#>    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#> 1:  1  1  1  1  1  1  1  1  1   1
#> 2:  2  2  2  2  2  2  2  2  2   2
#> 3:  3  1  1  1  1  1  1  1  1   1
#> 4:  4  2  2  2  2  2  2  2  2   2
#> 5:  5  1  1  1  1  1  1  1  1   1
#> 6:  6  2  2  2  2  2  2  2  2   2

df2 <- df[, lapply(.SD, mean, na.rm=TRUE), X1-0:1]
head(df2)
#>    X1  X2  X3  X4  X5  X6  X7  X8  X9 X10
#> 1:  1 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 2:  3 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 3:  5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 4:  7 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 5:  9 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 6: 11 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5

Created on 2021-07-16 by the reprex package (v2.0.0)reprex 包( v2.0.0 ) 于 2021 年 7 月 16 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM