繁体   English   中英

如何计算多列的单个平均值?

[英]How to compute a single mean of multiple columns?

我有一个包含 4 列和 8 个观察值的数据库:

在此处输入图片说明

> df1
  Rater1 Rater2 Rater4 Rater5
1      3      3      3      3
2      3      3      2      3
3      3      3      2      2
4      0      0      1      0
5      0      0      0      0
6      0      0      0      0
7      0      0      1      0
8      0      0      0      0

我想获得所有 Rater1 和 Rater4 观察值(16)以及所有 Rater2 和 Rater5 观察值(16)的均值、中值、iqr、sd,而无需创建具有 2 个变量的新 df,如下所示:

> df2
   var1 var2
1     3    3
2     3    3
3     3    3
4     0    0
5     0    0
6     0    0
7     0    0
8     0    0
9     3    3
10    2    3
11    2    2
12    1    0
13    0    0
14    0    0
15    1    0
16    0    0

我想获得这个(没有新的数据库,只是在第一个数据库上工作):

> stat.desc(df2)
                   var1       var2
nbr.val      16.0000000 16.0000000
nbr.null      8.0000000 10.0000000
nbr.na        0.0000000  0.0000000
min           0.0000000  0.0000000
max           3.0000000  3.0000000
range         3.0000000  3.0000000
sum          18.0000000 17.0000000
median        0.5000000  0.0000000
mean          1.1250000  1.0625000
SE.mean       0.3275541  0.3590352
CI.mean.0.95  0.6981650  0.7652653
var           1.7166667  2.0625000
std.dev       1.3102163  1.4361407
coef.var      1.1646367  1.3516618

我怎样才能在 R 中做到这一点?

先感谢您

我们可以遍历相似的列名,转换为vector并得到meanmedianIQRsd

out <- do.call(rbind, Map(function(x, y) {v1 <- c(df1[[x]], df1[[y]])
          data.frame(Mean = mean(v1), Median = median(v1),
           IQR = IQR(v1), SD = sd(v1))}, names(df1)[1:2], names(df1)[3:4]))



row.names(out) <- paste(names(df1)[1:2], names(df1)[3:4], sep="_")
out
#                Mean Median  IQR       SD
#Rater1_Rater4 1.1250    0.5 2.25 1.310216
#Rater2_Rater5 1.0625    0.0 3.00 1.436141

数据

df1 <- structure(list(Rater1 = c(3, 3, 3, 0, 0, 0, 0, 0), Rater2 = c(3, 
3, 3, 0, 0, 0, 0, 0), Rater4 = c(3, 2, 2, 1, 0, 0, 1, 0), Rater5 = c(3, 
3, 2, 0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA, 
-8L))

一种可能的base方法:

df <- data.frame(                     # construct your original dataframe
  Rater1 = c(3, 3, 3, 0, 0, 0, 0, 0),
  Rater2 = c(3, 3, 3, 0, 0, 0, 0, 0),
  Rater4 = c(3, 2, 2, 1, 0, 0, 1, 0),
  Rater5 = c(3, 3, 2, 0, 0, 0, 0, 0)
)

combined <- data.frame(               # make a new dataframe with your desired variables
  R14 = with(df, c(Rater1, Rater4)),  
  R25 = with(df, c(Rater2, Rater5))  
)

sapply(combined, mean)                # compute mean of each column
sapply(combined, median)              # median
sapply(combined, sd)                  # standard deviation
sapply(combined, IQR)                 # interquartile range

另一种解决方案,使用for循环一次性计算统计数据:首先,为要合并的评分者创建向量:

# Raters 2 and 4:
r24 <- as.integer(unlist(df1[,c("Rater2", "Rater4")]))
# Raters 1 and 5:
r15 <- as.integer(unlist(df1[,c("Rater1","Rater5")]))

将这些向量组合在一个数据框中:

df <- data.frame(r15, r24)

并计算统计:

for(i in 1:ncol(df)){
  print(c(mean(df[,i]), IQR(df[,i]), median(df[,i]), sd(df[,i])))
}
[1] 1.062500 3.000000 0.000000 1.436141
[1] 1.125000 2.250000 0.500000 1.310216

一个tidyverse / dplyr解决方案。

library(dplyr)

bind_rows(select(df, r12 = Rater1, r45 = Rater4),
          select(df, r12 = Rater2, r45 = Rater5)) %>%
  summarise_all(list(
    mean = mean,
    median = median,
    sd = sd,
    iqr = IQR
  ))
#>   r12_mean r45_mean r12_median r45_median r12_sd   r45_sd r12_iqr r45_iqr
#> 1    1.125   1.0625          0        0.5    1.5 1.236595       3       2

如果您希望输出与问题中的输出类似,请使用t()转置结果。

t(.Last.value)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM