![](/img/trans.png)
[英]In R, how do you compute multiple mean scores based on partial variable names in a single function / loop?
[英]How to compute a single mean of multiple columns?
我有一个包含 4 列和 8 个观察值的数据库:
> df1
Rater1 Rater2 Rater4 Rater5
1 3 3 3 3
2 3 3 2 3
3 3 3 2 2
4 0 0 1 0
5 0 0 0 0
6 0 0 0 0
7 0 0 1 0
8 0 0 0 0
我想获得所有 Rater1 和 Rater4 观察值(16)以及所有 Rater2 和 Rater5 观察值(16)的均值、中值、iqr、sd,而无需创建具有 2 个变量的新 df,如下所示:
> df2
var1 var2
1 3 3
2 3 3
3 3 3
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 3 3
10 2 3
11 2 2
12 1 0
13 0 0
14 0 0
15 1 0
16 0 0
我想获得这个(没有新的数据库,只是在第一个数据库上工作):
> stat.desc(df2)
var1 var2
nbr.val 16.0000000 16.0000000
nbr.null 8.0000000 10.0000000
nbr.na 0.0000000 0.0000000
min 0.0000000 0.0000000
max 3.0000000 3.0000000
range 3.0000000 3.0000000
sum 18.0000000 17.0000000
median 0.5000000 0.0000000
mean 1.1250000 1.0625000
SE.mean 0.3275541 0.3590352
CI.mean.0.95 0.6981650 0.7652653
var 1.7166667 2.0625000
std.dev 1.3102163 1.4361407
coef.var 1.1646367 1.3516618
我怎样才能在 R 中做到这一点?
先感谢您
我们可以遍历相似的列名,转换为vector
并得到mean
、 median
、 IQR
和sd
out <- do.call(rbind, Map(function(x, y) {v1 <- c(df1[[x]], df1[[y]])
data.frame(Mean = mean(v1), Median = median(v1),
IQR = IQR(v1), SD = sd(v1))}, names(df1)[1:2], names(df1)[3:4]))
row.names(out) <- paste(names(df1)[1:2], names(df1)[3:4], sep="_")
out
# Mean Median IQR SD
#Rater1_Rater4 1.1250 0.5 2.25 1.310216
#Rater2_Rater5 1.0625 0.0 3.00 1.436141
df1 <- structure(list(Rater1 = c(3, 3, 3, 0, 0, 0, 0, 0), Rater2 = c(3,
3, 3, 0, 0, 0, 0, 0), Rater4 = c(3, 2, 2, 1, 0, 0, 1, 0), Rater5 = c(3,
3, 2, 0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA,
-8L))
一种可能的base
方法:
df <- data.frame( # construct your original dataframe
Rater1 = c(3, 3, 3, 0, 0, 0, 0, 0),
Rater2 = c(3, 3, 3, 0, 0, 0, 0, 0),
Rater4 = c(3, 2, 2, 1, 0, 0, 1, 0),
Rater5 = c(3, 3, 2, 0, 0, 0, 0, 0)
)
combined <- data.frame( # make a new dataframe with your desired variables
R14 = with(df, c(Rater1, Rater4)),
R25 = with(df, c(Rater2, Rater5))
)
sapply(combined, mean) # compute mean of each column
sapply(combined, median) # median
sapply(combined, sd) # standard deviation
sapply(combined, IQR) # interquartile range
另一种解决方案,使用for
循环一次性计算统计数据:首先,为要合并的评分者创建向量:
# Raters 2 and 4:
r24 <- as.integer(unlist(df1[,c("Rater2", "Rater4")]))
# Raters 1 and 5:
r15 <- as.integer(unlist(df1[,c("Rater1","Rater5")]))
将这些向量组合在一个数据框中:
df <- data.frame(r15, r24)
并计算统计:
for(i in 1:ncol(df)){
print(c(mean(df[,i]), IQR(df[,i]), median(df[,i]), sd(df[,i])))
}
[1] 1.062500 3.000000 0.000000 1.436141
[1] 1.125000 2.250000 0.500000 1.310216
一个tidyverse
/ dplyr
解决方案。
library(dplyr)
bind_rows(select(df, r12 = Rater1, r45 = Rater4),
select(df, r12 = Rater2, r45 = Rater5)) %>%
summarise_all(list(
mean = mean,
median = median,
sd = sd,
iqr = IQR
))
#> r12_mean r45_mean r12_median r45_median r12_sd r45_sd r12_iqr r45_iqr
#> 1 1.125 1.0625 0 0.5 1.5 1.236595 3 2
如果您希望输出与问题中的输出类似,请使用t()
转置结果。
t(.Last.value)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.