[英]Conditional Statistics in R (dplyr solution preferred)
I have the following data frame:我有以下数据框:
df <- data.frame("num1" = 1:3, "num2" = 4:6, "num3" = c(NA, 10, 12), stringsAsFactors = FALSE)
num1 num2 num3
1 4 NA
2 5 10
3 6 12
Is there a way to generate a summary table using the mean for every column conditionally?有没有办法有条件地使用每列的平均值生成汇总表? To elaborate, if a column in the data frame contains a null value, then
na.exclude
that null and compute the mean: (10 + 12) /2 = 11. If a column does not have any null values, then just compute the mean: eg (1 + 2 + 3) / 3 = 2 for num1
column.详细地说,如果数据框中的一列包含 null 值,则不
na.exclude
该 null 并计算平均值:(10 + 12) /2 = 11。如果一列没有任何 Z37A6259CC0C1DAE2997 值,则计算意思是:例如 (1 + 2 + 3) / 3 = 2 用于num1
列。
Desired output:所需的 output:
mean_num1 mean_num2 mean_num3
2 5 11
You could loop over all columns with an "apply" function, with the mean() function, with the "na.rm=TRUE" argument.您可以使用“应用”function、mean() function 和“na.rm=TRUE”参数遍历所有列。
Something like就像是
sapply(df, mean, na.rm=TRUE)
num1 num2 num3
2 5 11
Then you could rename the vector as you please:然后你可以随意重命名向量:
names(mean_col)<-paste0('mean_', names(mean_col))
mean_num1 mean_num2 mean_num3
2 5 11
With dplyr:使用 dplyr:
df%>%summarize(across(everything(), mean, na.rm=TRUE, .names = "mean_{col}"))
Edit编辑
Or the simplest of all, with colMeans()
:或者最简单的,使用
colMeans()
:
colMeans(df, na.rm=TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.