[英]R loop for replacing NA's in multiple columns with mean of selected columns
我有一个包含 96 个不同变量的数据表,包括每周参加 17 个不同周的 NFL 比赛。
df 的 colnames 看起来像这样:
colnames(df)
[1] "NFL_team_name" "year" "season_performance" "margin_of_victory" "strength_of_schedule"
[6] "simple_rating" "offensive_ranking" "defensive_ranking" "playoffs" "sb_winner"
[11] "price" "weekly_attendance.1" "day.1" "time.1" "home_ind.1"
[16] "winner.1" "weekly_attendance.2" "day.2" "time.2" "home_ind.2"
[21] "winner.2" "weekly_attendance.4" "day.4" "time.4" "home_ind.4"
[26] "winner.4" "weekly_attendance.5" "day.5" "time.5" "home_ind.5"
[31] "winner.5" "weekly_attendance.6" "day.6" "time.6" "home_ind.6"
等等..
一些每周出勤列有 NA,我想根据行号输入每周出勤列的 rest 的平均值。 每周出勤列为 12、17、22、27... 如下所示。 我试过类似下面的东西,但我真的不知道如何让它工作:
每周 att. 的所有行:
mean(df[1,c(12,17,22,27,32,37,42,47,52,57,62,67,72,77,82,87,92)])
这意味着每行(团队和年份)的每周出勤列:
rowmeans <- as.data.table(rowMeans(df[,c(12,17,22,27,32,37,42,47,52,57,62,67,72,77,82,87,92)], na.rm = T))
使用 rowmeans 替换 na 的(类似这样的东西):
for (i in 1:nrow(df)) {
if (is.na(df[i,])) {
df[i,] <- rowmeans[i,]
}
else
next
}
所以我想要的是根据每一行中每周出勤列的平均值在每一行中填写 NA。
希望它有意义,并且你们中的一些人可以告诉我缺少什么。
没有Minimal Reproducible Example很难确定,但你可以尝试这样的事情:
# find column names with weekly attendance figures
wa_cols <- grep("weekly_attendance", colnames(df), value=TRUE)
# calculate the mean for each row for just those columns
wa_mean <- rowMeans(df[, wa_cols], na.rm=TRUE)
# loop over weekly attendance columns, filling in if missing
for (x in wa_cols) {
df[[x]] <- ifelse(is.na(df[[x]]), wa_mean, df[[x]])
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.