简体   繁体   English

R 用于用所选列的平均值替换多列中的 NA 的循环

[英]R loop for replacing NA's in multiple columns with mean of selected columns

I have a datatable with 96 different variables, including weekly attendance at NFL games of 17 different weeks.我有一个包含 96 个不同变量的数据表,包括每周参加 17 个不同周的 NFL 比赛。

The colnames of df looks like this: df 的 colnames 看起来像这样:

colnames(df)
 [1] "NFL_team_name"        "year"                 "season_performance"   "margin_of_victory"    "strength_of_schedule"
 [6] "simple_rating"        "offensive_ranking"    "defensive_ranking"    "playoffs"             "sb_winner"           
[11] "price"                "weekly_attendance.1"  "day.1"                "time.1"               "home_ind.1"          
[16] "winner.1"             "weekly_attendance.2"  "day.2"                "time.2"               "home_ind.2"          
[21] "winner.2"             "weekly_attendance.4"  "day.4"                "time.4"               "home_ind.4"          
[26] "winner.4"             "weekly_attendance.5"  "day.5"                "time.5"               "home_ind.5"          
[31] "winner.5"             "weekly_attendance.6"  "day.6"                "time.6"               "home_ind.6"  

and so on..等等..

Some of the weekly attendance columns have NA's and there I want to put in the mean of the rest of the weekly attendance columns, based on the row number.一些每周出勤列有 NA,我想根据行号输入每周出勤列的 rest 的平均值。 The weekly attendance columns is 12,17,22,27... as seen below.每周出勤列为 12、17、22、27... 如下所示。 I have tried something like the following, but I don't really know how to get it to work:我试过类似下面的东西,但我真的不知道如何让它工作:

all rows with weekly att.:每周 att. 的所有行:

mean(df[1,c(12,17,22,27,32,37,42,47,52,57,62,67,72,77,82,87,92)])

the means the weekly attendance columns of each row (team & year):这意味着每行(团队和年份)的每周出勤列:

rowmeans <- as.data.table(rowMeans(df[,c(12,17,22,27,32,37,42,47,52,57,62,67,72,77,82,87,92)], na.rm = T))

use the rowmeans to replace na's (SOMETHING LIKE THIS):使用 rowmeans 替换 na 的(类似这样的东西):

for (i in 1:nrow(df)) {
  if (is.na(df[i,])) {
    df[i,] <- rowmeans[i,]
  }
  else
    next
}

So what I want, is to fill in the NA's in each row, based on the mean of the weekly attendance columns in each row.所以我想要的是根据每一行中每周出勤列的平均值在每一行中填写 NA。

Hope it makes sense, and that some of you can tell me what is missing.希望它有意义,并且你们中的一些人可以告诉我缺少什么。

Hard to be certain without a Minimal Reproducible Example , but you might try something like this:没有Minimal Reproducible Example很难确定,但你可以尝试这样的事情:

# find column names with weekly attendance figures
wa_cols <- grep("weekly_attendance", colnames(df), value=TRUE)

# calculate the mean for each row for just those columns
wa_mean <- rowMeans(df[, wa_cols], na.rm=TRUE)

# loop over weekly attendance columns, filling in if missing
for (x in wa_cols) {
  df[[x]] <- ifelse(is.na(df[[x]]), wa_mean, df[[x]])
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM