简体   繁体   中英

Fill missing values in dataframe columns with column median in R

I have a dataframe with some columns of type "factor" and others "numeric". There are no missing values in any of the "factor" columns.

I am trying to replace missing values in each column with column median using the following code:

for(i in 1:ncol(df3)){
  df3[is.na(df3[,i]), i] <- median(df3[,i], na.rm = TRUE)
}

However I am getting the error:

Error in median.default(df3[, i], na.rm = TRUE) : need numeric data

I am sure that there are missing values only in numeric column, why am I getting this error?

More importantly, how do I fill missing values in each column with respective column medians?

Even if df3[is.na(df3[, i]), i] has zero rows, R still needs to calculate the RHS median(df3[,i], na.rm = TRUE) . You could add a check to only replace missing values in numeric columns:

for(i in seq_along(df3)) {
  if (is.numeric(df3[, i])) {
    df3[is.na(df3[, i]), i] <- median(df3[, i], na.rm = TRUE)
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM