简体   繁体   中英

comparing multiple columns including NAs in a dataframe R

I have a dataframe include 1,2 and bunch of NAs

I would like to compare these columns and save the results in a new column (let's say F) so that: if in each row all values are 1 then the new column get 1 for the same row if all values are 2, then assign 2 for the same row in new column if numbers are different (combination of 1 and 2) assign new number like 3 in the new

Do you have any idea how is it possible to doing so?

Try this. If the variance is 0 then the new column equals the mean. If it is not then the new column equals the sum of the unique values (2,2,4=6). If there is only one non-NA value in a row variance will not work so the first "if" statement takes care of that.

df <- as.data.frame(matrix(c(1, 1, 1, NA, NA, NA, 2, 2, NA, NA, NA, NA, 3, 2, 1, 2,
                           NA, NA, 4, 2, NA, 2, NA, NA, 5, 1, NA, NA, NA, NA),
                         ncol=5, byrow=T))

colnames(df) <- c("A", "B", "C", "D", "F")

for (i in 1:nrow(df)) { 
  if (length(as.numeric(df[i, 1:5])[!is.na(as.numeric(df[i, 1:5]))]) == 1) {
    df[i, "col3"] <- as.numeric(df[i, 1:5])[!is.na(as.numeric(df[i, 1:5]))]
  }
  else if (var(as.numeric(df[i, 1:5]), na.rm=T)==0) { 
    df[i, "col3"] <- mean(as.numeric(df[i, 1:5]), na.rm=T)
  } 
  else if (var(as.numeric(df[i, 1:5]), na.rm=T)!=0) {
    df[i, "col3"] <- sum(unique(as.numeric(df[i, 1:5])), na.rm=T)
  } 
}
df

*Updated to work for more than two columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM