简体   繁体   中英

Faster Alternative for looping in combination with If in R

I have a data frame with 2,000,000 + rows and 22 columns. In three of the columns the entries are either 0, 1 or NA. I want to have a column which has the sum of these three columns for every row, treating NA as 0. Using a for loop is definitely way too slow.

Have you got any alternatives for me? Another idea was using mutate in a pipe, but I have problems selecting the columns that I want to add up by name.

First attempt:

for(i in 1:nrow(T12)){

  if(is.na(T12$blue[i])  & is.na(T12$blue.y[i])) {
  
    T12$blue[i] <- T12$blue.x[i]
  
  }else if(is.na(T12$blue[i])  & is.na(T12$blue.x[i])){
  
  
    T12$blue[i] <- T12$blue.y[i]
  }else if(is.na(T12$blue[i])  & is.na(T12$blue.x[i]) & is.na(T12$blue.y[i]) )
    T12[i,] <- NULL
}

Thank you!

I am going to assume that the columns you wish to add are the first three. If you need different columns, just change c(1,2,3) in the code below.

apply(T12[,c(1,2,3)], 1, sum, na.rm=TRUE)

Note: @27ϕ9 comments that a faster solution is

rowSums(T12[,c(1,2,3)], 1, na.rm=TRUE)

You can first replace all the NA's to 0.

df[is.na(df)] <- 0
setDT(df)[,newcol := a + b + c]

If your object column names are a , b and c , maybe you can try the code below

within(T12, new <- rowSums(cbind(a,b,c),na.rm = TRUE))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM