I have a data frame with 2,000,000 + rows and 22 columns. In three of the columns the entries are either 0, 1 or NA. I want to have a column which has the sum of these three columns for every row, treating NA as 0. Using a for loop is definitely way too slow.
Have you got any alternatives for me? Another idea was using mutate in a pipe, but I have problems selecting the columns that I want to add up by name.
First attempt:
for(i in 1:nrow(T12)){
if(is.na(T12$blue[i]) & is.na(T12$blue.y[i])) {
T12$blue[i] <- T12$blue.x[i]
}else if(is.na(T12$blue[i]) & is.na(T12$blue.x[i])){
T12$blue[i] <- T12$blue.y[i]
}else if(is.na(T12$blue[i]) & is.na(T12$blue.x[i]) & is.na(T12$blue.y[i]) )
T12[i,] <- NULL
}
Thank you!
I am going to assume that the columns you wish to add are the first three. If you need different columns, just change c(1,2,3)
in the code below.
apply(T12[,c(1,2,3)], 1, sum, na.rm=TRUE)
Note: @27ϕ9 comments that a faster solution is
rowSums(T12[,c(1,2,3)], 1, na.rm=TRUE)
You can first replace all the NA's to 0.
df[is.na(df)] <- 0
setDT(df)[,newcol := a + b + c]
If your object column names are a
, b
and c
, maybe you can try the code below
within(T12, new <- rowSums(cbind(a,b,c),na.rm = TRUE))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.