简体   繁体   中英

how to blend two columns by stubsituting NA's?

I have got a huge dataframe with two columns that I need to combine. On may wonder if this isn't just the same problem mentionend in this post here , but my dataframe in fact doesn' exist of just two columns. There are many more (wit a lot of NA's) and I only want to deal with two of them, which maked it a bit more complicated I guess.

This is a piece of those columns:

    col1 col2
15   NA    4
16   NA    5
17   NA    5
18   NA    5
19   NA    1
20   NA   NA
21   NA   NA
22   1    NA
23   5    NA
24   2    NA
25   4    NA
26   3    NA
27   NA    2
28   NA    4
29   NA    5
30   NA    3

I need to combine the columns into one by replacing eachother's NA's. No preplacements take place when both columns contain an NA (but that's obvious).

The result should look like this:

    col1
15   4    
16   5  
17   5  
18   5  
19   1  
20   NA 
21   NA 
22   1  
23   5  
24   2  
25   4  
26   3  
27   2  
28   4  
29   5  
30   3  

I tried an ugly for loop:

for (i in 1:nrow(df)){
  if (is.na(df[i,1])==TRUE){
    df[i,1] <- df[i,2]
  }
 }
df <- df[,1]

but that code appeared to be way too slow. Does anyone have an idea about how two handle this problem?

Many thanks in advance!

无需循环,因为您可以分配给子集:

df$col1[is.na(df$col1)] <- df$col2[is.na(df$col1)]

Languages like SQL have a function called coalesce which returns the first non-missing value given a set of lists. I wrote a function that does this behavior in R.

coalesce<-function(...) {
    x<-lapply(list(...), function(z) {if (is.factor(z)) as.character(z) else z})
    m<-is.na(x[[1]])
    i<-2
    while(any(m) & i<=length(x)) {
        if ( length(x[[i]])==length(x[[1]])) {
            x[[1]][m]<-x[[i]][m]
        } else if (length(x[[i]])==1) {
            x[[1]][m]<-x[[i]]
        } else {
            stop(paste("length mismatch in argument",i," - found:", length( x[[i]] ),"expected:",length( x[[1]] ) ))
        }
        m<-is.na(x[[1]])
        i<-i+1
    }
    return(x[[1]])
}

And you would use it like

coalesce(col1,col2)

And you can also add a default value if all rows are NA

coalesce(col1,col2, -9)

It returns a new column rather than modifying any of the originals.

I try to keep the latest version of the function on this gist

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM