简体   繁体   中英

Replacing values of dataframe with NA values of another dataframe

I would like to replace values of one dataframe with NA of another dataframe that have the same identifier. That is, for all values of df1 that have the same id, assign the "NA" values of df2 at the corresponding id and indices.

I have df1 and df2:

df1 =data.frame(id = c(1,1,2,2,6,6),a = c(2,4,1,7,5,3), b = c(5,3,0,3,2,5),c = c(9,3,10,33,2,5))
df2 =data.frame(id = c(1,2,6),a = c("NA",0,"NA"), b= c("NA", 9, 9),c=c(0,"NA","NA"))

what i would like is df3:

df3 = data.frame(id = c(1,1,2,2,6,6),a = c("NA","NA",1,7,"NA","NA"), b = c("NA","NA",0,3,2,5),c = c(9,3,"NA","NA","NA","NA"))

I have tried the lookup function and the library "data.table", but i could get the correct df3. Could anyone please help me with this?

We can do a join on 'id' and then replace the NA values by multiplying the .

library(data.table)
nm1 <- names(df1)[-1]
setDT(df1)[df2,  (nm1) := Map(function(x, y) x*(NA^is.na(y)), .SD, 
                  mget(paste0('i.', nm1))), on = .(id), .SDcols = nm1]
df1
#   id  a  b  c
#1:  1 NA NA  9
#2:  1 NA NA  3
#3:  2  1  0 NA
#4:  2  7  3 NA
#5:  6 NA  2 NA
#6:  6 NA  5 NA

data

df2 =data.frame(id = c(1,2,6),a = c(NA,0,NA), b= c(NA, 9, 9),c=c(0,NA,NA))

NOTE: In the OP's post NA were "NA"

Since your NA values are actually text "NA" you will have to turn all your variables into text (with as.character ). You can join both dataframes by id column. Since both dataframes have columns a,b, and c R will rename then ax, bx and cx (df1) and ay, by and cy (df2). After that you can create new columns a,b, and c. These than have "NA" whenever ay == "NA" and ax otherwise (and so on). If your NA values were real NA you need to test differently is.na(value) (see example below in the code).

library(dplyr)

df1 %>%  
  mutate_all(as.character) %>% # allvariables as text
  left_join(df2 %>% 
              mutate_all(as.character) ## all variables as text
            , by = "id") %>% ## join tables by 'id'; a.x from df1 and a.y from df2 and so on
  mutate(a = case_when(a.y == "NA" ~ "NA", TRUE ~ a.x), ## if a.y == "NA" take this,else  a.x 
         b = case_when(b.y == "NA" ~ "NA", TRUE  ~ b.x),
         c = case_when(c.y == "NA" ~ "NA", TRUE ~ c.x)) %>%
  select(id, a, b, c) ## keep only these initial columns

  id  a  b  c
1  1 NA NA  9
2  1 NA NA  3
3  2  1  0 NA
4  2  7  3 NA
5  6 NA  2 NA
6  6 NA  5 NA

##if your dataframe head real NA this is how you can test:
missing_value <- NA

is.na(missing_value) ## TRUE
missing_value == NA  ## Does not work with R

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM