at the beginning I'd like to note that I found multiple solutions on SO but none of them met my expectations.
I have to DF's:
1.
E F G H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992,
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 NA NA NA
chr2_202492077_202492078 NA NA NA
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 NA NA NA
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 NA NA NA
chr2_31167747_31167748 NA NA NA
2.
E F G H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 NA NA NA
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 NA NA NA
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033,
chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987,
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066,
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956,
chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,
Desired output:
E F G H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992,
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033,
chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987,
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066,
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956,
chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,
As you see DF1 is updated by DF2 columns F, G, H, where column E is my unique index. I tried to do merge()
but this function didin't update my rows, it jus added columns of DF2 to DF1. I also tried updating with data.table
and tidyverse
, and my rows have been updated but other ones went to NAs
... Finally I decided to do simple lapply()
with nested ifelse()
, however I don't know how to update all three columns simultaneously, what is more this is terrible slow for my over 50000 rows of data in each DF...
What I did so far:
DF1$F <- sapply(1:nrow(DF1), function(i) ifelse(DF1[i,1]==DF2[i,1] & is.na(DF1[i,1]), DF2[i,1], DF[i,1]))
You cand do this in base R:
as.data.frame(Map(function(x,y) ifelse(is.na(x),y,x),DF1,DF2))
with library purrr
you can have a prettier more compact form (see Soto's answer for an even more compact one with dplyr
):
library(purrr)
map2_df(DF1,DF2,~ifelse(is.na(.x),.y,.x))
In both cases (technically a data.frame
in the first case and a tibble
in the second case):
output
E F G H
1 chr1_100203723_100203724 <NA> <NA> <NA>
2 chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992,
3 chr1_10032235_10032236 <NA> <NA> <NA>
4 chr1_100327060_100327061 <NA> <NA> <NA>
5 chr1_100346889_100346890 <NA> <NA> <NA>
6 chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
7 chr1_100357190_100357191 <NA> <NA> <NA>
8 chr1_100358057_100358058 <NA> <NA> <NA>
9 chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033,
10 chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987,
11 chr2_203760838_203760839 <NA> <NA> <NA>
12 chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066,
13 chr2_220354644_220354645 <NA> <NA> <NA>
14 chr2_234749403_234749404 <NA> <NA> <NA>
15 chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956,
16 chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,
data
DF1 <- read.table(text="E F G H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992,
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 NA NA NA
chr2_202492077_202492078 NA NA NA
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 NA NA NA
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 NA NA NA
chr2_31167747_31167748 NA NA NA",header=T,stringsAsFactors=F)
DF2 <- read.table(text="E F G H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 NA NA NA
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 NA NA NA
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033,
chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987,
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066,
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956,
chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,",header=T,stringsAsFactors=F)
The function coalesce
from dplyr
does exactly that. I m sure we can use purrr
functions to map the 2 data frames, but here is one using base R mapply
,
DF1[-1] <- mapply(dplyr::coalesce, DF1[-1], DF2[-1])
which gives,
EFGH 1 chr1_100203723_100203724 <NA> <NA> <NA> 2 chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992, 3 chr1_10032235_10032236 <NA> <NA> <NA> 4 chr1_100327060_100327061 <NA> <NA> <NA> 5 chr1_100346889_100346890 <NA> <NA> <NA> 6 chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983, 7 chr1_100357190_100357191 <NA> <NA> <NA> 8 chr1_100358057_100358058 <NA> <NA> <NA> 9 chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033, 10 chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987, 11 chr2_203760838_203760839 <NA> <NA> <NA> 12 chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066, 13 chr2_220354644_220354645 <NA> <NA> <NA> 14 chr2_234749403_234749404 <NA> <NA> <NA> 15 chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956, 16 chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,
NOTE: As mentioned by @Moody_Mudskipper, the purrr
version, which produces a new data frame without altering DF1
or DF2
, would be,
library(purrr)
map2_df(DF1,DF2,dplyr::coalesce)
Another naive way of doing this is using paste0
> df1 <- data.frame(E = c('A','B','C'), F=c('0.9,1',NA,NA), G=c(NA,'0.98,0.34',NA), H=c(NA,'0.98,0.34',NA), stringsAsFactors = F)
> df2 <- data.frame(E = c('A','B','C'), F=c(NA,'1,3',NA), G=c(NA,NA,'5,6,7'), H=c(NA,NA,NA), stringsAsFactors = F)
> df1[is.na(df1)] <- ''
> df2[is.na(df2)] <- ''
>
> mapply(paste, df1[-1], df2[-1])
F G H
[1,] "0.9,1 " " " " "
[2,] " 1,3" "0.98,0.34 " "0.98,0.34 "
[3,] " " " 5,6,7" " "
Updated Based on Sotos suggestion with mapply
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.