Select and combine rows from two data frames with different columns and lenght in R

Question

I have 2 data frames. df1 is like

V1    V2    V3   V4        V5
1   1  7506 10949    3 0.2284710
2   1 28272 29965  147 0.6033058
3   1 36598 37518  843 0.7459016
4   1 37512 40365   52 0.4121901
5   1 48795 50666  150 0.8050847
6   1 50660 52365   92 0.6995614
7   1 52850 54453 1337 0.8991597
8   1 54447 54527  279 0.9858824
9   1 54816 64015    2 0.2787356
10  1 70664 74349   17 0.5549451

And df2 is like this :

1     1     1  7512
2     1  7506 10949
3     1 10943 13175
4     1 13169 20070
5     1 20064 28278
6     1 28272 29965
7     1 29959 36604
8     1 36598 37518
9     1 37512 40365
10    1 40359 48801

i would like to combine them in a new df3 in the way that if there is match it will take the value of df1$V4 and df1$V5 if not it will be NA or 0. The final data frame should be like :

 1     1  7512    0 0
 1  7506 10949    3 0.2284710
 1 10943 13175    0 0
 1 13169 20070    0 0
 1 20064 28278    0 0
 1 28272 29965  147 0.6033058
 1 29959 36604    0 0
 1 36598 37518  843 0.7459016
 1 37512 40365   52 0.4121901
 1 40359 48801    0 0
 ......
 ......
 etc until the end of the files

Could you please help me . Which function is doing this ?

Thank you in advance

Answer 1

First just to make it easier to reproduce your example it is nice to include your data like this:

df1 <- structure(list(V1 = 1:10, V2 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), V3 = c(7506L, 28272L, 36598L, 37512L, 48795L, 50660L,
52850L, 54447L, 54816L, 70664L), V4 = c(10949L, 29965L, 37518L,
40365L, 50666L, 52365L, 54453L, 54527L, 64015L, 74349L), V5 = c(3L,
147L, 843L, 52L, 150L, 92L, 1337L, 279L, 2L, 17L), V6 = c(0.228471,
0.6033058, 0.7459016, 0.4121901, 0.8050847, 0.6995614, 0.8991597,
0.9858824, 0.2787356, 0.5549451)), class = "data.frame", row.names = c(NA,
-10L))


df2 <- structure(list(V1 = 1:10, V2 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), V3 = c(1L, 7506L, 10943L, 13169L, 20064L, 28272L,
29959L, 36598L, 37512L, 40359L), V4 = c(7512L, 10949L, 13175L,
20070L, 28278L, 29965L, 36604L, 37518L, 40365L, 48801L)), class = "data.frame", row.names = c(NA,
-10L))

Then generate an index with your two keys in each dataset and match the the positions

index <- match(paste0(df2$V3, df2$V4), paste0(df1$V3, df1$V4))

Then use that index to fill in the values in your second dataframe:

df2$V5 <- df1$V5[index]
df2$V6 <- df1$V6[index]

You might have different column names in you data of course since I just quickly copy/pasted your data and got the row names and stuff as well.

df2

   V1 V2    V3    V4  V5        V6
1   1  1     1  7512  NA        NA
2   2  1  7506 10949   3 0.2284710
3   3  1 10943 13175  NA        NA
4   4  1 13169 20070  NA        NA
5   5  1 20064 28278  NA        NA
6   6  1 28272 29965 147 0.6033058
7   7  1 29959 36604  NA        NA
8   8  1 36598 37518 843 0.7459016
9   9  1 37512 40365  52 0.4121901
10 10  1 40359 48801  NA        NA

Answer 2

If I understand correctly, the OP requests to right join df1 with df2 on key columns V1 , V2 , and V3 . The result will consist of all rows of df2 with columns V4 and V5 appended from df1 where the keys match.

One possible implementation is with data.table :

library(data.table)
setDT(df1)[setDT(df2), on = .(V1, V2, V3)]

  V1 V2 V3 V4 V5 1: 1 1 7512 NA NA 2: 1 7506 10949 3 0.2284710 3: 1 10943 13175 NA NA 4: 1 13169 20070 NA NA 5: 1 20064 28278 NA NA 6: 1 28272 29965 147 0.6033058 7: 1 29959 36604 NA NA 8: 1 36598 37518 843 0.7459016 9: 1 37512 40365 52 0.4121901 10: 1 40359 48801 NA NA

Data

library(data.table)
df1 <- fread("rn V1    V2    V3   V4        V5
1   1  7506 10949    3 0.2284710
2   1 28272 29965  147 0.6033058
3   1 36598 37518  843 0.7459016
4   1 37512 40365   52 0.4121901
5   1 48795 50666  150 0.8050847
6   1 50660 52365   92 0.6995614
7   1 52850 54453 1337 0.8991597
8   1 54447 54527  279 0.9858824
9   1 54816 64015    2 0.2787356
10  1 70664 74349   17 0.5549451", drop = 1L)
df2 <- fread("rn V1    V2    V3
1     1     1  7512
2     1  7506 10949
3     1 10943 13175
4     1 13169 20070
5     1 20064 28278
6     1 28272 29965
7     1 29959 36604
8     1 36598 37518
9     1 37512 40365
10    1 40359 48801", drop = 1L)

Select and combine rows from two data frames with different columns and lenght in R

Question

2 answers

solution1
0 ACCPTED 2019-02-25 11:07:32

solution2
0 2019-02-25 11:47:17

Data

Select and combine rows from two data frames with different columns and lenght in R

Question

2 answers

solution1 0 ACCPTED 2019-02-25 11:07:32

solution2 0 2019-02-25 11:47:17

Data

solution1
0 ACCPTED 2019-02-25 11:07:32

solution2
0 2019-02-25 11:47:17