简体   繁体   English

两个数据框的特定联接

[英]Specific Join of two Dataframes

I have two data frames: df1 and df2 : 我有两个数据帧: df1df2

> df1

     ID  Gender      age      cd       evnt     scr     test_dt
1 C0004    MALE       22       1          1      82    7/3/2014
2 C0004    MALE       22       1          2      76    7/3/2014
3 C0005    MALE       22       1          3    1514    7/3/2014
4 C0005    MALE       23       2          1      81   11/3/2014
5 C0006    MALE       23       2          2      75   11/3/2014
6 C0006    MALE       23       2          3     878   11/3/2014

and, 和,

> df2

     ID    hgt    wt     phys_dt
1 C0004     70   147   6/29/2015
2 C0004     70   157   6/27/2016
3 C0005     67   175   6/27/2016
4 C0005     65   171    7/2/2014
5 C0006     69   160   6/29/2015
6 C0006     64   143    7/2/2014

I want to join df1 and df2 in a way that yields the following data frame, call it df3 : 我想以产生以下数据帧的方式加入df1df2 ,将其称为df3

> df3

     ID   Gender      age      cd       evnt     scr     hgt     wt
1 C0004     MALE       22       1          1      82      70    147
2 C0004     MALE       22       1          2      76      70    157
3 C0005     MALE       22       1          3    1514      67    175
4 C0005     MALE       23       2          1      81      65    171
5 C0006     MALE       23       2          2      75      69    160
6 C0006     MALE       23       2          3     878      64    143

I'm trying to add df2$hgt and df2$wt to the proper ID row. 我正在尝试将df2$hgtdf2$wt到正确的ID行。 The tricky part is that I want to join hgt and wt to the ID row whose dates ( df1$test_dt and df2$phys_dt ) most closely align. 棘手的部分是我想将hgtwt加入到日期( df1$test_dtdf2$phys_dt )最接近的ID行中。 I was thinking I could first sort the two data frames by ID then by their respective dates then try and join? 我想我可以先按ID对两个数据框进行排序,然后按它们各自的日期排序,然后尝试加入? I'm not quite sure how to approach this. 我不太确定该如何处理。 Thanks. 谢谢。

If you want to murge just matching the df1$ID and df2$ID, the following should do it: 如果您只想匹配df1 $ ID和df2 $ ID,则应该执行以下操作:

df3 <- left_join(df1, df2, by = c("ID" = "ID"))  

if the date should be matched as well as the ID, you could try: 如果日期和ID应该匹配,则可以尝试:

df3 <- left_join(df1, df2, by = c("ID" = "ID", "test_dt" = "phys_dt")) 

it is in the library(dplyr) 它在库中(dplyr)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM