简体   繁体   中英

Merge/join longitudinal data based on two variables

I am trying to merge two longitudinal data which are both in the long format.

df1:
patientid visit mental-health
703-FD    1     depressed
703-FD    2     depressed
703-FD    3     depressed
707-NM    1     non-depressed
707-NM    2     non-depressed
707-NM    3     depressed 

df2:
patientid visit HIV_disclosure 
703-FD    1     yes
703-FD    2     yes
703-FD    3     yes
707-NM    1     no
707-NM    2     no
707-NM    3     yes

Code I've tried:

data_combined <- full_join(x=df1, y=df2, by="patientid"): 

patientid visit.x mental-health  visit.y   HIV disclosure
703-FD    1       depressed      1         yes
703-FD    1       depressed      2         yes
703-FD    1       depressed      3         yes
703-FD    2       depressed      1         yes
703-FD    2       depressed      2         yes
703-FD    2       depressed      3         yes
703-FD    3       depressed      1         yes
703-FD    3       depressed      2         yes
703-FD    3       depressed      3         yes
707-NM    1     non-depressed    1         no
707-NM    1     non-depressed    2         no
707-NM    1     non-depressed    3         yes
707-NM    2     non-depressed    1         no
707-NM    2     non-depressed    2         no
707-NM    2     non-depressed    3         yes
707-NM    3     depressed        1         no
707-NM    3     depressed        2         no
707-NM    3     depressed        3         yes

How do I edit the above code to merge by both the patientid and the visit variable?

I've tried:

library (dplyr)
data_combined <- full_join(x=df1, y=df2, by="patientid", "visit")

Desired joined/merged dataframe:

patientid visit  mental-health  HIV disclosure
703-FD    1       depressed         yes
703-FD    2       depressed         yes
703-FD    3       depressed         yes
707-NM    1     non-depressed       no
707-NM    2     non-depressed       no
707-NM    3       depressed         yes

I'm sure it's a simple code, but I've been struggling with it for a while; please assist.

By default, the dplyr join functions will join by all variables in common. In your data, those two variables are patientid and visit . So, for the sample data you provide, the following simplified code should work:

library(dplyr)
data_combined <- full_join(x=df1, y=df2)

If you want to specify the two columns (perhaps there are more columns in common), then you need to provide a vector to the by = argument.

data_combined <- full_join(x=df1, y=df2, by = c("patientid", "visit"))

Your original code only supplied by = 'patientid' . Since 'visit' was after the comma, full_join() would try to apply 'visit' to another possible argument for full_join() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM