简体   繁体   English

如何按列合并两个数据框并交替R中不匹配的列

[英]How to merge two dataframes by columns and alternating the columns that don't match in R

I have to do a D in D analysis and for that I have two dataframes with the exact same colums but with different values (pre treatment and post treatment).我必须进行 D 中的 D 分析,为此我有两个具有完全相同列但具有不同值(预处理和后处理)的dataframes框。 In addition, from one period to another I lost some participants so some values won't be taken into account:此外,从一个时期到另一个时期,我失去了一些参与者,所以一些价值观不会被考虑在内:

Survey1:调查1:

ID ID City城市 Children孩子们 Q1第一季度 Q2第二季度
1 1 Paris巴黎 Yes是的 0.5 0.5 Yes是的
2 2 NY纽约 No 1 1 No
3 3 London伦敦 No NA不适用 Yes是的
4 4 Madrid马德里 Yes是的 2.1 2.1 No
5 5 Paris巴黎 Yes是的 1.8 1.8 Yes是的
6 6 Paris巴黎 No NA不适用 Yes是的
7 7 NY纽约 Yes是的 3 3 Yes是的
8 8 Madrid马德里 Yes是的 0.8 0.8 No
9 9 Paris巴黎 No 2.5 2.5 No
10 10 Paris巴黎 No 1 1 Yes是的

Survey 2:调查 2:

ID ID City城市 Children孩子们 Q1第一季度 Q2第二季度
1 1 Paris巴黎 Yes是的 1 1 Yes是的
3 3 London伦敦 No 2 2 Yes是的
4 4 Madrid马德里 Yes是的 0.5 0.5 Yes是的
6 6 Paris巴黎 No 2 2 Yes是的
7 7 NY纽约 Yes是的 1.8 1.8 Yes是的
9 9 Paris巴黎 Yes是的 2.5 2.5 Yes是的
10 10 Paris巴黎 No 1 1 No

As you can see in Survey2 I have lost subjects: 2, 5 and 8 + Subject 9 had a baby meanwhile.正如您在Survey2中看到的那样,我失去了对象:2、5 和 8 + 对象 9 同时生了一个孩子。

I would like to merge both dataframes by ID, alternating columns and if possible changing the name to make clear the columps pre and post treatment:我想按 ID、交替列合并两个数据框,如果可能的话,更改名称以明确列前和后处理:

Result:结果:

ID ID City城市 Children孩子们 Q1第一季度 Q1_t Q1_t Q2第二季度 Q2_t Q2_t
1 1 Paris巴黎 Yes是的 0.5 0.5 1 1 Yes是的 Yes是的
3 3 London伦敦 No NA不适用 2 2 Yes是的 Yes是的
4 4 Madrid马德里 Yes是的 2.1 2.1 0.5 0.5 No Yes是的
6 6 Paris巴黎 No NA不适用 2 2 Yes是的 Yes是的
7 7 NY纽约 Yes是的 3 3 1.8 1.8 Yes是的 Yes是的
9 9 Paris巴黎 Yes是的 2.5 2.5 2.5 2.5 No Yes是的
10 10 Paris巴黎 No 1 1 1 1 Yes是的 No

When i use merge(Survey1, Survey2, by = "ID") it keeps the correct ID but since subject 9 had a baby it gives me an extra observation so my Result dataframe has 8 obs.当我使用merge(Survey1, Survey2, by = "ID")时,它会保留正确的 ID,但由于主题 9 有一个孩子,它给了我一个额外的观察结果,所以我的 Result dataframe有 8 个 obs。 instead of only 7 obs.而不是只有 7 个 obs。 (since I lost 3 subjects). (因为我失去了 3 个科目)。 I would like to only take into account the observation once subject 9 had a baby.我只想考虑对象 9 生了孩子后的观察结果。 It also tries to merge some of my questions which as a consequence eliminates my pre and post treatment observations and puts them in the order: Q1, Q2, Q1, Q2 instead of: Q1, Q1, Q2, Q2.它还试图合并我的一些问题,从而消除我的治疗前后观察结果,并将它们按以下顺序排列:Q1、Q2、Q1、Q2 而不是:Q1、Q1、Q2、Q2。 Also, I don't know how to merge by adding the "_t" at the end ob the second dataframe column names.另外,我不知道如何通过在第二个dataframe列名的末尾添加“_t”来合并。

Does somebody has an idea?有人有想法吗?

#For Survey 1
a <- c("1","2","3","4","5","6","7","8","9","10")
b <- c("Paris", "NY", "London", "Madrid", "Paris", "Paris", "NY", "Madrid", "Paris", "Paris")
c <- c("Yes", "No", "No", "Yes", "Yes", "No", "Yes", "Yes", "No", "No")
d <- c(0.5, 1, NA, 2.1, 1.8, NA, 3, 0.8, 2.5, 1)
e <- c("Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "No", "No", "Yes")
Survey1 <- data.frame(a,b,c,d,e)
names(Survey1) <- c("ID", "City", "Children", "Q1", "Q2")
Survey1

#For Survey2
a <- c("1","3","4","6","7","9","10")
b <- c("Paris", "London", "Madrid", "Paris", "NY", "Paris", "Paris")
c <- c("Yes", "No", "Yes", "No", "Yes", "Yes", "No")
d <- c(1, 2, 0.5, 2, 1.8, 2.5, 1)
e <- c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No")
Survey2 <- data.frame(a,b,c,d,e)
names(Survey2) <- c("ID", "City", "Children", "Q1", "Q2")
Survey2 

Your data is not reproducible, so I haven't tried my solution, but by the looks of your data, it seems you want to do:您的数据不可重现,所以我没有尝试过我的解决方案,但从您的数据来看,您似乎想要这样做:

library(tidyverse)
results <- survey1 %>%
  select(-Children) %>%
  right_join(survey2 %>%
               rename(Q1_t = Q1,
                      Q2_t = Q2),
             by = c("id", "City")) %>%
  relocate(Q1_T, .after = Q1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM