[英]Join and match two data frames in R
I have two data frames.我有两个数据框。 The first data frame consists of: four columns 1) ID, 2) Site, 3) Depth, and 3) Density.第一个数据框包括:四列 1) ID、2) 站点、3) 深度和 3) 密度。 The second data frame consists of 3 columns: 1) ID, 2) Site and 3) Choice (ie, favorite site).第二个数据框由 3 列组成:1) ID,2) 站点和 3) 选择(即最喜欢的站点)。
df1 df1
ID Site Depth Density
1 B 0.1 0
2 C 0.2 0
3 C 0.2 1
4 A 0.05 0
5 A 0.05 1
6 B 0.1 1
7 B 0.1 2
8 B 0.1 3
9 D 0.3 0
10 C 0.2 2
11 D 0.3 1
12 D 0.3 2
13 D 0.3 3
14 D 0.3 4
15 D 0.3 5
df 2 DF 2
ID Site Choices
1 A No
1 B Yes
1 C No
1 D No
2 A No
2 B No
2 C Yes
2 D No
3 A No
3 B No
3 C Yes
3 D No
4 A Yes
4 B No
4 C No
4 D No
I am trying to add a column to df2 that has the densities of each ID at each site when an ID selected its favorite site.我正在尝试向 df2 添加一列,当 ID 选择其最喜欢的站点时,该列具有每个站点的每个 ID 的密度。
Desired Output:所需 Output:
ID Site Depth Density Choice
1 A 0.05 0 No
1 B 0.1 0 Yes
1 C 0.2 0 No
1 D 0.3 0 No
2 A 0.05 0 No
2 B 0.1 1 No
2 C 0.2 0 Yes
2 D 0.3 0 No
3 A 0.05 0 No
3 B 0.1 0 No
3 C 0.2 1 Yes
3 D 0.3 0 No
4 A 0.05 0 Yes
4 B 0.1 1 No
4 C 0.2 2 No
4 D 0.3 0 No
df2 explanation: When ID 1 selected site B, there was 0 density in site A,B,C, and D. When ID 2 selected C, the density in site A was 0, in site B 1, in site C 0, and site D 0. When ID 3 selected site C, the density in A was still 0 (no ID has chosen site A yet), B has 1, C has 1, and site D 0, and so on. df2解释:ID 1选择B站时,A、B、C、D站密度为0。ID 2选择C时,A站密度为0,B站为1,C站为0,而站点D 0。当ID 3选择站点C时,A中的密度仍然为0(尚未有ID选择站点A),B为1,C为1,站点D为0,依此类推。
I've tried using the full join function and mutate function but I am not getting my desired output:我试过使用完全连接 function 和变异 function 但我没有得到我想要的 output:
df3<-df2 %>%
full_join(df1, by = c("ID", "Site")) %>%
group_by(ID) %>%
mutate(Density= Density[Choice == "Yes"] ) %>%
distinct(ID, Site, .keep_all = TRUE)
I think Density
is the running total of how many groups have selected each site.我认为Density
是有多少组选择了每个站点的运行总数。 To calculate that, I would do this:为了计算它,我会这样做:
df3 <- df2 %>%
full_join(df1, by = c("ID", "Site")) %>%
arrange(ID, site) %>% ## make sure IDs are in ascending order
group_by(Site) %>%
mutate(Density = cumsum(Choice == "Yes"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.