加入并匹配R中的两个数据框

Question

I have two data frames.我有两个数据框。 The first data frame consists of: four columns 1) ID, 2) Site, 3) Depth, and 3) Density.第一个数据框包括：四列 1) ID、2) 站点、3) 深度和 3) 密度。 The second data frame consists of 3 columns: 1) ID, 2) Site and 3) Choice (ie, favorite site).第二个数据框由 3 列组成：1) ID，2) 站点和 3) 选择（即最喜欢的站点）。

df1 df1

  ID  Site Depth Density      
  1     B   0.1       0
  2     C   0.2       0
  3     C   0.2       1
  4     A  0.05       0
  5     A  0.05       1
  6     B   0.1       1
  7     B   0.1       2
  8     B   0.1       3
  9     D   0.3       0
 10     C   0.2       2
 11     D   0.3       1
 12     D   0.3       2
 13     D   0.3       3
 14     D   0.3       4
 15     D   0.3       5

df 2 DF 2

     ID     Site   Choices
      1       A     No
      1       B     Yes
      1       C     No
      1       D     No
      2       A     No
      2       B     No
      2       C     Yes
      2       D     No
      3       A     No
      3       B     No
      3       C     Yes
      3       D     No
      4       A     Yes
      4       B     No
      4       C     No
      4       D     No

I am trying to add a column to df2 that has the densities of each ID at each site when an ID selected its favorite site.我正在尝试向 df2 添加一列，当 ID 选择其最喜欢的站点时，该列具有每个站点的每个 ID 的密度。

Desired Output:所需 Output：

     ID     Site   Depth  Density    Choice
      1       A      0.05     0         No
      1       B      0.1      0         Yes
      1       C      0.2      0         No
      1       D      0.3      0         No
      2       A      0.05     0         No
      2       B      0.1      1         No
      2       C      0.2      0         Yes
      2       D      0.3      0         No
      3       A      0.05     0         No
      3       B      0.1      0         No
      3       C      0.2      1         Yes
      3       D      0.3      0         No
      4       A      0.05     0         Yes
      4       B      0.1      1         No
      4       C      0.2      2         No
      4       D      0.3      0         No

df2 explanation: When ID 1 selected site B, there was 0 density in site A,B,C, and D. When ID 2 selected C, the density in site A was 0, in site B 1, in site C 0, and site D 0. When ID 3 selected site C, the density in A was still 0 (no ID has chosen site A yet), B has 1, C has 1, and site D 0, and so on. df2解释：ID 1选择B站时，A、B、C、D站密度为0。ID 2选择C时，A站密度为0，B站为1，C站为0，而站点D 0。当ID 3选择站点C时，A中的密度仍然为0（尚未有ID选择站点A），B为1，C为1，站点D为0，依此类推。

I've tried using the full join function and mutate function but I am not getting my desired output:我试过使用完全连接 function 和变异 function 但我没有得到我想要的 output：

           df3<-df2 %>%
           full_join(df1, by = c("ID", "Site")) %>%
           group_by(ID) %>%
           mutate(Density= Density[Choice == "Yes"] ) %>%
           distinct(ID, Site, .keep_all = TRUE)

Answer 1

I think Density is the running total of how many groups have selected each site.我认为Density是有多少组选择了每个站点的运行总数。 To calculate that, I would do this:为了计算它，我会这样做：

df3 <- df2 %>%
  full_join(df1, by = c("ID", "Site")) %>%
  arrange(ID, site) %>%  ## make sure IDs are in ascending order
  group_by(Site) %>%
  mutate(Density = cumsum(Choice == "Yes"))

加入并匹配R中的两个数据框

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-12-02 04:49:18

加入并匹配R中的两个数据框

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-12-02 04:49:18

解决方案1
0 已采纳 2020-12-02 04:49:18