根据两个标准分配渐进式 ID

Question

I have two columns about the IDs of participants in my study.我有两列关于我研究中参与者的 ID。 The column ID contains progressive order of numbers as the subjects were all distinct people.列ID包含数字的渐进顺序，因为主题都是不同的人。 The second column new_ID contains the information about which IDs correspond to the same person.第二列new_ID包含有关哪些 ID 对应于同一个人的信息。 Unfortunately they are not in the progressive order.不幸的是，它们不是按渐进顺序排列的。

ID <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6)
new_ID <- c(8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 8, 8, 8, 8, 8, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 10, 10, 10, 10, 10, 10, 10)

data.frame(ID, new_ID)

#   ID  new_ID
#1   1       8
#2   1       8
#3   1       8
#4   1       8
#5   2      10
#6   2      10
#7   2      10
#8   2      10
#9   2      10
#10  2      10
#11  3       8
#12  3       8
#13  3       8
#14  3       8
#15  3       8
#16  4       4
#17  4       4
#18  4       4
#19  4       4
#20  4       4
#21  4       4
#22  5       5
#23  5       5
#24  5       5
#25  5       5
#26  6      10
#27  6      10
#28  6      10
#29  6      10
#30  6      10
#31  6      10
#32  6      10

I reported below what I would like to achieve, ie assigning the new ID ( final_ID ) based on the information in the two first columns.我在下面报告了我想要实现的目标，即根据前两列中的信息分配新 ID ( final_ID )。 Any helps will be appreciated (best if using dplyr )!任何帮助将不胜感激（最好使用dplyr ）！


#   ID new_ID ID_final
#1   1      8        1
#2   1      8        1
#3   1      8        1
#4   1      8        1
#5   2     10        2
#6   2     10        2
#7   2     10        2
#8   2     10        2
#9   2     10        2
#10  2     10        2
#11  3      8        1
#12  3      8        1
#13  3      8        1
#14  3      8        1
#15  3      8        1
#16  4      4        4
#17  4      4        4
#18  4      4        4
#19  4      4        4
#20  4      4        4
#21  4      4        4
#22  5      5        5
#23  5      5        5
#24  5      5        5
#25  5      5        5
#26  6     10        2
#27  6     10        2
#28  6     10        2
#29  6     10        2
#30  6     10        2
#31  6     10        2
#32  6     10        2

Answer 1

Here's a data.table solution as well.这也是一个data.table解决方案。

EDIT: added a dplyr solution too at the request of the OP.编辑：应 OP 的要求，也添加了dplyr解决方案。

library(data.table)
ID <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6)
new_ID <- c(8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 8, 8, 8, 8, 8, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 10, 10, 10, 10, 10, 10, 10)

d <- data.table(ID, new_ID)
d[, ID_final := min(.SD[,ID]), new_ID]
d
#>     ID new_ID ID_final
#>  1:  1      8        1
#>  2:  1      8        1
#>  3:  1      8        1
#>  4:  1      8        1
#>  5:  2     10        2
#>  6:  2     10        2
#>  7:  2     10        2
#>  8:  2     10        2
#>  9:  2     10        2
#> 10:  2     10        2
#> 11:  3      8        1
#> 12:  3      8        1
#> 13:  3      8        1
#> 14:  3      8        1
#> 15:  3      8        1
#> 16:  4      4        4
#> 17:  4      4        4
#> 18:  4      4        4
#> 19:  4      4        4
#> 20:  4      4        4
#> 21:  4      4        4
#> 22:  5      5        5
#> 23:  5      5        5
#> 24:  5      5        5
#> 25:  5      5        5
#> 26:  6     10        2
#> 27:  6     10        2
#> 28:  6     10        2
#> 29:  6     10        2
#> 30:  6     10        2
#> 31:  6     10        2
#> 32:  6     10        2
#>     ID new_ID ID_final

library(dplyr)
df <- data.frame(ID, new_ID)
df <- df %>% group_by(new_ID)  %>%
  mutate(ID_final = min(ID))
df
#> # A tibble: 32 x 3
#> # Groups:   new_ID [4]
#>       ID new_ID ID_final
#>    <dbl>  <dbl>    <dbl>
#>  1     1      8        1
#>  2     1      8        1
#>  3     1      8        1
#>  4     1      8        1
#>  5     2     10        2
#>  6     2     10        2
#>  7     2     10        2
#>  8     2     10        2
#>  9     2     10        2
#> 10     2     10        2
#> # ... with 22 more rows

^{Created on 2019-09-30 by the reprex package (v0.3.0)}^{由代表 package (v0.3.0) 于 2019 年 9 月 30 日创建}

Answer 2

What you want to do is find the correct ID for each new_ID, and then join to that mapping.您要做的是为每个 new_ID 找到正确的 ID，然后加入该映射。

final_id_map <- df %>% group_by(new_ID) %>% summarise(ID_final=min(ID))
> final_id_map
# A tibble: 4 x 2
  new_ID ID_final
   <dbl>    <dbl>
1      4        4
2      5        5
3      8        1
4     10        2

Then you can just do a然后你可以做一个

df %>% join(final_id_map)

to produce the desired output.生产所需的 output。

根据两个标准分配渐进式 ID

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-09-27 17:44:56

解决方案2
0 2019-09-27 17:18:36

根据两个标准分配渐进式 ID

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-09-27 17:44:56

解决方案2 0 2019-09-27 17:18:36

解决方案1
2 已采纳 2019-09-27 17:44:56

解决方案2
0 2019-09-27 17:18:36