简体   繁体   English

根据两个标准分配渐进式 ID

[英]Assign progressive ID based on two criteria

I have two columns about the IDs of participants in my study.我有两列关于我研究中参与者的 ID。 The column ID contains progressive order of numbers as the subjects were all distinct people.ID包含数字的渐进顺序,因为主题都是不同的人。 The second column new_ID contains the information about which IDs correspond to the same person.第二列new_ID包含有关哪些 ID 对应于同一个人的信息。 Unfortunately they are not in the progressive order.不幸的是,它们不是按渐进顺序排列的。

ID <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6)
new_ID <- c(8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 8, 8, 8, 8, 8, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 10, 10, 10, 10, 10, 10, 10)

data.frame(ID, new_ID)

#   ID  new_ID
#1   1       8
#2   1       8
#3   1       8
#4   1       8
#5   2      10
#6   2      10
#7   2      10
#8   2      10
#9   2      10
#10  2      10
#11  3       8
#12  3       8
#13  3       8
#14  3       8
#15  3       8
#16  4       4
#17  4       4
#18  4       4
#19  4       4
#20  4       4
#21  4       4
#22  5       5
#23  5       5
#24  5       5
#25  5       5
#26  6      10
#27  6      10
#28  6      10
#29  6      10
#30  6      10
#31  6      10
#32  6      10

I reported below what I would like to achieve, ie assigning the new ID ( final_ID ) based on the information in the two first columns.我在下面报告了我想要实现的目标,即根据前两列中的信息分配新 ID ( final_ID )。 Any helps will be appreciated (best if using dplyr )!任何帮助将不胜感激(最好使用dplyr )!


#   ID new_ID ID_final
#1   1      8        1
#2   1      8        1
#3   1      8        1
#4   1      8        1
#5   2     10        2
#6   2     10        2
#7   2     10        2
#8   2     10        2
#9   2     10        2
#10  2     10        2
#11  3      8        1
#12  3      8        1
#13  3      8        1
#14  3      8        1
#15  3      8        1
#16  4      4        4
#17  4      4        4
#18  4      4        4
#19  4      4        4
#20  4      4        4
#21  4      4        4
#22  5      5        5
#23  5      5        5
#24  5      5        5
#25  5      5        5
#26  6     10        2
#27  6     10        2
#28  6     10        2
#29  6     10        2
#30  6     10        2
#31  6     10        2
#32  6     10        2

Here's a data.table solution as well.这也是一个data.table解决方案。

EDIT: added a dplyr solution too at the request of the OP.编辑:应 OP 的要求,也添加了dplyr解决方案。

library(data.table)
ID <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6)
new_ID <- c(8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 8, 8, 8, 8, 8, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 10, 10, 10, 10, 10, 10, 10)

d <- data.table(ID, new_ID)
d[, ID_final := min(.SD[,ID]), new_ID]
d
#>     ID new_ID ID_final
#>  1:  1      8        1
#>  2:  1      8        1
#>  3:  1      8        1
#>  4:  1      8        1
#>  5:  2     10        2
#>  6:  2     10        2
#>  7:  2     10        2
#>  8:  2     10        2
#>  9:  2     10        2
#> 10:  2     10        2
#> 11:  3      8        1
#> 12:  3      8        1
#> 13:  3      8        1
#> 14:  3      8        1
#> 15:  3      8        1
#> 16:  4      4        4
#> 17:  4      4        4
#> 18:  4      4        4
#> 19:  4      4        4
#> 20:  4      4        4
#> 21:  4      4        4
#> 22:  5      5        5
#> 23:  5      5        5
#> 24:  5      5        5
#> 25:  5      5        5
#> 26:  6     10        2
#> 27:  6     10        2
#> 28:  6     10        2
#> 29:  6     10        2
#> 30:  6     10        2
#> 31:  6     10        2
#> 32:  6     10        2
#>     ID new_ID ID_final

library(dplyr)
df <- data.frame(ID, new_ID)
df <- df %>% group_by(new_ID)  %>%
  mutate(ID_final = min(ID))
df
#> # A tibble: 32 x 3
#> # Groups:   new_ID [4]
#>       ID new_ID ID_final
#>    <dbl>  <dbl>    <dbl>
#>  1     1      8        1
#>  2     1      8        1
#>  3     1      8        1
#>  4     1      8        1
#>  5     2     10        2
#>  6     2     10        2
#>  7     2     10        2
#>  8     2     10        2
#>  9     2     10        2
#> 10     2     10        2
#> # ... with 22 more rows

Created on 2019-09-30 by the reprex package (v0.3.0)代表 package (v0.3.0) 于 2019 年 9 月 30 日创建

What you want to do is find the correct ID for each new_ID, and then join to that mapping.您要做的是为每个 new_ID 找到正确的 ID,然后加入该映射。

final_id_map <- df %>% group_by(new_ID) %>% summarise(ID_final=min(ID))
> final_id_map
# A tibble: 4 x 2
  new_ID ID_final
   <dbl>    <dbl>
1      4        4
2      5        5
3      8        1
4     10        2

Then you can just do a然后你可以做一个

df %>% join(final_id_map)

to produce the desired output.生产所需的 output。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM