简体   繁体   English

根据具有data.table的R中按列的行数重复行集

[英]Repeating sets of rows according to the number of rows by column in R with data.table

Currently in R, I am trying to do the following for data.table table: 当前在R中,我正在尝试对data.table表执行以下操作:

Suppose my data looks like: 假设我的数据如下所示:

Class   Person ID      Index
A       1              3
A       2              3
A       5              3
B       7              2
B       12             2
C       18             1
D       25             2
D       44             2

Here, the class refers to the class a person belongs to. 在此,类别是指一个人所属的类别。 The Person ID variable represents a unique identifier of a person. Person ID变量代表一个人的唯一标识符。 Finally, the Index tells us how many people are in each class. 最后,该指数告诉我们每个班级有多少人。

From this, I would like to create a new data table as so: 由此,我想这样创建一个新的数据表:

Class   Person ID      Index
A       1              3
A       2              3
A       5              3
A       1              3
A       2              3
A       5              3
A       1              3
A       2              3
A       5              3
B       7              2
B       12             2
B       7              2
B       12             2
C       18             1
D       25             2
D       44             2
D       25             2
D       44             2

where we repeated each set of persons by class based on the index variable. 在这里我们根据索引变量按类重复每组人员。 Hence, we would repeat the class A by 3 times because the index says 3. 因此,我们将A类重复3次,因为索引显示为3。

So far, my code looks like: 到目前为止,我的代码看起来像:

setDT(data)[, list(Class = rep(Person ID, seq_len(.N)), Person ID = sequence(seq_len(.N)), by = Index]

However, I am not getting the correct result and I feel like there is a simpler way to do this. 但是,我没有得到正确的结果,我觉得有一种更简单的方法可以做到这一点。 Would anyone have any ideas? 有人有什么想法吗? Thank you! 谢谢!

If that particular order is important to you, then perhaps something like this should work: 如果该特定顺序对您很重要,那么应该可以执行以下操作:

setDT(data)[, list(PersonID, sequence(rep(.N, Index))), by = list(Class, Index)]
#     Class Index PersonID V2
#  1:     A     3        1  1
#  2:     A     3        2  2
#  3:     A     3        5  3
#  4:     A     3        1  1
#  5:     A     3        2  2
#  6:     A     3        5  3
#  7:     A     3        1  1
#  8:     A     3        2  2
#  9:     A     3        5  3
# 10:     B     2        7  1
# 11:     B     2       12  2
# 12:     B     2        7  1
# 13:     B     2       12  2
# 14:     C     1       18  1
# 15:     D     2       25  1
# 16:     D     2       44  2
# 17:     D     2       25  1
# 18:     D     2       44  2

If the order is not important, perhaps: 如果顺序不重要,则可能:

setDT(data)[rep(1:nrow(data), Index)]

Here is a way using dplyr in case you wanted to try 这是使用dplyr的方法,以防您尝试

library(dplyr)
data %>%
group_by(Class) %>% 
do(data.frame(.[sequence(.$Index[row(.)[,1]]),]))

which gives the output 这给出了输出

 #      Class Person.ID Index
 #1      A         1     3
 #2      A         2     3
 #3      A         5     3
 #4      A         1     3
 #5      A         2     3
 #6      A         5     3
 #7      A         1     3
 #8      A         2     3
 #9      A         5     3
 #10     B         7     2
 #11     B        12     2
 #12     B         7     2
 #13     B        12     2
 #14     C        18     1
 #15     D        25     2
 #16     D        44     2
 #17     D        25     2
 #18     D        44     2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM