简体   繁体   English

按R中一个向量的值对矩阵中的行进行分组

[英]Grouping rows in a matrix by the value of one vector in R

I'm working with a very large matrix that looks something like this (VNUMBER is the number of that particular visit for the person with the corresponding ID): 我正在使用一个看起来像这样的非常大的矩阵(VNUMBER是具有相应ID的人的特定访问次数):

ID  VNUMBER
23  1
23  2
23  3
37  1
37  2
15  4
15  5
47  1
47  2
47  3
47  4
15  1
15  2
15  3

I'd like to group all the rows so that I have blocks of the same ID number in order by visit. 我想对所有行进行分组,以便按访问顺序排列具有相同ID号的块。 For this example, I'd like to rearrange the matrix so that all the rows where ID=15 are together and in order by VNUMBER, so the resulting matrix would look like: 对于此示例,我想重新排列矩阵,以使ID = 15的所有行按VNUMBER排列在一起并按顺序排列,因此结果矩阵如下所示:

ID  VNUMBER
23  1
23  2
23  3
37  1
37  2
15  1
15  2
15  3
15  4
15  5
47  1
47  2
47  3
47  4

As you can see, it doesn't really matter to me that the ID's are in any order as long as they're in groups and the corresponding visit numbers in those groups are in ascending order. 如您所见,对我来说,ID的顺序并不重要,只要它们在组中,并且这些组中的相应访问者编号在升序即可。

Thus far all I can come up with is to create a new matrix using something like: 到目前为止,我所能想到的就是使用以下方法创建一个新矩阵:

id2 <- sort(ID)
f <- as.numeric(levels(factor(ID)))
vnum2 <- c(VNUMBER[ID==f[1]],VNUMBER[ID==f[2]],VNUMBER[ID==f[3]],VNUMBER[ID==f[4]])

I can then make a new matrix with the id2 and vnum2 vectors that has the format I want. 然后,我可以使用id2和vnum2向量创建一个具有所需格式的新矩阵。 But there must be some simpler way to do so. 但是必须有一些更简单的方法。 Like I said, the actual matrix I'm working with is large (about 100,000 rows and 1,000 columns) so the method above is not feasible and I'd like to avoid long loops. 就像我说的那样,我正在使用的实际矩阵很大(大约100,000行和1,000列),所以上面的方法不可行,我想避免长循环。

Sorry if my question is too long or ill-worded, this is my first time using the site. 抱歉,如果我的问题太长或措辞不当,这是我第一次使用该网站。 Any help would be great. 任何帮助都会很棒。

We can use data.table . 我们可以使用data.table Convert the 'data.frame' to 'data.table' ( setDT(df1) . We order the 'ID' column after converting to factor by setting the level as he unique elements of 'ID', followed by the 'VNUMBER'. It would be give the expected output as showed in the OP's post. 将'data.frame'转换为'data.table'( setDT(df1) 。在转换为factor后,我们将'ID'列order ,方法是将level设置为'ID'的unique元素,然后是'VNUMBER'。如OP的帖子所示,它将给出预期的输出。

library(data.table)
setDT(df1)[order(factor(ID, levels=unique(ID)), VNUMBER)]
#    ID VNUMBER
# 1: 23       1
# 2: 23       2
# 3: 23       3
# 4: 37       1
# 5: 37       2
# 6: 15       1
# 7: 15       2
# 8: 15       3
# 9: 15       4
#10: 15       5
#11: 47       1
#12: 47       2
#13: 47       3
#14: 47       4

Or we can use match . 或者我们可以使用match If the initial dataset is matrix , then 如果初始数据集是matrix ,则

m1[order(match(m1[,'ID'], unique(m1[,'ID'])), m1[,'VNUMBER']),]
#   ID VNUMBER
#1  23       1
#2  23       2
#3  23       3
#4  37       1
#5  37       2
#12 15       1
#13 15       2
#14 15       3
#6  15       4
#7  15       5
#8  47       1
#9  47       2
#10 47       3
#11 47       4

A similar approach using dplyr would be 使用dplyr的类似方法是

library(dplyr)
df1 %>% 
    arrange( match(ID, unique(ID)), VNUMBER)

NOTE: Both the dplyr/data.table methods assume the initial dataset as data.frame . 注意:两种dplyr/data.table方法都假定初始数据集为data.frame We can convert the matrix to data.frame by 我们可以将矩阵转换为data.frame

df1 <- as.data.frame(m1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM