简体   繁体   中英

Grouping rows in a matrix by the value of one vector in R

I'm working with a very large matrix that looks something like this (VNUMBER is the number of that particular visit for the person with the corresponding ID):

ID  VNUMBER
23  1
23  2
23  3
37  1
37  2
15  4
15  5
47  1
47  2
47  3
47  4
15  1
15  2
15  3

I'd like to group all the rows so that I have blocks of the same ID number in order by visit. For this example, I'd like to rearrange the matrix so that all the rows where ID=15 are together and in order by VNUMBER, so the resulting matrix would look like:

ID  VNUMBER
23  1
23  2
23  3
37  1
37  2
15  1
15  2
15  3
15  4
15  5
47  1
47  2
47  3
47  4

As you can see, it doesn't really matter to me that the ID's are in any order as long as they're in groups and the corresponding visit numbers in those groups are in ascending order.

Thus far all I can come up with is to create a new matrix using something like:

id2 <- sort(ID)
f <- as.numeric(levels(factor(ID)))
vnum2 <- c(VNUMBER[ID==f[1]],VNUMBER[ID==f[2]],VNUMBER[ID==f[3]],VNUMBER[ID==f[4]])

I can then make a new matrix with the id2 and vnum2 vectors that has the format I want. But there must be some simpler way to do so. Like I said, the actual matrix I'm working with is large (about 100,000 rows and 1,000 columns) so the method above is not feasible and I'd like to avoid long loops.

Sorry if my question is too long or ill-worded, this is my first time using the site. Any help would be great.

We can use data.table . Convert the 'data.frame' to 'data.table' ( setDT(df1) . We order the 'ID' column after converting to factor by setting the level as he unique elements of 'ID', followed by the 'VNUMBER'. It would be give the expected output as showed in the OP's post.

library(data.table)
setDT(df1)[order(factor(ID, levels=unique(ID)), VNUMBER)]
#    ID VNUMBER
# 1: 23       1
# 2: 23       2
# 3: 23       3
# 4: 37       1
# 5: 37       2
# 6: 15       1
# 7: 15       2
# 8: 15       3
# 9: 15       4
#10: 15       5
#11: 47       1
#12: 47       2
#13: 47       3
#14: 47       4

Or we can use match . If the initial dataset is matrix , then

m1[order(match(m1[,'ID'], unique(m1[,'ID'])), m1[,'VNUMBER']),]
#   ID VNUMBER
#1  23       1
#2  23       2
#3  23       3
#4  37       1
#5  37       2
#12 15       1
#13 15       2
#14 15       3
#6  15       4
#7  15       5
#8  47       1
#9  47       2
#10 47       3
#11 47       4

A similar approach using dplyr would be

library(dplyr)
df1 %>% 
    arrange( match(ID, unique(ID)), VNUMBER)

NOTE: Both the dplyr/data.table methods assume the initial dataset as data.frame . We can convert the matrix to data.frame by

df1 <- as.data.frame(m1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM