简体   繁体   中英

How to pair rows with the same value in one column of a dataframe in R

I have data in the following form :

set.seed(1234)
data <- data.frame(cbind(runif(40,0,10), rep(seq(1,20,1), each = 2)))
data <- data[sample(nrow(data)),]
colnames(data) <- c("obs","subject")
head(data)

    obs      subject
1.5904600      12
8.1059855      13
5.4497484       6
0.3999592      12
2.5880982      19
2.6682078       9
   ...         ...

Let's say that I have only two observations (column "obs") by subject (column "subject", where subjects are numbered from 1 to 20).

I would like to "group" rows by values of the "subject" column. More precisely, I would like to "order" data by subject, but conserving the order displayed above. Thus, final data would be something like this:

    obs      subject
1.5904600      12
0.3999592      12
8.1059855      13
2.3656473      13
5.4497484       6
7.2934746       6

Any ideas ? I thought of maybe identifying each row corresponding to a subject with which :

which(data$subject==x)

then rbind these rows in a loop but I am sure there is a simpler and faster way to do this, isn't it ?

Convert to factor with levels then order:

data$group <- factor(data$subject, levels = unique(data$subject))
data[ order(data$group), ]

#           obs subject group
# 1  1.59046003      12    12
# 4  0.39995918      12    12
# 2  8.10598552      13    13
# 30 2.18799541      13    13
# ...

Nest the data by obs and unnest again. The resulting tibble will have retained the original order but subject will be grouped.

library(tidyr)
data %>% nest(obs) %>% unnest()

# A tibble: 6 × 2
#  subject       obs
#    <int>     <dbl>
#1      12 1.5904600
#2      12 0.3999592
#3      13 8.1059855
#4       6 5.4497484
#5      19 2.5880982
#6       9 2.6682078

It is based on zx8754 but it does preserve the data type:

library(dplyr) #arrange function

group<-factor(data[,'subject'], levels=unique(data[,'subject']))
data<-cbind(data,group)
data<-arrange(as.data.frame(data),group)
data<-as.matrix(data[,-3])

dplyr is a great package with various useful verbs, one of which is arrange(variable) , which does what you want here, and more elegantly (result is generally also a data.frame, so you don't need to cbind ):

require(dplyr)
as.data.frame(data) %>% arrange(subject)
# or, if you want reverse order:
as.data.frame(data) %>% arrange(-subject)

(For that matter, data.table is great too. In fact, you can get them both merged in dtplyr package)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM