简体   繁体   中英

Order multiple columns in R

Sample data:

now <- data.frame(id=c(123,123,123,222,222,222,135,135,135),year=c(2002,2001,2003,2006,2007,2005,2001,2002,2003),freq=c(3,1,2,2,3,1,3,1,2))

Desired output:

wanted <- data.frame(id=c(123,123,123,222,222,222,135,135,135),year=c(2001,2002,2003,2005,2006,2007,2001,2002,2003),freq=c(1,2,3,1,2,3,1,2,3))

This solution works, but I'm getting memory error (cannot assign 134kb...)

ddply(now,.(id), transform, year=sort(year))

Please note I need speedwise efficient solution as I have dataframe of length 300K and 50 columns. Thanks.

You can use dplyr to sort it (which is called arrange in dplyr). dplyr is also faster than plyr .

wanted <- now %>% arrange(id, year) 
# or: wanted <- arrange(now, id, year)

> wanted
#   id year freq
#1 123 2001    1
#2 123 2002    3
#3 123 2003    2
#4 135 2001    3
#5 135 2002    1
#6 135 2003    2
#7 222 2005    1
#8 222 2006    2
#9 222 2007    3

You could do the same with base R:

wanted <- now[order(now$id, now$year),]

However, there is a diffrence in your now and wanted data.frame for id == 123 and year 2002 (in your now df, the freq is 2 while it is 3 in the wanted df). Based on your question, I assume this is a typo and that you did not actually want to change the freq values.

You could use base R function here

now <- now[order(now$id, now$year), ]

or data.table for faster performance

library(data.table)
setDT(now)[order(id, year)]

or

now <- data.table(now, key = c("id", "year"))

or

setDT(now)
setkey(now, id, year)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM