简体   繁体   中英

how can I split a dataframe by two columns and count number of rows based on group more efficient

I have a data.frame with more than 120000 rows, it looks like this

> head(mydf)
ID MONTH.YEAR VALUE
1 110  JAN. 2012  1000
2 111  JAN. 2012  1000
3 121  FEB. 2012  3000
4 131  FEB. 2012  3000
5 141  MAR. 2012  5000
6 142  MAR. 2012  4000

and I want to split the data.frame depend on the MONTH.YEAR and VALUE column, and count the rows of each group, my expect answer should looks like this

MONTH.YEAR VALUE count
JAN. 2012  1000  2
FEB. 2012  3000  2
MAR. 2012  5000  1
MAR. 2012  4000  1

I tried to split it and use the sapply count the number of each group, and this is my code

sp <- split(mydf, list(mydf$MONTH.YEAR, mydf$VALUE), drop=TRUE);
result <- data.frame(yearandvalue = names(sapply(sp, nrow)), count = sapply(sp, nrow))

but I find the process is very slow. Is there a more efficient way to impliment this? thank you very much.

Try

aggregate(ID~., mydf, length)

Or

library(dplyr)
 mydf %>%
    group_by(MONTH.YEAR, VALUE) %>%
    summarise(count=n())

Or

library(data.table)
setDT(mydf)[, list(count=.N) , list(MONTH.YEAR, VALUE)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM