how can I split a dataframe by two columns and count number of rows based on group more efficient

Question

I have a data.frame with more than 120000 rows, it looks like this

> head(mydf)
ID MONTH.YEAR VALUE
1 110  JAN. 2012  1000
2 111  JAN. 2012  1000
3 121  FEB. 2012  3000
4 131  FEB. 2012  3000
5 141  MAR. 2012  5000
6 142  MAR. 2012  4000

and I want to split the data.frame depend on the MONTH.YEAR and VALUE column, and count the rows of each group, my expect answer should looks like this

MONTH.YEAR VALUE count
JAN. 2012  1000  2
FEB. 2012  3000  2
MAR. 2012  5000  1
MAR. 2012  4000  1

I tried to split it and use the sapply count the number of each group, and this is my code

sp <- split(mydf, list(mydf$MONTH.YEAR, mydf$VALUE), drop=TRUE);
result <- data.frame(yearandvalue = names(sapply(sp, nrow)), count = sapply(sp, nrow))

but I find the process is very slow. Is there a more efficient way to impliment this? thank you very much.

Answer 1

Try

aggregate(ID~., mydf, length)

Or

library(dplyr)
 mydf %>%
    group_by(MONTH.YEAR, VALUE) %>%
    summarise(count=n())

Or

library(data.table)
setDT(mydf)[, list(count=.N) , list(MONTH.YEAR, VALUE)]

how can I split a dataframe by two columns and count number of rows based on group more efficient

Question

1 answers

solution1
9 ACCPTED 2015-05-18 04:14:03

how can I split a dataframe by two columns and count number of rows based on group more efficient

Question

1 answers

solution1 9 ACCPTED 2015-05-18 04:14:03

solution1
9 ACCPTED 2015-05-18 04:14:03