[英]how can I split a dataframe by two columns and count number of rows based on group more efficient
I have a data.frame with more than 120000 rows, it looks like this 我有一个超过120000行的data.frame,它看起来像这样
> head(mydf)
ID MONTH.YEAR VALUE
1 110 JAN. 2012 1000
2 111 JAN. 2012 1000
3 121 FEB. 2012 3000
4 131 FEB. 2012 3000
5 141 MAR. 2012 5000
6 142 MAR. 2012 4000
and I want to split the data.frame depend on the MONTH.YEAR
and VALUE
column, and count the rows of each group, my expect answer should looks like this 我想拆分data.frame取决于
MONTH.YEAR
和VALUE
列,并计算每组的行数,我的期望答案应该是这样的
MONTH.YEAR VALUE count
JAN. 2012 1000 2
FEB. 2012 3000 2
MAR. 2012 5000 1
MAR. 2012 4000 1
I tried to split it and use the sapply
count the number of each group, and this is my code 我试图拆分它并使用
sapply
计数每组的数量,这是我的代码
sp <- split(mydf, list(mydf$MONTH.YEAR, mydf$VALUE), drop=TRUE);
result <- data.frame(yearandvalue = names(sapply(sp, nrow)), count = sapply(sp, nrow))
but I find the process is very slow. 但我发现这个过程很慢。 Is there a more efficient way to impliment this?
是否有更有效的方式来表达这一点? thank you very much.
非常感谢你。
Try 尝试
aggregate(ID~., mydf, length)
Or 要么
library(dplyr)
mydf %>%
group_by(MONTH.YEAR, VALUE) %>%
summarise(count=n())
Or 要么
library(data.table)
setDT(mydf)[, list(count=.N) , list(MONTH.YEAR, VALUE)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.