简体   繁体   中英

Aggregating according to factor levels in R (create new columns?)

I have an array of dates and an array of categories. I want to aggregate the array of categories by dates, counting the occurrences. If I just do:

array <- aggregate(array$category,by=list(array$date),FUN="length")

I will get the occurrences of all the categories. I want the occurrences per factor of the category.

I have several data, each with its own category. It varies from 5 to 9 categories, and in each date, you can have different categories.

An example data is:

category dateop
   3 05/07/2012
   3 05/07/2012
   4 05/07/2012
   4 05/07/2012
   4 05/07/2012
   4 05/07/2012
   5 05/07/2012
   5 05/07/2012
   5 05/07/2012
   3 05/07/2012
   3 05/07/2012
   3 05/07/2012
   3 03/07/2012
   1 04/07/2012
   5 05/07/2012
   5 05/07/2012
   5 05/07/2012
   5 05/07/2012
   5 05/07/2012
   3 05/07/2012

I guess that I have to create new arrays that will keep the occurrences of each category. I have written a function that does a for loop over each level and creates a new array, but I was wondering if there was a faster (and with less code) way to do this.

Thank you!

Here are two possible simple solutions (I'll call your data set df , though it's not any better name than array )

library(data.table)  
setDT(df)[, .(occurrences  = .N), .(date, category)]

#          date category occurrences
# 1: 05/07/2012        3           6
# 2: 05/07/2012        4           4
# 3: 05/07/2012        5           8
# 4: 03/07/2012        3           1
# 5: 04/07/2012        1           1

Or

library(dplyr)
df %>%
  group_by(date, category) %>%
  summarise(occurrences = n())

# Source: local data table [5 x 3]
# Groups: date
# 
#         date category occurrences
# 1 05/07/2012        3           6
# 2 05/07/2012        4           4
# 3 05/07/2012        5           8
# 4 03/07/2012        3           1
# 5 04/07/2012        1           1

Or with base R

df$occurrences <- 1
aggregate(occurrences ~ date + category, df, sum)
#         date category occurrences
# 1 04/07/2012        1           1
# 2 03/07/2012        3           1
# 3 05/07/2012        3           6
# 4 05/07/2012        4           4
# 5 05/07/2012        5           8

And @akruns uber vectorized solution

subset(as.data.frame(table(df[2:1])), !!Freq)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM