简体   繁体   中英

adding row/column total data when aggregating data using plyr and reshape2 package in R

I create aggregate tables most of the time during my work using the flow below:

set.seed(1)
temp.df <- data.frame(var1=sample(letters[1:5],100,replace=TRUE),
                      var2=sample(11:15,100,replace=TRUE))
temp.output <- ddply(temp.df,
                     c("var1","var2"),
                     function(df) {
                       data.frame(count=nrow(df))
                     })
temp.output.all <- ddply(temp.df,
                         c("var2"),
                         function(df) {
                           data.frame(var1="all",
                                      count=nrow(df))
                         })

temp.output <- rbind(temp.output,temp.output.all)
temp.output[,"var1"] <- factor(temp.output[,"var1"],levels=c(letters[1:5],"all"))
temp.output <- dcast(temp.output,formula=var2~var1,value.var="count",fill=0)

I start feeling silly to writing the "boilerplate" code every time to include the row/column total when I create a new aggregate table, is there some way for skipping it?

Looking at your desired output (now that I'm in front of a computer), perhaps you should look at the margins argument of dcast :

library(reshape2)
dcast(temp.df, var2 ~ var1, value.var = "var2", 
      fun.aggregate=length, margins = "var1")
#   var2 a b c d e (all)
# 1   11 3 1 6 4 2    16
# 2   12 1 3 6 5 5    20
# 3   13 5 9 3 6 1    24
# 4   14 4 7 3 6 2    22
# 5   15 0 5 1 5 7    18

Also look into the addmargins function in base R.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM