简体   繁体   中英

summaryBy and lots of variables

I want to use summaryBy and use three grouping variables (right side of my formula), but 170 or so variables to be summarised (in my case calculate median). How can I specify them all in the same formula?

Instead of typing out

var1+var2+var3...

etc. I thought I could make a string like that. That was a whole project in itself, but at least I now have a string stored that is all the variables with plus signs in between. I call it z1 .

Now, simply asking for z1 or even paste(z1) in my summaryBy script does not work:

d <- summaryBy(paste(z1) ~ year + month + ID,
                data=.., 
                FUN=c(median,sum), 
                na.rm=TRUE)

Giving error:

Error in tapply(currVAR, rh.string.factor, function(x) { :
arguments must have same length

I imagine it has to do with the fact that in summaryBy I specify my data. But I am new to R and therefore am not able to comprehend the problem beyond this.

I also tried a different method, as suggested:

d<-summaryBy(paste(z1,"~year+month+ID"),
                data=..,
                FUN=c(median,sum),
                na.rm=TRUE)

This instead gives the error

Error in .get_variables(formula, data, id, debug.info) : 'formula' must be a formula or a list

So not sure how to go form there.

From the help documentation:

Computations on several variables is done using cbind( ) summaryBy(cbind(Weight, Feed) ~ Evit + Cu, data=subset(dietox, Time > 1), FUN=fun)

And testing this, this time with z2 being a string of all my variables separated by commas.

d<-summaryBy(cbind(z2)~year+month+ID,
                data=..,
                FUN=c(median,sum),
                na.rm=TRUE)

or the variation

d<-summaryBy(cbind(paste(z2))~year+month+ID,
                data=..,
                FUN=c(median,sum),
                na.rm=TRUE)

Both give the argument length error as my original try above.

Another suggestion (thanks @akrun):

d<-summaryBy(as.formula(paste(z1,"~year+month+ID")),
                data=..,
                FUN=c(median,sum),
                na.rm=TRUE)'

Reminder: z1 is variables with pluses in between.

In this case, R gives no error. It seems like it is either loading or wating for additional commands. Console looks like this: Screenshot of console Without the > at the bottom.. What does that mean?

Final edit and solution:

The as.formula approach worked! Thanks so much! I now understand that if console does not have an arrow at the bottom, like in my screenshot above, it means R is computing haha.

The issue is that paste is just wrapping around only the variables of interest. It can be

library(doBy)
summaryBy(as.formula(paste(z1, "~ year + month + ID")),
            data=.., 
            FUN=c(median,sum), 
            na.rm=TRUE)

where

z1 <- paste0('var', 1:3, collapse=" + ")

Using a reproducible example from ?summaryBy

data(dietox)
dietox12    <- subset(dietox,Time==12)
fun <- function(x){
   c(m=mean(x), v=var(x), n=length(x))
 }

out1 <-  summaryBy(cbind(Weight, Feed) ~ Evit + Cu, data=dietox12,
       FUN=fun)

out2 <-  summaryBy(Weight +  Feed ~ Evit + Cu, data=dietox12,
                      FUN=fun)

z2 <- paste(c("Weight", "Feed"), collapse=" + ")
out3 <- summaryBy(as.formula(paste(z2,  "~ Evit + Cu")), data=dietox12,
       FUN=fun)
identical(out1, out2)
#[1] TRUE
identical(out1, out3)
#[1] TRUE

So, thanks to @akrun, the following code now works:

d<-summaryBy(as.formula(paste(z1,"~year+month+ID")),
              data=..,
              FUN=c(median,sum),
              na.rm=TRUE)

The reason I thought it didn't at first is because it took so long to compute! It is a massive dataset after all. Edited my original post but left all my tries in there, including the question about the "missing arrow" which I now understand to mean that R is working. Hard. Thanks!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM